(080) previous ~ index ~ next
To: Jonathan Fiscus <jonathan.fiscus@nist.gov>
From: Hubert Jin <hjin@bbn.com>
Subject: Re: TDT-2 Evaluation Website
Date: Mon, 13 Jul 1998 12:52:16 -0400 (EDT)
Hi Jon,
The evaluation code for TDT2 (the April-dry-dry-run) doesn't work for the
May release, nor does it for the most recent June release, because of the
following reasons:
(1) LDC changed label 'FULL' for a topic story to 'YES' in the
May and June release. The scoring procedure has a problem
"token "YES" occurs more than once in attribute definition
list". So the solution I had was to change the 'YES's in
topic_relevance.table to 'FULL' [used in the April release
and where scoring software works]
(2) The NIST/LDC scoring procedure can not handle Multiple FULL
topic levels. In the April release, each story is only on
at most one topic. However, in the May and June release,
some stories are 'YES' for multiple topics. So, unless the
the scoring code is updated, it is unlikely we can use it.
Note: LDC changed the labels of some stories in May, and
introduced the multiple FULL [i.e. YES] topic levels
for a single story.
April release:
<ONTOPIC topicid=2 level=FULL docno=CNN19980123.1130.0252
<ONTOPIC topicid=11 level=BRIEF docno=CNN19980123.1130.0252
May release:
<ONTOPIC topicid=2 level=YES docno=CNN19980123.1130.0252
<ONTOPIC topicid=11 level=YES docno=CNN19980123.1130.0252
So far, we have been mapping TDT2's DOCNO to TDTID (similar to what
in the Pilot study) and used the TDT1 scoring software to plot the
DET curve.
Would we be able to get the update dated TDT2 evaluation score before
the workshop?
Also, the index files for TDT2 (May & June releases) are still not
available. Would we get them soon?
Thanks,
-Hubert
On Fri, 5 Jun 1998, Jonathan Fiscus wrote:
> Hubert,
>
> You're specifying the index file and the system
> output file incorrectly. Since a single evaluation involves many
> system runs, (one per topic), the TDTtrk.pl script uses file lists
> to identify index files and system output files.
>
> In the "Errors" directory, there are two example file lists,
> "tk_bn_indexes" which is a file list of indexes, and "tk_bn_outputs"
> which is a file list of system outputs. If you instead run the
> command:
>
> TDT2trk.pl -R ../.. -I tk_bn_indexes tk_bn_outputs
>
> It will work. In your example, you only wanted to score the output
> for a single topic, creating file lists containing just single topic
> will do the trick.
>
> If this isn't clear, the online documentation explains this in more
> detail.
>
> Jon
>
(080) previous ~ index ~ next
Last updated Wed Sep 9 09:40:51 1998