(258) previous ~ index ~ next
To: Jon Fiscus <email@example.com>
From: Jonathan Fiscus <firstname.lastname@example.org>
Subject: Re: [Fwd: New Dry Run resources]
Date: Wed, 12 Jul 2000 14:48:54 -0400
The LDC has released a new set of TDT2 topic relevance tables, version
3.3. The new release includes the re-classified certified off-topic
stories for the TDT2 multilingual topics and
updates to the annotations for the all topics. You can access the data
through the LDC's "TDT2" Project web page:
'http://www.ldc.upenn.edu/Projects/TDT2/' which contains the link
called: "Latest Version of TDT2-Multilangage Topic Tables (v3.3)".
During the dry run, I'll be scoring with these new annotations.
As promised on July 3rd, I have built new dry run index files using real
certified off topic stories. The new dry run index files are at the
Please use these new index files for the dry run.
Jon Fiscus wrote:
> I'm releasing three things today for the TDT2000 Dry Run: dry-run index
> files, an updated evaluation plan (v1.1), and boundary files generated
> by automatic story segmentation. The relevant URLs are:
> Index Files
> The index files are an interim release of the index files. The files
> conform to the TDT spec. syntactically, but the certified off-topic
> stories are placeholders. The LDC is re-annotating the certified no
> stories to be inline with the new evaluation specification. As soon as I
> return from my vacation (on July 11th), I'll issue updated index files.
> Evaluation Plan
> The primary changes to the TDT2000 eval plan (from TDT3) are
> changes in the topic tracking task and the link detection task.
> These two tasks represent the primary interest of the sponsor
> and should serve as the primary focus of TDT R&D. The changes
> Topic Tracking:
> * Single-language training only. This does not indicate a lack
> of interest in cross-language topic tracking. On the contrary,
> cross-language tracking is of prime interest. However, dual-
> language training has been eliminated because dual-language
> - results in essentially single-language topic tracking.
> - avoids the hard cross-language issues.
> - clutters the results with data of secondary value.
> - fragments the research effort and causes unnecessary work.
> * Negative example training stories. This is done by certifying
> as off-topic, off-topic training stories that are very similar
> to the on-topic training stories.
> Link Detection:
> * Extension of the task to include cross-language story pairs.
> This is done without additional annotation effort by deriving
> the link judgements from topic annotation for the given topics
> (as was demonstrated successfully in the TDT3 dry run).
> Automatic Story Segmentation Boundary Files
> IBM graciously provided NIST with the output of their segmentation
> system for both the TDT2 and TDT3 corpus. This release contains
> boundary files, (in TDT corpus format), generated from the output of
> their automatic story segmenter.
> Jonathan Fiscus
> National Inst. of Stds. and Tech.
> 100 Bureau Dr. Stop 8940
> Gaithersburg, MD 20899-8940
> Phone: (301) 975-3182
> Email: email@example.com
National Inst. of Stds. and Tech.
100 Bureau Dr. Stop 8940
Gaithersburg, MD 20899-8940
Phone: (301) 975-3182
(258) previous ~ index ~ next
Last updated Thu Jul 13 16:02:18 2000