(258) previous ~ index ~ next

To: Jon Fiscus <jfiscus@nist.gov>
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: [Fwd: New Dry Run resources]
Date: Wed, 12 Jul 2000 14:48:54 -0400

Folks,

The LDC has released a new set of TDT2 topic relevance tables, version
3.3. The new release includes the re-classified certified off-topic
stories for the TDT2 multilingual topics and
updates to the annotations for the all topics. You can access the data
through the LDC's "TDT2" Project web page:
'http://www.ldc.upenn.edu/Projects/TDT2/' which contains the link
called: "Latest Version of TDT2-Multilangage Topic Tables (v3.3)".
During the dry run, I'll be scoring with these new annotations.

As promised on July 3rd, I have built new dry run index files using real
certified off topic stories. The new dry run index files are at the
URL:


ftp://jaguar.ncsl.nist.gov/tdt/tdt2000/dryrun2000/dryrun2000_indexfiles.20000712.tar.Z

Please use these new index files for the dry run.


Jon



Jon Fiscus wrote:
>
> Folks,
>
> I'm releasing three things today for the TDT2000 Dry Run: dry-run index
> files, an updated evaluation plan (v1.1), and boundary files generated
> by automatic story segmentation. The relevant URLs are:
>
> ftp://jaguar.ncsl.nist.gov/tdt/tdt2000/dryrun2000/dryrun2000_indexfiles.20000630.tar.Z
> ftp://jaguar.ncsl.nist.gov/tdt/tdt2000/evalplans/TDT00.Eval.Plan.v1.1.doc
> ftp://jaguar.ncsl.nist.gov/tdt/tdt2000/evalplans/TDT00.Eval.Plan.v1.1.ps
> ftp://jaguar.ncsl.nist.gov/tdt/tdt2000/AutoBoundary_20000629.tgz
>
> Index Files
> -----------
>
> The index files are an interim release of the index files. The files
> conform to the TDT spec. syntactically, but the certified off-topic
> stories are placeholders. The LDC is re-annotating the certified no
> stories to be inline with the new evaluation specification. As soon as I
> return from my vacation (on July 11th), I'll issue updated index files.
>
> Evaluation Plan
> ---------------
>
> The primary changes to the TDT2000 eval plan (from TDT3) are
> changes in the topic tracking task and the link detection task.
> These two tasks represent the primary interest of the sponsor
> and should serve as the primary focus of TDT R&D. The changes
> are:
>
> Topic Tracking:
> * Single-language training only. This does not indicate a lack
> of interest in cross-language topic tracking. On the contrary,
> cross-language tracking is of prime interest. However, dual-
> language training has been eliminated because dual-language
> training:
> - results in essentially single-language topic tracking.
> - avoids the hard cross-language issues.
> - clutters the results with data of secondary value.
> - fragments the research effort and causes unnecessary work.
> * Negative example training stories. This is done by certifying
> as off-topic, off-topic training stories that are very similar
> to the on-topic training stories.
>
> Link Detection:
> * Extension of the task to include cross-language story pairs.
> This is done without additional annotation effort by deriving
> the link judgements from topic annotation for the given topics
> (as was demonstrated successfully in the TDT3 dry run).
>
> Automatic Story Segmentation Boundary Files
> -------------------------------------------
>
> IBM graciously provided NIST with the output of their segmentation
> system for both the TDT2 and TDT3 corpus. This release contains
> boundary files, (in TDT corpus format), generated from the output of
> their automatic story segmenter.
>
> Jon
>
> --
> Jonathan Fiscus
> National Inst. of Stds. and Tech.
> 100 Bureau Dr. Stop 8940
> Gaithersburg, MD 20899-8940
>
> Phone: (301) 975-3182
> Email: jonathan.fiscus@nist.gov

--
Jonathan Fiscus
National Inst. of Stds. and Tech.
100 Bureau Dr. Stop 8940
Gaithersburg, MD 20899-8940

Phone: (301) 975-3182
Email: jonathan.fiscus@nist.gov
(258) previous ~ index ~ next

Last updated Thu Jul 13 16:02:18 2000