(178) previous ~ index ~ next

To: tdt-distrib@ldc.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: Re: The TDT3 second dry run and the October TDT workshop
Date: Tue, 07 Sep 1999 18:32:38 EDT

Folks,

I'm sorry not to have responded sooner to the question that Bowden Wise posted
to this list last week, wondering about the status of the TDT2 Text corpus.
Yes, we have been working on a FULLY CORRECTED version of the TDT2
Mulilanguage Text collection, and that version is now ready for shipment.
TDT3 participants will receive their copies later this week.

This newest version will show "Release Date: September 7, 1999" on the cdrom
label, and the data will be identified as "version 3.1", to distinguish it
from the two earlier (unsuccessful) attempts that I distributed a couple weeks
ago.

Although Jon Fiscus is out of the country this week, I believe that the
version of scoring software now available at the NIST TDT3 web site will work
on this version of the data -- but I do not know this for sure, since I have
not been able to try it myself. (Whoever happens to try it first, please be
sure to let Rich Schartz know how it turns out. ;^))

I do know that the glitches in data content and file format that affected
earlier versions -- both NIST's June 6 release (what I call "version 2" of the
corpus) and the August releases of "version 3.0" -- have been eliminated. Any
difficulty that you might encounter in using the version 3.1 data will need to
be addressed by making fairly simple adjustments to programs or scripts that
were tailored to the older ("version 2") directory and file structures --
overall, the front-end coding will be simpler.

The format and organization of v3.1 is as described in my email of August 19
-- you can review that message at:

http://www.ldc.upenn.edu/Projects/TDT3/email/email_161.html

and that documentation is also provided on the cdrom, both in the root
directory and in the tar file that contains the actual corpus. Further
documentation about the data is also provided in that tar file.

Dave Graff


(178) previous ~ index ~ next

Last updated Wed Sep 22 10:26:04 1999