(015) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: Delivery of TDT2 materials for January -- Part 1
Date: Wed, 01 Apr 1998 05:39:23 EST

Those of you who have been watching the LDC's web pages regarding the
TDT collection will know that you should be expecting 6 items to be
available:

- the complete SGML text archive for the January TDT2 collection
- the ASR text provided by Dragon for the January acoustic data
- a table mapping story boundaries to the ASR text
- a tokenized, untagged text set derived from the SGML archive
- a table mapping story boundaries to the tokenized text data
- a table of topic relevance data, relating stories to topics

The items that are available as of Wednesday morning, April 1 (no
fooling) are the SGML archive and the relevance table. We have the
ASR text data in hand from Dragon, and we will complete the story
boundary table for it within the next day (I hope). The tokenized
version of the SGML text (and its associated table) should also be
ready within the same amount of time.

There will be a few gaps in the sampling provided in this week's
delivery, which will be filled in when the entire training partition
of the corpus (January and February data) has been completed, within
the next month or so.

I have made some minor updates to the documentation web pages; also,
please note that there are now two lists of topics posted on our main
TDT web page (http://www.ldc.upenn.edu/TDT/). The current data
delivery contains relevance data only for the first list of topics.

Here is how to obtain the first installment of the delivery:

[ftp instructions available on request from graff@ldc.upenn.edu]

The compressed file size is 29351085 bytes, and it will expand to
87483904 bytes when uncompressed.

Good luck, and please do not hesitate to inform me (and the list
<tdt-distrib@ldc.upenn.edu>) of any difficulties or confusion that you
encounter.

Dave Graff
(015) previous ~ index ~ next

Last updated Wed Sep 9 09:40:46 1998