(098) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: New Topic Relevance Table for TDT2 English
Date: Fri, 14 May 1999 15:28:41 EDT

Folks,

I have just packaged up a complete topic_relevance.table file for the
TDT2 English Corpus. This table includes 11348 records (i.e.
on-topic hits), spanning all 100 TDT2 topics as judged against all
stories from the six English sources collected between January 4 and
June 30, 1998. (Actually, only 99 of the topics show up in the table
-- one topic was a complete miss; several others show very low yield.)

The file is now available via the usual "members_only" anon-ftp
retrieval method. The file name is:

tdt2-eng-toptable-v1.0.tar.gz

The exact file size is 106654 bytes. In addition to the topic table
itself, this tar file also includes a slightly modified DTD for the
table ("dtd/topic_relevance.dtd"), which accommodates two new SGML
attributes on the "<TOPICSET>" element, which appear in the first
line of the table file, as follows:

<TOPICSET version="1.0" release_date="14-May-1999">

Please bear in mind that this version of the complete table may be
lacking in terms of checking for "misses" on the part of LDC
annotators. We have done some work to look for misses on the first
66 topics over the first four months of data, but that work has not
been incorporated into this release of the table, because we have not
had time yet to run precision checks on the additional hits that were
cited.

I am working on a final, comprehensive release of the TDT2 English
text data, and this should be ready for delivery on cdrom within a
few days. (By and large, the topic table provided above should work
with the text data as delivered previously.)

Dave Graff
(098) previous ~ index ~ next

Last updated Mon Jun 21 11:03:30 1999