(230) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: NEW VERSION OF TDT2 COMPLETE TOPIC TABLE
Date: Tue, 25 Apr 2000 16:48:06 EDT

Folks,

Mike Schultz brought an important fact to our attention: in the last release
of the TDT2 Multilingual text corpus (dating from October 1999), the
"complete" topic annotation table (file name "tdt2_topic_rel.complete_annot",
containing ``version="3.1" release_date="19-Aug-1999"'' in the initial line)
was in fact NOT complete -- we had failed to incorporate the results of
adjudication on the 1998 TDT2 Evaluation scoring.

Recall that in the LDC's adjudication of TDT2 test results, we added about 200
hits ("YES" and "BRIEF") on the 1998 eval-set topics (topicids 20067-20100),
changed "YES" to "BRIEF" or vice-versa on about a dozen or so other hits, and
eliminated a small number of false-alarms in the previous TDT2 annotations;
these all involved stories in the 1998 evaluation data set (May-June 1998).

A few months later, when we built the topic table that spanned all 100 topics
judged against all six months of TDT2 English data, we failed to incorporate
the adjuducation changes.

Thanks to Mike's attentiveness, we have corrected the problem, and a new
version of the TDT2 topic table is now available via anonymous ftp:

ftp://ftp.cis.upenn.edu/pub/ldc/data_samples/tdt2_topic_rel.complete_annot.v3.2
.gz

(i.e. no special ldc-membership group restrictions apply to access the file).


On the subject of topic tables, I'd like to ask those of you with functioning
TDT systems to recall one of the action items that was discussed at the TDT3
workshop a couple of months ago: it was agreed that folks would do a complete
"tracking" run for all 60 TDT3 topics, covering the entire TDT3 text corpus.
The intention here is to provide LDC with as many sets of system outputs as
possible, so that we can do a more thorough adjudication of topic labels in
this corpus.

In particular, the adjudication that we did for the 1999 test results covered
only those portions of the corpus that constituted the test epoch for each of
the 60 topics. In case it makes things easier on participants, it would of
course be adequate if systems merely did tracking over the training epochs for
each topic -- the primary goal is to look for annotation misses in that
portion of the corpus.

No schedule or deadline had been set for submission of site results to LDC,
but now that Jon Fiscus has circulated a schedule for upcoming TDT-2000 events
I'm worried that we may have let too much time go by already.

Please let me know if you can help with this task, and what sort of
turn-around schedule you can handle. Thanks.

Dave Graff
LDC


(230) previous ~ index ~ next

Last updated Wed May 24 17:18:23 2000