(308) previous ~ index ~ next
To: tdt-distrib@ldc.upenn.edu
From: James Allan <allan@cs.umass.edu>
Subject: TDT 2000 meeting summary
Date: Sun, 26 Nov 2000 15:48:17 -0500
TDT people,
I thought I'd drop a note to the entire TDT mailing list to summarize
some of the conclusions at the TDT 2000 meeting at NIST. We had a
fairly intense and quick meeting, one that was a bit draining for a
good portion of the attendees since they had just spent 3-4 days
attending TREC. We had several people from the TREC filtering crowd
attend Thursday afternoon, and some who stuck around for the full
meeting.
The TDT Web site (http://www.nist.gov/TDT should work, but doesn't
seem to right now) will contain the various talks and papers that
people put together for the workshop. I'm sending Jon Fiscus my
summary and lesson learned slides, so they'll be available for your
amusement, too.
Some tentative conclusions toward TDT 2001 and TREC 2001 (there will
not be a TREC-10) follow. I'm providing them to whet you appetite and
get you thinking about participating. If you need funding to work on
TDT, perhaps this'll inspire you to write a proposal so you can be
involved.
* TDT 2001's evaluation workshop will be held with TREC 2001.
Since TREC is Tues-Fri next year, that probably means TDT will
precede TREC rather than follow it.
* The TREC filtering track will probably adopt the TDT tracking
task, or a variation of it. (This is very tentative and depends
upon resolving some data access issues as well as the filtering
group's deciding this is the right choice.) They will probably
focus on English-only data, though may be willing to include
SYSTRAN transcripts in there, as well as ASR output. They will
not tackle system-generated boundaries.
* Variations on the tracking task (e.g., system-generated
boundaries, cross-language beyond SYSTRAN) may be reported out at
the filtering group sessions or at TDT.
* The Link Detection and Cluster Detection tasks are likely to be
the most heavily explored tasks in TDT (not counting tracking
which is being explored at TREC). FSD may be dropped for now, due
to apparent lack of interest. Segmentation will be retained, but
there seems to be minimal interest in it without a new corpus for
evaluation.
* The Cluster Detection evaluation measure will probably change.
One proposal is to use the Link Detection task to create a measure
of the cluster quality.
* The TDT-4 corpus is being gathered (last four months of 2000), but
it does not appear at the moment that it will be useable for TDT
2001. We are shy of funding, and may not even have the time given
the funding. People are working toward making TDT-4 available for
some evaluation: TDT 2002 if not TDT 2001.
* The TDT-3 corpus, half of the TDT 1999 topics and half of the TDT
2000 topics will probably be made available for training and
development of the technology. Evaluation will be on the
remaining topics and on the TDT-3 corpus.
* For evaluation, the TDT-3 corpus will probably be "augmented" by
the addition of other material with the potential of confusing the
systems. This results is an artificially different corpus, but
without the expense of relevance judgments. For example, the
July-September newswire data could be prepended, or extra newswire
stories could be sprinkled throughout the three months.
Note that these last two are *very* tentative. Please do not start
training your system on the TDT-3 corpus until we've finalized the
decisions. There has been discussion of trying to hold back the TDT-3
corpus again, though the hardliners on that position are weakening as
time goes by and the data become less fresh.
I'm positive I've missed some things. I hope that you'll find
something interesting, though, and that you'll plan on participating
in TDT 2001.
James Allan
(308) previous ~ index ~ next
Last updated Mon Mar 5 14:36:28 2001