
Topic
Detection and Tracking
Topic Detection and Tracking
(TDT) is a multi-site research project, now in its third phase, to develop core technologies for a news understanding systems. Specifically, TDT systems discover the topical structure in unsegmented streams of news reporting as it appears across multiple media and in different languages. For a detailed discussion of the goals of TDT, see Charles Wayne's overview. The NIST web site describes the evaluation methodology and reports on previous phases of TDT research. LDC developed the corpus for the second phase of TDT and is currently developing the phase three corpus. More detailed information follows on the phases of TDT and the corpora they involve.
- Pilot-Study
- TDT 2 (corpus used for training and for 1998 test)
- TDT 3 (corpus used in 1999, 2000 and 2001 tests)
- TDT 2000 -- takes you to the NIST TDT-2000 web page
- TDT 2001 -- takes you to the NIST TDT-2001 web page
- TDT 4 (corpus used for 2002, 2003 tests)
- TDT 5 (corpus used for 2004 test)