(291) previous ~ index ~ next
To: tdt-distrib@ldc.upenn.edu
From: James Allan <allan@cs.umass.edu>
Subject: Forum for TDT papers
Date: Tue, 05 Jan 1999 09:35:56 -0500
TDTers,
I'm co-chairing (with Bruce Croft) a "thematic session" at ACL this
year that touches on TDT. Previously, they had asked us to avoid
separate publicity on these sessions, but they've recently changed
their mind, so I thought I'd relay this to you.
Some small additional details about this session can be found at
http://ciir.cs.umass.edu/acl99 and the full conference call can be
found at http://www.mri.mq.edu.au/conf/acl99. The paper deadline is
January 25th, so for those of you who didn't have information about
this from other sources, I apologize.
The title of the session sounds like it's exactly TDT, but you'll see
when you read it that it's actually much broader.
-- james
ACL '99 thematic session: Topic Detection
Topic Detection is about discovering structure and themes in text; it
is about finding the topics that underlie the text. Given a set of
documents, the goal is to impose an organization on it that makes it
possible for someone to see the structure of the texts, or to
recognize the themes that run through it. This problem of detecting
and presenting that structure is of growing importance as people and
organizations generate and store more and more information on-line.
Document clustering is a form of topic detection that groups the
texts based on similarity of content. The Information Retrieval
research community has investigated clustering methods, applications,
and evaluation for decades, but has generally found serious drawbacks
in its loose definition of group and the inability to convincingly
describe the content of a cluster.
Document classification, on the hand, is unlike detection because it
expects the "classes" to be known in advance. The element of
discovery is a critical aspect of detection: scanning a set of
documents in order to find structure that was not known about in
advance.
The recent Topic Detection and Tracking (TDT) effort focuses the
detection problem slightly by limiting the domain to broadcast news
stories, by insisting that the detected structure be based upon the
underlying real-world events that cause news stories to be reported,
and by requiring that the detection be carried out "on-line", as the
news stories arrive. This more restricted setting has the potential
to generate more powerful clustering approaches--e.g., ones that
generate more succinct clusters that can be more easily described.
(TDT is an initiative arising out of the Information Retrieval and
speech recognition communities, with a broad range of associated
problems. Additional information about TDT is available on the Web at
http://www.nist.gov/speech/tdt98/tdt98.htm.)
For this ACL theme, we are interested in papers that discuss the Topic
Detection problem directly, or any work on technology that can make it
more likely that high-quality solutions will be available. The topics
covered include, but are not limited to:
* Precise summaries of multiple texts
* Multi-document theme extraction
* TDT work (primarily detection)
* Concept extraction for use in grouping or summarizing
* Named entity recognition used for detection
* High-quality incremental clustering
* Novel clustering methods for discovery purposes
* Evaluation approaches for detection or summarization
* User studies to understand "topic", etc.
Papers may rely entirely on statistical and probabilistic Information
Retrieval techniques, may employ symbolic or statistical Natural
Language Processing methods, may combine the two, or introduce
entirely different approaches. We are interested in new and exciting
work that will help in the process of detecting and presenting the
topics that underly the structure of text.
(291) previous ~ index ~ next
Last updated Wed Feb 3 10:44:20 1999