(240) previous ~ index ~ next
To: tdt-distrib@ldc.upenn.edu
From: James Allan <allan@cs.umass.edu>
Subject: Re: Evaluation plan
Date: Tue, 06 Jun 2000 18:35:35 -0400
Regarding Wessel's comments....
> but at least one can test subsystems on well known test-collections. So my
> first question to the TDT organizers was: being a newcomer to TDT, where is
> the evaluation data from last year, so that we can build our systems (some
> components have to built from scratch), test them/ bring them to an
> acceptable level *before* the dry run starts. The answer I got is this: one
> is not allowed to use the TDT-3 evaluation data for system development, but
> you are allowed to use TDT-2.
That is consistant with the evaluation plan, which has the goal of
minimizing as much as possible the effects of training on the test
data.
> So my question to the experienced groups is: is it possible to develop a
> system on the TDT-2 data? We are primarily interested in topic detection,
> but might do tracking and/or segmentation. If we could use the 60 topics
> from TDT-3, that would be even better. From a methodological point of view,
> I think an evaluation based on 60 new topics is more convincing than one
> that includes old topics of which the relevance data has been distributed
> already.
Note that if you use the existing TDT-3 topics for your training, you
are learning how to put them into clusters. Anything you do to
improve their clustering might also improve your clustering of stories
in the as-yet-untagged 60 new topics in the same corpus. Of course,
people will try not to fall into a trap like that, but it's possible.
The TDT-2 corpus (Jan-Jun) has about 100 official topics. There are
another roughly 100 topics that were created for the summer workshop
at JHU (they should be used somewhat cautiously), though they're only
for the English data. So there's a decent amount of training
information in the TDT-2 corpus.
Jon raised the question of whether that's enough. It's a good
question.
-- james
(240) previous ~ index ~ next
Last updated Mon Jun 12 13:26:39 2000