(238) previous ~ index ~ next
To: Rich Schwartz <schwartz@bbn.com>, Jon_Yamron@dragonsys.com
From: "Kraaij, Wessel" <Kraaij@TPD.TNO.NL>
Subject: RE: Evaluation plan
Date: Fri, 2 Jun 2000 15:56:50 +0200
Jon and Rich,
we will be a new participant in TDT-2000. We have built some experience in
the TREC evaluations, where it is common to test/develop and tune data on
the evaluation
data of previous years. Sometimes a new task in TREC is a little bit like a
gamble,
but at least one can test subsystems on well known test-collections. So my
first question to the TDT organizers was: being a newcomer to TDT, where is
the evaluation data from last year, so that we can build our systems (some
components have to built from scratch), test them/ bring them to an
acceptable level *before* the dry run starts. The answer I got is this: one
is not allowed to use the TDT-3 evaluation data for system development, but
you are allowed to use TDT-2.
So my question to the experienced groups is: is it possible to develop a
system on the TDT-2 data? We are primarily interested in topic detection,
but might do tracking and/or segmentation. If we could use the 60 topics
from TDT-3, that would be even better. From a methodological point of view,
I think an evaluation based on 60 new topics is more convincing than one
that includes old topics of which the relevance data has been distributed
already. I do not say that it is wrong to have a mixed evaluation set. For
example at the TREC adaptive filtering task, we have been using a data set
for which old relevance judgements from the ad-hoc task were available. But
for a part of the data set (AP90) new relevance judgements were produced. My
main concern is thus the availibility of a sufficient amount of development
data, be it TDT-2 or 3.
--Wessel
Wessel Kraaij, TNO-TPD
> -----Original Message-----
> From: Rich Schwartz [mailto:schwartz@bbn.com]
> Sent: donderdag 1 juni 2000 19:46
> To: Jon_Yamron@dragonsys.com
> Cc: tdt-distrib@ldc.upenn.edu
> Subject: Re: Evaluation plan
>
>
>
> Jon,
>
> On Thu, 1 Jun 2000 Jon_Yamron@Dragonsys.com wrote:
>
> > In short, if we want to get useful work done this year, I
> think we need an
> > evaluation plan that allows us to use the 60 topics from Eval-99 for
> > development.
>
> I agree that having a good development test set is critical for
> doing useful work. Otherwise we all just gambling.
>
> Your suggestion of using the eval-99 topics as development and
> then testing on a new set of 60 topics from the same set sounds like a
> good compromise. Since we currently have no way to evaluate
> our systems
> on the new topics (since we don't know what they are yet), there is no
> way to cheat even though we may run our systems on the data
> many times.
>
> --Rich
>
>
(238) previous ~ index ~ next
Last updated Mon Jun 12 13:26:39 2000