(236) previous ~ index ~ next

To: tdt-distrib@ldc.upenn.edu
From: Jon_Yamron@Dragonsys.com
Subject: Evaluation plan
Date: Thu, 1 Jun 2000 17:52:54 -0400

In reading over the evaluation plan, I was disturbed to read the following:

"The TDT3 corpus will be augmented for TDT2000 to include 60 additional topics.
These topics, along with the original 60 TDT3 topics, will be used for formal
TDT2000 evaluation."

(I assume we are talking about topics annotated on the October-December data.)
Under this plan, the October-December data is off-limits for any use, which is
PROPER, in the sense that evaluation data should be off limits, but VERY
LIMITING, as we are already starved for development data as it is. Under this
plan, we cannot even look at our Eval-99 output (even looking at the DET curve
for Tracking or C_det for Detection is a cheat, something we have all already
done)!

I thought that what we actually decided was to use the 60 new topics for
evaluation, but to allow sites to use the original 60 topics for tuning (or more
precisely, as the development test set). Of course, sites would NOT be allowed
to incorporate the October-December data into their training (e.g., for building
discriminator models). While this is a little bit suspect, because we are
tuning on a data set that contains the stories we will evaluate on, it isn't too
bad, since the topics we are tuning on are disjoint from the ones we will be
evaluated on. This seems like a reasonable accommodation for the research,
given the fact that we were unable to acquire a new corpus for this evaluation.

In short, if we want to get useful work done this year, I think we need an
evaluation plan that allows us to use the 60 topics from Eval-99 for
development.

Comments?

- Jon


(236) previous ~ index ~ next

Last updated Mon Jun 12 13:26:39 2000