(239) previous ~ index ~ next

To: tdt-distrib@ldc.upenn.edu
From: James Allan <allan@cs.umass.edu>
Subject: Re: Evaluation plan
Date: Tue, 06 Jun 2000 17:44:20 -0400

Anyone else have any comments on Jon's concerns about whether or not
people can look at the evaluation data in any way at all? At the
moment, the only comment has been from Rich who agrees with Jon.
Anyone else?
			-- james



An extract of Jon's message....
> (I assume we are talking about topics annotated on the
> October-December data.) Under this plan, the October-December data
> is off-limits for any use, which is PROPER, in the sense that
> evaluation data should be off limits, but VERY LIMITING, as we are
> already starved for development data as it is. Under this plan, we
> cannot even look at our Eval-99 output (even looking at the DET
> curve for Tracking or C_det for Detection is a cheat, something we
> have all already done)!
>
> I thought that what we actually decided was to use the 60 new topics
> for evaluation, but to allow sites to use the original 60 topics for
> tuning (or more precisely, as the development test set). Of course,
> sites would NOT be allowed to incorporate the October-December data
> into their training (e.g., for building discriminator models).
> While this is a little bit suspect, because we are tuning on a data
> set that contains the stories we will evaluate on, it isn't too bad,
> since the topics we are tuning on are disjoint from the ones we will
> be evaluated on. This seems like a reasonable accommodation for the
> research, given the fact that we were unable to acquire a new corpus
> for this evaluation.
>
> In short, if we want to get useful work done this year, I think we
> need an evaluation plan that allows us to use the 60 topics from
> Eval-99 for development.




(239) previous ~ index ~ next

Last updated Mon Jun 12 13:26:39 2000