(246) previous ~ index ~ next
To: tdt-distrib@ldc.upenn.edu
From: Paul van Mulbregt <paulvm@dragonsys.com>
Subject: Re: Evaluation plan
Date: Mon, 12 Jun 2000 12:39:58 -0400
I think the degree of cheating depends on the particular TDT task.
For Segmentation, even looking at the numbers from the 99 Eval tells you
something about
the 2000 Eval, since it is the same data set with exactly the same scoring.
For Topic Detection, one could argue that creating good clusters, as scored
by the first 60 topics,
will lead to good clusters, as scored by the second set of 60 topics (and
even more likely if scored by the full 120 topics.)
For Topic Tracking, that same argument is harder to make.
-Paul
At 12:26 PM 12-06-00 -0400, Rich Schwartz wrote:
>Jon,
> It's hard to know how good different sites are at memorizing the
>answers (i.e., tuning). But as I understand it, there are additional
>topics defined now, isn't that right? In this case, I don't see what
>would be wrong with declaring the old 60 topics as dev test, and the new
>ones as eval test. It's pretty hard to see how training on the old topics
>is somehow cheating on the new ones, even though the corpus is the same.
>It would, at best be a second-order effect.
>
>--Rich
>========================================
------------------------------------------------------------------
Paul van Mulbregt, Dragon Systems Inc., Newton, MA. (617) 965-5200
email: paulvm@dragonsys.com
(246) previous ~ index ~ next
Last updated Mon Jun 12 13:26:40 2000