(268) previous ~ index ~ next

To: TDT distribution <tdt-distrib@ldc.upenn.edu>
From: Jon_Yamron@dragonsys.com
Subject: Re: Proposed change to the topic tracking task
Date: Wed, 9 Aug 2000 15:16:08 -0400

I would like to know as soon as possible how many tracking runs will be
required in the evaluation. Before the meeting, it was 120 if one did the
Nt=1 condition, and 240 if one did the Nt=4 condition (since this entailed
doing the No=2 and No=0 conditions). After the meeting, if I understood
Charles correctly, it was 120 for the Nt=1 condition (which is now
required), plus 240 if one does the Nt=4 condition. If you are suggesting
that the minimum may now be 1500, or that the cost of doing the Nt=4
condition is an additional 1500 runs, I would like to know right away---our
tracker is not so fast that this is a trivial amount of resources, and it
may, in fact, eliminate certain strategies from consideration.

An additional (obvious, but unstated) note: for the comparison of using/not
using off-topic material to be valid, any run for Nt=x/No=2 must have a
corresponding run for Nt=x/No=0, where x is the same in both cases.

- Jon

P.S. Where do you get 1500? I thought it was 120 topics x (2^4 - 1 for
combinatorics) x 2 conditions = 3600, plus 120 for the Nt=1 eval.





George Doddington <doddington@nist.gov> on 08/09/2000 02:04:27 PM

To: TDT distribution <tdt-distrib@ldc.upenn.edu>
cc:
Subject: Proposed change to the topic tracking task


Unless there is a coherent argument to the contrary, the topic
tracking task will be changed to demote Nt, the # of on-topic
training stories, from the status of "parameter" to the status
of "variable". In other words, the number of on-topic training
stories will be a variable function of topic, constrained only
to assume a value of 1, 2, 3 or 4. This will simplify the task
definition, conceptually at least, and reduce the number of
different sets of parameters. (Topic tracking performance as a
function of the value of Nt will still be tabulated and analyzed,
of course.) Note that this change will make the topic tracking
task more difficult, because of the need to normalize the score
across different values of Nt. (Performance conditioned on Nt
should NOT change, however.) As a footnote to the motivation
for this change, it has been observed that for the current
alternate condition of Nt = 4, this is actually Nt <= 4, because
not all topics will have 4 or more on-topic stories. Since this
condition is already in place, namely that Nt is a variable
function of topic, and since as it stands this provides a clue
to how many on-topic stories that exist in the test set, it was
thought that a good move would be simply to change the status of
Nt from parameter to variable. This would then simplify the task
definition, obscure the extraneous (and illegal) clue to threshold
setting, and facilitate easy evaluation over multiple values of Nt.
Note that under this new task definition, there may be far more than
120 topics to track -- in the limit up to 1500 for the formal TDT2000
evaluation.
--
George Doddington at NIST: doddington@nist.gov or 301/975-3261




(268) previous ~ index ~ next

Last updated Tue Sep 19 14:30:46 2000