(204) previous ~ index ~ next
To: Padmini Srinivasan <padmini@uma.info-science.uiowa.edu>
From: James Allan <allan@cs.umass.edu>
Subject: Re: Request for clarification regarding TDT rules.
Date: Fri, 23 Oct 1998 15:34:21 -0400
Padmini (et al),
First note that, technically, the allowable training examples--both
positive and negative--are specified in the index file for that topic
number. Many of us are kind of side-stepping the index file, though,
and in that case questions like yours become meaningful. (They also
help understand how the index file is constructed.)
> Consider the situation where we opt to use Nt (the number of positive topic
> examples) to be 6 for topic tracking. Then the TDT description says that we
> must use the last 6 positive examples only.
> That is from positive example 11 through 16 (which is the maximum positive
> examples given). Let us assume that the 11th positive example is item number
> 1000.
>
> 1) can we use the negative examples in items 1 through 999 during training?
Yes. In fact, you can use the negative examples that are between the
11th and 16th positive training examples, too. You just cannot use
the 1st through 10th positive training examples. You must treat the
training data as if those 10 stories had never existed. (They will
not be listed in the index file.)
> 2) can we extract statistics such as IDF from all items in 1 through 999
> during training?
Pretty much. Same rules as above: you cannot use the 1st through 10th
positive training stories in any way at all, so they cannot
participate in the idf calculations. Otherwise, you can extract all
the stats you want from the training set. And, again, you can even
get stats from the stories running from the 11th to the 16th positive
training stories.
-- james
(204) previous ~ index ~ next
Last updated Wed Oct 28 14:44:12 1998