(205) previous ~ index ~ next

To: James Allan <allan@cs.umass.edu>
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: Request for clarification regarding TDT rules.
Date: Mon, 26 Oct 1998 07:52:13 -0500

Padmini and James,

Thank you James for responding! I agreed with your explaination.


Jon

James Allan wrote:
>
> Padmini (et al),
>
> First note that, technically, the allowable training examples--both
> positive and negative--are specified in the index file for that topic
> number. Many of us are kind of side-stepping the index file, though,
> and in that case questions like yours become meaningful. (They also
> help understand how the index file is constructed.)
>
> > Consider the situation where we opt to use Nt (the number of positive topic
> > examples) to be 6 for topic tracking. Then the TDT description says that we
> > must use the last 6 positive examples only.
> > That is from positive example 11 through 16 (which is the maximum positive
> > examples given). Let us assume that the 11th positive example is item number
> > 1000.
> >
> > 1) can we use the negative examples in items 1 through 999 during training?
>
> Yes. In fact, you can use the negative examples that are between the
> 11th and 16th positive training examples, too. You just cannot use
> the 1st through 10th positive training examples. You must treat the
> training data as if those 10 stories had never existed. (They will
> not be listed in the index file.)
>
> > 2) can we extract statistics such as IDF from all items in 1 through 999
> > during training?
>
> Pretty much. Same rules as above: you cannot use the 1st through 10th
> positive training stories in any way at all, so they cannot
> participate in the idf calculations. Otherwise, you can extract all
> the stats you want from the training set. And, again, you can even
> get stats from the stories running from the 11th to the 16th positive
> training stories.
>
> -- james

--
Jon Fiscus
NIST
Email: jfiscus@nist.gov
Phone: (301) 975-3182
(205) previous ~ index ~ next

Last updated Wed Oct 28 14:44:12 1998