(006) previous ~ index ~ next

To: strzalkowski@crd.ge.com (Strzalkowski, Tomek)
From: J michAel schuLtz <mschultz@unagi.cis.upenn.edu>
Subject: Re: TDT3, a variety of issues
Date: Mon, 8 Feb 1999 14:49:19 -0500 (EST)

> James wrote>
> > > b. Do Tracking without labeled background stories.
> >
> > I'm not completely clear on what this task means. Presumably I still
> > have my 1-16 on-topic stories that start the tracking. Is the idea
> > that I no longer have vast amounts of KNOWN off-topic stories? What
> > about including a small number (N_t?) of known off-topic stories for
> > contrast? Ideally, ones that are similar in nature. As if to say,
> > "I'm interested in the event discussed here, not the similar event
> > that's discussed in these".
> >
> -------------------
Tomek wrote>
>
> My understanding is that one gets N_t on-topic stories, and that's it.
> Makes it tougher to track, but is also more realistic. James suggestion
> of carefully selected off-topic stories (as opposed to a vast amount
> of "random" off-topic stories) is a good one too. I think this would make
> a slightly different evaluation though (not unrealistic). This is in fact similar
> to TREC topic narratives that gave negative clues (and which were
> mostly disregarded). I can see adding negatives as part of interactive
> tuning: first you say: track these, the you realize that you get some
> F/A, so you thow in "but not these".

I agree with suggestion that no off-topic stories should be given for one simple
reason. If our definition of topic (event occuring in a specific place and time
etc.) is to result in any real functional distinction then by definition any story
previous to the first story in the test corpus is off-topic, all retrospective
corpora are off-topic, meaning virtually unlimited off-topic material is available.
If we need negative evidence from the test corpus this weakens the topci definition
and makes the task more like categorization.
(006) previous ~ index ~ next

Last updated Thu May 13 09:28:12 1999