(024) previous ~ index ~ next
To: Jon_Yamron@dragonsys.com
From: J michAel schuLtz <mschultz@unagi.cis.upenn.edu>
Subject: Re: TDT3, a variety of issues
Date: Thu, 11 Feb 1999 18:54:38 -0500 (EST)
I agree with Jon's point that it is impossible to know the breadth or
bounds of a topic without some off-topic material, but I think this points
to a larger problem that won't be solved with some small number of off-topic
stories as they are _now_ supplied. Currently, off-topic stories are not choosen
as to best delineate the bounds of a topic, but are simply those stories in the
test corpus up to the intial on-topic story. So for the James Earl Ray example
none of the provided off-topic stories would pertain to his death, since they
occur after the defining event. Do the users of such systems expect to supply
both on-topic and disambiguating-off-topic material?
Mike
> I think we should definitely consider supplying SOME off-topic material (on the
> order of Nt stories), or else change our expectations of performance. In the
> TDT2 dev-test data, for example, there were some decisions made in the topic
> selection that I can't see any system reproducing reliably if only on-topic
> material is included. For example:
>
> *) On the Tobacco Settlement topic, stories on the national settlement were
> considered on-topic, but stories on the state settlement were considered
> off-topic.
>
> *) On the James Earl Ray topic, stories about how he was trying to get a new
> trial before he died were on-topic, but stories about his death were off-topic.
>
> I claim that different human annotators would not necessarily reach the same
> conclusions about what to include and not to include in these cases if they only
> looked at a small number of on-topic examples. In fact, they
> didn't---consistency was achieved by having different annotators consult with
> each other and share both on-topic and off-topic examples.
>
> If the humans can't do it from on-topic material alone, it is unrealistic (to
> say the least) to expect a machine to do it. This is not to say that by
> including off-topic material we will actually succeed in designing systems that
> can make these subtle distinctions, but at least it's a fair test.
>
> - Jon
>
>
>
>
>
> "Strzalkowski, Tomek (CRD)" <strzalkowski@crd.ge.com> on 02/07/99 11:59:37 AM
>
> To: tdt-distrib@unagi.cis.upenn.edu, "'James Allan'" <allan@cs.umass.edu>
> cc: (bcc: Jon Yamron/Dragon Systems USA)
> Subject: RE: TDT3, a variety of issues
>
>
>
>
>
>
> > ----------
> > From: James Allan[SMTP:allan@cs.umass.edu]
> > Sent: Saturday, February 06, 1999 10:43 AM
> > To: tdt-distrib@unagi.cis.upenn.edu
> > Subject: TDT3, a variety of issues
> >
> > > b. Do Tracking without labeled background stories.
> >
> > I'm not completely clear on what this task means. Presumably I still
> > have my 1-16 on-topic stories that start the tracking. Is the idea
> > that I no longer have vast amounts of KNOWN off-topic stories? What
> > about including a small number (N_t?) of known off-topic stories for
> > contrast? Ideally, ones that are similar in nature. As if to say,
> > "I'm interested in the event discussed here, not the similar event
> > that's discussed in these".
> >
> -------------------
>
> My understanding is that one gets N_t on-topic stories, and that's it.
> Makes it tougher to track, but is also more realistic. James suggestion
> of carefully selected off-topic stories (as opposed to a vast amount
> of "random" off-topic stories) is a good one too. I think this would make
> a slightly different evaluation though (not unrealistic). This is in fact
> similar
> to TREC topic narratives that gave negative clues (and which were
> mostly disregarded). I can see adding negatives as part of interactive
> tuning: first you say: track these, the you realize that you get some
> F/A, so you thow in "but not these".
>
> A separate issue I have is the False Alarm measure that we use.
> Should we report both FA and precision (and have both as part
> of cost function)? For example (taking GE1 run) topics 75 and 98
> get similar (very small) precision (approx 2-4%), but their FA rates
> differ by the factor of 5 (5% vs. 0.9%). The difference is the number
> of stories tracked (ratio 5 to 1).
>
> ---- Tomek
>
>
>
>
>
(024) previous ~ index ~ next
Last updated Thu May 13 09:28:14 1999