(021) previous ~ index ~ next
To: tdt-distrib@unagi.cis.upenn.edu
From: Jon_Yamron@dragonsys.com
Subject: RE: TDT3, a variety of issues
Date: Thu, 11 Feb 1999 16:31:25 -0500
I think we should definitely consider supplying SOME off-topic material (on the
order of Nt stories), or else change our expectations of performance. In the
TDT2 dev-test data, for example, there were some decisions made in the topic
selection that I can't see any system reproducing reliably if only on-topic
material is included. For example:
*) On the Tobacco Settlement topic, stories on the national settlement were
considered on-topic, but stories on the state settlement were considered
off-topic.
*) On the James Earl Ray topic, stories about how he was trying to get a new
trial before he died were on-topic, but stories about his death were off-topic.
I claim that different human annotators would not necessarily reach the same
conclusions about what to include and not to include in these cases if they only
looked at a small number of on-topic examples. In fact, they
didn't---consistency was achieved by having different annotators consult with
each other and share both on-topic and off-topic examples.
If the humans can't do it from on-topic material alone, it is unrealistic (to
say the least) to expect a machine to do it. This is not to say that by
including off-topic material we will actually succeed in designing systems that
can make these subtle distinctions, but at least it's a fair test.
- Jon
"Strzalkowski, Tomek (CRD)" <strzalkowski@crd.ge.com> on 02/07/99 11:59:37 AM
To: tdt-distrib@unagi.cis.upenn.edu, "'James Allan'" <allan@cs.umass.edu>
cc: (bcc: Jon Yamron/Dragon Systems USA)
Subject: RE: TDT3, a variety of issues
> ----------
> From: James Allan[SMTP:allan@cs.umass.edu]
> Sent: Saturday, February 06, 1999 10:43 AM
> To: tdt-distrib@unagi.cis.upenn.edu
> Subject: TDT3, a variety of issues
>
> > b. Do Tracking without labeled background stories.
>
> I'm not completely clear on what this task means. Presumably I still
> have my 1-16 on-topic stories that start the tracking. Is the idea
> that I no longer have vast amounts of KNOWN off-topic stories? What
> about including a small number (N_t?) of known off-topic stories for
> contrast? Ideally, ones that are similar in nature. As if to say,
> "I'm interested in the event discussed here, not the similar event
> that's discussed in these".
>
-------------------
My understanding is that one gets N_t on-topic stories, and that's it.
Makes it tougher to track, but is also more realistic. James suggestion
of carefully selected off-topic stories (as opposed to a vast amount
of "random" off-topic stories) is a good one too. I think this would make
a slightly different evaluation though (not unrealistic). This is in fact
similar
to TREC topic narratives that gave negative clues (and which were
mostly disregarded). I can see adding negatives as part of interactive
tuning: first you say: track these, the you realize that you get some
F/A, so you thow in "but not these".
A separate issue I have is the False Alarm measure that we use.
Should we report both FA and precision (and have both as part
of cost function)? For example (taking GE1 run) topics 75 and 98
get similar (very small) precision (approx 2-4%), but their FA rates
differ by the factor of 5 (5% vs. 0.9%). The difference is the number
of stories tracked (ratio 5 to 1).
---- Tomek
(021) previous ~ index ~ next
Last updated Thu May 13 09:28:14 1999