(027) previous ~ index ~ next

To: "'Rich Schwartz' (E-mail)" <schwartz@bbn.com>
From: George Doddington <doddington@msn.com>
Subject: Re: TDT3 -- are topic boundaries arbitrary
Date: Fri, 12 Feb 1999 16:24:09 -0800

On Friday, February 12, 1999 11:31 AM, Rich Schwartz [SMTP:schwartz@bbn.com] wrote:
>
> Another two cents here:
>
> If two annotators were each given two topic definitions based only
> on a story or two as examples and WITHOUT any decisions about exactly what
> is NOT to be included, I bet they two people will (randomly) make
> different choices about how much to include. When we examine the errors
> in detection, we see (in our system) that most of the errors are accounted
> for by a small number of topics (say 2 or 3) where either the scope of
> topic was different from the official one, but reasonable, or where some
> of the stories that are supposed to be included in the topic were
> actually more than 50% about some other (unlabeled) topic and were
> therefore merged with those other stories.

Given the same topic definition, annotators seem to be reasonably
consistent in their story labeling. LDC has measured this. Perhaps
this would be a good time for Chris Cieri to refresh our memory about
the statistics. I would also be interested in having LDC measure how
consistent topic DEFINITIONS are among annotators (as suggested in my
previous email on this issue). Then there would be no need for you to
bet on the outcome.

I'm quite sympathetic on the issue and viability of topic definition.
This is reason that I've been pushing the "topic link" task (i.e., the
task of determining whether two stories discuss the same topic). Note
that in the topic link task, the topic definition is a free variable.
This thus avoids the problems of dealing with explicit topics and the
assumption of single-topic stories.
--------

> This would not be a problem since everyone faces these same
> problems, except that since most of the error is due to two or three
> topics where you have to make a somewhat random choice, different systems
> might make this decision differently and this might dominate the
> differences among systems.

Yeah. Detailed examples and derived insight would be most welcome.
--------

> It is for these reasons (as well as others), that we suggested the
> other metrics that are a little less sensitive to granularity decisions.

I assume that you are referring to the "YDZ" clustering metric. The
more I think about this, the more I like it for the detection task.
NIST does plan to implement this metric, time permitting.
--
George Doddington in Orinda, CA. doddington@nist.gov 925/631-6628

(027) previous ~ index ~ next

Last updated Thu May 13 09:28:15 1999