(026) previous ~ index ~ next
To: doddington@nist.gov
From: Rich Schwartz <schwartz@bbn.com>
Subject: Re: TDT3 -- are topic boundaries arbitrary
Date: Fri, 12 Feb 1999 14:30:57 -0500 (EST)
Another two cents here:
If two annotators were each given two topic definitions based only
on a story or two as examples and WITHOUT any decisions about exactly what
is NOT to be included, I bet they two people will (randomly) make
different choices about how much to include. When we examine the errors
in detection, we see (in our system) that most of the errors are accounted
for by a small number of topics (say 2 or 3) where either the scope of
topic was different from the official one, but reasonable, or where some
of the stories that are supposed to be included in the topic were
actually more than 50% about some other (unlabeled) topic and were
therefore merged with those other stories.
So the overall performance is pretty good, I suppose, and it would be MUCH
better except for these few arguably impossible cases.
This would not be a problem since everyone faces these same
problems, except that since most of the error is due to two or three
topics where you have to make a somewhat random choice, different systems
might make this decision differently and this might dominate the
differences among systems.
It is for these reasons (as well as others), that we suggested the
other metrics that are a little less sensitive to granularity decisions.
--Rich
=========================================================================
On Thu, 11 Feb 1999, George Doddington wrote:
> Date: Thu, 11 Feb 1999 16:54:54 -0800
> From: George Doddington <doddington@msn.com>
> Reply-To: doddington@nist.gov
> To: Jon_Yamron@dragonsys.com
> Cc: tdt-distrib@unagi.cis.upenn.edu
> Subject: Re: TDT3 -- are topic boundaries arbitrary
>
> > I think we should definitely consider supplying SOME off-topic material (on the
> > order of Nt stories), or else change our expectations of performance. In the
> > TDT2 dev-test data, for example, there were some decisions made in the topic
> > selection that I can't see any system reproducing reliably if only on-topic
> > material is included.
> > .
> > .
> > .
> >
> > I claim that different human annotators would not necessarily reach the same
> > conclusions about what to include and not to include in these cases if they
> > only looked at a small number of on-topic examples.
>
> You bring up a good point. Several good points, actually:
>
> * Can systems realistically be expected to make reasonable
> decisions about topics and topic boundaries?
>
> While you bring up some really tough examples that, I agree,
> would be impossible for current technology to solve, I think
> that the technology has generally demonstrated that the answer
> is "yes". Current technology is up to the challenge, at least
> in an overall "rms" sense. (Or should I say in an overall
> cost function sense?)
> --
>
> * Is the topic concept, as we have defined it, reasonable?
> I.e., is it reasonable to expect human annotators to produce
> topic labels that are consistent among annotators without
> collaboration.
>
> There are two parts to this: detecting topics, and tracking
> topics. I'm fairly comfortable with the tracking part, given
> topic definitions. (The meta-definition of a topic as "a
> seminal event or activity, along with all directly related
> events and activities", along with the rules of interpretation
> for 11 generic types of topics given in the instructions to
> the annotators, seems to be working.) What makes me somewhat
> uncomfortable is the detection part. We really have no test
> of how consistent we (humans) are in identifying topics. And
> the results on topic detection so far have not been compelling.
> I think that it would be interesting to take two seasoned LDC
> topic definers, give them a small set of seed stories (that
> have been selected by a third person as "good" seed stories),
> and see how similar their corresponding topics (events) are.
> --------
>
> In any case, using stories certified as being off-topic isn't
> in the cards. I also agree with Schultz that a small number
> of off-topic stories aren't going to provide the data needed
> to make the fine distinctions that you have discussed.
> --
> George Doddington at NIST: doddington@nist.gov or 301/975-3261
>
(026) previous ~ index ~ next
Last updated Thu May 13 09:28:15 1999