(025) previous ~ index ~ next

To: Jon_Yamron@dragonsys.com
From: George Doddington <doddington@msn.com>
Subject: Re: TDT3 -- are topic boundaries arbitrary
Date: Thu, 11 Feb 1999 16:54:54 -0800

> I think we should definitely consider supplying SOME off-topic material (on the
> order of Nt stories), or else change our expectations of performance. In the
> TDT2 dev-test data, for example, there were some decisions made in the topic
> selection that I can't see any system reproducing reliably if only on-topic
> material is included.
> .
> .
> .
>
> I claim that different human annotators would not necessarily reach the same
> conclusions about what to include and not to include in these cases if they
> only looked at a small number of on-topic examples.

You bring up a good point. Several good points, actually:

* Can systems realistically be expected to make reasonable
decisions about topics and topic boundaries?

While you bring up some really tough examples that, I agree,
would be impossible for current technology to solve, I think
that the technology has generally demonstrated that the answer
is "yes". Current technology is up to the challenge, at least
in an overall "rms" sense. (Or should I say in an overall
cost function sense?)
--

* Is the topic concept, as we have defined it, reasonable?
I.e., is it reasonable to expect human annotators to produce
topic labels that are consistent among annotators without
collaboration.

There are two parts to this: detecting topics, and tracking
topics. I'm fairly comfortable with the tracking part, given
topic definitions. (The meta-definition of a topic as "a
seminal event or activity, along with all directly related
events and activities", along with the rules of interpretation
for 11 generic types of topics given in the instructions to
the annotators, seems to be working.) What makes me somewhat
uncomfortable is the detection part. We really have no test
of how consistent we (humans) are in identifying topics. And
the results on topic detection so far have not been compelling.
I think that it would be interesting to take two seasoned LDC
topic definers, give them a small set of seed stories (that
have been selected by a third person as "good" seed stories),
and see how similar their corresponding topics (events) are.
--------

In any case, using stories certified as being off-topic isn't
in the cards. I also agree with Schultz that a small number
of off-topic stories aren't going to provide the data needed
to make the fine distinctions that you have discussed.
--
George Doddington at NIST: doddington@nist.gov or 301/975-3261
(025) previous ~ index ~ next

Last updated Thu May 13 09:28:14 1999