(293) previous ~ index ~ next
To: tdt-distrib@ldc.upenn.edu
From: Mark Przybocki <mark.przybocki@nist.gov>
Subject: [Fwd: Problem with initial (pre-adjudication) TDT2000 results]
Date: Mon, 16 Oct 2000 11:24:55 -0400
Folks,
Dave Graff advises us of an apparent error in the pre-adjudicated
TDT2000 results released last week by NIST (see message below). Jon
Fiscus and George Doddington are currently attending ICSLP in
Beijing. We will work to correct this problem, please stay tuned...
-mark
-------- Original Message --------
Subject: Problem with initial (pre-adjudication) TDT2000 results
Date: Mon, 16 Oct 2000 10:54:22 EDT
From: David Graff <graff@unagi.cis.upenn.edu>
To: David Pallett <david.pallett@nist.gov>,Mark Przybocki
<mark.przybocki@nist.gov>
CC: graff@unagi.cis.upenn.edu, George Doddington
<doddington@nist.gov>,clwayne@afterlife.ncsc.mil
Dave P. or Mark P.
I gather that Jon Fiscus is en route to Beijing, but I don't know
where George Doddington is. I sent email to both of them
yesterday, and haven't heard back.
It seems there is a serious problem with the way that Jon ran the
scoring for the TDT2000 evaluation. According to the scores that
he released last Friday, there is an enormous difference in system
performance between last year's 60 topics (30001-30060) and this
year's (new) 55 topics (31001-31060, but not all 60, because 5
topics were left out due to lack of training stories in English
sources).
The difference shows up as very large numbers of "Miss Stories" on
each of the 31000-series topics. The cause appears to be a
mis-use of the topic_relevance table that we sent to Jon for these
topics. Whereas the earlier topic tables had entries that said
either "level=YES" or "level=BRIEF", the table for these new
topics had "YES" and "BRIEF" entries and also a large number of
"level=NO" entries. (This is because the new topics were done
using "partial annotation" rather than "complete annotation" --
only a subset of stories were read for each topic, and the
"tdt2000_topic_rel.partial_annot" table lists not only the
on-topic stories that were found, but also all the off-topic
stories that were read during the annotation process.)
Anyway, it looks like Jon's scoring software made a mistake
because of this difference in the topic table content. He
eliminated all the "BRIEF" entries, but then treated everything
else as "YES" -- the "NO" entries were not eliminated from the
table, but instead were treated the same as "YES" entries. So the
site scores for 31000-series topics are incorrect, because stories
that we explicitely labeled as "off-topic" were treated in the
scoring as if they were "on-topic".
It appears that one site may still be working on the data, and has
not submitted its final results, so Jon has not released the
"answer key". But the scores are out there, and some of the
participants must be scratching their heads about the results on
the new topics. I think some sort of announcement should be made
to the list today to explain the problem.
Dave G.
(293) previous ~ index ~ next
Last updated Mon Oct 16 12:05:13 2000