(184) previous ~ index ~ next
To: tdt-distrib@ldc.upenn.edu
From: Christopher Cieri <ccieri@ldc.upenn.edu>
Subject: First Review of Possible Judgment Errors
Date: Wed, 16 Sep 1998 12:04:53 -0400
TDT Folk,
As promised, here is a review of possible errors suggested in Ralf
Brown's e-mails:
9/8/1998 Errors in devset judgment file
9/8/1998 tagging errors on TDT training set
9/9/1998 more missed labels in devset
9/9/1998 more probable misses on training set
We are indebted to Ralf for providing very detailed and thorough data.
Among those messages, we identified 121 possible misses. Of those 33
were cases of miscellaneous text. 3 were cases where Ralf appeared to
attribute a miss but where an inspection of the released relevance
tables revealed that the stories were in fact labeled for the correct
topic. Of the remaining 85, adjudication revealed that 4 stories were on
multiple topics, 44 should have been labeled "yes" for some topic and 14
should have been labeled "brief" for some topic. In 23 cases,
adjudication confirmed the original judgment.
During this review the following issues arose.
Miscellaneous Text
Over 1/4 of the putative misses where instances of "miscellaneous
text". Of the 27 stories Ralf had identified as possible misses for
topic 48 (Jonesboro Shooting), 24 were cases of "miscellaneous text". We
have reviewed these and confirmed 20. 4 are still under review. Most
cases come from ABC and CNN where mention of the Jonesboro incident
often appeared in the list of upcoming stories at the top of the
broadcast or before commercial break.
Stories on Multiple Topics
In the process of adjudication, we identified 4 stories that are
actually on topic for two different topics. Following the evaluation
spec, these stories were excluded from the totals here.
Topic 1 - Asia crisis
The Asian Economic Crisis was a very difficult topic to define and
predates our current practice of topic definition/explication. As a
result, we see numerous cases of fringe stories (for instance,
discussions about Asian company performances) being treated
inconsistently by annotators.
Topic 44 - National Tobacco Settlement
Ralf identified 27 possible misses. Adjudication revealed that 7
should
have been labeled "yes" and 5 should have been labeled "brief". There
are two sources of confusion:
Stories discussing teen smoking are not on topic unless they also
discuss the construction of national settlement
Stories discussing state settlements are not on topic unless they
also discuss the national settlement. There were many such stories,
particularly after the national settlement proceedings broke down.
Articles that discuss state lawsuits will be responsible for a number of
false alarms.
Nii notes, however, that the Minnesota case (and only the Minnesota
case) is well represented in the corpus with many stories on topic.
Evidence revealed in the Minnesota case is believed to have had an
impact on the national settlement. Stories about the Minnesota case or
the evidence revealed in it are not on topic unless they also discuss
the impact on the national settlement. Since some stories make this
connection explicitly, there is fair overlap between the Minnesota case
and the national settlement. This is a source of confusion both for the
technology and for the human annotators.
Topic 56 - James Earl Ray
Several stories related to the James Earl Ray escaped the notice of
the annotators. All 9 misses came from CNN over a period of one week
and, in fact, 6 of them came from a two day period and a single
annotator.
We believe that these particular misses also escaped notice of the
senior annotators during QC because of a number of factors including a
case-sensitivity flaw in our Recall test, the nature of the seed story
and the poor discriminating power of the words James, Earl and Ray. This
problem was discussed briefly in Cambridge. Several of the suggestions
made then will be used in the next QC pass.
Error Rates
The tables that follow give some estimate of %error uncovered by a
review of Ralf's lists. There are two important points to note at the
outset.
There are a number of instances in which miss rate for a single
topic (42, 53, 63) is high primarily because there are very few stories
on topic and the annotation crew missed some.
Ralf notes that he provided only a partial review of his putative
false alarms -- the annotators putative misses. Therefore, the miss
rates calculated as confirmed misses/(stories labeled topic + confirmed
misses) should not be viewed as a miss rate but rather a type of lower
limit on the possible miss rate. In other words, we know that the
process that created the corpus at the time of the review had a miss
rate of at least 0.9%.
************************************************************
Topic Yes Brief Possib Brief Yes Yes Misses/
Misses Misses Misses (Yes+Yes Misses)
1 656 308 19 2 13 1.9%
2 564 196 2 0 0 0.4%
3 1 0 0 0 0 0.0%
4 16 0 0 0 0 0.0%
5 9 1 0 0 0 0.0%
6 4 3 0 0 0 0.0%
7 15 7 0 0 0 0.0%
8 43 1 0 0 0 0.0%
9 47 3 0 0 0 0.0%
10 7 0 0 0 0 0.0%
11 47 46 6 0 0 0.0%
12 147 19 3 0 1 0.7%
13 526 113 0 0 0 0.0%
14 1 0 0 0 0 0.0%
15 1273 186 0 0 0 0.2%
16 6 0 0 0 0 0.0%
17 21 1 0 0 0 0.0%
18 75 1 0 0 0 0.0%
19 65 4 1 0 0 0.0%
20 34 0 0 0 0 0.0%
21 49 7 3 2 0 0.0%
22 30 0 0 0 0 0.0%
23 93 4 0 0 0 0.0%
24 20 5 0 0 0 0.0%
25 0 0 0 0 0 0.0%
26 69 0 0 0 0 0.0%
27 0 0 0 0 0 0.0%
28 9 0 0 0 0 0.0%
29 6 3 0 0 0 0.0%
30 2 0 0 0 0 0.0%
31 26 4 0 0 0 0.0%
32 56 0 0 0 0 0.0%
33 73 39 0 0 0 0.0%
34 16 1 0 0 0 0.0%
35 3 3 0 0 0 0.0%
36 5 0 0 0 0 0.0%
37 14 2 0 0 0 0.0%
38 57 4 0 0 0 0.0%
39 57 4 3 1 2 3.4%
40 3 0 0 0 0 0.0%
41 23 1 2 0 1 4.2%
42 24 4 6 0 3 11.1%
43 13 1 1 0 0 0.0%
44 147 9 27 5 7 4.5%
45 0 0 0 0 0 0.0%
46 5 0 0 0 0 0.0%
47 24 2 3 0 2 7.7%
48 123 20 27 2 1 1.6%
49 0 0 0 0 0 0.0%
50 11 0 0 0 0 0.0%
51 0 0 0 0 0 0.0%
52 5 0 0 0 0 0.0%
53 4 2 1 0 1 20.0%
54 1 1 0 0 0 0.0%
55 1 0 0 0 0 0.0%
56 36 8 11 1 9 20.0%
57 16 1 1 0 1 5.9%
58 1 0 0 0 0 0.0%
59 1 0 0 0 0 0.0%
60 7 0 0 0 0 0.0%
61 1 3 0 0 0 0.0%
62 2 0 0 0 0 0.0%
63 14 1 2 0 2 12.5%
64 9 2 1 0 0 0.0%
65 44 0 2 0 1 2.2%
66 6 0 0 0 0 0.0%
TOTAL 4663 1020 121 12 49 0.9%
************************************************************
The tables that follow provide the raw output of adjudication used in
the summary table above. Note these abbreviations.
SO - site observed (SO != LO, mis-attribution)
SE - site expected
LO - LDC observed
LA - LDC adjudication (LO != LA, miss)
MISCELLANEOUS TEXT
Story ID SO SE LO LA
ABC19980319.1830.1708: 0: 43B: MISC: MISC:
ABC19980324.1830.0743: 0: 48B: MISC: MISC:
ABC19980326.1830.0000: 0: 48B: MISC: MISC:
ABC19980326.1830.1438: 0: 48B: MISC: MISC:
ABC19980331.1830.0000: 0: 48B: MISC: MISC:
CNN19980123.0130.0000: 0: 2Y: MISC: MISC:
CNN19980312.1130.0000: 0: 19B,42B: MISC: MISC:
CNN19980325.1130.1698: 0: 48B: MISC: MISC:
CNN19980326.1130.1682: 0: 48B: MISC: MISC:
CNN19980326.2130.1679: 0: 48B: MISC: MISC:
CNN19980327.1130.0000: 0: 48B: MISC: MISC:
CNN19980327.1600.1637: 0: 48B: MISC: MISC:
CNN19980328.0130.1690: 0: 48B: MISC: MISC:
CNN19980328.1130.0000: 0: 48B: MISC: MISC:
CNN19980328.1130.0972: 0: 48B: MISC: MISC:
CNN19980328.1130.1690: 0: 48B: MISC: MISC:
CNN19980329.1000.1651: 0: 48B: MISC: MISC:
CNN19980329.1130.0544: 0: 48B: MISC: MISC:
CNN19980329.1130.0974: 0: 48B: MISC: MISC:
CNN19980331.0130.0427: 0: 48B: MISC: MISC:
CNN19980331.2130.1698: 0: 48B: MISC: MISC:
CNN19980401.0130.1694: 0: 48B: MISC: MISC:
CNN19980401.1130.0000: 0: 44B: MISC: MISC:
CNN19980403.1130.1684: 0: 56Y: MISC: MISC:
CNN19980419.1130.0948: 0: 65B: MISC: MISC:
CNN19980422.1130.0000: 0: 41Y: MISC: MISC:
PRI19980227.2000.0000: 0: 21B: MISC: MISC:
PRI19980312.2000.0000: 0: 42B: MISC: MISC:
PRI19980331.2000.2881: 0: 48B: MISC: MISC:
VOA19980408.2300.2647: 0: 48B: MISC: MISC:
VOA19980409.2100.0000: 0: 44B: MISC: MISC:
VOA19980409.2300.3321: 0: 44B: MISC: MISC:
MISATTRIBUTIONS
Story ID SO SE LO LA
ABC19980112.1830.0331: 0: 1Y: 1Y: 1Y:
ABC19980128.1830.0095: 0: 2B: 2B: 2B:
NYT19980315.0160 : 0: 42B: 42B: 42B:
ADJUDICATION AGREES WITH ORIGINAL JUDGEMENT
Story ID SO SE LO LA
APW19980121.0631 : 0: 1B: 0: 0:
CNN19980112.1600.1009: 0: 1Y: 0: 0:
CNN19980123.0130.0312: 2Y: 12Y: 2Y: 2Y:
CNN19980123.0130.0931: 2Y: 12Y: 2Y: 2Y:
CNN19980129.0130.0327: 2B: 11B: 2B: 2B:
CNN19980129.0130.0392: 0: 11B: 0: 0:
CNN19980307.1300.0342: 0: 44Y: 0: 0:
CNN19980307.1600.0077: 0: 44Y: 0: 0:
CNN19980307.1600.0136: 0: 44Y:44B:44B:
CNN19980309.1130.0683: 0: 44B: 0: 0:
CNN19980310.1600.0283: 0: 44Y:44B:44B:
CNN19980331.0130.1560: 0: 48Y: 0: 0:
CNN19980404.1000.0109: 0: 47Y: 0: 0:
CNN19980408.2300.2647: 0: 48B: 0: 0:
CNN19980429.1130.0250: 0: 44Y: 0: 0:
NYT19980413.0419 : 0: 44B: 0: 0:
NYT19980421.0351 : 0: 44B: 0: 0:
NYT19980429.0478 : 0: 64Y: 0: 0:
PRI19980427.2000.0102: 0: 44B: 0: 0:
VOA19980104.2300.1338: 0: 11B: 0: 0:
VOA19980113.2300.0918: 0: 1Y: 0: 0:
VOA19980421.2300.1476: 0: 44B: 0: 0:
VOA19980429.1700.1596: 0: 44Y: 0: 0:
YES MISSES
Story ID
ABC19980308.1830.0687: 0: 53Y: 0: 53Y:
APW19980105.0021 : 2Y: 1Y: 2Y: 1Y:
APW19980105.0549 : 2Y: 1Y: 2Y: 1Y:
APW19980105.0550 : 2Y: 1Y: 2Y: 1Y:
APW19980105.0808 : 2Y: 1Y: 2Y: 1Y:
APW19980105.0810 : 2Y: 1Y: 2Y: 1Y:
APW19980105.1105 : 2Y: 1Y: 2Y: 1Y:
APW19980108.0631 : 0: 1B: 0: 1Y:
APW19980301.0188 : 0: 39Y: 0: 39Y:
APW19980402.1858 : 0: 57Y: 0: 57Y:
APW19980425.0642 : 0: 63Y: 0: 63Y:
CNN19980108.1600.1005: 0: 1B: 0: 1Y:
CNN19980121.1600.0270: 2Y: 12Y: 2Y: 12Y:
CNN19980308.1000.0365: 0: 44Y: 0: 44Y:
CNN19980313.1130.0355: 0: 42Y: 0: 42Y:
CNN19980313.2130.0636: 0: 42Y: 0: 42Y:
CNN19980323.1600.1057: 0: 47Y: 0: 47Y:
CNN19980402.1600.0489: 0: 56Y: 0: 56Y:
CNN19980402.2130.0077: 0: 56Y: 0: 56Y:
CNN19980402.2130.0978: 0: 56Y: 0: 56Y:
CNN19980403.0130.0489: 0: 56Y: 0: 56Y:
CNN19980404.1130.0236: 0: 56Y: 0: 56Y:
CNN19980404.1300.0047: 0: 56Y: 0: 56Y:
CNN19980404.1600.0050: 0: 56Y: 0: 56Y:
CNN19980408.2130.0378: 0: 56Y: 0: 56Y:
CNN19980408.2130.0924: 0: 44Y: 0: 44Y:
CNN19980409.0130.0387: 0: 56Y: 0: 56Y:
CNN19980420.2130.0237: 0: 44Y: 0: 44Y:
CNN19980421.1130.0900: 0: 41Y: 0: 41Y:
CNN19980421.1600.0350: 0: 47Y: 0: 47Y:
CNN19980422.2130.0374: 0: 44Y: 0: 44Y:
NYT19980114.0891 : 0: 1Y: 0: 1Y:
NYT19980115.0002 : 0: 1Y: 0: 1Y:
NYT19980115.0045 : 0: 1Y: 0: 1Y:
NYT19980115.0901 : 0: 1Y: 0: 1Y:
NYT19980118.0099 : 0: 1Y: 0: 1Y:
NYT19980304.0423 : 0: 44Y: 0: 44Y:
NYT19980310.0170 : 0: 44Y: 0: 44Y:
NYT19980312.0354 : 0: 42Y: 0: 42Y:
NYT19980402.0408 : 0: 63Y: 0: 63Y:
NYT19980412.0094 : 0: 44Y: 0: 44Y:
VOA19980302.1600.0499: 0: 39Y: 0: 39Y:
VOA19980326.2300.1347: 0: 48YB: 0: 48Y:
VOA19980430.1800.1485: 0: 65Y: 0: 65Y:
BRIEF MISSES
ABC19980406.1830.0784: 0: 48Y: 0: 48B:
APW19980318.0654 : 0: 39B: 0: 39B:
APW19980327.0768 : 0: 48B: 0: 48B:
CNN19980116.1130.1023: 0: 1B: 0: 1B:
CNN19980120.1600.1032: 0: 1B: 0: 1B:
CNN19980309.1600.0380: 0: 44B: 0: 44B:
CNN19980309.2130.0442: 0: 44B: 0: 44B:
CNN19980421.1130.1417: 0: 44B: 0: 44B:
CNN19980424.1130.0123: 0: 65Y: 0: 65B:
CNN19980427.1600.0030: 0: 44B: 0: 44B:
VOA19980203.2100.2469: 0: 21B: 0: 21B:
VOA19980224.2300.1482: 0: 21Y: 0: 21B:
VOA19980429.1700.1482: 0: 44Y: 0: 44B:
STORIES ON MORE THAN ONE TOPIC
ABC19980430.1830.0216: 0: 44B: 0: 2Y,44B:
VOA19980128.2100.0592: 0: 11B: 0: 2B,11B:
VOA19980129.2100.0036: 15Y: 11B: 15Y:15Y,11B:
VOA19980129.2300.0013: 15Y: 11B: 15Y:15Y,11B:
(184) previous ~ index ~ next
Last updated Fri Oct 2 19:04:21 1998