(161) previous ~ index ~ next
To: tdt-distrib@ldc.upenn.edu
From: Ralf Brown <Ralf_Brown@v.gp.cs.cmu.edu>
Subject: Errors in devset judgement file
Date: Tue, 08 Sep 1998 18:24:53 -0400
This relates to using the trackers for finding misses in the tagging. Last
night, I did just that, looking at the stories for which my tracker said
YES and the official judgements file says NO (it started out as a simple
error analysis, but I kept going...).
The results of checking about 300 stories make me question the validity
of the evaluation results, since a good 20% of the supposed false alarms
from my tracker are in fact missing tags in the judgements file. I'm
appending the results of my checks on the development data (nwt+asr); I
also have a number of erroneous labelings to report for the training data,
which will follow in a separate message.
Ralf
In my opinion, the following stories should be YES for the indicated event:
39:
VOA19980302.1600.0499
42:
NYT19980312.0354
CNN19980313.1130.0355
CNN19980313.2130.0636 (at least BRIEF)
44:
CNN19980307.1600.0077 (since ABC19980429.1830.0653 is)
CNN19980307.1600.0136
CNN19980308.1000.0365
CNN19980310.1600.0283
NYT19980310.0170
CNN19980408.2130.0924
NYT19980412.0094
CNN19980420.2130.0237
CNN19980422.2130.0374
CNN19980429.1130.0250
VOA19980429.1700.1482
VOA19980429.1700.1596
48:
the "unassigned" text after CNN19980331.0130.1560 [there is apparently
no manual transcription of that story]
-- allegations against one of the Joneboro shooters
56:
CNN19980402.1600.0489
CNN19980402.2130.0077
CNN19980402.2130.0978
CNN19980403.0130.0489
CNN19980408.2130.0378
CNN19980409.0130.0387
57:
APW19980402.1858
63:
NYT19980402.0408
APW19980425.0642
64:
NYT19980429.0478 (another panel formed in response to town-hall meets)
65:
VOA19980430.1800.1485
The following stories should be BRIEF:
39:
APW19980318.0654
42:
CNN19980312.1130.0000 (also for Event 19)
PRI19980312.2000.0000
NYT19980315.0160
43:
ABC19980319.1830.1708
44: (only if teen smoking is on topic, as in CNN19980105.2000.0191)
CNN19980309.1130.0683
CNN19980309.2130.0442
NYT19980421.0351 (YES if on-topic, BRIEF if not)
VOA19980421.2300.1476
PRI19980427.2000.0102
44:
CNN19980309.1600.0380
CNN19980401.1130.0000
VOA19980409.2100.0000
NYT19980413.0419 (possibly even YES)
CNN19980421.1130.1417
CNN19980427.1600.0030 (possibly even YES)
ABC19980430.1830.0216
48:
ABC19980324.1830.0743
CNN19980325.1130.1698
CNN19980326.1130.1682
ABC19980326.1830.0000
ABC19980326.1830.1438
CNN19980326.2130.1679
VOA19980326.2300.1347 (possibly even YES)
APW19980327.0768
CNN19980327.1130.0000
CNN19980327.1600.1637
CNN19980328.0130.1690
CNN19980328.1130.0000
CNN19980328.1130.0972
CNN19980328.1130.1690
CNN19980329.1000.1651
CNN19980329.1130.0544
CNN19980329.1130.0974
CNN19980331.0130.0427
ABC19980331.1830.0000
PRI19980331.2000.2881
CNN19980331.2130.1698
CNN19980401.0130.1694
VOA19980408.2300.2647 ??
VOA19980409.2300.3321
56:
CNN19980403.1130.1684 ??
CNN19980404.1130.0236
CNN19980404.1300.0047
CNN19980404.1600.0050
CNN19980424.1130.0123 (possibly even YES)
Possible bad boundaries:
VOA19980316.2100.0500 is a continuation of VOA19980316.2100.0422, and
should thus also be YES for Event 39
the split between ABC19980325.1830.1645 and the following "unassigned"
text occurs in the middle of a sentence, and the "unassigned" text
should be YES for Event 48
Other issues:
ABC19980406.1830.0784 and ABC19980406.1830.0922 should either both be
YES or both be NO for Event48 (they're about gun education and
easy access to guns by children, and the renewed discussion thereof
sparked by the Jonesboro shooting)
Of note:
PRI19980414.2000.2591 and VOA19980415.1700.0773 are about a
*different* group of three Americans being held hostage by
Colombian geurillas since 1993.... Naturally, they popped up
for Event 56.
--- end of report ---
(161) previous ~ index ~ next
Last updated Wed Sep 9 09:40:57 1998