(025) previous ~ index ~ next
To: tdt-distrib@unagi.cis.upenn.edu, graff@unagi.cis.upenn.edu
From: strzalk@schedar.crd.ge.com (Tomek Strzalkowski)
Subject: Re: Clarification about the latest release
Date: Mon, 20 Apr 1998 17:10:45 -0400
I started to take a closer look at some of the TDT2 data, and discovered
some apparent annotation errors. How are story boundaries assigned? Are
there any extra-textual cues used?
--- Tomek
Here is an example where the boundary appears misplaced:
File: 19980106_1600_1630_CNN_HDL.sgm
<DOC>
<DOCNO> CNN19980106.1600.0000 </DOCNO>
<DOCTYPE> MISCELLANEOUS TEXT (manually segmented) </DOCTYPE>
<DATE_TIME> 01/06/1998 16:00:00.00 </DATE_TIME>
<BODY>
<TEXT>
president clinton proposes big changes to medicare that could affect
millions of early retirees.
and nasa hopes to head back to the moon tonight.
from atlanta, this is "cnn headline news."
i'm chuck roberts.
good afternoon. <<<<<---- boundary here???
he first came to attention as half of the husband-and-wife singing team
sonny and cher.
</TEXT>
</BODY>
<END_TIME> 01/06/1998 16:00:16.80 </END_TIME>
</DOC>
<DOC>
<DOCNO> CNN19980106.1600.0016 </DOCNO>
<DOCTYPE> NEWS STORY </DOCTYPE>
<DATE_TIME> 01/06/1998 16:00:16.80 </DATE_TIME>
<BODY>
<TEXT>
after their divorce, sonny bono went on to become a small-town mayor and
then a u.s. congressman.
he died yesterday after apparently skiing into a tree at a resort which
sits on the california/nevada border.
...
(025) previous ~ index ~ next
Last updated Wed Sep 9 09:40:47 1998