(026) previous ~ index ~ next

To: strzalk@schedar.crd.ge.com (Tomek Strzalkowski)
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: Re: Clarification about the latest release
Date: Mon, 20 Apr 1998 17:41:26 EDT

Tomek,

Thanks for the specific example of an apparent misplacement of a
story boundary in the TV broadcast data.

The method of operation we established at the outset for segmenting
TV material was that story boundary marks presented in the closed-
caption signal should be kept as-is as much as possible, taking care
to insert, move or delete the given boundaries only in the cases
where their original placement is egregiously bad.

In the case of radio broadcasts, where transcripts are created by
outside contractors, it is left to the discretion of the professional
transcribers to follow their own common conventions for placing story
boundaries in the text, and the LDC annotators are supposed to simply
identify the time stamps for those boundaries.

The case you cited would, I think, qualify as bad enough to warrant a
correction. We will need to consider how (or whether) this sort of
check on the boundaries can be done across the data that has been
annotated so far, and what sort of schedule should be worked out for
release of improved versions.

Dave Graff


> Date: Mon, 20 Apr 1998 17:10:45 -0400
> From: strzalk@schedar.crd.ge.com (Tomek Strzalkowski)
> Subject: Re: Clarification about the latest release
>
> I started to take a closer look at some of the TDT2 data, and
> discovered some apparent annotation errors. How are story
> boundaries assigned? Are there any extra-textual cues used?
>
> Here is an example where the boundary appears misplaced:
>
> File: 19980106_1600_1630_CNN_HDL.sgm
>
> <DOC>
> <DOCNO> CNN19980106.1600.0000 </DOCNO>
> <DOCTYPE> MISCELLANEOUS TEXT (manually segmented) </DOCTYPE>
> <DATE_TIME> 01/06/1998 16:00:00.00 </DATE_TIME>
> <BODY>
> <TEXT>
> president clinton proposes big changes to medicare that could
> affect millions of early retirees.
> and nasa hopes to head back to the moon tonight.
> from atlanta, this is "cnn headline news."
> i'm chuck roberts.
> good afternoon. <<<<<---- boundary here ???
> he first came to attention as half of the husband-and-wife
> singing team sonny and cher.
> </TEXT>
> </BODY>
> <END_TIME> 01/06/1998 16:00:16.80 </END_TIME>
> </DOC>
> <DOC>
> <DOCNO> CNN19980106.1600.0016 </DOCNO>
> <DOCTYPE> NEWS STORY </DOCTYPE>
> <DATE_TIME> 01/06/1998 16:00:16.80 </DATE_TIME>
> <BODY>
> <TEXT>
> after their divorce, sonny bono went on to become a small-town
> mayor and then a u.s. congressman.
> he died yesterday after apparently skiing into a tree at a resort
> which sits on the california/nevada border.
>
> ...
(026) previous ~ index ~ next

Last updated Wed Sep 9 09:40:47 1998