(153) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: New bug report on TDT2 text data
Date: Tue, 27 Jul 1999 18:15:50 EDT

Folks,

It was just brought to my attention that the ASR data associated with
this broadcast sample is NOT USABLE:

19980528_1600_1630_CNN_HDL

The problem affects both the Dragon (asr) file and the NIST/BBN (as1)
file for this program. It involves absence of about 60% of the
program's content from the asr data, together with a mis-alignment of
time-stamps between the asr data and the reference text.

(The problem stemmed from an initial faulty recording of the
broadcast; the bad recording was used for both asr runs, but a
different recording -- redigitized from video tape -- was used to
generate the reference text and do story segmentation for the
complete broadcast.)

I don't know yet whether we can arrange for new asr data to be
generated from the correct version of the audio file. In the
meantime, it may be sufficient to make sure that the asr/as1 files
for this program are removed from any index files that include them.

There are some on-topic stories in the reference text (tkn) data for
this program, but any use of the associated asr data would yield bad
results.

Dave Graff
(153) previous ~ index ~ next

Last updated Thu Aug 19 16:14:47 1999