(194) previous ~ index ~ next

To: tdt-distrib@ldc.upenn.edu
From: Jon_Yamron@Dragonsys.com
Subject: BBN ASR
Date: Mon, 27 Sep 1999 14:06:08 -0400

I have encountered the following anomaly in the corpus. The file

as1/19980107_1130_1200_CNN_HDL.as1

contains the sequence of lines

<X Bsec=1602.48 Dur=0.23 Conf=NA>
<W recid=3189 Bsec=1602.71 Dur=0.00 Clust=NA Conf=NA>
<X Bsec=1602.71 Dur=2.89 Conf=NA>

In other words, the transcript contains a line indicating the presence of a word
of zero duration (although no word actually appears at the end of the line, as
required by our format), sandwiched between two short pauses.

I don't know if there are other examples, but it didn't take long for our parser
to find (and break on) this one...

- Jon


(194) previous ~ index ~ next

Last updated Tue Sep 28 10:38:17 1999