(114) previous ~ index ~ next
To: Jon Yamron <Jon@dragonsys.com>
From: "Charles L. Wayne" <clwayne@afterlife.ncsc.mil>
Subject: Re: More Problems with TDT2 devset data -- follow-up -Reply
Date: Wed, 5 Aug 1998 20:58:00 -0400 (EDT)
This is clearly the right thing to do. Please check those items quickly.
On Wed, 5 Aug 1998, Jon Yamron wrote:
> We should identify the audio disks and see what's going on. I know there were some early
> cases in which shows were not recorded properly. If that is not the case here, we should
> re-recognize the show.
>
> - Jon
>
> >>> David Graff <graff@unagi.cis.upenn.edu> 08/04/98 04:45pm >>>
>
> In response to an earlier report about boundary table records for
> "NEWS" stories that appeared to contain no "recid" information (hence no words), there
> were a total of 34 such cases in the ASR boundary tables of the two releases -- I described
> these in my previous message.
> These appear to be cases where the ASR system simply failed to produce output over the
> duration of the given story. (I haven't checked every case, but this is true for the ones I've
> checked.)
>
> There are also 9 boundary records in the "bndtkn" files -- 8 in the training release, 1 in the
> devtest release -- that have "doctype=NEWS" but lack "recid" information. These 9 cases
> should all have been marked as "MISCELLANEOUS" (or possibly as
> "UNTRANSCRIBED") rather than as "NEWS" -- they really do not have any text content.
> (Three of the cases, from APW, actually do contain weather information, but also contained
> some anomaly that caused them to be mis-parsed by the newswire conditioning filter -- they
> should have been eliminated as rejects.)
>
> Here is the listing of these 9 erroneous entries from the "bndtkn" files -- "doctype" should be
> "MISCELLANEOUS" in all cases:
>
> tdt_deliv_980522/tables/19980104_1600_1630_CNN_HDL.bndtkn:<BOUNDARY
> docno=CNN19980104.1600.1304 doctype=NEWS Bsec=1304.89 Esec=1316.29>
> tdt_deliv_980522/tables/19980106_1600_1630_CNN_HDL.bndtkn:<BOUNDARY
> docno=CNN19980106.1600.1348 doctype=NEWS Bsec=1348.07 Esec=1355.54>
> tdt_deliv_980522/tables/19980115_2130_2200_CNN_HDL.bndtkn:<BOUNDARY
> docno=CNN19980115.2130.1163 doctype=NEWS Bsec=1163.35 Esec=1173.50>
> tdt_deliv_980522/tables/19980118_1743_1920_APW_ENG.bndtkn:<BOUNDARY
> docno=APW19980118.0856 doctype=NEWS>
> tdt_deliv_980522/tables/19980120_2300_2400_VOA_TDY.bndtkn:<BOUNDARY
> docno=VOA19980120.2300.1691 doctype=NEWS Bsec=1691.54 Esec=1786.66>
> tdt_deliv_980522/tables/19980205_1106_1135_APW_ENG.bndtkn:<BOUNDARY
> docno=APW19980205.0916 doctype=NEWS>
> tdt_deliv_980522/tables/19980206_1600_1630_CNN_HDL.bndtkn:<BOUNDARY
> docno=CNN19980206.1600.1401 doctype=NEWS Bsec=1401.82 Esec=1414.26>
> tdt_deliv_980522/tables/19980208_1643_1851_APW_ENG.bndtkn:<BOUNDARY
> docno=APW19980208.0822 doctype=NEWS>
>
> tdt_deliv_980708/tables/19980331_1700_1800_VOA_TDY.bndtkn:<BOUNDARY
> docno=VOA19980331.1700.2954 doctype=NEWS Bsec=2954.97 Esec=2961.88>
>
>
> Dave Graff
>
>
(114) previous ~ index ~ next
Last updated Wed Sep 9 09:40:54 1998