(238) previous ~ index ~ next
To: "Paul van Mulbregt" <paulvm@dragonsys.com>
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: Re: TDT2 ASR February 3rd through 15th
Date: Wed, 11 Nov 1998 10:52:15 EST
Paul,
Thanks for pointing out the problem with the "fileid=" values in the
first lines of some asrtext files, and for the thorough listing of
files that are affected.
Folks,
The problem that Paul noticed involves a "clerical" error on the
initial line of each asrtext file for data between 19980203 and
19980215, inclusive. The following example, from the first file in
this set (19980203_0130_0200_CNN_HDL.asr), shows the nature of the
problem:
<DOCSET type=ASRTEXT fileid=022_19980200>
It should be:
<DOCSET type=ASRTEXT fileid=19980203_0130_0200_CNN_HDL>
A total of 64 asrtext files are affected. The problem does not affect
the corresponding boundary table files (*.bndasr), or any other points
in the corpus that refer to data in this period.
Paul van Mulbregt indicated that having a bad value for the fileid
attribute caused problems when trying to run scoring on these files.
I have prepared a patch to replace them, which you can get now via ftp:
[ftp instructions available on request from graff@ldc.upenn.edu]
The "Veterans' Day" patch file contains only the 64 asrtext files
affected by this bug report.
As with previous patches, I have also prepared a "latest" tar file,
which contains all tables (none of which have changed since the
previous patch of Oct. 28), together with all the astext files whose
content has changed since the Oct. 6 cdrom publication.
The file sizes are:
tdt_patch_981111.tar.gz 4578928 bytes
tdt_tables_latest.tar.gz 6310694 bytes
As always, let me know if you have any questions or problems.
Dave Graff
(238) previous ~ index ~ next
Last updated Fri Dec 4 12:05:49 1998