(244) previous ~ index ~ next

To: David Graff <graff@unagi.cis.upenn.edu>
From: Ron Papka <papka@dandenong.cs.umass.edu>
Subject: Potential TDT2 ASR fileid problem beyond: February 3rd through
Date: Wed, 18 Nov 1998 12:04:44 -0500 (EST)

Dave,

File 19980217_0130_0200_CNN_HDL.asr has the following header in
our data:

<DOCSET type=ASRTEXT fileid=19980218_0130_0200_CNN_HDL>

I'm using this fileid from the .asr file as opposed to the Operating
System's filename.

Can we rely on this fileid field, or should we be relying
on another source ?


BTW, we loaded patch tdt_patch_981111.tar.gz, and did not have
a problem with the range of files specified below.

Ron


On Wed, 11 Nov 1998, David Graff wrote:

>
> Paul,
>
> Thanks for pointing out the problem with the "fileid=" values in the
> first lines of some asrtext files, and for the thorough listing of
> files that are affected.
>
> Folks,
>
> The problem that Paul noticed involves a "clerical" error on the
> initial line of each asrtext file for data between 19980203 and
> 19980215, inclusive. The following example, from the first file in
> this set (19980203_0130_0200_CNN_HDL.asr), shows the nature of the
> problem:
>
> <DOCSET type=ASRTEXT fileid=022_19980200>
>
> It should be:
>
> <DOCSET type=ASRTEXT fileid=19980203_0130_0200_CNN_HDL>
>
> A total of 64 asrtext files are affected. The problem does not affect
> the corresponding boundary table files (*.bndasr), or any other points
> in the corpus that refer to data in this period.
>
> Paul van Mulbregt indicated that having a bad value for the fileid
> attribute caused problems when trying to run scoring on these files.
> I have prepared a patch to replace them, which you can get now via ftp:
>
> [ftp instructions available on request from graff@ldc.upenn.edu]
>
> The "Veterans' Day" patch file contains only the 64 asrtext files
> affected by this bug report.
>
> As with previous patches, I have also prepared a "latest" tar file,
> which contains all tables (none of which have changed since the
> previous patch of Oct. 28), together with all the astext files whose
> content has changed since the Oct. 6 cdrom publication.
>
> The file sizes are:
>
> 	tdt_patch_981111.tar.gz		4578928 bytes
> 	tdt_tables_latest.tar.gz	6310694 bytes

>
> As always, let me know if you have any questions or problems.
>
> Dave Graff
>
>

(244) previous ~ index ~ next

Last updated Fri Dec 4 12:05:49 1998