(131) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: James Allan <allan@cs.umass.edu>
Subject: Re: index files for devtest
Date: Wed, 02 Sep 1998 21:50:36 -0400

TDTers,

I think that Mike's concerns about the makeup of the corpus are valid.
I looked at the ASR output where CNN_HDL includes both asr and tkn
files.

The only tkn file included is:

tkntext/19980424_1600_1630_CNN_HDL.tkn

When I look at the set of tables in the data itself, I see:

19980424_2130_2200_CNN_HDL.bndasr
19980424_2130_2200_CNN_HDL.bndtkn

19980424_1600_1630_CNN_HDL.bndtkn

So the 2130-2200 file has both asr and text, but the 1600-1630 has
only text. Does that mean that the index file is "falling back" to
text when there is no asr data?

I haven't looked at any of the other questionable cases, but perhaps
that's the explanation. If so, does that give you a workaround, Mike?
That is, use the tkn file when there isn't an appropriate asr file?
			-- james

(131) previous ~ index ~ next

Last updated Wed Sep 9 09:40:55 1998