(131) previous ~ index ~ next
To: tdt-distrib@unagi.cis.upenn.edu
From: James Allan <allan@cs.umass.edu>
Subject: Re: index files for devtest
Date: Wed, 02 Sep 1998 21:50:36 -0400
TDTers,
I think that Mike's concerns about the makeup of the corpus are valid.
I looked at the ASR output where CNN_HDL includes both asr and tkn
files.
The only tkn file included is:
tkntext/19980424_1600_1630_CNN_HDL.tkn
When I look at the set of tables in the data itself, I see:
19980424_2130_2200_CNN_HDL.bndasr
19980424_2130_2200_CNN_HDL.bndtkn
19980424_1600_1630_CNN_HDL.bndtkn
So the 2130-2200 file has both asr and text, but the 1600-1630 has
only text. Does that mean that the index file is "falling back" to
text when there is no asr data?
I haven't looked at any of the other questionable cases, but perhaps
that's the explanation. If so, does that give you a workaround, Mike?
That is, use the tkn file when there isn't an appropriate asr file?
-- james
(131) previous ~ index ~ next
Last updated Wed Sep 9 09:40:55 1998