(134) previous ~ index ~ next
To: James Allan <allan@cs.umass.edu>
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: index files for devtest
Date: Thu, 03 Sep 1998 11:01:50 -0400
James,
James Allan wrote:
>
> TDTers,
>
> I think that Mike's concerns about the makeup of the corpus are valid.
> I looked at the ASR output where CNN_HDL includes both asr and tkn
> files.
>
> The only tkn file included is:
>
> tkntext/19980424_1600_1630_CNN_HDL.tkn
>
> When I look at the set of tables in the data itself, I see:
>
> 19980424_2130_2200_CNN_HDL.bndasr
> 19980424_2130_2200_CNN_HDL.bndtkn
>
> 19980424_1600_1630_CNN_HDL.bndtkn
>
> So the 2130-2200 file has both asr and text, but the 1600-1630 has
> only text. Does that mean that the index file is "falling back" to
> text when there is no asr data?
Yes, the index files fall back to text data if there's no ASR data.
> I haven't looked at any of the other questionable cases, but perhaps
> that's the explanation. If so, does that give you a workaround, Mike?
> That is, use the tkn file when there isn't an appropriate asr file?
Jon
--
Jon Fiscus
NIST
Email: jfiscus@nist.gov
Phone: (301) 975-3182
(134) previous ~ index ~ next
Last updated Wed Sep 9 09:40:55 1998