(254) previous ~ index ~ next

To: Thomas C Pierce <tp26+@andrew.cmu.edu>
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: oopsie!
Date: Thu, 03 Dec 1998 10:05:55 -0500

Folks,

I've verified the second problem exists only for topic 88, and I've also
verified this problem exists in the dev-test index files as well...

I'll fi the problems today, build a new release of index files and make
them available via anonymous ftp later today....

Jon

Thomas C Pierce wrote:
>
> hi folks,
>
> i have a heads-up for everyone. i'm sending this out 'en masse' so
> people can (if they want) do some quick fixes locally, while we await a
> standard fix.
>
> i found some problems with the index files circulated for the test set.
> for one, the files contain many duplicate lines. for example:
>
> % grep CNN19980517.1000.0921 trk_nwt+asr_98.ndx
> # Non_topic_training_story CNN19980517.1000.0921
> asrtext/19980517_1000_1030_CNN_HDL.asr 2269 2299
> # Non_topic_training_story CNN19980517.1000.0921
> asrtext/19980517_1000_1030_CNN_HDL.asr 2269 2299
>
> the other problem i found is a bit more serious. for event 88 all "on
> topic" training stories are also listed as off-topic training stories.
>
> % grep APW19980506.1942 trk_nwt+asr_88.ndx
> # Topic_training_story APW19980506.1942
> text/19980506_2159_2347_APW_ENG.tkn 4638 5101
> # Non_topic_training_story APW19980506.1942
> text/19980506_2159_2347_APW_ENG.tkn 4638 5101
>
> hopefully, it can be verified that this didn't happen anywhere else, and
> that "BRIEFS" were not similarly included as off-topic training material.
>
> -tom

--
Jonathan Fiscus			    Snailmail: 	Nat'l Inst. of Stds. and Tech.
NIST						100 Bureau Dr. Stop 8940
Phone: (301) 975-3182				Gaithersburg, MD 20899-8940

Email: jfiscus@nist.gov
(254) previous ~ index ~ next

Last updated Fri Dec 4 12:05:50 1998