(252) previous ~ index ~ next

To: Thomas C Pierce <tp26+@andrew.cmu.edu>
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: oopsie!
Date: Thu, 03 Dec 1998 09:44:45 -0500

Folks,

I've verified that this problem exists in a number of the evaluation
tracking index files, AND I've also verified that this occurs in the
devtest index files as well so apparently the problem has been around
for a while, but no one detected it.

I'm looking deeper in this problem and I'll release new index files.

Jon

Thomas C Pierce wrote:
>
> hi folks,
>
> i have a heads-up for everyone. i'm sending this out 'en masse' so
> people can (if they want) do some quick fixes locally, while we await a
> standard fix.
>
> i found some problems with the index files circulated for the test set.
> for one, the files contain many duplicate lines. for example:
>
> % grep CNN19980517.1000.0921 trk_nwt+asr_98.ndx
> # Non_topic_training_story CNN19980517.1000.0921
> asrtext/19980517_1000_1030_CNN_HDL.asr 2269 2299
> # Non_topic_training_story CNN19980517.1000.0921
> asrtext/19980517_1000_1030_CNN_HDL.asr 2269 2299
>
> the other problem i found is a bit more serious. for event 88 all "on
> topic" training stories are also listed as off-topic training stories.
>
> % grep APW19980506.1942 trk_nwt+asr_88.ndx
> # Topic_training_story APW19980506.1942
> text/19980506_2159_2347_APW_ENG.tkn 4638 5101
> # Non_topic_training_story APW19980506.1942
> text/19980506_2159_2347_APW_ENG.tkn 4638 5101
>
> hopefully, it can be verified that this didn't happen anywhere else, and
> that "BRIEFS" were not similarly included as off-topic training material.
>
> -tom

--
Jonathan Fiscus			    Snailmail: 	Nat'l Inst. of Stds. and Tech.
NIST						100 Bureau Dr. Stop 8940
Phone: (301) 975-3182				Gaithersburg, MD 20899-8940

Email: jfiscus@nist.gov
(252) previous ~ index ~ next

Last updated Fri Dec 4 12:05:50 1998