(253) previous ~ index ~ next

To: Thomas C Pierce <tp26+@andrew.cmu.edu>,
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: oopsie!
Date: Thu, 03 Dec 1998 09:47:41 -0500

I only verified the first problem, still looking at the second one.


Jonathan Fiscus wrote:
>
> Folks,
>
> I've verified that this problem exists in a number of the evaluation
> tracking index files, AND I've also verified that this occurs in the
> devtest index files as well so apparently the problem has been around
> for a while, but no one detected it.
>
> I'm looking deeper in this problem and I'll release new index files.
>
> Jon
>
> Thomas C Pierce wrote:
> >
> > hi folks,
> >
> > i have a heads-up for everyone. i'm sending this out 'en masse' so
> > people can (if they want) do some quick fixes locally, while we await a
> > standard fix.
> >
> > i found some problems with the index files circulated for the test set.
> > for one, the files contain many duplicate lines. for example:
> >
> > % grep CNN19980517.1000.0921 trk_nwt+asr_98.ndx
> > # Non_topic_training_story CNN19980517.1000.0921
> > asrtext/19980517_1000_1030_CNN_HDL.asr 2269 2299
> > # Non_topic_training_story CNN19980517.1000.0921
> > asrtext/19980517_1000_1030_CNN_HDL.asr 2269 2299
> >
> > the other problem i found is a bit more serious. for event 88 all "on
> > topic" training stories are also listed as off-topic training stories.
> >
> > % grep APW19980506.1942 trk_nwt+asr_88.ndx
> > # Topic_training_story APW19980506.1942
> > text/19980506_2159_2347_APW_ENG.tkn 4638 5101
> > # Non_topic_training_story APW19980506.1942
> > text/19980506_2159_2347_APW_ENG.tkn 4638 5101
> >
> > hopefully, it can be verified that this didn't happen anywhere else, and
> > that "BRIEFS" were not similarly included as off-topic training material.
> >
> > -tom
>
> --
> Jonathan Fiscus Snailmail: Nat'l Inst. of Stds. and Tech.
> NIST 100 Bureau Dr. Stop 8940
> Phone: (301) 975-3182 Gaithersburg, MD 20899-8940
> Email: jfiscus@nist.gov

--
Jonathan Fiscus			    Snailmail: 	Nat'l Inst. of Stds. and Tech.
NIST						100 Bureau Dr. Stop 8940
Phone: (301) 975-3182				Gaithersburg, MD 20899-8940

Email: jfiscus@nist.gov
(253) previous ~ index ~ next

Last updated Fri Dec 4 12:05:50 1998