(154) previous ~ index ~ next

To: "G. Bowden Wise" <wisegb@crd.ge.com>
From: Jon Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: MISC Files in Tracking Output
Date: Fri, 04 Sep 1998 20:05:06 -0400

Folks,

George will be sending a more detailed message soon, but i feel compelled to
respond breifly now. The tracking eval software, and the other evaluation
programs as well, expects simply this, a judgement for every word after the
final training story, regardless of any auxillary information that systems
might be privy to because of the corpus' construction such as story type.
In fact, evaluation systems should not be using any information other than
that stated in the eval spec, namely:

the index files
the tokenized text file, (indicated by the index file) and
story boundary information if given (i.e. beginning and ending times or
recids)

Conditioning your output based on the doctype is not in the "style" of
evaluation that the test is trying to duplicate. An operational system
probably won't give you that information, so the evaluation doesn't give
that as side information. We decided at previous meeting that while systems
are expected to 'track' miscellaneous storys, the evaluation software won't
score them.

Jon

G. Bowden Wise wrote:

> Sorry to spawn yet another e-mail to everyone :/
>
> I wrote:
>
> > We chose to ignore MISC files and not do anything with them.
> > So in our initial runs, our trackout files would not have
> > any output for MISC files only the NEWS files. But we found
> > the eval software blowing up saying some files were missing
> > from the trackout files. So, now we still dont track
> > MISC files but add a line to the out file anyway with a
> > NO decision and 0 score.
>
> I take that back, we only add the line if the first doc in the
> source file is of type MISC. Otherwise, we skip all other MISC
> files during tracking.
>
> Can anyone at NIST confirm whether the MISC files need to be
> present in the tracking output files or not?? What does the
> TDT2trk.pl script expect?
>
> G. Bowden Wise wrote:
> >
> > James Allan wrote:
> > > I believe that we are supposed to track everything. You will not be
> > > evaluated on what you do with MISC stories. They will be discounted
> > > by the evaluation software. This point is not stated explicitly (I
> > > don't see it at least), but is implicit in the discussion at the top
> > > of page 2 of the eval plan (v3.7).
> >
> > We chose to ignore MISC files and not do anything with them.
> > So in our initial runs, our trackout files would not have
> > any output for MISC files only the NEWS files. But we found
> > the eval software blowing up saying some files were missing
> > from the trackout files. So, now we still dont track
> > MISC files but add a line to the out file anyway with a
> > NO decision and 0 score.
>
> --
> -------------------------------------------------------------------
> G. Bowden Wise General Electric Company
> wisegb@crd.ge.com Corporate Research and Development
> Phone: 518 387-5175 Dial Comm: 8*833-5175 FAX: 518-387-6845



(154) previous ~ index ~ next

Last updated Wed Sep 9 09:40:57 1998