(130) previous ~ index ~ next

To: Rich Schwartz <schwartz@bbn.com>
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: Language-dependent scoring
Date: Thu, 24 Jun 1999 13:09:41 -0400

Rich Schwartz wrote:
>
> Jon,
>
> In analyzing our results we find it impossible to make sense out
> of the results unless we have separate scores for each of the test
> languages. This is because the proportion of English and Mandarin docs
> are not the same, and furthermore, the proportion of test documents varies
> by language and topic.
>
> Would you be able to add an option to report the results by test
> language? Otherwise it's possible to get a CTRACK score that looks normal
> but not notice that you have returned absolutely no Chinese documents.
>
> I realize you probably can't do this in the next day. This is
> request for whenever you can do it.

Rich,

There's already a way to do this, (although it's not the most efficient
way to do so). The techniques is to run the eval script separately for
each additional subsetting using modified index files.

- First, create a new set of index files (via grep -v) that includes
only the source files you would like to score over. (e.g. for last
year's eval I built separate index files for both newswire and broadcast
news sources.)

- Add the option -S to the command line, (to skip system output files
that do not exist in the index files.)

and let 'er rip......

Like I said, not efficient, but flexible.

Jon

--
Jonathan Fiscus			    Snailmail: 	Nat'l Inst. of Stds. and Tech.
NIST						100 Bureau Dr. Stop 8940
Phone: (301) 975-3182				Gaithersburg, MD 20899-8940

Email: jfiscus@nist.gov
(130) previous ~ index ~ next

Last updated Thu Jun 24 13:12:45 1999