(130) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: James Allan <allan@cs.umass.edu>
Subject: Re: Example Evaluation Mismatch
Date: Wed, 02 Sep 1998 19:39:42 -0400

> James can you double check your initial counts to
> be sure you don't get 10488 total stories with
> 8558 NEWS.

Sigh. Good thing someone took the challenge. Yes, it appears that
during the course of playing around, I accidentally removed the first
test file from my "overall" calculations. That means that:

> Bowden James
> 418 test files 418
> 10488 stories (BOUNDARY's) in those files 10466
> 8558 NEWS stories 8543
> 1720 MISC stories 1714
> 210 UNTRANSCRIBED stories 209

Bowden's numbers are all correct (my first number there was really
417). If you add the corresponding numbers from the first test file
into my numbers, you get Bowden's numbers.

> Deducting those 14 training NEWS stories we have
> 8558 - 14 = 8544 stories. In those
> there are 14 which do not have a Brecid present. This
> means that there are 8544 - 14 - 8530
>
> 8530 NEWS stories in the training set
>
> Which is what NIST counted.

So it appears that NIST is at least in good shape.

> However, when I compute statistics for the tracking
> task for topic 50 I am counting 8541 documents.

Note that there are 11 stories in the test set that lack Brecid's. If
you accidentally count those, you get 8541. Also, if you accidentally
include the 11 training stories in the first test file, you can get
8541. Those are two obvious possibilities.

Thanks, Bowden, for doublechecking my numbers.
			-- james

(130) previous ~ index ~ next

Last updated Wed Sep 9 09:40:55 1998