(126) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: strzalk@schedar.crd.ge.com (tomasz strzalkowski)
Subject: Example Evaluation Mismatch
Date: Wed, 2 Sep 1998 16:30:43 -0400

TDT'ers,

We have discovered some worrisome discrepancies between the numbers
returned by TDT evaluation program (latest version) and our own counting.
Could someone please take a look? The actual results are not too far apart
but it's still a bit spooky. This is all on TDT-2 March-April asr tracking.

--- Tomek

----- Begin Included Message -----

>From wisegb@crd.ge.com Tue Sep 1 12:59:02 1998
Sender: wisegb@crd.ge.com
Date: Tue, 01 Sep 1998 13:00:00 -0400
From: "G. Bowden Wise" <wisegb@crd.ge.com>
Organization: GE Corporate Research & Development
X-Mailer: Mozilla 4.06 [en] (X11; U; SunOS 5.5.1 sun4m)
Mime-Version: 1.0
To: Tomek Strzalkowski <strzalk@thuban.crd.ge.com>,
Gees Stein <steing@crd.ge.com>
Subject: Example Evaluation Mismatch
Content-Transfer-Encoding: 7bit

here is an example discrepancy between our numbers computed
by the GE tracker and the numbers computed by the tracking
evaluation program...if we could find out why there
is a discrepancy (maybe we are counting something different
than expected) it would be nice to know!

For topic 50 using the topic_relevance.table provided with
tdt_deliv_980708 we find that there are 11 relevant
documents for this topic (all of type YES)

Topic 50 has 11 entries; YES=11

For topic 50 using the index
indexes_devtest_version2/trk_nwt+asr_50.ndx
we count
Topic 50 counts: #training=4 #nontopic=8490 #sources=418

When we do tracking we find that there are a total
of 8541 documents to track within those 418 source files.

And we find the following from our tracking run

correct correct
detect FA MISS !detect #R #total
#NR MISS% FA%
50 out-5 7 141 0 8393 7 8541
8534 0.0000 0.0165

However, when use the evaluation software
TDT2eval_v0.3/TDT2trk.pl

There are different numbers reported
Filename Topic Train Test Corr
Corr Miss F/A Pct. Pct. Ctrack
Story Story Det. !
Det. Story Story Miss F/A
-------- ----- ----- ------ ------
------ ------ ------ ------ ------ ------
out-5/trk_nwt+asr_50.ndx.trackout 50 4 8530 6
8384 0 140 0.0000 0.0164 0.0161
======== ===== ====== ====== ======
====== ====== ====== ====== ====== ======
Sums 8530 6
8384 0 140
Means 8530 6
8384 0 140 0.0000 0.0164 0.0161



Note that there are inconsistent results between our statistics
and those computed by the evaluation software

TDT2trk GE
Test Story = 8530 != 8541
Corr Det. = 6 != 7
Corr !Det = 8384 != 8393
FA Story = 140 != 141

Miss Story = 0 == 0 (the only one that is identical)

Although the miss and false alarm rates are very similar, we still would
like to know
why our numbers are off.


Loading Topic Boundary tables:
Loading Index File indexes_devtest_version2/trk_nwt+asr_50.ndx
.... Topic Boundary tables loaded
Loading Topic Index
.... Topic Index loaded
Performing Tracking scoring on system output file list 'TRACKLIST.50'.
Reading tracking output file 'out-5/trk_nwt+asr_50.ndx.trackout'.
..................................................................................................................................................................................................................................................................................................................................................................................................................................
Verifying completness of system output
Complete system output
-------------------------------------------------------------------------------
-------------------- TDT Tracking Task Performance Report
------------------

Command line:
/projects/NL5/NL/TDT/TDT2/software/TDT2eval_v0.3/TDT2trk.pl -R
/projects/NL5/NL/TDT/TDT2/data/tdt_deliv_980708 -I ../flists/FLIST.50
TRACKLIST.50
Execution Date: Tue Sep 1 12:31:33 EDT 1998

Story Weighted (Pooled) Tracking: P(Miss) = 0.0000
P(Fa) = 0.0164
Ctrack = 0.0161

Topic Weighted Tracking: P(Miss) = 0.0000
P(Fa) = 0.0164
Ctrack = 0.0161

Tracking Performance Calculations:

Filename Topic Train Test Corr
Corr Miss F/A Pct. Pct. Ctrack
Story Story Det. !
Det. Story Story Miss F/A
-------- ----- ----- ------ ------
------ ------ ------ ------ ------ ------
out-5/trk_nwt+asr_50.ndx.trackout 50 4 8530 6
8384 0 140 0.0000 0.0164 0.0161
======== ===== ====== ====== ======
====== ====== ====== ====== ====== ======
Sums 8530 6
8384 0 140
Means 8530 6
8384 0 140 0.0000 0.0164 0.0161

Execution parameters:

LDC TDT Corpus Root Dir: /projects/NL5/NL/TDT/TDT2/data/tdt_deliv_980708
Index File list: ../flists/FLIST.50
Index Files: indexes_devtest_version2/trk_nwt+asr_50.ndx
System Output File List: TRACKLIST.50
System Output File: out-5/trk_nwt+asr_50.ndx.trackout Name:
GETopicTracker
Pointer Type: RECID

Ctrack parameters:
P(topic) = 0.02
Cmiss = 1
Cfa = 1

----------------- End of TDT Tracking Task Performance Report
---------------
-------------------------------------------------------------------------------




--
-------------------------------------------------------------------
G. Bowden Wise General Electric Company
wisegb@crd.ge.com Corporate Research and Development
Phone: 518 387-5175 Dial Comm: 8*833-5175 FAX: 518-387-6845


----- End Included Message -----

(126) previous ~ index ~ next

Last updated Wed Sep 9 09:40:55 1998