(132) previous ~ index ~ next

To: tomasz strzalkowski <strzalk@schedar.crd.ge.com>
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: Example Evaluation Mismatch
Date: Thu, 03 Sep 1998 09:27:55 -0400

Tomek,

There are a number for discrepencies, that you've noted.

Probably, the most worrisome is the you have 7 correct detections, while
NIST has 6. This is the most easily explained and may well be the cause
of the other discrepencies.

If you did a grep for topicid=50, and do a wc, you get 11 lines, but
TDT2trk has 6 correct detections plus the 4 training stories is 10!
Where's the missing one? Here it is,

<ONTOPIC topicid=50 level=YES docno=VOA19980331.2100.2730
fileid=19980331_2100_2200_VOA_WRP>

There's no asrtext, tkntext, or sgml data for this file, so the index
excluded it from the test index. I don't have that set of files from
the LDC, so I can't take a look at it. Have you agumented your version
of the 'tdt_deliv_980708' release with those files? Does your system
run from the index files, or does it read in the database as it exists?

The rest of the numbers in descrepency are higher by you counting. If
those differences match the boundary file for this voa file, then we're
done. Can you verify that?



Jon



tomasz strzalkowski wrote:
>
> TDT'ers,
>
> We have discovered some worrisome discrepancies between the numbers
> returned by TDT evaluation program (latest version) and our own counting.
> Could someone please take a look? The actual results are not too far apart
> but it's still a bit spooky. This is all on TDT-2 March-April asr tracking.
>
> --- Tomek
>
> ----- Begin Included Message -----
>
> >From wisegb@crd.ge.com Tue Sep 1 12:59:02 1998
> Sender: wisegb@crd.ge.com
> Date: Tue, 01 Sep 1998 13:00:00 -0400
> From: "G. Bowden Wise" <wisegb@crd.ge.com>
> Organization: GE Corporate Research & Development
> X-Mailer: Mozilla 4.06 [en] (X11; U; SunOS 5.5.1 sun4m)
> Mime-Version: 1.0
> To: Tomek Strzalkowski <strzalk@thuban.crd.ge.com>,
> Gees Stein <steing@crd.ge.com>
> Subject: Example Evaluation Mismatch
> Content-Transfer-Encoding: 7bit
>
> here is an example discrepancy between our numbers computed
> by the GE tracker and the numbers computed by the tracking
> evaluation program...if we could find out why there
> is a discrepancy (maybe we are counting something different
> than expected) it would be nice to know!
>
> For topic 50 using the topic_relevance.table provided with
> tdt_deliv_980708 we find that there are 11 relevant
> documents for this topic (all of type YES)
>
> Topic 50 has 11 entries; YES=11
>
> For topic 50 using the index
> indexes_devtest_version2/trk_nwt+asr_50.ndx
> we count
> Topic 50 counts: #training=4 #nontopic=8490 #sources=418
>
> When we do tracking we find that there are a total
> of 8541 documents to track within those 418 source files.
>
> And we find the following from our tracking run
>
> correct correct
> detect FA MISS !detect #R #total
> #NR MISS% FA%
> 50 out-5 7 141 0 8393 7 8541
> 8534 0.0000 0.0165
>
> However, when use the evaluation software
> TDT2eval_v0.3/TDT2trk.pl
>
> There are different numbers reported
> Filename Topic Train Test Corr
> Corr Miss F/A Pct. Pct. Ctrack
> Story Story Det. !
> Det. Story Story Miss F/A
> -------- ----- ----- ------ ------
> ------ ------ ------ ------ ------ ------
> out-5/trk_nwt+asr_50.ndx.trackout 50 4 8530 6
> 8384 0 140 0.0000 0.0164 0.0161
> ======== ===== ====== ====== ======
> ====== ====== ====== ====== ====== ======
> Sums 8530 6
> 8384 0 140
> Means 8530 6
> 8384 0 140 0.0000 0.0164 0.0161
>
> Note that there are inconsistent results between our statistics
> and those computed by the evaluation software
>
> TDT2trk GE
> Test Story = 8530 != 8541
> Corr Det. = 6 != 7
> Corr !Det = 8384 != 8393
> FA Story = 140 != 141
>
> Miss Story = 0 == 0 (the only one that is identical)
>
> Although the miss and false alarm rates are very similar, we still would
> like to know
> why our numbers are off.
>
> Loading Topic Boundary tables:
> Loading Index File indexes_devtest_version2/trk_nwt+asr_50.ndx
> .... Topic Boundary tables loaded
> Loading Topic Index
> .... Topic Index loaded
> Performing Tracking scoring on system output file list 'TRACKLIST.50'.
> Reading tracking output file 'out-5/trk_nwt+asr_50.ndx.trackout'.
> ..................................................................................................................................................................................................................................................................................................................................................................................................................................
> Verifying completness of system output
> Complete system output
> -------------------------------------------------------------------------------
> -------------------- TDT Tracking Task Performance Report
> ------------------
>
> Command line:
> /projects/NL5/NL/TDT/TDT2/software/TDT2eval_v0.3/TDT2trk.pl -R
> /projects/NL5/NL/TDT/TDT2/data/tdt_deliv_980708 -I ../flists/FLIST.50
> TRACKLIST.50
> Execution Date: Tue Sep 1 12:31:33 EDT 1998
>
> Story Weighted (Pooled) Tracking: P(Miss) = 0.0000
> P(Fa) = 0.0164
> Ctrack = 0.0161
>
> Topic Weighted Tracking: P(Miss) = 0.0000
> P(Fa) = 0.0164
> Ctrack = 0.0161
>
> Tracking Performance Calculations:
>
> Filename Topic Train Test Corr
> Corr Miss F/A Pct. Pct. Ctrack
> Story Story Det. !
> Det. Story Story Miss F/A
> -------- ----- ----- ------ ------
> ------ ------ ------ ------ ------ ------
> out-5/trk_nwt+asr_50.ndx.trackout 50 4 8530 6
> 8384 0 140 0.0000 0.0164 0.0161
> ======== ===== ====== ====== ======
> ====== ====== ====== ====== ====== ======
> Sums 8530 6
> 8384 0 140
> Means 8530 6
> 8384 0 140 0.0000 0.0164 0.0161
>
> Execution parameters:
>
> LDC TDT Corpus Root Dir: /projects/NL5/NL/TDT/TDT2/data/tdt_deliv_980708
> Index File list: ../flists/FLIST.50
> Index Files: indexes_devtest_version2/trk_nwt+asr_50.ndx
> System Output File List: TRACKLIST.50
> System Output File: out-5/trk_nwt+asr_50.ndx.trackout Name:
> GETopicTracker
> Pointer Type: RECID
>
> Ctrack parameters:
> P(topic) = 0.02
> Cmiss = 1
> Cfa = 1
>
> ----------------- End of TDT Tracking Task Performance Report
> ---------------
> -------------------------------------------------------------------------------
>
> --
> -------------------------------------------------------------------
> G. Bowden Wise General Electric Company
> wisegb@crd.ge.com Corporate Research and Development
> Phone: 518 387-5175 Dial Comm: 8*833-5175 FAX: 518-387-6845
>
> ----- End Included Message -----

--
Jon Fiscus
NIST
Email: jfiscus@nist.gov
Phone: (301) 975-3182
(132) previous ~ index ~ next

Last updated Wed Sep 9 09:40:55 1998