(022) previous ~ index ~ next
To: "'doddington@nist.gov'" <doddington@nist.gov>
From: "Strzalkowski, Tomek (CRD)" <strzalkowski@crd.ge.com>
Subject: RE: TDT3 -- Precision vs False Alarm (postscript attachment)
Date: Thu, 11 Feb 1999 17:12:22 -0500
George -- thanks for the tech note. I understand your rationale -- of course
we knew that from IR that topics with lots of relevants are "easier". TREC
organizers also used topic "hardness" to show that we are actually making
progress in spite of sinking precision. What I worry about is practicality
of FA measure. The harder the topic (less rich) the lower FA is needed
to make the results practical. Precision tells you whether your result is
usable or not: I have little use for say 1% FA if precision is also 1%.
---- Tomek
> ----------
> From: George Doddington[SMTP:doddington@msn.com]
> Reply To: doddington@nist.gov
> Sent: Tuesday, February 09, 1999 10:53 AM
> To: Strzalkowski, Tomek (CRD)
> Cc: tdt-distrib@unagi.cis.upenn.edu
> Subject: RE: TDT3 -- Precision vs False Alarm (postscript attachment)
>
> <<File: Precision.vs.FalseAlarm.ps>>
> >A separate issue I have is the False Alarm measure that we use.
> >Should we report both FA and precision (and have both as part
> >of cost function)? For example (taking GE1 run) topics 75 and 98
> >get similar (very small) precision (approx 2-4%), but their FA rates
> >differ by the factor of 5 (5% vs. 0.9%). The difference is the number
> >of stories tracked (ratio 5 to 1).
>
> While you have found an example in which two selected topics
> happen to exhibit similar values of precision but different values
> of false alarm, in general false alarm will be a more stable
> indicator of performance than precision. For example, if you
> compute the mean and variance statistics for precision and
> false alarm for ALL 21 topics in the GE1 run, you will find that
> the normalized standard deviation (std dev / mean) of precision
> is significantly greater than that for false alarm.
>
> Furthermore, precision is clearly a function of topic richness,
> while we presume that false alarm is not. For the data taken
> from the GE1 run, topic richness accounts for over half of the
> total variance of precision. I've prepared a technical note that
> explains this in more detail, including a figure that shows the
> effect of richness on precision, taken from the TDT2 tracking
> task and using GE1 results. I'm attaching a postscript copy
> of this note. I hope this helps explain why we have selected
> miss and false alarm rates for evaluation purposes.
> --
> George Doddington in Orinda, CA. doddington@nist.gov 925/631-6628
>
>
>
>
(022) previous ~ index ~ next
Last updated Thu May 13 09:28:14 1999