(339) previous ~ index ~ next

To: TDT distribution <tdt-distrib@ldc.upenn.edu>
From: Victor Lavrenko <lavrenko@cs.umass.edu>
Subject: First Story Detection -- implications of the new decision rule (fwd)
Date: Sun, 18 Nov 2001 23:41:59 -0500 (EST)

Folks,


I'd like to point out that perhaps we don't really need to change FSD
in the way George proposed:

1. I see no reason why the format of the FSD output needs to change.
It seems that outputting <DOCID YES/NO SCORE> will work just fine,
since every story is viewed individually as a possible alarm. The
stories will just be scored differently from how they used to be,
that's all...

2. An FSD system does not really need to maintain affiliations between
stories. It will help to maintain these affiliations, so the system
does not generate multiple alarms per topic, but it is not required.
For instance a system based on counting the number of novel words in
a story will still be able to detect novelty (albeit poorly). Such a
system does not maintain a notion of a topic.

3. If we switch to detection output format, we will lose the ability to
plot DET curves (DET logs don't have scores). Note that the new task
definition does not actually rule out DET curves, it is still possible
to construct them...

4. Furthermore, introduction of F_skip will require changes to FSD output
format that will make it incompatible with DETECTION output format
(thanks James!)

Let me know what you think...

-- Victor

________________________________________________________
Victor Lavrenko mail: lavrenko@cs.umass.edu
(413) 545-0728 / 259-1655 http: cs.umass.edu/~lavrenko
"If you can fill the unforgiving munite..." [R.Kipling]


---------- Forwarded message ----------
Date: Fri, 16 Nov 2001 10:39:01 -0800
From: George Doddington <doddington@nist.gov>
To: TDT distribution <tdt-distrib@ldc.upenn.edu>
Subject: First Story Detection -- implications of the new decision rule

I hope that most TDTer's now accept that FSD requires topic detection
as an underlying technology. Essentially, FSD merely skims the first
story from each topic (aka event). When we first implemented FSD, we
changed the output format to conform with the evaluation, so that only
the first story of each topic was output. Now, however, since we want
to soften the evaluation to allow credit for detecting on-topic stories
that are "close" to the first story, we are forced to change the format
for FSD output. FSD systems must now output multiple early on-topic
stories, and topic affiliation among these stories must be maintained.
For this reason, FSD output format must revert to the same format that
is used for topic detection. Thus each FSD output record must contain
a topic ID, just as in topic detection. Note, however, that FSD output
need not include ALL on-topic stories. Only those involved in scoring
must be output.
--
George Doddington in Orinda, CA: doddington@nist.gov or 925/250-8346
-------------------------------------------------------------
To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu
with "unsubscribe tdt-distrib" in the body of the message.



-------------------------------------------------------------
To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu
with "unsubscribe tdt-distrib" in the body of the message.
(339) previous ~ index ~ next

Last updated Mon Nov 19 09:14:07 2001