(040) previous ~ index ~ next

To: "TDT distribution list" <tdt-distrib@ldc.upenn.edu>
From: "George Doddington" <doddington@email.msn.com>
Subject: Summary of TDT2 Meeting at NIST on May 7-8 1998
Date: Mon, 11 May 1998 13:28:49 -0700

Here is my summary of the decisions and action items arising in our meeting
last week. I'm sure this is incomplete, so please review my summary
carefully and inform me of any items that I've forgotten. Thanks.
--------

TDT2 CORPUS

The specification of topic definitions has changed. A topic definition now
has five parts:
· Title. This is a mneumonic handle, especially as a memory aid for the
annotators.
· Identification of the seminal event in terms of its specifying triple –
what/where/when.
· Link to the reference story from whence the topic was derived.
· Reference to the principles of interpretation used to define the topic.
· Explicit topic description and summary.

Annotation schedule – LDC will make and supply a quantitative estimate of
annotation effort and schedule.

ASR transcriptions of the 11 kHz VOA data will be excluded from the TDT2
corpus. This means that there will be no ASR transcriptions of VOA data for
the first two months of the corpus (the training portion).
--------

SEGMENTATION

Newswire text was deleted from the segmentation task.

Data within non-stories will not be considered in the computation of
segmentation errors. This will be achieved by excluding from the error
accumulation all points where the comparison interval lies wholly within
non-stories.

A default set of segmentation task parameters was defined. All sites
participating in a segmentation task evaluation must perform an evaluation
using these parameters. They are:
· source = ASR
· deferral = 10,000 words
--------

TRACKING

Some of the training stories for each target topic will be excluded from use
in training the tracker for that topic. These are namely all of the stories
tagged as "BRIEF" for the target topic and all of the stories tagged as
"YES" for the target topic for the given value of Nt. (There will be 16 -
Nt such stories.)

The specification of tracking output has been changed. Previously, output
was made impulsively, with a score being output for a particular word or
time. Now the convention has been changed, so that the output score applies
to all following words in addition to the indicated word, up to the point
where a subsequent score is output. Going along with this change, the
mapping of system output scores onto stories is made by computing the
average score over the reference story, rather than by taking the maximum
score, as was previously specified. (I think that y’all will be sorry for
this one! No harm done, though, because it can be changed back to maximum
with no effort.)

A default set of tracking task parameters was defined. All sites
participating in a tracking task evaluation must perform an evaluation using
these parameters. They are:
· source = ASR
· story boundaries = given
· number of training stories, Nt, = 4
--------

DETECTION

The deferral periods for the detection task, Nd, were changed from {1, 10,
100} to {0, 10, 100}. Nd is the number of additional source files that may
be processed prior to making decisions on the stories contained in the
current source file.

The specification of detection output has been changed. Previously, output
was made impulsively, with a decision being output for a particular word or
time. Now the convention has been changed, so that the decision applies to
all following words in addition to the indicated word, up to the point where
a subsequent output is made. Going along with this change, the mapping of
output decisions onto stories is made by majority vote over the reference
story.

A default set of detection task parameters was defined. All sites
participating in a detection task evaluation must perform an evaluation
using these parameters. They are:
· source = ASR
· story boundaries = given
· # of source files to defer = 10
--------

EVALUATION

Bottom-line (one-number) evaluation – This will be a detection cost
function, defined as a linear combination of miss and false alarm
probabilities. Absent a principled selection of weights for the error
probabilities, the weights will be selected to approximate equal
contribution of miss and false alarm decisions. (This means that the
weights will be proportional to the frequency of type 1 and type 2 trials,
as estimated from training or development test data.) This same cost
function will also be used to determine the mapping of reference topics onto
system output topics for the detection




(040) previous ~ index ~ next

Last updated Wed Sep 9 09:40:48 1998