(040) previous ~ index ~ next
To: tdt-distrib@ldc.upenn.edu
From: Paul van Mulbregt <paulvm@dragonsys.com>
Subject: Contrasts involving Segmentation
Date: Mon, 08 Mar 1999 12:59:01 -0500
3 points on contrasts involving automatically generated segmentations.
1. At the end of the PI meeting on Tuesday, it seemed to be decided that
the Tracking with No Boundaries would be done on just the ASR data.
This so that we can first measure how much is lost by using automatically
generated story boundaries, rather than confounding it with the newswire
text (which reduces the reported loss by averaging, and potentially reduces
the loss by allowing recovery through adaptation.)
I presume this means that the contrast test will require that there
are two runs that need to be be done:
1. Test on ASR data alone, with known boundaries.
2. Test on ASR data alone, with automatically generated boundaries.
In both cases the training data would be the same as the training data
for the ASR+NWT, namely Nt stories from ASR+NWT, with known boundaries.
(I'm expecting that the performance on ASR with known boundaries will be
similar to ASR+NWT
with known boundaries, given some initial graphs and comments I've seen and
heard. And a site
which doesn't use any online adaptation could presumably just pull the ASR
results
out of their ASR+NWT run.)
Is this a fair assessment of the situation?
2. The same for detection? Namely one run on ASR with known boundaries
compared
to one run on ASR with automatically generated boundaries.
3. For the segmentation contrast involving transcripts (FCDH vs CCAP),
I'd like to suggest that this be done on a subset of broadcasts for which
FDCH and CCAP are both available. In the Dec 1998 year eval, 2 CCAP shows
were substituted for missing FDCH transcripts, and I'm not convinced that it
was necessary. In fact, this just complicates the run, since CCAP has
different
parameters than FDCH or ASR, and complicates the reporting, since there
were two
ABC lines in the NIST scoring output. So the 2 shows were in fact ignored
for all
results resporting, and hence not needed.
Paul
------------------------------------------------------------------
Paul van Mulbregt, Dragon Systems Inc., Newton, MA. (617) 965-5200
email: paulvm@dragonsys.com
(040) previous ~ index ~ next
Last updated Thu May 13 09:28:18 1999