(083) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: James Allan <allan@cs.umass.edu>
Subject: TDT2 data confusion
Date: Tue, 14 Jul 1998 16:11:45 -0400

TDTers,

I have a few questions related to the evaluation dry-dry runs. They
are in line with Hubert's questions about incompatibilities between
corpora and evaluation software. I suspect the burden of answering
these falls mainly on NIST and on George, though it's possible some
issues may require deeper discussion.....

First, we have the following bits of stuff, distributed on the dates
indicated:

4/7 Jan-Feb
4/29 TDT2 eval V0.1 software
4/30 Jan-Feb, including dry-dry-run
5/22 Jan-Feb
7/8 Mar-Apr

The 4/29 evaluation software works fine with the 4/30 distribution.
As Hubert mentioned, it does NOT work with the 5/22 data's format.
Presumably that means it will not work with the Devset (7/8), either.
Will an updated version of the evaluation software be coming out soon?

Second, we have some confusion about what exactly we should be using
for the "audio - manual transcription" runs. We have FDCH and CCAP
transcripts; should we use one, the other, some combination, or should
it somehow be specified in our run? Or is FDCH considered "newswire"?
Note that this information is not included anywhere in the output
files, either.

Third, assuming that we're to use either the FDCH or CCAP (and not a
combination), how is it specified? The sample index files (from 4/30)
do not contain any indication about what type of source should be
used. Since in some cases there is not a complete overlap between
types, it's not clear what we should do. Specifically, for the ABC
sources in the 5/22 Jan-Feb data, the following files exist as
closed-caption files, but are missing EITHER the FDCH or the ASR (but
not both):

19980104_1830_1900_ABC_WNT FDCH missing
19980106_1830_1900_ABC_WNT ASR missing
19980111_1830_1900_ABC_WNT FDCH missing
19980120_1830_1900_ABC_WNT ASR missing
19980125_1830_1900_ABC_WNT FDCH missing
19980217_1830_1900_ABC_WNT ASR missing

A variant of this problem occurs in the Mar-Apr data, according to the
README file that Dave Graff sent out on the 8th. If we're supposed to
be doing the FDCH version of stuff, what do we do in the three cases
it's missing? Similarly, what about the ASR data?

Thanks.
			-- james

(083) previous ~ index ~ next

Last updated Wed Sep 9 09:40:52 1998