(084) previous ~ index ~ next
To: James Allan <allan@cs.umass.edu>
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: TDT2 data confusion
Date: Tue, 14 Jul 1998 16:21:25 -0400
James, Hubert and all,
Most of the issues that you have raised will be solved by the updated
release of the scoring software and index files that I'm desparately
trying to get finished. I've finished the re-write of the segmentation
and tracking modules and I'm about to begin the detection module.
Please be patient while I conclude work on the modules, but if there
are any other issues, let us know so they can be included in the
revision.
Thanks
Jon
James Allan wrote:
>
> TDTers,
>
> I have a few questions related to the evaluation dry-dry runs. They
> are in line with Hubert's questions about incompatibilities between
> corpora and evaluation software. I suspect the burden of answering
> these falls mainly on NIST and on George, though it's possible some
> issues may require deeper discussion.....
>
> First, we have the following bits of stuff, distributed on the dates
> indicated:
>
> 4/7 Jan-Feb
> 4/29 TDT2 eval V0.1 software
> 4/30 Jan-Feb, including dry-dry-run
> 5/22 Jan-Feb
> 7/8 Mar-Apr
>
> The 4/29 evaluation software works fine with the 4/30 distribution.
> As Hubert mentioned, it does NOT work with the 5/22 data's format.
> Presumably that means it will not work with the Devset (7/8), either.
> Will an updated version of the evaluation software be coming out soon?
>
> Second, we have some confusion about what exactly we should be using
> for the "audio - manual transcription" runs. We have FDCH and CCAP
> transcripts; should we use one, the other, some combination, or should
> it somehow be specified in our run? Or is FDCH considered "newswire"?
> Note that this information is not included anywhere in the output
> files, either.
>
> Third, assuming that we're to use either the FDCH or CCAP (and not a
> combination), how is it specified? The sample index files (from 4/30)
> do not contain any indication about what type of source should be
> used. Since in some cases there is not a complete overlap between
> types, it's not clear what we should do. Specifically, for the ABC
> sources in the 5/22 Jan-Feb data, the following files exist as
> closed-caption files, but are missing EITHER the FDCH or the ASR (but
> not both):
>
> 19980104_1830_1900_ABC_WNT FDCH missing
> 19980106_1830_1900_ABC_WNT ASR missing
> 19980111_1830_1900_ABC_WNT FDCH missing
> 19980120_1830_1900_ABC_WNT ASR missing
> 19980125_1830_1900_ABC_WNT FDCH missing
> 19980217_1830_1900_ABC_WNT ASR missing
>
> A variant of this problem occurs in the Mar-Apr data, according to the
> README file that Dave Graff sent out on the 8th. If we're supposed to
> be doing the FDCH version of stuff, what do we do in the three cases
> it's missing? Similarly, what about the ASR data?
>
> Thanks.
> -- james
--
Jon Fiscus
NIST
Email: jfiscus@nist.gov
Phone: (301) 975-3182
(084) previous ~ index ~ next
Last updated Wed Sep 9 09:40:52 1998