(087) previous ~ index ~ next

To: "'Jonathan Fiscus'" <jonathan.fiscus@nist.gov>
From: "Strzalkowski, Tomek (CRD)" <strzalkowski@exc01crdge.crd.ge.com>
Subject: TDT2 data/software questions
Date: Wed, 15 Jul 1998 09:40:42 -0400

Jon,

Are we going to get the revised software in time to get any results
for the July 30-31 meeting?

Also, regarding data index files for tracking task: these are nowhere to
be found in any of the data releases. Where do I get these? We can of course
make them up for now using topic relevance tables and alphanumeric order
of files. I thought this would be okay -- there was some discussion on this
a while ago.

Finally, since the default parameters of the tracking task is auto transcribed
speech files, then we use only 4 speech sources for tracking, right?

---- Tomek

> ----------
> From: Jonathan Fiscus[SMTP:jonathan.fiscus@nist.gov]
> Sent: Tuesday, July 14, 1998 4:21 PM
> To: James Allan
> Cc: tdt-distrib@unagi.cis.upenn.edu
> Subject: Re: TDT2 data confusion
>
> James, Hubert and all,
>
> Most of the issues that you have raised will be solved by the updated
> release of the scoring software and index files that I'm desparately
> trying to get finished. I've finished the re-write of the segmentation
> and tracking modules and I'm about to begin the detection module.
>
> Please be patient while I conclude work on the modules, but if there
> are any other issues, let us know so they can be included in the
> revision.
>
> Thanks
> Jon
>
>
> James Allan wrote:
> >
> > TDTers,
> >
> > I have a few questions related to the evaluation dry-dry runs. They
> > are in line with Hubert's questions about incompatibilities between
> > corpora and evaluation software. I suspect the burden of answering
> > these falls mainly on NIST and on George, though it's possible some
> > issues may require deeper discussion.....
> >
> > First, we have the following bits of stuff, distributed on the dates
> > indicated:
> >
> > 4/7 Jan-Feb
> > 4/29 TDT2 eval V0.1 software
> > 4/30 Jan-Feb, including dry-dry-run
> > 5/22 Jan-Feb
> > 7/8 Mar-Apr
> >
> > The 4/29 evaluation software works fine with the 4/30 distribution.
> > As Hubert mentioned, it does NOT work with the 5/22 data's format.
> > Presumably that means it will not work with the Devset (7/8), either.
> > Will an updated version of the evaluation software be coming out soon?
> >
> > Second, we have some confusion about what exactly we should be using
> > for the "audio - manual transcription" runs. We have FDCH and CCAP
> > transcripts; should we use one, the other, some combination, or should
> > it somehow be specified in our run? Or is FDCH considered "newswire"?
> > Note that this information is not included anywhere in the output
> > files, either.
> >
> > Third, assuming that we're to use either the FDCH or CCAP (and not a
> > combination), how is it specified? The sample index files (from 4/30)
> > do not contain any indication about what type of source should be
> > used. Since in some cases there is not a complete overlap between
> > types, it's not clear what we should do. Specifically, for the ABC
> > sources in the 5/22 Jan-Feb data, the following files exist as
> > closed-caption files, but are missing EITHER the FDCH or the ASR (but
> > not both):
> >
> > 19980104_1830_1900_ABC_WNT FDCH missing
> > 19980106_1830_1900_ABC_WNT ASR missing
> > 19980111_1830_1900_ABC_WNT FDCH missing
> > 19980120_1830_1900_ABC_WNT ASR missing
> > 19980125_1830_1900_ABC_WNT FDCH missing
> > 19980217_1830_1900_ABC_WNT ASR missing
> >
> > A variant of this problem occurs in the Mar-Apr data, according to the
> > README file that Dave Graff sent out on the 8th. If we're supposed to
> > be doing the FDCH version of stuff, what do we do in the three cases
> > it's missing? Similarly, what about the ASR data?
> >
> > Thanks.
> > -- james
>
> --
> Jon Fiscus
> NIST
> Email: jfiscus@nist.gov
> Phone: (301) 975-3182
>
(087) previous ~ index ~ next

Last updated Wed Sep 9 09:40:52 1998