(089) previous ~ index ~ next
To: Jonathan Fiscus <email@example.com>
From: Rich Schwartz <firstname.lastname@example.org>
Subject: Re: TDT2 data/software questions
Date: Wed, 15 Jul 1998 10:32:53 -0400 (EDT)
Jon et. al.,
OK now. Today is Wednesday 7/15. We are having a workshop in two
weeks. From the May workshop until now we have had to use what we had
in hand to do our research. Ideally, the new definitions of data (index
files, what is training and test) and the new scoring software would have
been available at the beginning of this period in May so that we could use
them for our research. But they weren't available and still won't be
until later this week or next week.
So, back in May we converted all of the data back to TDT1 format.
One critical issue is what we train and test on. Unfortunately, there
was no agreement on this either, so We have chosen to perform
two kinds of experiments:
1. on text (merged both newspaper and closed caption) from the May
release of data.
2. To compare the performance of closed-caption and manual transcirpitons
with ASR transcriptions we are performing experiments on the audio
data on the July 8th release of data.
We have developed a large number of different methods that we
compare and combine to get results. We can try to use the new index
files and software, but my guess is that we won't have time to do this
before the workshop. Besides changing our software and formats and trying
to find bugs in the newly released software and index files, it means more
computation and manpower than is available. So even if we ignored the
problem of running things in a different way than we have been doing in
the past two months, I don't think we could finish.
I think it will be more informative in the workshop for people to
describe what they have been doing for the past two months in a somewhat
stable environment than to try to compare lots of partial and broken
results that could not be trusted.
But the new released software and index files are not wasted.
They are here just in time to use for the next two months of research
leading up to the dry run tests.
On Wed, 15 Jul 1998, Jonathan Fiscus wrote:
> Strzalkowski, Tomek (CRD) wrote:
> > Jon,
> > Are we going to get the revised software in time to get any results
> > for the July 30-31 meeting?
> Yes, my goal is to release the software by the end of the week. We'll
> see how it goes.
> > Also, regarding data index files for tracking task: these are nowhere to
> > be found in any of the data releases. Where do I get these? We can of course
> > make them up for now using topic relevance tables and alphanumeric order
> > of files. I thought this would be okay -- there was some discussion on this
> > a while ago.
> The release of the software will include updated index files.
> > Finally, since the default parameters of the tracking task is auto transcribed
> > speech files, then we use only 4 speech sources for tracking, right?
> That's correct, but there will be additional index files for the other
> e.g., newswire text, and manually transcribed speech files.
(089) previous ~ index ~ next
Last updated Wed Sep 9 09:40:52 1998