(147) previous ~ index ~ next
To: Jonathan Fiscus <jonathan.fiscus@nist.gov>
From: Jaime Carbonell <jgc@NL.CS.CMU.EDU>
Subject: Re: TDT2 Dev-Test submission instructions
Date: Fri, 4 Sep 98 15:40:35 EDT
Jon et al,
We are in a bit of a quandry as to exactly what to submit for
the "official" evaluation because there are known bugs that have
not been addressed (e.g. the ABC missing stories, the "nyt" vs "NYT"
both reported to you earlier on 27-August, etc.)
If we submit runs according to specification, the evaluation software
bombs out (error message, no results) because of the bugs. This
is not useful.
An alternative is to introduce hacks into our system in order to
make it bug-compatible with the current evaluation sofware. In
other words we could waste considerable time introducing mirror
images of the current bugs in order to try to get the eval sw to
run to completion -- and throw away all this work when the eval software
is finally working right. This is inefficient not only because
it is wasted work in the long run, but it also must be done
by every site. And, it leaves a bad taste.
So, what we'll do at CMU is, barring further developments:
1. Submit runs done correctly (even if the eval sw can't digest them)
2. Try to run our own evaluation for the above, using internal
eval methods + adapting TDT1 eval
3. If time permits, we may introduce workaround hacks to our code
to try to get something half-way meaningful from the TDT-2 eval
software. This may require extending submission by a day or two.
This has been a frustraing time for us. We want the eval software
to help us develop better algorithms and methods -- i.e. to sharpen
and focus the research process. TDT-1 worked well in this respect.
Things have been otherwise for TDT-2. And we do realize it's more
complicated, and that Jon is overworked with multiple projects, and
trying the best he can. And this is massive work for LDC, and so on.
However, it's hard to do experimental research without functional
measuring devices.
--Jaime et CMUers
(147) previous ~ index ~ next
Last updated Wed Sep 9 09:40:56 1998