(052) previous ~ index ~ next
To: Rich Schwartz <schwartz@bbn.com>
From: Doug Oard <oard@glue.umd.edu>
Subject: Re: TDT3 dry run
Date: Fri, 26 Mar 1999 20:54:06 -0500 (EST)
I forgot to include a couple of useful resources in my last email that I
had meant to throw in:
Cross-language IR (CLIR) papers and systems:
http://www.clis.umd.edu/dlrg/clir
Chinese (monolingual) IR:
http://trec.nist.gov (TREC-4 and TREC-5, if I recall correctly)
And I'll be happy to provide pointers to other work on request. I
particular, I have a more recent survey that is presently only available
as marked up page proofs (appearing "real soon now" in print) that I could
snail mail to people who are seriously interested in the cross-language
aspect of things and don't mind reading past the editorial glitches.
One more comment below ...
Doug
On Fri, 26 Mar 1999, Rich Schwartz wrote:
> P.S. The reason I was suggesting that people not try to just do a 'better'
> translation, is that this is likely to be a small difference compared to
> doing something more fundamental. This is just like for TDT2, no one
> bothered trying to do a better speech recognition than Dragon, because we
> all knew that a 1 or 2 point difference in word error rate wouldln't
> matter than much. Now if it turns out after a few years of trying, that
> the best thing we can do for this is just to translate, and if it also
> turns out that the loss due to imperfect translation is large, then
> someone might be attempted to just devote effort to better translation.
The key difference in this case is that CLIR is quite fast, while speech
recognition on 600 hours that finishes in a reasonable time requires a lot
of compute horsepower. But in the main I agree with the sentiment - both
require resources and expertise that would serve as barriers to entry to
soem teams. That's the reason I think making the Systran-output task the
required one is the best choice.
(052) previous ~ index ~ next
Last updated Thu May 13 09:28:21 1999