(056) previous ~ index ~ next
To: George Doddington <doddington@nist.gov>
From: Doug Oard <oard@glue.umd.edu>
Subject: Re: TDT3 dry run
Date: Sat, 27 Mar 1999 14:41:16 -0500 (EST)
Thanks George - I left out the key caveat: all other things being equal.
I certainly agree that we should look for ways of leveraging more
processing power as it becomes available, and that we can start looking
for those opportunities now. I've tried massive document translation for
CLIR and found that - at least for my translation system - it was no more
effective than query translation. What I have read about a recent set of
experiments at two other sites tends to confirm that with another
translation approach. From this I conclude that there is not yet much
evidence that massive translation can improve effectiveness (caveat: in
subject-based searching), and this led me to observe that even in the long
run the extra processing necessary to do it might be wasted. That said,
we are looking at yet another document-translation scheme here for TDT-3
:-). It's not a dead issue, we just haven't shown any benefit yet and we
have tried.
I actually was thinking about the stream case when I referred to high
volume - my initial experiments on this were for a filtering application.
As long as translation is an order of magnitude slower than tokenization
(at present this difference is closer to three orders of magnitude in
English), document-language searches will always have an advantage over
massive traslation approaches from an efficiency perspective. And with
transboarder information flows growing rapidly, we may be playing catchup
for a very long time before we get to the point where we have cycles to
waste.
As an aside, massive advance translation is presently a win when
presenting documents to users because they don't have to wait for the
translation before trying to assess their utility. But as machines
gets faster this effect is likely to diminish since the speed threshold we
need to hit (100 ms translations?) is references to human perception and
not to other computational processes. This is somewhat outside the scope
of the present focus of TDT, of course, but it a point worth considering
when designing systems.
My actual goal is to infect the TDT community with a desire to look at the
CLIR literature as they start thinking about translingual TDT. I realize
that many good ideas come from clean-sheet-of-paper thinking, but on
balance I come down in favor of building on related work rather than
reinventing the wheel.
Doug
On Sat, 27 Mar 1999, George Doddington wrote:
> > From an efficiency standpoint, [cross-language document retreival is
> > MUCH fast than MT]. And that's a big deal in real applications --
> > you want to train your topic tracker to process in the language
> > of the documents if you need to do high volume, and then only
> > translate the documents that are detected.
>
> Since you've made this assertion several times, I feel obliged to
> respond, lest your comments infect the entire TDT community. You are
> of course correct, but I would like to temper your perspective with
> a couple of observations about the technology and the application:
>
> * Progress has inevitable come from the existence of ever greater
> amounts of processing power and memory, coupled with development
> of techniques that exploit this power. To restrict research to
> those ideas which are economical of this power is to ensure failure.
>
> * TDT applications are generally of the type that process streams of
> data in real time. This is unlike the typical document retrieval
> applications which involve continually searching large and static
> or accreting databases. Thus computer power is not as critically
> important for TDT as it is for document retrieval, because the
> volume of processing is limited by the source data rate.
> --
> George Doddington in McLean, VA: doddington@nist.gov or 703/556-3434
>
(056) previous ~ index ~ next
Last updated Thu May 13 09:28:21 1999