(063) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu, "'Jaime Carbonell'" <jgc@nl.cs.cmu.edu>
From: "Strzalkowski, Tomek (CRD)" <strzalkowski@crd.ge.com>
Subject: RE: TDT3, deconflating the issues
Date: Mon, 29 Mar 1999 23:09:00 -0500

I enjoyed Jaime's fine story. I don't know if MT "factoring" is good or bad,
but the reality is that for most languages the best we can hope for is a
bilingual dictionary, and some sample (non-parallel) texts. Such resources
are sufficient to build quick-and-dirty translation for translingual IR/TDT
purposes. An MT system could do better or not -- and this is a question
to be answered. I believe btw that TIDES spirit is translingual not MT-all.

ASR "factoring" is different because even a quick-and-dirty ASR is not that
easy to come by (or maybe it is). Whether TDT-2 transcriptions were
optimal or not couldn't have possibly mattered much, as we know by now.
What do we do about languages for which there is no ASR available?

---- Tomek

> ----------
> From: Jaime Carbonell[SMTP:jgc@NL.CS.CMU.EDU]
> Sent: Monday, March 29, 1999 8:56 PM
> To: tdt-distrib@unagi.cis.upenn.edu
> Cc: jgc@NL.CS.CMU.EDU
> Subject: TDT3, deconflating the issues
>
> OK Folks, There is more than one issue on the table, but they sometimes
> get conflated. Let me attemt to clarify:
>
> 1. Judgements for evaluation (whether stories are on topic for each topic) --
> English are being judged by English speaker, Mandarin by Madarin speakers,
> if I understood correctly. End of issue, so long as the understanding of
> what belongs to each topic is the same for the raters in each language.
>
> 2. Required vs Optional conditions. George cleared that up in his last
> email. Again, end of issue.
>
> 3. Factoring vs coupling. There are workload, optimality and a generality
> issues here. Jon and Rich addressed workload minimization (bless their
> souls), and together with Doug recognized the optimility concern. I seem
> to be the only person worrying about generality. Indulge me with a simple
> story, the kind I used to make up for my kids:
>
> Problem: How do we fetch the milk and honey from the other side
> of the river? Answer 1: Presuppose a bridge even if it sways
> in the wind and we occasionally spill some goodies on the way.
> Answer 2: Learn to build a raft or a boat or a quick pontoon.
> Which will be less effort? Pretty clear. Which will be better?
> No way to know a priori. Which leads to developing more general
> technology, the kind you want when you don't know whether there'll
> be a bridge at the next river? Also pretty clear.
>
> Hence, the points are:
>
> - MT does not cover most language pairs, hence "factoring"
> a la Jon and Rich is usually not possible. Since it is
> possible in the instance case, however, we may indeed take
> the shortcut.
>
> - Even when factoring is possible, the shortcut may not
> lead to the optimal solution. This is a matter for research
> to address, and it requires one or more sites to do it both
> ways so as to eliminate confounding factors (such as different
> classifiers or clustering methods).
>
> - We should be clear that TDT need not presuppose MT (even if
> factoring does work relatively well) because building MT
> systems for new language pairs is more onerous than building
> TDT systems, at least given present state of MT technology.
>
> Note that given our collective resource constraints it may indeed make
> sense for sites to opt for saving effort at the cost of generality,
> and dedicate that effort to achieving better performance via improving
> language models, classifiers, clustering methods, etc. Nonetheless,
> I want us all to regonize the consequences of our choices.
>
>
> 4. Task clarification -- sorry if I missed this in earlier communications:
> Are we required/advised to:
>
> - Do the TDT tasks independenly in each language?, or
>
> - Do the tasks jointly? This implies that we may get
> only Mandarin training stories for tracking English and
> vice-versa. It also implies joint clusters of English and
> Madarin stories if these are on the same topic for detection.
>
> Note that this issue is separate from the factoring imbroglio.
>
>
> Cheers,
>
>
> --Jaime (still within reach of my suit-of-armor)
>
(063) previous ~ index ~ next

Last updated Thu May 13 09:28:22 1999