(061) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: Jaime Carbonell <jgc@NL.CS.CMU.EDU>
Subject: TDT3, deconflating the issues
Date: Mon, 29 Mar 99 20:56:00 EST

OK Folks, There is more than one issue on the table, but they sometimes
get conflated. Let me attemt to clarify:

1. Judgements for evaluation (whether stories are on topic for each topic) --
English are being judged by English speaker, Mandarin by Madarin speakers,
if I understood correctly. End of issue, so long as the understanding of
what belongs to each topic is the same for the raters in each language.

2. Required vs Optional conditions. George cleared that up in his last
email. Again, end of issue.

3. Factoring vs coupling. There are workload, optimality and a generality
issues here. Jon and Rich addressed workload minimization (bless their
souls), and together with Doug recognized the optimility concern. I seem
to be the only person worrying about generality. Indulge me with a simple
story, the kind I used to make up for my kids:

Problem: How do we fetch the milk and honey from the other side
of the river? Answer 1: Presuppose a bridge even if it sways
in the wind and we occasionally spill some goodies on the way.
Answer 2: Learn to build a raft or a boat or a quick pontoon.
Which will be less effort? Pretty clear. Which will be better?
No way to know a priori. Which leads to developing more general
technology, the kind you want when you don't know whether there'll
be a bridge at the next river? Also pretty clear.

Hence, the points are:

- MT does not cover most language pairs, hence "factoring"
a la Jon and Rich is usually not possible. Since it is
possible in the instance case, however, we may indeed take
the shortcut.

- Even when factoring is possible, the shortcut may not
lead to the optimal solution. This is a matter for research
to address, and it requires one or more sites to do it both
ways so as to eliminate confounding factors (such as different
classifiers or clustering methods).

- We should be clear that TDT need not presuppose MT (even if
factoring does work relatively well) because building MT
systems for new language pairs is more onerous than building
TDT systems, at least given present state of MT technology.

Note that given our collective resource constraints it may indeed make
sense for sites to opt for saving effort at the cost of generality,
and dedicate that effort to achieving better performance via improving
language models, classifiers, clustering methods, etc. Nonetheless,
I want us all to regonize the consequences of our choices.


4. Task clarification -- sorry if I missed this in earlier communications:
Are we required/advised to:

- Do the TDT tasks independenly in each language?, or

- Do the tasks jointly? This implies that we may get
only Mandarin training stories for tracking English and
vice-versa. It also implies joint clusters of English and
Madarin stories if these are on the same topic for detection.

Note that this issue is separate from the factoring imbroglio.


Cheers,


--Jaime (still within reach of my suit-of-armor)

(061) previous ~ index ~ next

Last updated Thu May 13 09:28:21 1999