(018) previous ~ index ~ next

To: Rich Schwartz <schwartz@bbn.com>
From: Jon_Yamron@dragonsys.com
Subject: Re: Mandarin Resources and Large, Non-Punctual Topics
Date: Wed, 10 Feb 1999 10:53:56 -0500

I agree with Rich that that big problem is getting hold of some kind of
bilingual concordance. This is made at least a little trickier in Mandarin
because it interacts with the tokenization problem---basically, if we agree on a
bilingual dictionary, we also must agree on a word list. In short, the word
list the LDC uses in the tokenization of the Mandarin text data, the word list
used in the Mandarin recognizer used to process the Mandarin broadcast sources,
and the word list used in the English-Mandarin concordance, had better all be
pretty similar.

- Jon





Rich Schwartz <schwartz@bbn.com> on 02/10/99 09:20:01 AM

To: Christopher Cieri <ccieri@ldc.upenn.edu>
cc: tdt-distrib@ldc.upenn.edu (bcc: Jon Yamron/Dragon Systems USA)
Subject: Re: Mandarin Resources and Large, Non-Punctual Topics






Chris,

On Tue, 9 Feb 1999, Christopher Cieri wrote:

> James asked us to list any Mandarin resources we might contribute to
> TDT-3.

The resource we all need most is a bilingual dictionary, because we
are supposed to find Mandarin and English documents that are about the
same topic. We could each start to work on techniques for estimating
a probabilistic bilingual dictionary from general news. But I'm
assuming that is beyond the scope of this effort. Am I wrong there?
It's certainly an interesting and current topic. I just didn't think
we were going to do that here.

Doug Oard's message is encouraging.

me.



(018) previous ~ index ~ next

Last updated Thu May 13 09:28:14 1999