(045) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: Mark Liberman <myl@unagi.cis.upenn.edu>
Subject: New versions of TDT Mandarin<->English glossaries
Date: Thu, 25 Mar 1999 21:30:58 EST

For anyone who has downloaded the earlier versions:
Under http://www.ldc.upenn.edu/Projects/Chinese you will find
new Mandarin->English and English->Mandarin glossaries.

Xiaoyi Ma has added 2,600 city names, and also added a simple inversion
of the Mandarin->English glossary to the English->Mandarin glossary, and
vice versa. Note that the inversion process may produce odd results in
some cases; our concept has been that a larger word list with some
useless items is better than a smaller list.

There are now 18196 entries in the English-Mandarin direction, and
24298 entries in the Mandarin-English direction.

If someone who believes that their named-entity detector is a good one
will provide us with a frequency-coded list of named entities from TDT
2, we will try to add items from the top of this list to the
glossaries.

You should continue to check the site from time to time, since we'll
continue to upgrade the glossaries as the opportunity arises.

-Mark

(045) previous ~ index ~ next

Last updated Thu May 13 09:28:20 1999