(086) previous ~ index ~ next

To: Xiaoyi Ma <xma@unagi.cis.upenn.edu>
From: Hubert Jin <hjin@bbn.com>
Subject: resources for TDT3
Date: Wed, 21 Apr 1999 12:24:20 -0400 (EDT)

Hi Xiaoyi,

Do you know where to get a Chinese-character to pingyin dictionary?

We start to develop some stuff for TDT3, and would like to have
the following data (even in case the whole data set is not ready
yet, a few examples would certainly help debug and test):

(1) some stories in Chinese what would be in the TDT3
[before and after segmentation]
(2) some Chinese-English sentences from the parallel data corpus
[parallel at which level? sentence or paragraph or story?]

Thanks,

-Hubert

On Thu, 8 Apr 1999, Xiaoyi Ma wrote:

> > Do you know any other resources that might be helpful in the TDT
> > project? (such as some parallel data corpus of Chinese-English
> > sentence)
> >
> > Thanks,
> >
> > -Hubert
>
> Yes, we do have parallel Chinese-English corpus, they are coming from 3
> sources:
>
> Hong Kong Law Code:
> -------------------
> 8M words on English side, total size 78M bytes.
>
>
> News Released by Hong Kong Government
> -------------------------------------
> 19970701 - present, 21 months
> 5-6M words on English side, total size 60M bytes.
>
>
> Hong Kong Hansard:
> ------------------
> 199510 - 199901
> 5-6M words on English side, total size 60M (roughly) bytes.
>
>
> We have the distribution permission from Hong Kong Goverment for the first
> 2 sources, the permission for the hansard is still in petition.
>
> I am working on cleanup and alignment. They will coming up one by one, I'll
> send everybody email when they are avaliable.
>
>
>
>
>
>
> Thanks.
>
>
> Xiaoyi
>

(086) previous ~ index ~ next

Last updated Thu May 13 09:28:23 1999