(080) previous ~ index ~ next

To: hjin@bbn.com (Hubert Jin)
From: Xiaoyi Ma <xma@unagi.cis.upenn.edu>
Subject: Re: new version of bilingual lexicons
Date: Thu, 8 Apr 1999 11:37:25 -0400 (EDT)

> Do you know any other resources that might be helpful in the TDT
> project? (such as some parallel data corpus of Chinese-English
> sentence)
>
> Thanks,
>
> -Hubert

Yes, we do have parallel Chinese-English corpus, they are coming from 3
sources:

Hong Kong Law Code:
-------------------
8M words on English side, total size 78M bytes.


News Released by Hong Kong Government
-------------------------------------
19970701 - present, 21 months
5-6M words on English side, total size 60M bytes.


Hong Kong Hansard:
------------------
199510 - 199901
5-6M words on English side, total size 60M (roughly) bytes.


We have the distribution permission from Hong Kong Goverment for the first
2 sources, the permission for the hansard is still in petition.

I am working on cleanup and alignment. They will coming up one by one, I'll
send everybody email when they are avaliable.






Thanks.


Xiaoyi

(080) previous ~ index ~ next

Last updated Thu May 13 09:28:23 1999