(080) previous ~ index ~ next
To: hjin@bbn.com (Hubert Jin)
From: Xiaoyi Ma <xma@unagi.cis.upenn.edu>
Subject: Re: new version of bilingual lexicons
Date: Thu, 8 Apr 1999 11:37:25 -0400 (EDT)
> Do you know any other resources that might be helpful in the TDT
> project? (such as some parallel data corpus of Chinese-English
> sentence)
>
> Thanks,
>
> -Hubert
Yes, we do have parallel Chinese-English corpus, they are coming from 3
sources:
Hong Kong Law Code:
-------------------
8M words on English side, total size 78M bytes.
News Released by Hong Kong Government
-------------------------------------
19970701 - present, 21 months
5-6M words on English side, total size 60M bytes.
Hong Kong Hansard:
------------------
199510 - 199901
5-6M words on English side, total size 60M (roughly) bytes.
We have the distribution permission from Hong Kong Goverment for the first
2 sources, the permission for the hansard is still in petition.
I am working on cleanup and alignment. They will coming up one by one, I'll
send everybody email when they are avaliable.
Thanks.
Xiaoyi
(080) previous ~ index ~ next
Last updated Thu May 13 09:28:23 1999