Allowable training data for the separate evaluation conditions. Registered participants should contact the LDC to acquire this data:
|
Small Data
Condition |
|
|
LDC2002E17 |
English Translation of Chinese Treebank |
|
Not Applicable |
The 10k-word dictionary from CMU (S.Vogel) |
|
|
|
|
Large Data Condition |
|
|
LDC catalog # |
Title |
|
LDC2003E14 |
FBIS data |
|
|
|
|
LDC2003E25 |
Hong Kong News Parallel Text, sentence-aligned |
|
Hong Kong News Parallel Text |
|
|
Hong Kong Hansard Parallel Text, aligned at the document level |
|
|
LDC2004E09 |
Hong Kong Hansard Parallel Text, aligned at the sentence level |
|
LDC2002E17 |
English Translation of Chinese Treebank |
|
LDC2002E18 |
Xinhua Chinese-English Parallel News Text Version 1.0 beta 2 |
|
LDC2004E12 |
UN Chinese-English Parallel Text Version 2 |
|
Chinese English Translation Lexicon version 3.0 |
|
|
LDC2002E58 |
Sinorama Chinese-English Parallel Text |
|
Multiple-Translation Chinese Corpus |
|
|
Multiple-Translation Chinese Part 2 NIST June 2002 MT evaluation data |
|
|
LDC2003E01 |
Chinese-English Name Entity Lists version 1.0 beta |
|
LDC2003E04 |
Multiple Translation Chinese Corpus Part 3 |
|
Chinese Treebank Version 4.0 |
|
|
LDC2003E07 |
Chinese Treebank English Parallel Corpus |
|
LDC2003E08 |
Chinese News Translation Corpus Part 1 |
|
Not Applicable |
NIST May 2003 MT evaluation data can be acquired from NIST |
|
|
|
|
Unlimited Training
Condition |
|
|
All publicly available data up to |
|