Allowable training data for the separate evaluation conditions. Registered participants should contact the LDC to acquire this data:

 

Small Data Condition

LDC2002E17

English Translation of Chinese Treebank

Not Applicable

The 10k-word dictionary from CMU (S.Vogel)

 

Large Data Condition

LDC catalog #

Title

LDC2003E14

FBIS data

LDC2000T47

Hong Kong Laws Parallel Text

LDC2003E25

Hong Kong News Parallel Text, sentence-aligned

LDC2000T46

Hong Kong News Parallel Text

LDC2000T50

Hong Kong Hansard Parallel Text, aligned at the document level

LDC2004E09

Hong Kong Hansard Parallel Text, aligned at the sentence level

LDC2002E17

English Translation of Chinese Treebank

LDC2002E18

Xinhua Chinese-English Parallel News Text Version 1.0 beta 2

LDC2004E12

UN Chinese-English Parallel Text Version 2

LDC2002L27

Chinese English Translation Lexicon version 3.0

LDC2002E58

Sinorama Chinese-English Parallel Text

LDC2002T01

Multiple-Translation Chinese Corpus

LDC2003T17

Multiple-Translation Chinese Part 2

NIST June 2002 MT evaluation data

LDC2003E01

Chinese-English Name Entity Lists version 1.0 beta

LDC2003E04

Multiple Translation Chinese Corpus Part 3

LDC2004T05

Chinese Treebank Version 4.0

LDC2003E07

Chinese Treebank English Parallel Corpus

LDC2003E08

Chinese News Translation Corpus Part 1

Not Applicable

NIST May 2003 MT evaluation data can be acquired from NIST

 

Unlimited Training Condition

All publicly available data up to Jan. 1st, 2004