Allowable training data for the separate evaluation conditions. Registered participants should contact the LDC to acquire this data:

 

Large Data Condition

LDC catalog #

Title

LDC2003T07

Arabic Treebank Part 1 10k word English Translation

LDC2004E13

UN Arabic English Parallel Text Version 2

LDC2004E08

Arabic English Parallel News Text Part 1

LDC2002L49

Buckwalter Arabic Morphological Analyzer Version 1.0

LDC2003T06

Arabic Treebank Part 1 v2.0

LDC2003T18

Multiple Translation Arabic Corpus Part 1

NIST June 2002 MT evaluation data

LDC2003E05

Arabic News Translation Corpus Part 1

LDC2003E09

Arabic News Translation Corpus Part 2

LDC2004E07

Arabic News Translation Corpus Part 3

LDC2004E11

Arabic News Translation Corpus Part 4

LDC2004T02

Arabic Treebank: Part 2 v 2.0

Not Applicable

NIST May 2003 MT evaluation data can be acquired from NIST

 

 

Unlimited Training Condition

 All publicly available data up to Jan 1st, 2004