Allowable training data for the separate evaluation conditions. Registered participants should contact the LDC to acquire this data:

Large Data Condition
LDC Catalog Number Title Description

* LDC2005E46

Arabic Treebank English Translation translation of ATB part 1, 3 and 4; 551K Arabic words total, single translation

LDC2004E13

UN Arabic English Parallel Text Version 2

Parallel text from UN; 101M Arabic words; sentence aligned
* LDC2004T18 Arabic English Parallel News Text Part 1 Parallel text from Ummah Press; 2M Arabic words, sentence aligned
LDC2002L49 Buckwalter Arabic Morphological Analyzer Version 1.0 See the catalog page for details
* LDC2005T02 Arabic Treebank: Part 1, v3.0 (POS with full vocal.+ syntactic analysis; 145K words
* LDC2004T02 Arabic Treebank: Part 2, v2.0 144K words
* LDC2004T11 Arabic Treebank: Part 3, v1.0 293K words
LDC2003T18 Multiple Translation Arabic Corpus Part 1 See the catalog page for details
* LDC2005T05 Multiple Translation Arabic Corpus Part 2 See the catalog page for details
* LDC2004T17 Arabic News Translation Text Part 1 441K words

* LDC2004E72

eTIRR Arabic English News Text 5.6K words
Not Applicable NIST May 2004 MT evaluation data can be acquired from NIST  
 
Unlimited Data Condition
All publicly available data up to November 30th, 2004

* denotes resources created after the 2004 TIDES MT Evaluation