Allowable training data for the separate evaluation conditions. Registered participants should contact the LDC to acquire this data:
| Large Data Condition | ||
| LDC Catalog Number | Title | Description |
| Arabic Treebank English Translation | translation of ATB part 1, 3 and 4; 551K Arabic words total, single translation | |
|
UN Arabic English Parallel Text Version 2 |
Parallel text from UN; 101M Arabic words; sentence aligned | |
| * LDC2004T18 | Arabic English Parallel News Text Part 1 | Parallel text from Ummah Press; 2M Arabic words, sentence aligned |
| LDC2002L49 | Buckwalter Arabic Morphological Analyzer Version 1.0 | See the catalog page for details |
| * LDC2005T02 | Arabic Treebank: Part 1, v3.0 | (POS with full vocal.+ syntactic analysis; 145K words |
| * LDC2004T02 | Arabic Treebank: Part 2, v2.0 | 144K words |
| * LDC2004T11 | Arabic Treebank: Part 3, v1.0 | 293K words |
| LDC2003T18 | Multiple Translation Arabic Corpus Part 1 | See the catalog page for details |
| * LDC2005T05 | Multiple Translation Arabic Corpus Part 2 | See the catalog page for details |
| * LDC2004T17 | Arabic News Translation Text Part 1 | 441K words |
| eTIRR Arabic English News Text | 5.6K words | |
| Not Applicable | NIST May 2004 MT evaluation data can be acquired from NIST | |
| Unlimited Data Condition | ||
| All publicly available data up to November 30th, 2004 | ||
* denotes resources created after the 2004 TIDES MT Evaluation