ACE 2005 Corpora
Data Profile
Training data: 300,000 words/language
Test data: 50,000 words/language
| Data Type | English | Chinese | Arabic |
|---|---|---|---|
| newswire (NW) | 20% | 40% | 40% |
| broadcast news (BN) | 20% | 40% | 40% |
| broadcast conversation (BC) | 15% | 0% | 0% |
| weblogs (blogs) (WL) | 15% | 20% | 20% |
| usenet newsgroups, discussion forums (UN) | 15% | 0% | 0% |
| conversational telephone speech (CTS) | 15% | 0% | 0% |
Training and Test Epochs
English
- NW: training epoch March-June 2003; test epoch July-Aug 2003
- BN: training epoch March-June 2003; test epoch July-Aug 2003
- BC: training epoch March-June 2003; test epoch July-Aug 2003
- WL: training epoch Nov 2004-Feb 2005; test epoch March-April 2005
- UN: training epoch Nov 2004-Feb 2005; test epoch March-April 2005
- CTS: training epoch Nov-Dec 2004; test epoch Nov-Dec 2004 (Note: For CTS data, test data is differentiated from training data by conversational topic rather than calendar epoch)
Chinese
- NW: training epoch Oct-Dec 2000; test epoch Jan 2001
- BN: training epoch Oct-Dec 2000; test epoch Jan 2001
- WL: training epoch Nov 2004-Feb 2005; test epoch March-April 2005
Arabic
- NW: training epoch Oct-Dec 2000; test epoch Jan 2001
- BN: training epoch Oct-Dec 2000; test epoch Jan 2001
- WL: training epoch Nov 2004-Feb 2005; test epoch March-April 2005














