ACE 2007 Corpora

Data selection is semi-automatic. A document pool is established for each language based on the requirements described above. Humans then review the pool to select individual documents that are suitable for ACE annotation, for instance documents that are representative of their genre and contain targeted ACE entity types. Data processing involves automatically standardizing the source document formatting to comply to the customary ACE community standard. Source data formats match the format used in the ACE 2005 evaluation.