Automatic Content Extraction (ACE): Previous Annotation Efforts
A summary of data and annotation guidelines for each evaluation can be found on the ACE Data Matrix.ACE07
ACE Evaluation
New tasks for ACE 2007
included a pilot evaluation using Spanish data for the tasks of entity
detection and recognition (EDR) and temporal expression recognition and
normalization
(TERN).Description of corpora
Data selection process
Data format and DTD
ACE07 Evaluation (NIST Website)
Entity Translation
Pilot Evaluation
A pilot evaluation of
"Entity Translation" was conducted as part of ACE07. Systems
participating in the pilot ET task are evaluated on their ability to
take in a
text document in one language (either Mandarin Chinese or Arabic) and
emit an
English language catalog of the entities mentioned in the
document. LDC
created reference translations and ACE annotations to support the ET
pilot
task, with support from the REFLEX program.Data selection process
Data format and DTD
ET Pilot Evaluation (NIST Website)
ACE05
In 2005 the ACE Program
expanded to include Events annotation for Arabic, English and Chinese
.
For the 2005 TIDES Extraction Evaluation, LDC created new training data
of
approximately 300,000 words per language, plus test sets of
approximately
50,000 words per language. Description of corpora
Data selection process
Data format and DTD
ACE05 Annotation Toolkit
ACE05 Evaluation (NIST Website)
ACE04
In 2004 the ACE Program
expanded to include Relation annotation for Arabic. For the
September
2004 TIDES Extraction Evaluation, LDC created new training data of
approximately 150,000 words per language, plus test sets of
approximately
50,000 words per language. Annotations include Entities and
Relations for
English, Chinese and Arabic. ACE03
In 2003 the ACE
program
expanded to include Chinese and Arabic as well as English. For
the
September 2003 TIDES Extraction evaluation, LDC created new training
data of
approximately 100,000 words per language, plus test sets of up to
50,000 words
for each language. Annotations included Entities and Relations
for
Chinese, and Entities only for Arabic.
ACE Phase 2
To support the
November 2002 Extraction evaluation, LDC created 180,000 words of
English training data (from ACE Phase 1), plus an newly-defined 45,000
word development set and 45,000 words of new evaluation data.
This
data was annotated for both Entities and Relations, to support EDT and
RDC technology evaluations.
ACE Phase 2b Evaluation Summer 2002 (NIST Website)
ACE Phase 2 Evaluation 2001/2002 (NIST Website)
ACE Phase 1
LDC created a
180,000 word English training corpus and 45,000 words of test data to
support the February 2002 ACE evaluation.
ACE Pilot
LDC joined ACE
research sites to create an English pilot corpus of 15,000 words tagged
for Entities. This effort supported EDT evaluations in May and
November 2000.














