Project Specifications: ACE Data Overview

Corpus Data(words) Tasks: Languages Annotation Task Definition Sources Phase Evaluations Availability
ACE-Pilot 15K Pilot Entities-Pilot: English Entity Dectection and Tracking: Phase1 v2.2 TDT-2, Newspaper ACE Pilot May 2000
November 2000
Currently ACE/TIDES only; slated for future publication
ACE-1 180K Train

45K Eval
Entities:
English
Entity Detection and Tracking: Phase1 v2.2 TDT-2, Newspaper ACE PHASE1 February 2002 Currently ACE/TIDES only; slated for future publication
ACE-2 180K Train
(from ACE-1 Train)

 45K Dev/Test
(from ACE-1 Eval)

45,000 Eval (new)
Entities (revised treatment), Relations:
English
EDT Annotation Guidelines V2.5
 
RDC Annotation Guidelines V3.6
TDT-2, Newspaper
ACE PHASE2 September 2002 LDC Publication
LDC2003T11
ACE-2 EELD Supplement
30K Train (new)

20K Eval (new)
Entities, Relations:
English
EDT Annotation Guidelines v2.5

RDC Annotation Guidelines V3.6
RCK domain
EELD
September 2002
EELD/ACE only
ACE2003 Training Data
100K/lang Train (new)

50K/lang Eval (new)
Entities: English, Chinese, Arabic

Relations: English, Chinese
EDT Annotation Guidelines v2.5

RDC Annotation Guidelines V3.6
TDT-4 ACE PHASE2 TIDES Extraction, September 2003 LDC Publication
LDC2004T09




ACE 2004 Pilot Data




25K English Pilot (new)




English:

Entities, Relations, Events
English Entity  Guidelines V4.2.6

English Linking Guidelines V3.0

English Relations Guidelines V4.3.2

English Events Guidelines V2.0




Spring 2004 Mid-Course Correction Workshop





2004 Pilot Study,
February, 2004



ACE2004 Pilot Corpus: LDC2004E03

Availability: contact the LDC
ACE2004 Training Data
150K Train/lang (new)


50K Eval/lang


Entities: English, Chinese, Arabic


Relations: 
English, Chinese, Arabic
English Entity  Guidelines V4.2.6

English Linking Guidelines V3.0

English Relations Guidelines V4.3.2



Chinese Entity Guidelines V4.2.4

Chinese Linking Guidelines V2.0

Chinese Relations Guidelinies V4.3



Arabic Entity Guidelines V4.2.3

Arabic Linking Guidelines V1.0

Arabic Relations Guidelines V4.3
TDT-4;
Chinese Treebank;
Arabic Treebank;
Switchboard;
Fisher;

ACE PHASE3 ACE Program/
TIDES Extraction, September 2004


ACE/TIDES sites can obtain the following corpora by contacting LDC


ACE/TIDES Extraction 2004 Training Data
LDC2004E17, LDC2005T09


ACE/TIDES Extraction 2004 Training Data - Consistency Study
LDC2004E39


ACE/TIDES Extraction 2004 Evaluation/Test Data
LDC2004E51


ACE/TIDES Extraction 2004 Evaluation Data - Consistency Study
LDC2004E40
ACE 2005 Training Data (new):

English: 260K words
Chinese: 308K characters (205K words)
Arabic: 100K words

Evaluation Data (new):

Engish, Chinese and Arabic: 50K words
Entities, Relations, Events:

English, Chinese, Arabic
English-Entities-Guidelines_v5.6.1.pdf
English-Values-Guidelines_v1.2.4.pdf
English-Relations-Guidelines_v5.8.3.pdf
English-Events-Guidelines_v5.4.3.pdf
English-TimestampingGuidelines_v3.pdf
English-TIMEX2-Guidelines_v0.1.pdf

Chinese-Entities-Guidelines_v5.5.pdf
Chinese-Values-Guidelines_v1.1.2.pdf
Chinese-Relations-Guidelines_v5.5.1.pdf Chinese-Events-Guidelines_v5.5.1.pdf
Chinese-TIMEX2-Guideline-Summary_v1.2.pdf
Chinese-Timestamping-Guidelines_v2.pdf

Arabic-Entities-Guidelines_v5.3.3.pdf
Arabic-Values-Guidelines_v1.2.3.pdf
Arabic-Relations-Guidelines_v5.3.4.pdf
Arabic-Events-Guidelines_v5.4.4.pdf
Newswire;

Broadcast News;

Broadcast Conversation;

WebBlogs;

WebForums;

English Fisher Telephone Transcripts;
ACE 2005 November 2005 ACE sites can obtain the following corpus by contacting LDC

ACE 2005 Multilingual Training Data V6.0: LDC2005E18