TIDES Data for 2004

 

Updated 2/9/2005

 

Project

Project Manager

Delivery Description

Delivery Made/Due

eCorpus/

Catalog Number

Notes

Detection

HARD

Stephanie Strassel

2003 Relevance Judgments

5/18/2004

LDC2004E25

 

2004 Corpus

5/21/2004

LDC2004E30

 

2004 Training Topics

6/11/2004

LDC2004E32

 

2004 Evaluation Topics

6/25/2004

LDC2004E34

 

2004 Evaluation Topics with Metadata

7/12/2004

LDC2004E34

 

Clarification Forms

7/16/2004

LDC2004E42

 

2004 Relevance Judgments

10/1/2004

LDC2004E42

 

TDT

Stephanie Strassel

TDT-4 Topic Judgments

5/12/2004

LDC2004E20

 

  TDT-5 Corpus
  LDC2004E41

TDT-5 2004 Annotations


LDC2004E45

 

Extraction/ACE

Stephanie Strassel

Pilot Corpus

V1.1

1/20/2004

LDC2004E03

 

V1.2

2/6/2004

LDC2004E03

 

Training Data

V1.0

4/1/2004

LDC2004E17

 

V1.1

5/12/2004

LDC2004E17

 

V1.2

7/1/2004

LDC2004E17

 

  DevTest Data
 
  LDC2004E38

Evaluation Data

8/3/2004

LDC2004E51

 

Machine

Translation

Xiaoyi Ma

Arabic

Arabic News Translation Corpus Part 3

1/16/2004

LDC2004E07

 

Arabic English

Parallel News Text

Part 1 (2M words)

1/29/2004

LDC2004E08

Official publication to be released with additional QC and documentation on

9/15/2004

UN Arabic English

Parallel Text

Version 2 (101M words)

3/30/2004

LDC2004E13

 

Arabic News Translation Corpus Part 3 (524K words)

3/30/2004

LDC2004E11

 

Arabic News Translation Corpus Part 4 (200K words)

3/30/2004

LDC2004E11

 

Arabic News Translation Text

8/15/2004

LDC2004T08

 

Arabic Eval Data

5/2/2004

 

 

Human Assessment of Arabic to NIST

8/31/2004

 

 

Chinese

Hong Kong Hansards Parallel Text (36M en words)

3/30/2004

LDC2004E09

 

UN Chinese-English Parallel Text (147M en words)

3/30/2004

LDC2004E12

 

Multiple-Translation

Chinese Part 3

7/15/2004

LDC2004T07

Ready for publication

Hong Kong Parallel

Text

8/15/2004

 

 

Chinese-English News Magazine Parallel Text

12/15/2004

 

 

Chinese Eval Data

(80k Chinese Chars)

5/2/2004

 

 

Human Assessments of Chinese to NIST

8/31/2004

 

 

Summarization

Stephanie Strassel

50 Topic Summaries,

4 Annotators

12/21/2004

LDC2004E46

Tagged Text and

X Banks

Mohamed Maamouri

Arabic Treebank

Part 2 V2.0 (144K words - newswire)

1/30/2004

LDC2004T02

 

Part 3

V1.0 (340K words - newswire)

4/19/2004

LDC2004T11

 

Part 3 (a)

V1.1

7/30/2004

 

 

Part 3

V2.0

2/15/2005

 

Full Arabic Treebank

Part 1

V2.3

12/20/2004

 

Will include new annotation passes for morphology, POS, gloss and added vocalization

Xiaoyi Ma

Chinese Treebank

Version 4.0

(404K words – newswire, press release)

3/15/2004

LDC2004T05

 

Lexicons

Mohamed Maamouri

Buckwalter

Lexicon and Morphological

Analyzer

Version 2.0

12/2004