GALE Distillation
LDC creates training corpora, guidelines, annotation tools and related resources to support the Distillation task.
- Collections of raw source text that serve as input to training and test corpora for Distillation
- Selection of appropriate documents for annotation
- Manual annotation in English which include English source and English translation from Chinese and Arabic in four genres: Newswire, Web data, Broadcast News and Broadcast Conversation
GALE Phase 3 Evaluation
For Phase 3, LDC creates training corpora, guidelines, annotation tools and
related resources to support the Distillation task. Training
resources include
Annotation Guidelines
GALE Phase 2 Evaluation
For Phase 2, LDC creates training corpora, guidelines, annotation tools and
related resources to support the Distillation task. Training
resources include
- relevant documents
- snippets of relevant text from those
- formalized "nuggets" that express core facts extracted from those snippets
- entailment judgments between pairs of nuggets
- relevance judgments of system-extracted snippets
Training
queries for Phase 2 Go/No Go Evaluation
- BAE Training queries for unstructured text
- BAE-created queries
- LDC-created queries
- xml file for all Phase 2 queries
Training data DTD V1.5 (updated
4/2/2007)
Annotation
guidelines for snippets, nuggets and entailment
- Distillation
Training Data Annotation Guidelines V2.3 (updated 3/1/2007)
Timeline including data distribution schedule
GALE Phase 1 Evaluation
Training queries for Phase 1 Go/No Go Evaluation
Annotation guidelines for snippets, nuggets and nugget co-reference
- Distillation
Training Data Annotation Guidelines V1.0 (updated 5/4/2006)
Timeline including data distribution schedule














