GALE Distillation

LDC creates training corpora, guidelines, annotation tools and related resources to support the Distillation task.

    GALE Phase 3 Evaluation

    For Phase 3, LDC creates training corpora, guidelines, annotation tools and related resources to support the Distillation task.  Training resources include

  • Collections of raw source text that serve as input to training and test corpora for Distillation
  • Selection of appropriate documents for annotation
  • Manual annotation in English which include English source and English translation from Chinese and Arabic in four genres: Newswire, Web data, Broadcast News and Broadcast Conversation

Annotation Guidelines


GALE Phase 2 Evaluation

For Phase 2, LDC creates training corpora, guidelines, annotation tools and related resources to support the Distillation task.  Training resources include

  • collections of raw source text that serve as input to training and test corpora for Distillation
  • manual annotation in English which include English source and English translation from Chinese and Arabic in four genres: Newswire, Web data, Broadcast News and Broadcast Conversation
  • queries that conform to designated query templates
  • Annotation includes
    • snippets of relevant text
    • formalized "nuggets" that express core facts extracted from those snippets
    • entailment judgments between pairs of nuggets
    • relevance judgments of system-extracted snippets

Training queries for Phase 2 Go/No Go Evaluation

Training data DTD V1.5 (updated 4/2/2007)

Annotation guidelines for snippets, nuggets and entailment

Timeline including data distribution schedule


GALE Phase 1 Evaluation


Training queries for Phase 1 Go/No Go Evaluation

Annotation guidelines for snippets, nuggets and nugget co-reference

Timeline including data distribution schedule