GALE Distillation

LDC creates training corpora, guidelines, annotation tools and related resources to support the Distillation task.

    GALE Phase 3 Evaluation

    For Phase 3, LDC creates training corpora, guidelines, annotation tools and related resources to support the Distillation task.  Training resources include

  • Collections of raw source text that serve as input to training and test corpora for Distillation
  • Selection of appropriate documents for annotation
  • Manual annotation in English which include English source and English translation from Chinese and Arabic in four genres: Newswire, Web data, Broadcast News and Broadcast Conversation

Annotation Guidelines


GALE Phase 2 Evaluation

For Phase 2, LDC creates training corpora, guidelines, annotation tools and related resources to support the Distillation task.  Training resources include

  • collections of raw source text that serve as input to training and test corpora for Distillation
  • queries that conform to designated query templates
  • Manual annotation in English, Chinese and Arabic for
    • relevant documents
    • snippets of relevant text from those
    • formalized "nuggets" that express core facts extracted from those snippets
    • entailment judgments between pairs of nuggets
    • relevance judgments of system-extracted snippets

    Training queries for Phase 2 Go/No Go Evaluation

    Training data DTD V1.5 (updated 4/2/2007)

    Annotation guidelines for snippets, nuggets and entailment

    Timeline including data distribution schedule


    GALE Phase 1 Evaluation


    Training queries for Phase 1 Go/No Go Evaluation

    Annotation guidelines for snippets, nuggets and nugget co-reference

    Timeline including data distribution schedule