Join a Discussion List!
Global Autonomous Language Exploitation (GALE)
The goal of the DARPA GALE program is to develop and apply computer software technologies to absorb, analyze and interpret huge volumes of speech and text in multiple languages. Automatic processing “engines” will convert and distill the data, delivering pertinent, consolidated information in easy-to-understand forms to military personnel and monolingual English-speaking analysts in response to direct or implicit requests.
GALE will consist of three major engines: Transcription, Translation and Distillation. The output of each engine is English text. The input to the transcription engine is speech and to the translation engine, text. Engines will pass along pointers to relevant source language data that will be available to humans and downstream processes. The distillation engine integrates information of interest to its user from multiple sources and documents. Military personnel will interact with the distillation engine via interfaces that could include various forms of human-machine dialogue (not necessarily in natural language).
Linguistic Data Consortium supports the GALE Program by providing linguistic resources -- data, annotations, tools, standards and best practices -- for system training, development and evaluation.
An introduction to GALE-related activities taking place at LDC.
Table of planned linguistic resources specifically targeted for the current phase of GALE.
List of existing LDC publications relevant to GALE.
- Task Specifications
Task specifications state needs and assumptions for each task, describe the process for collecting and/or selecting data for that task, define annotation and quality control procedures associated with the task, and describe the distribution formats for the resulting data. LDC's GALE tasks include
- Broadcast data (news and talk shows)
data (blogs and newsgroups)
Rich Transcription (QRTR)
- Careful Transcription (CTR)
- Sentence-based translation
- Full-document translation
- Word Alignment
- Arabic Word Alignment V4.0 (updated 04/08/2009)
- Chinese Word Alignment V4.0 (updated 04/16/2009)
- Chinese Tagging Guidelines V1.0 (DRAFT) (updated 4/10/2009)
- LDC's GALE Team
Contact information for core GALE staff at LDC