Global Autonomous Language Exploitation (GALE)

The goal of the DARPA GALE program is to develop and apply computer software technologies to absorb, analyze and interpret huge volumes of speech and text in multiple languages. Automatic processing “engines” will convert and distill the data, delivering pertinent, consolidated information in easy-to-understand forms to military personnel and monolingual English-speaking analysts in response to direct or implicit requests.

GALE will consist of three major engines: Transcription, Translation and Distillation. The output of each engine is English text. The input to the transcription engine is speech and to the translation engine, text. Engines will pass along pointers to relevant source language data that will be available to humans and downstream processes. The distillation engine integrates information of interest to its user from multiple sources and documents. Military personnel will interact with the distillation engine via interfaces that could include various forms of human-machine dialogue (not necessarily in natural language).

Linguistic Data Consortium supports the GALE Program by providing linguistic resources -- data, annotations, tools, standards and best practices -- for system training, development and evaluation.

Sign up to receive GALE data announcements from LDC

Overview

An introduction to GALE-related activities taking place at LDC.

Data

Data Matrix
Table of planned linguistic resources specifically targeted for the current phase of GALE.

Catalog Query
List of existing LDC publications relevant to GALE.

Task Specifications

Task specifications state needs and assumptions for each task, describe the process for collecting and/or selecting data for that task, define annotation and quality control procedures associated with the task, and describe the distribution formats for the resulting data. LDC's GALE tasks include

LDC's GALE Team

Contact information for core GALE staff at LDC