REFLEX/

Low Density Languages/

Surprise Language

Website

This site contains links to information which we have collected for three projects:

  1. The Surprise Language exercise, held during June 2003, and sponsored by DARPA TIDES; and
  2. The Low Density Languages project, supported by the National Science Foundation's
    Human Language and Communication program (IIS 99-82201-004) and by
    funding from the Intelligence Technology Innovation Center (ITIC) MT
    Research program).
  3. The Less Commonly Taught Languages (LCTLs) portion of the REFLEX project.

In support of these projects, the LDC is conducting a survey of the largest (in terms of population) 300 or so languages of the world.

Here is a listing of languages for which we are conducting a more thorough survey of resources. Clicking on the language name will take you to a "card catalog" table containing summary information and pointers to web-based resources for that language, as discovered during intesive group surveys for each language. The "harvest page" link, when present, points to a more recently organized superset of discovered resources:



Some of the tools useful for more than one language are listed on the Generic Resources page, along with documentation on file formats.

Further resources, including text which has been processed (html stripping, tokenization, conversion to standard encoding, etc.), lexicons, annotated text, and morphological parsers, is available by login (see below).