REFLEX/
Low Density Languages/
Surprise Language
Website
This site contains links to information which we have collected for three projects:
In support of these projects, the LDC is conducting a survey of the largest (in terms of population) 300 or so languages of the world.
Here is a listing of languages for which we are conducting a more thorough survey of resources. Clicking on the language name will take you to a "card catalog" table containing summary information and pointers to web-based resources for that language, as discovered during intesive group surveys for each language. The "harvest page" link, when present, points to a more recently organized superset of discovered resources:
Some of the tools useful for more than one language are listed on the Generic Resources page, along with documentation on file formats.
Further resources, including text which has been processed (html stripping, tokenization, conversion to standard encoding, etc.), lexicons, annotated text, and morphological parsers, is available by login (see below).