This page contains links to specifications used in LDC's work on
creating and sharing resources for Less Commonly Taught Languages.
Formats and Conventions
This specification defines the formats, data pipelines and naming
conventions for LCTL deliverables and major intermediaries used at the
This specification is designed to support the outsourcing of LCTL translations.
LDC has also developed a set of specs for GALE Translation. See: The GALE Translation Webpage
These DTDs allow for the validation of LCTL Deliverables.
Text DTD v1.2
Lexicon DTD v1.3
Lexicon DTD v1.2
Lexicon DTD v1.1
Annotation DTD v1.1
Named Entity Annotation
Version 6.5 has
been proposed by LDC for all future work and is currently in use to
support non-English spec development and discussion regarding Second
Version 6.4.1 of the guidelines has been adapted to some of the LCTL Phase 1 languages. These documents are also currently in use by the LDC to support Second Pass annotation.
The LCTL Time Annotation addendum clarifies the annotation of time as a simplified version of TIMEX2.
Version 6.4.1 was used for the purposes of First Pass annotation in LCTL Phase1. Time annotation was not undertaken during First Pass Annotation.
Version 6.1 was used to annotate Bengali, Punjabi, Tagalog, Tamil, Tigrigna and Uzbek in 2004
Human TranslationThese specifications are used in creating translated text. Typically LDC acquires text in the LCTL, divides it into sentence-like units and then subcontracts the translations to outside translation agencies. Because translators are instructed to produce one of more sentences of translation for each sentence of source text, sentence alignments are known and one-to-one or, at worst, one-to-many. The specific guidelines given here were used for Chinese-English translation. However the guidelines were written to be general; only the examples are language specific.
These specifications show how translated text is formatted for release: