LCTL
Specifications
This page contains links to specifications used in LDC's work on
creating and sharing resources for Less Commonly Taught Languages.
Formats and Conventions
This specification defines the formats, data pipelines and naming
conventions for LCTL deliverables and major intermediaries used at the
LDC.LCTL-Formats-v2.5.pdf
This specification is designed to support the outsourcing of LCTL translations.
LCTL_Translation_Guidelines_v0.8.pdf
LDC has also developed a set of specs for GALE Translation. See: The GALE Translation Webpage
DTDs
These DTDs allow for the validation of LCTL Deliverables.Text DTD v1.2
Lexicon DTD v1.3
Lexicon DTD v1.2
Lexicon DTD v1.1
Annotation DTD v1.1
Named Entity Annotation
2005-06:
Version 6.5 has
been proposed by LDC for all future work and is currently in use to
support non-English spec development and discussion regarding Second
Pass annotation.
SimpleNamedEntityGuidelinesV6.5.pdf
Version 6.4.1 of the guidelines has been adapted to some of the LCTL Phase 1 languages. These documents are also currently in use by the LDC to support Second Pass annotation.
SimpleNamedEntityGuidelinesV6.4-Thai.pdf
SimpleNamedEntityGuidelinesV6.4-Tamil.pdf
The
LCTL Time Annotation addendum clarifies the annotation of
time as a simplified version of TIMEX2.
TimeAnnotationGuidelinesV1.0-Thai.pdf
Version 6.4.1 was used for the purposes of First Pass annotation in LCTL Phase1. Time annotation was not undertaken during First Pass Annotation.
SimpleNamedEntityGuidelinesV6.4.pdf
2004-05:
Version 6.1 was used to annotate Bengali, Punjabi, Tagalog, Tamil, Tigrigna and Uzbek in 2004
SimpleNamedEntityGuidelinesV6.1.pdf
Human Translation
These specifications are used in creating translated text. Typically LDC acquires text in the LCTL, divides it into sentence-like units and then subcontracts the translations to outside translation agencies. Because translators are instructed to produce one of more sentences of translation for each sentence of source text, sentence alignments are known and one-to-one or, at worst, one-to-many. The specific guidelines given here were used for Chinese-English translation. However the guidelines were written to be general; only the examples are language specific.MT2005_ChineseTranslationGuidelines.pdf
These specifications show how translated text is formatted for release:
MT2005_FinalDataFormatMT_Corpora.pdf














