LCTL


Specifications

This page contains links to specifications used in LDC's work on creating and sharing resources for Less Commonly Taught Languages.


Formats and Conventions

This specification defines the formats, data pipelines and naming conventions for LCTL deliverables and major intermediaries used at the LDC.

    LCTL-Formats-v2.5.pdf

This specification is designed to support the outsourcing of LCTL translations.

    LCTL_Translation_Guidelines_v0.8.pdf

LDC has also developed a set of specs for GALE Translation. See: The GALE Translation Webpage


DTDs

These DTDs allow for the validation of LCTL Deliverables.

    Text DTD v1.2


    Lexicon DTD v1.3

       Lexicon DTD v1.2

       Lexicon DTD v1.1


    Annotation DTD v1.1


Named Entity Annotation

2005-06:

Version 6.5 has been proposed by LDC for all future work and is currently in use to support non-English spec development and discussion regarding Second Pass annotation.

    SimpleNamedEntityGuidelinesV6.5.pdf


Version 6.4.1 of the guidelines has been adapted to some of the LCTL Phase 1 languages.  These documents are also currently in use by the LDC to support Second Pass annotation.

    SimpleNamedEntityGuidelinesV6.4-Thai.pdf

    SimpleNamedEntityGuidelinesV6.4-Tamil.pdf


The LCTL Time Annotation addendum clarifies the annotation of time as a simplified version of TIMEX2.

    TimeAnnotationGuidelinesV1.0.pdf

    TimeAnnotationGuidelinesV1.0-Thai.pdf


Version 6.4.1 was used for the purposes of First Pass annotation in LCTL Phase1. Time annotation was not undertaken during First Pass Annotation.

    SimpleNamedEntityGuidelinesV6.4.pdf


2004-05:

Version 6.1 was used to annotate Bengali, Punjabi, Tagalog, Tamil, Tigrigna and Uzbek in 2004


    SimpleNamedEntityGuidelinesV6.1.pdf


Human Translation

These specifications are used in creating translated text. Typically LDC acquires text in the LCTL, divides it into sentence-like units and then subcontracts the translations to outside translation agencies. Because translators are instructed to produce one of more sentences of translation for each sentence of source text, sentence alignments are known and one-to-one or, at worst, one-to-many. The specific guidelines given here were used for Chinese-English translation. However the guidelines were written to be general; only the examples are language specific.

    MT2005_ChineseTranslationGuidelines.pdf

These specifications show how translated text is formatted for release:

    MT2005_FinalDataFormatMT_Corpora.pdf


Back to Top