EARS Metadata Extraction
Overview
As part of the EARS program, LDC is creating annotated resources to support a metadata extraction (MDE) research evaluation. The goal of MDE is to enable technology that can take the raw STT output and refine it into forms that are of more use to humans and to downstream automatic processes. In simple terms, this means the creation of automatic transcripts that are maximally readable. This readability might be achieved in a number of ways: removing non-content words like filled pauses and discourse markers from the text; removing sections of disfluent speech; and creating boundaries between natural breakpoints in the flow of speech so that each sentence or other meaningful unit of speech might be presented on a separate line within the resulting transcript. Natural capitalization, punctuation and standardized spelling, plus sensible conventions for representing speaker turns and identity are further elements in the readable transcript. LDC has defined a SimpleMDE annotation task and is currently annotating English telephone and broadcast news data to provide training data for MDE. The links below provide additional information about the MDE Annotation project.Data and Timeline
Data annotated, timeline of annotation
2003 Annotation Effort
2004 Annotation Effort
Please refer to the 2004 EARS Data Matrix for information about MDE 2004 schedules and data content.
Annotation Guidelines
2004 version of the official SimpleMDE Annotation Guidelines --- Version 6.2 (.pdf)2004 Web Guidelines
Created and used locally by LDC annotators to provide additional training and examples. These guidelines are a work in progress, and should not be considered a replacement for the formal annotation guidelines.
2003 version of the official SimpleMDE Annotation Guidelines --- Version 5.0 (.pdf)
2003 Web Guidelines
The Web Guidlines used for the 2003 MDE annotation effort. They were created and used locally by LDC annotators to provide additional training and examples.
Administrative (password protected)
Information about work assignments, progress and administrative details for LDC annotatorsTools
Download the latest free MDE Annotation Toolkit for Windows and *NIXMIT-LL's MDE site (password protected)