SPINE2

SPeech in Noisy Environments - Phase 2

In-house Transcription Conventions



**Please note: These are the in-house transcription practices adopted by LDC transcribers and do not represent the final delivered product.  Post-processing of transcription files will convert our in-house markup to standard Hub5 formatting***

A description of the characteristics of Hub4 and Hub5 transcripts can be found here.



The table below summarizes the special tags employed in the transcription of SPINE2 audio.  SPINE2 markup represents a subset of the markup used in Hub5.  The initial SPINE2 markup was created to minimize transcriber effort and maximize consistency.  Post-processing will be applied to the initial SPINE markup to bring it into line with Hub5 before the data is delivered.
 
Condition
SPINE - working conventions
Numerals write out numerals in full: 
twenty-two
Acronyms pronounced as single letters N/A
Acronyms pronounced as full words N/A
Pronounced individual letters (spelled-out words) ~S ~I ~M ~P ~S ~O ~N
Proper names/places ^George ^Allen ^Burns
Partial words (speaker) absolu-
Partial words (audio signal cuts out) +please
Mispronounced words word written in standard orthography:
*probably
Idiosyncratic words N/A
Speaker noise first two letters of noise preceded by backslash:
/ps
Background N/A
Backgroung noise (extended) N/A
Semi-intelligible speech ((text))
Unintelligible speech (token) (( ))
Unintelligible speech (long span) N/A
Repeated section of speech N/A
Foreign language N/A
Speaker aside N/A
Overlapping speech (same channel) N/A
Non-lexemes use list of non-lexemes plus additional; no markup
Interjections list of interjections; no markup
Punctuation limited to question mark, period, comma
Capitalization Standard English
Questions/comments?

strassel@ldc.upenn.edu, nmartey@ldc.upenn.edu
Last modified: Fri, July 6, 2001