SPINE

SPeech in Noisy Environments

In-house Transcription Conventions



**Please note: These are the in-house transcription practices adopted by LDC transcribers and do not represent the final delivered product.  Post-processing of transcription files will convert our in-house markup to standard Hub5 formatting***
 

A description of the characteristics of Hub4 and Hub5 transcripts can be found here.



The table below summarizes the special tags employed in the transcription of SPINE audio, and compares the use of these tags with those used in the transcription of telephone speech (Hub5).  SPINE markup represents a subset of the markup used in Hub5.  The initial SPINE markup was created to minimize transcriber effort and maximize consistency.  Post-processing will be applied to the initial SPINE markup to bring it into line with Hub5 before the data is delivered.  The final column represents the markup that will appear in the final SPINE data.
 
 
Condition
Hub5 (Telephone Speech)
SPINE - working conventions
SPINE - final conventions (as delivered)
Numerals write out numerals in full: 
twenty-two
write out numerals in full: 
twenty-two
write out numerals in full: 
twenty-two
Acronyms pronounced as single letters ~VCR N/A N/A
Acronyms pronounced as full words @NATO N/A N/A
Pronounced individual letters (spelled-out words) ~S ~I ~M ~P ~S ~O ~N ~S ~I ~M ~P ~S ~O ~N ~S ~I ~M ~P ~S ~O ~N
Proper names/places ^Homer ^George ^Allen ^Burns ^George ^Allen ^Burns
Partial words (speaker) absolu- absolu- absolu-
Partial words (audio signal cuts out) N/A +please #please
Mispronounced words word written in standard orthography:
+probably
word written in standard orthography:
*probably
word written in standard orthography:
+probably
Idiosyncratic words *poodleish N/A N/A
Speaker noise {noise written within curly brackets} first two letters of noise preceded by backslash:
/ps
{noise written within curly brackets}
Background [text] N/A N/A
Backgroung noise (extended) [text/] [/text] N/A N/A
Semi-intelligible speech ((text)) ((text)) ((text))
Unintelligible speech (token) (( )) (( )) (( ))
Unintelligible speech (long span) [[skip]] N/A N/A
Repeated section of speech [[repeat]] N/A N/A
Foreign language <language text> N/A N/A
Speaker aside <as> text </as> N/A N/A
Overlapping speech (same channel) <ov> text </ov> N/A N/A
Non-lexemes list of non-lexemes; marked with % use Hub5 list of non-lexemes plus additional; no markup use Hub5 list of non-lexemes plus additional; no markup
Interjections list of interjections; no markup list of interjections; no markup list of interjections; no markup
Punctuation limited to . , ? variable limited to . , ?
Capitalization Standard English Standard English Standard English

 
 
 
Questions/comments?
krennert@ldc.upenn.edu

Last modified: Thu May 18 15:02:39 2000