In-house Transcription Conventions
A description of the characteristics of Hub4 and Hub5 transcripts can be found here.
|
|
|
| Numerals | write out numerals in full:
twenty-two |
| Acronyms pronounced as single letters | N/A |
| Acronyms pronounced as full words | N/A |
| Pronounced individual letters (spelled-out words) | ~S ~I ~M ~P ~S ~O ~N |
| Proper names/places | ^George ^Allen ^Burns |
| Partial words (speaker) | absolu- |
| Partial words (audio signal cuts out) | +please |
| Mispronounced words | word written in standard orthography:
*probably |
| Idiosyncratic words | N/A |
| Speaker noise | first two letters of noise preceded by backslash:
/ps |
| Background | N/A |
| Backgroung noise (extended) | N/A |
| Semi-intelligible speech | ((text)) |
| Unintelligible speech (token) | (( )) |
| Unintelligible speech (long span) | N/A |
| Repeated section of speech | N/A |
| Foreign language | N/A |
| Speaker aside | N/A |
| Overlapping speech (same channel) | N/A |
| Non-lexemes | use list of non-lexemes plus additional; no markup |
| Interjections | list of interjections; no markup |
| Punctuation | limited to question mark, period, comma |
| Capitalization | Standard English |