Guidelines for Hub4/TDT Transcription 2000
At the Unix prompt, enter
bc-type tdt train
Double click/ highlight the file desired, and paste in at the prompt
The xwave window will not appear until the first attempt is made to
listen to a segment. Use the "Again" key to start. The transcription
shortcuts show the keyboard commands for listening, scrolling and changing
tags and timestamps in the editor.
word-for-word transcription, using standard orthography and standard capitalization
Remove all text that is not present in the audio - copyright information,
use end-of-sentence punctuation (periods, question marks), commas, and
hyphens for hyphenated words but no other punctuation (please remove all
other extraneous marks, ie quotation marks, double hypens, etc.)
every speaker turn has to be indicated & timestamped, e.g.
start of <section type=report> all
files will start with this
start of (non-initial) turn within
turn interval breakpoints
end of turn within section, followed
by a non-speech region
start of overlap region (speaker one
is interrupted by speaker two)
end of overlap where speaker one stops
and speaker two continues
end of overlap where speaker two stops
and speaker one continues
for interruptions, use <o> & timestamp
to indicate beginning of overlapping speech region - overlapping speech
is determined by overlapping word boundaries, rather than the exact point
in the waveform which may sever a word in two-
The [[NS]] tag can be used when there is an area within a turn that has
no speech within it , i.e. a musical interruption, or extended background
about the same thing I
oceanography is new exploration
and we're not
that the first speaker has now stopped, and the second speaker has continued
to speak. If BOTH speakers STOP at the same point in time, the next speaker
turn indicated by a if overlap
ends with non-speech section (silence, music, etc.), mark beginning of
non-speech section with <e> & timestamp
<b 123.456 >
indicate disflencies by using hyphen to mark partial words; transcribe
pause fillers, e.g.
The crowd was furious.
Calm was soon restored
by the arrival of the riot police.
jus- just waiting for that uh tha- that report to to come in.
transcribe standard English contractions as they're spoken: they're, won't,
isn't, don't, etc.
for non-standard contractions like "gonna" and "wanna" spell out the entire
word: going to, want to.
identify extended non-speech sections (music, dead air, sound effects)
with <e> and timestamp at beginning of
section, followed by <t> and timestamp
when speech resumes, e.g.
148.57> Sounds of gunfire filled the air.
NOTE Several speakers
170.89> That sound greeted early morning visitors.
In situations when you have several people speaking at once, and it
is very difficult to make them out, insert an <e tag at the start of
the confused section. Then start the new turn at the next available clear
<t 223.456> <<male>>
use (( )) to indicate words or passsages that are hard to understand or
difficult to transcribe accurately
<t 232.563> <<female>>
spell out all numerical sequences
one oh seven
spell out all titles like "doctor" (instead of Dr.) and "junior" (instead
of Jr.), EXCEPT for Mr., Ms. and Mrs.
indicate proper names with a ^ (this is being
done so that we can standardize the spelling of proper names after transcription;
tags will be stripped out before delivery)
acronyms and spoken strings of letters will be indicated with ~,
we will use the following set of non-lexemes:
other "special" words like interjections and acronyms which were specially
tagged in some versions of hub4 transcriptions will be transcribed as normal
words, i.e. not specially marked. for instance:
uh-huh, okay, gee, hey, AIDS, NAFTA