| Transcription Main | Sample BN Transcript | Keyboard Shortcuts | Tools Help | LDC Home |
Your goal is to
provide an accurate, verbatim (word-for-word) transcript of the entire
broadcast. The transcript will be time-aligned with the audio file.
The Transcription Process
1) Segmentation
The segmentation
process creates initial timestamps for the audio file. Timestamps
indicate when different things are happening in the audio, and so allow
us to align the transcript with the corresponding audio file. Timestamps
also make transcription of the audio easier, by allowing the transcriber
to listen to small chunks of segmented speech at a time.
Segment boundaries, or timestamps, must occur at regular intervals within each audio file. Segment boundaries, at a minimum, must identify
Some things to keep in mind while doing
segmentation:
Section and turn boundaries are easy to
detect, and timestamps must be inserted at these points.
Some things to consider when inserting
breakpoints:
End of a section
End of the file
Overlapping speech
Multiple speakers
<t
223.456> <<male, speaker_12>>
Speakers start
simultaneously
<t
1123.176> <male, Jacques_Cousteau>
Extended periods
of non-speech
There are three types of section boundaries:
The table below summarizes the segment
labels you will use.
If a speaker is not identified by name
within a recording, a unique numerical index is used. Unnamed speakers
are divided into Reporter and Speaker. Reporter
is used for news anchors, interviewers, or reporters on the scene of a
story. Speaker refers to anyone else who is not identified by name.
The numerical IDs for Reporter and Speaker IDs cannot overlap;
each successive anonymous speaker has a unique number, regardless
of the category the speaker is assigned to. For example, the following
sequence is entirely possible:
reporter_1
Native and non-native
speakers
Examples of speaker identifications
<sr 1.402> <<male, Leon_Harris>>
4) Transcription
Transcription
Conventions
Capitalization
Orthography and
spelling
Punctuation
DO NOT use quotation marks, exclamation
marks, colons, semicolons, dashes or ellipses in transcribing. If
you encounter these symbols in an existing transcript, you must remove
them.
Abbreviations
Mr.
Brown
However, when they are used in any other
context, write them out in full.
I went to the junior league game.
Hyphenated words
and compounds
an overly complicated analysis not an overly-comlicated
analysis
However, in some cases, a hyphen
is required:
anti-nuclearprotests
not anti nuclear protests
Coumpounds can be tricky. When in
doubt, consult a dictionary and talk to your team leader.
Numbers
twenty-two
Contractions and
apostrophe -s / Contractions, abbreviations and compound words **NEEDS
WORK**
Note: Avoid the common mistakes of transposing
possessive its for contraction it's (it is), possessive your for the contraction
you're (you are), and their (possessive), they're (they are) and there.
Transcribe exactly what you here using
standard orthography. If a speaker uses a contraction, transcribe
the word as contracted: they're, won't, isn't, don't and so on. If
the speaker uses a complete form, transcribe what you hear: they are, is
not and so on.
For non-standard contractions like "gonna"
and "wanna" spell out the entire word: going to, want to. If you
are unsure about whether a contraction is standard or non-standard, talk
to your team leader.
Disfluent speech
**NEEDS WORK**
Filled Pauses/Hesitation
Sounds
If you believe a speaker uses a word or
sound as a filled pause or hesitation marker, and the word does not appear
on this list, let your team leader know. All filled pauses are indicated
with a % sign preceding the word.
Partial Words
Mispronounced
Words
Hard-to-understand
Words
If you have some idea of the speaker's
words but aren't entirely sure, type what you think you hear and surround
the stretch of uncertain transcription with double parentheses:
Idiosyncratic
words
Proper nouns
Interjections
Summary of special
symbols
Some general considerations
Don't try to imitate a speaker's non-standard
pronunciation. Use standard spelling for non-standard pronunciations.
Obviously mispronounced words (as opposed to non-standard pronunciations)
should be marked with the special + symbol. When in doubt, consult
your team leader.
-----------------------------------
Of all of the sections,
you should only transcribe those that are reports, "sr" (section
=report) , (including weather) ,or filler material, "sf" (section=filler).
There are several things which should not be transcribed:
If you skip any portion
of the broadcast, you should provide a time-stamp of the
<sn
323.08>
Furthermore, if the
material is marked as "sn" because it is a repeat of material found
<sn
323.08>
For the sections
marked as <sn> you should not provide any transcription.
If you have any questions
about this, please consult your language leader.
Overlapping
speech
i) Use Ctrl-c
s to "send" the segment to the waveform window. Then find the section
which cannot be understood and isolate it as you would if you were placing
breakpoints
punctuation should be inside `((...))'
Breakpoints, or timestamps within
a speaker's turn, will typically coincide with breath groups or pauses
in the speech, and may coincide with the ends of sentences or phrases.
Breakpoints should typically be inserted every 3-8 seconds.
Beginning of a section
Because section boundaries are presumed
to begin with a speaker turn, it is not necessary to insert a turn boundary
directly after a section boundary. For instance:
<sf 21.232>
<<male, Lou_Waters>>
End of a turn
The last great explorer
^Jacques ^Cousteau has died in ^Paris at age eighty-seven.
<t 25.907> <<female,
Natalie_Allen>>
{breath} Part of
Early Prime is being preempted so that for the next half hour we can remember
one of the giants
<b 31.105>
of the twentieth
century. Hello, I'm ^Natalie ^Allen.
If the end of one speaker's turn is directly
followed by the start of another speaker's turn, there is no need to specifically
label or timestamp the end of the first speaker's turn. If a speaker's
turn is followed by a period of non-speech (music, sound effects or silence),
then you must explicitly timestamp and label the end point of the speaker's
turn with <e>.
If the end of a section is directly followed
by the start of another section, there is no need to specifically label
or timestamp the end of the first section. If the section is followed
by a period of non-speech (music, sound effects or silence), then you must
explicitly timestamp and label the end point of the section with <e>.
Each file must end with a final timestamp,
indicating where the audio recording for that program concludes.
This timestamp should be labeled with <e> to indicate end.
A special subclass
of breakpoints marks the beginning and end points of overlapped speech;
that is, periods of the recording where there are multiple speakers talking
at once. Use the notation <o> to mark the beginning of the overlapping
speech section. The <e1> or <e2> label indicates the end of
the period of overlap. <e1> is used when speaker one stops talking
while speaker two continues; <e2> is used when speaker two stops talking
while speaker one continues. If both speakers stop talking at the
same time and a third person begins talking, a new <t> turn label or
<sx> section label should be used. Instructions for transcribing
overlapping speech appear below.
In situations when you have several people
speaking at once and it is very difficult to make them out, insert an <e>
tag at the start of the difficult section. Then start the new turn <t>
at the next region of clear, discernable speech.
<b 225.678>
<e 230.302>
((region of multiple speakers, impossible
to transcribe))
<t 232.563> <<female, speaker_13>>>
When speakers start talking simultaneously,
create start times for the speakers that are about one tenth of a second
(or less) apart, and use the <o> overlap tag for the second speaker's
turn. For instance,
You know,
<o 1123.276> <female, speaker_2>
SPEAKER1:I understand {laugh}
SPEAKER2:well,
<e1 1124.256>
oceanography is new exploration and we're not...
For an extended (more than 5 seconds)
period of silence, music or other non-speech, insert an <e> tag at the
start of the non-speech section. Then start the new turn <t> at the
next region of speech. For example,
<t 223.456> <<male, speaker_1>>
<b 225.678>
<e 230.302>
((region of silence, sound effects or music))
<t 236.563> <<female, speaker_3>>>
2) Section
Labels
In addition to providing timestamps, you must
also label each section, turn or breakpoint with the appropriate label.
These three sections are defined in detail
here.
3) Speaker
Identification
start of <section type=report>
news story section
start of <section type=filler>
filler section
start of <section type=non-news>
non-news section: commercials, etc.
start of (non-initial) speaker turn
within section
turn interval breakpoints
end of turn within section, followed
by a non-speech region
start of overlap region (speaker one
is interrupted by speaker two)
end of overlap where speaker one stops
and speaker two continues
end of overlap where speaker two stops
and speaker one continues
In addition to identifying segment boundaries
and timestamping them, you must also identify all of the speakers within
a broadcast. If you are unable to determine the name of a speaker,
you must assign that speaker a unique identification, and use the same
speaker ID throughout the transcript file. You must also identify
speaker type as
Female
Names and Identifiers
Male
Child
Other - used for speakers in unison, altered
voices, etc.
Whenever possible, include the proper
name of the speaker. Examples of proper names include Jacques_Cousteau,
William_Cohen, and Madeleine_Albright. You must use the
same spelling of proper names within and across all broadcast files.
reporter_2
speaker_3
speaker_4
reporter_2 (assuming it is the same voice as the previous reporter_2)
reporter_6 (a new reporter distinct from the two above)
In addition to indicating speaker type
and name/ID, you must also indicate when a speaker is a non-native speaker.
In English broadcast news,
native is defined as a speaker of any
North American English dialect. As native is the default, you do
not need to explicitly mark this. Non-native is used for speakers
of other dialects of English, including British English or Indian English;
non-nativeis
also used to indicate people who are not native English speakers and have
a discernable foreign accent.
<sr 158.244> <<female, Joie_Chen>>
<t 196.813> <<male, speaker_1>>
<t 498.314> <<female, non-native, speaker_3>>
<t 567.215> <<male, altered, speaker_4>>
In Progress
Once a file has
been fully segmented and the speakers identified, it must be transcribed.
Annotators must produce a verbatim (word-for-word) transcript of everything
that is said within the file. The words transcribed within each segment
boundary must correspond exactly to the timestamps that have been created,
so that the audio file is aligned with the transcript.
Capitalization in our transcripts is used
to aid human comprehension of the text. You should follow accepted standard
written capitalization patterns, and capitalize words at the beginning
of a sentence, proper names, and so on.
Transcribers should use standard orthography,
word segmentation and word spelling. All files must be spell-checked
after transcription is complete. When in doubt about the spelling
of a word or name, transcribers should consult a standard reference, like
an online or paper dictionary, world atlas or news website.
Transcribers should use standard punctuation
for ease of transcription and reading. Acceptable punctuation
is limited to periods and question marks at the end of a sentence, and
commas within a sentence. Write the punctuation as you normally
would in standard written English (with no additional spaces around the
punctuation marks).
Avoid word abbreviations whenever possible;
instead, spell out the word in full. When abbreviations are used
as part of a personal title, they can remain as abbreviations:
Mrs.
Jones
Dr.
Spock
I went to the doctor, and all he said was, don't worry, it's natural.
Hey mister, do you know how to get to the stadium?
In general, be conservative about use
of hyphens. For instance:
Write out all numerals as words. Hyphenate
numbers between twenty-one and ninety-nine only.
nineteen ninety-five
seven thousand two hundred seventy-five
nineteen oh nine
Limit your use of contractions to those
that exist in standard written English, and of course only when a contraction
is actually produced by the
speaker. Take care to transcribe
exactly what the speaker says. The table below, while not comprehensive,
shows some examples of how to transcribe common contractions.
Complete Form
Spoken As
Transcribed As
Incorrect
I have
I've
I've
cannot
can't
can't
will not
won't
won't
you have
you've
you've
could not
couldn't
couldn't
should have
should've
should've
should of, shoulda
it is
it's
it's
Marvin (possessive)
Marvin's
Marvin's
Marvin is
Marvin's
Marvin's
Marvin has
Marvin's
Marvin's
going to
gonna
going to
gonna
want to
wanna
want to
wanna
got to
gotta
got to
gotta
Regions of disfluent
speech are particularly difficult to transcribe. Speakers may stumble
over their words, repeat themselves, utter partial words, restart phrases
or sentences, and use lots of hesitation sounds. Take particular
care in sections of disfluency to transcribe exactly what you hear.
Filled pauses are non-lexemes (non-words)
that speakers employ to indicate hesitation or to maintain control of a
conversation while thinking of what to say next. Each language has
a limited set of filled pauses that speakers can employ.
Use the standardized spellings shown in the table below for filled pauses.
Don't alter the spelling to reflect how the speaker pronounces the word
(e.g., typing AH for a loud "ah" or hmmmmmmm for a long "hmm".) For
English, this set includes ah, eh, er, uh, um.
English Filled Pauses
%ah
%eh
%er
%uh
%um
Use - to indicate point at which word
was broken off.
Use + symbol for obviously mispronounced
words (not regional or non-standard dialect pronunciation).
Sometimes you will encounter a section
of speech that is difficult or impossible to understand. In these
cases, you should use the (( )) symbol to mark the region of difficulty.
((here you type what you think
you hear but aren't sure))
If you're truly mistified and can't at all
make out what the speaker is saying, don't type anything, and use empty
double parentheses to surround the untranscribed region. If possible,
this untranscribed region should get its own timestamp.
Occasionally a speaker will make up a
new word on the spot. These are not the same as slang words; they're
words that are unique to the speaker in that conversation. If you
encounter an idiosyncratic word, transcribe it to the best of your ability
and mark it with a * symbol.
We mark all proper nouns, including personal
names, place names and the like, with a ^ symbol. If the name
contains more than one word, mark all words in the name with the symbol.
Please use these standardized spellings
for interjections. Interjections *do not* require any special symbol.
If you encounter an interjectection that does not appear on this list and
are unsure how to spell it, notify your team leader.
ach
eee
ew
ha
hee
huh
hm
huh
jeezmhm
oh
okay
ooh
uh-huh
uh-oh
whoa
whew
yeah
Condition
Symbol
Example
Description of symbol's use
Individual letters
~
~I before ~E except after ~C
Individual letters spelled out, each with
~
Partial words
-
absolu-
Speaker-produced partial words are indicated
with a dash. Transcribe as much of the word as you hear.
Mispronounced words
+
+probably
Mispronounced word (a speech error).
NOTE: Do not use this symbol to indicate non-standard but common regional
or social dialect pronunciations, such as "gonna" for "going to". Transcribe
non-standard pronunciation variants using normal standard orthography.
Idiosyncratic words
*
*poodleish
Speaker uses a "made-up" word. NOTE:
Some speakers may use non-standard dialect words which don't constitute
idiosyncratic words. If you're unsure, consult your team leader.
Speaker restart
--
I thought he -- I thought he was there.
Used when the speaker stops short and
then repeats themselves or abandons the utterance completely, restarting
with a new sentence.
Speaker noise
{ }
{breath}
{cough}
{laugh}
{sneeze}
{lipsmack}Sounds made by the talker. Limited
to these five sounds
Semi-intelligible speech
((text))
((they lived next door to us))
This is the transcriber's best attempt
at transcribing a difficult passage
Unintelligible speech (single token)
(( ))
(( ))
Used if a single word or short phrase
is completely unintelligible.
Foreign language
<language text>
<French merci>
This is used to indicate foreign speech.
If the foreign word is unknown, merely write the language.
If the language is unknown, consult your team leader.
NOTE: Do not use this convention for common
foreign language borrowings into English, such as XX
Punctuation
,.?
Limited to end-of-sentence and commas
Numbers
twenty-five, one hundred and six
Written in full
Non-standard contractions
going to, want to
spell out in full - gonna, wanna
Proper names
Interjections and non-lexemes
no special markup
uh-huh, mhm, yeah,
uh-oh, whoa
filled pauses
%
ah, er, hm, um,
uh
limited to this list
Pronounced Acronyms
@
@NAFTA, @AIDS
Do not try to correct grammatical errors,
e.g. "I seen him" for "I saw him" should be transcribed as spoken.
The same goes for mis-used words: transcribe what is spoken, not what you
expect to hear.
Commercials
Material repeated
between broadcasts.
Anything too "difficult"
to understand: if you have to listen to a passage more than 4 times in
order to understand anything, it is probably too difficult to transcribe
Anything obscured
by heavy distortion or overwhelming background noise
skipped speech portion
(even if it is a minute long). Use the notation "sn" (section
non-transcribed)
to designate sections that fall into the categories above. Generally,
"sn"
elsewhere in the
transcripts, add the notation [[repeat]] after the "sn". If you
happen to know the
other source for the repeated material, include that information
(file id, timestamp(s)
if you know it) after the [[repeat]]:
[[repeat]]
<sn
156.997>
[[repeat
sv970613d at time 708.388 to
840.328]]
<t 1122.443>
<male, Jacques_Cousteau>
about the same thing
I
<o 1123.276>
<female, spkr_2>
SPEAKER1:understand
{laugh}
SPEAKER2:well,
<e1 1124.256>
oceanography is
new exploration and we're not
The [[NS]] tag can be used when there
is an area within a turn that has no speech within it , i.e. a musical
interruption, or extended background noise.
<b 123.456
>
The crowd was furious.
<b 124.567>
[[NS]]
<b 128.987>
Calm was soon restored
by the arrival of the riot police.
We're
jus-- just waiting for that uh tha-- that report to to come in.
<t 148.57>
Gunfire filled the
air.
<e 154.50>
<t 170.89>
That sound greeted
early morning visitors on tuesday.
Checking and
separation of unintelligible (( )) speech
Syntax
Checking
<t 859.405> <<male>>
Not only do we methodically destroy the
coastal fringe
<b 863.598>
but we also throw back our toxic *1((
))*2 directly in the sea
<b 868.453>
or under the sea when we feel ashamed.
ii) Return to the text area, and
place the cursor in front of the brackets (*1)
iii) Hold Alt and the middle mouse
button (M2), drag the cursor over the (( )) region, and release M2(*2).
There is no highlighting over the region to show you that it is enabled,
but you will receive a prompt for implementation of the change if it happens
correctly.Common Messages include:
-The timestamp
does not contain corresponding transcript data
- an empty
line should follow each transcribed timestamp.
-only one
turn permitted for each line.
-foreign speech
should not be contained within "guess" brackets.
-rather the
"guess" should be contained within the foreign language bracket
self evident
may be a number
of possibilities
may be a number
of possibilities
If completely necessary,
punctuation should go inside these brackets - in most case, no punctuation
will be necessary.
If there is punctuation
immediately outside an < , please place on the inside of the bracket.
There should exist
a space after punctuation
There should exist
a space after punctuation
self explanatory
Some characters
are not allowed within the text - for instance, exclamation points - please
let your language leader know when you come across this error warning.
There should not
be any numerals in the text outside of the timestamps