
I. Project scope: LDC will transcribe (5) minutes of each of 250 different
conversations (500 sides) compiled from the recently completed GSM phase of
SWB_CELLULAR. This will be done for Speaker Identification so that the
transcriptions can be used for speech-to-text systems R&D and evaluation
under conditions of vocoded speech. The following restrictions for selected
calls will apply:
A. Each transcribed call will involve a different pair of speakers.
B. No speaker may be involved in more than (3) transcribed
conversations.
C. In the event that those speakers paired during data collection do not
allow for the fulfillment of the above (A. and B.) transcription criteria,
individual sides will be transcribed to complete the quota of 500
transcribed sides.
Transcription Spec:
A: corresponding to the local channel, (the lower waveform window)
B: corresponding to the remote channel. (the top waveform window)
A "turn" has a speaker channel identification, and has a beginning and end timestamp.
The insertion of "breakpoints" has the same appearance as a new speaker
turn. Breakpoints can be inserted wherever they seem convenient to the
transcriber. They should occur at the natural boundaries of speech, such as
pauses, breaths, etc. Do not insert a breakpoint (timestamp) in the middle
of a word! The time stamp has both a start and end point, and neither point
can overlap a previous timestamp of the same speaker.
Punctuation
The following punctuation marks are to be used in the transcripts. The
punctuation marks are primarily for ease of (human) reading. Use only those
punctuation marks indicated below. Do not use marks such as single
quotation (' '), exclamation ('!') or apostrophe (') other than those given
below.
periods "." should be added at the end of declarative sentences
question marks "?" should be added at the end of interrogative
sentences commas "," should be added between clauses as is accepted
in the standard orthography of the language
Symbols
Acronyms I: are pronounced as a single word and should be written in caps
(no spaces) and preceded by a "@" symbol:@AIDS
Acronyms II: are normally written as a single word but pronounced as a
sequence of individual letters and should be written in all caps (no spaces)
and preceded by a "~" symbol: ~FBI
Individual letters: are pronounced as such
and should be written in caps and proceded by a "~" symbol:
Proper names: Both proper names and place names should be marked with a
"^"symbol. If you encounter a "proper name phrase", mark only those
words as proper names that are true proper names on their own.
Partial words: are indicated with a dash week-(without any spacing between
the dash and the word):
If a word is mispronounced (such as a slip of the tongue), provide the
correct spelling of the word, and place a "+" symbol in front of the
word.
Idiosyncratic words
If a speaker uses a "made-up" word which is not used by other
speakers (although it may be understandable), place a "*" symbol
before the word. Consult your language leader in cases where you are
uncertain whether a word fits in this category. Onomatopoeia also
fits into this category.
Interjections
Use one from a set of standardized spellings for interjections. When
it is hard to determine how to represent the interjection, ask your
language leader.
English interjections as transcribed in English.
mhm
uh-huh
uh-oh
whoa
whew
yeah
jeeze
Non-lexemes
In addition to the interjections (which are considered to be words),
we also have a set of standardized spellings for hesitation sounds
that speakers make while talking. Every such "non word" in the
transcripts is marked with the "%" symbol.
English non-lexemes (to give you an idea of the criterion for lexemes and non-lexemes.)
%ach
%ah
%eee
%eh
%ew
%ha
%hee
%huh
%hm
%huh
%um
%uh
%oh
Noises
In order to account for sound phenomena such as distortion, coughs,
breaths, unintelligible speech, foreign words and phrases, etc, we utilize
a set of unique brackets.
{Text} Sound made by the talker.
Use only those sounds described below: {laugh} {cough} {sneeze} {breath}
{lipsmack}
Sound not made by the talker (usually background or channel). This notation
should be used only in those rare cases where the background
condition is overwhelming.
Use only those descriptions provided below: [distortion] [static] -- used
for channel noise such as "buzzes", "pops", etc. [background] --
used for other noises such as children crying, pots being struck,
etc. There may be many instances of a brief channel noise, such as
intermittent [static] or [background] noises. You can ignore these
occurences. The focus of these transcriptions are areas of speech, so
there is no need to be overly concerned with small
distortions. Similarly, if a speaker is stuttering, or starts to
speak with a series of partial, hesititant words which have been
individually timestamped, include the partial speech into a larger
speech section.
[text/] [/text] Marks when sound not made by the talker is
non-instantaneous. Place this at the beginning and end of the noisy
region. These tags are channel specific, and therefore the tag can
cross turn changes if the sound is extended.
Other Conventions
((text)) Unintelligible speech. This is the transcriber's best guess.
(( )) Unintelligible speech (one or more words) that you cannot even make a
guess at (with a single space between the parentheses).
English (enclosed in triangle brackets) This is used to indicate speech (one or more words) in
another language. In place of "language", write the name of the
language, if known. This can overlap with the (( )) notation
above. If the language is recognized and can be transcribed, use the
notation. If the language is recognized but cannot be
transcribed, use . If the language is not even
recognized, use just the (( )) notation as above.
<as/> text </as> This is used to mark an aside made by the primary talker
where the talker is addressing someone in the background.
<ov/> text </ov> Overlapping speech is when a speaker is interrupted by
another speaker, at a roughly equal volume. In situations where
overlapping speech occurs, insert the breakpoint at the beginning of
the word in which the interruption started, in other words, at the
end of the last complete word.
damiller@ldc.upenn.edu
Last modified: Fri Oct 20 15:21:16 2000