SPINE2

SPeech in Noisy Environments - Phase 2

LDC Transcription Guide


Jump to:

The Basics

For each file, you will be provided with an audio file and a blank text file. The annotator's job is to produce a perfect verbatim (word-for-word) transcription of each file, complete with accurate timestamps.  This will be accomplished in two passes.  The first pass is designed to create an intial transcript and timestamps.  The second pass will focus on adding additional markup, correcting errors, and making sure all conventions have been applied consistently.

SPINE2 Transcription Conventions

Some special conventions are used to indicate particular kinds of speech:

Partial Words (audio cuts out)
Partial words (words cut off at the beginning, middle or end of the word) will be marked with + at the beginning of the word (no space separating the + from the word). The whole word will be spelled out in standard orthography; do not try to represent how the word was pronounced. This convention is used only for words that have been cut off or interrupted by the audio signal. This notation does not pertain to those examples where the speaker trails off or does not complete the word.

e.g.  Say that again +please.

Speaker Noise
For speaker noise that occurs within the speaker's turn, use curly brackets.  These noises are limited to the following:
{cough} {laugh} {breath} {lipsmack}
 

Background Non-Speaker Noise
Background noise (extended) is marked with [text/] [/text] for the start / end of non speaker noise. In the case of the SPINE files, it has been used to denote strong static.

e.g.  [noise/] Armed. Firing [/noise]
 

Partial Words (speaker caused)
For these partial words, that part of the word that is heard should be transcribed followed by a dash.

e.g.  Let's tr- Let's try that again.
 

Spelled Out Words
If a speaker spells out the letters of a word, each individual letter of the word should be preceded by a tilde (~) and written with a capital letter. Each spelled-out letter should be space-separated. This would indicate that the speaker said the word 'fear' and then spelled it out.

e.g. It's fear, ~F ~E ~A ~R.
 

Capitalization
Capitalization should conform to standard English usage.

Punctuation
Punctuation should be limited to the following symbols:
. (period)
, (comma)
? (question mark)

Hesitation sounds, filled pauses
Transcribers will indicate filled pauses using a standardized spelling. These hesitation words do not require any special markup.
(*If you believe a speaker uses a word that does not appear on this list, let us know.)
 
 
ach  eh hm oh um yep
ah er huh ooh whew yuh
ah-ha ew jeepers oop whoa yup
ay geez mm oops whoo-hoo
ay-yi-yi  ha mm-hm uh whoops  hoo
duh he-hem  nah uh-huh  yay op
eee hee oof uh-oh  yeah ow

Contractions and apostrophe -s
Limit your use of contractions to those that exist in standard written English, and of course only when a contraction is actually produced by the speaker.  The table below, while not comprehensive, illustrates what is considered standard written English with respect to contractions.

(Note: Avoid the common mistakes of transposing possessive its for contraction it's (it is) and possessive your for the contraction you're (you are).
 
Complete words Contraction allowed (when spoken) Contraction
not allowed
I have I've  
cannot can't  
will not won't  
you have you've  
could not couldn't  
we will we'll  
should have should've  
it is it's  
she/he is she's/he's  
they are they're  
Marvin - possessive Marvin's  
Marvin is -- Marvin's
Marvin has -- Marvin's
ship is -- ship's
going to -- gonna
want to -- wanna

Mispronounced Words
Mispronounced words will be marked with an asterik (*). The word should be spelled in standard orthography. Do not try to represent how the word was pronounced.

Speaker Noises
Sometimes speakers will make noises in between words. These sounds are not "words" like our hesitation words.  Examples are things like sshhhhhhhhh, ssssssssssss, pssssssss.  Note these sounds with a backslash and the first two letters of the sound heard.  (Put spaces around these sounds - do not connect them to the previous/following word).

e.g.   Well, I /sh I don't know.
         /ss
         /ps

These sounds should not be confused with elongated words, such as ssshoooot, which should be transcribed in standard orthography -  "shoot".

Things to look out for/Miscellaneous

Some arbitrary spelling decisions
 
Use this Not this
all right alright
OK okay, ok
alrighty all righty
gotcha got ya

Some uncommon words
These words are frequently uttered by certain speakers.  They're not mispronunciations or made-up words; they're just uncommon.  Don't correct them, but do spell them consistently:

armorage
armage
ay-yi-yi
 

Some common mistakes

Spelling
correct incorrect
acoustic/acoustical accoustic/accoustical

Watch Out!
In noisy files, it's sometimes easy to confuse certain words.  Be careful with pairs like:

confirm ~ confirmed
acoustic ~ acoustical

and so on.

Target Word List
The words in the table below are the "target" words used by the participants in each game.  Each set of grid coordinates will be a pair of these words.  It is very important that these words be spelled correctly and consistently throughout all files.  Please pay particular attention to this on second passing.  Although the words in the list are spelled with all capital letters, please use normal capitalization for these words in the transcripts.
 
ABORT
ABOVE
AFFIRM
AFT
ALFA
ANCHOR
AWAY
BINGO
BLAST
BOGEY
BOW
BRAVO
BREAK
BROKEN
BUSTER
CHARLIE
CLEAR
CODE
COPY
CREW
DANGER
DECK
DECOY
DELTA
DITCH
DIVERT
DRIVE
ECHO
ENGINE
FOE
FOXTROT
FUEL
GOLF
GUNS
HELP
HOTEL
INDIA
JULIETT
KAYBECK
KILO
KNOTS
LAMPS
LAUNCH
LEFT
LEVEL
LIMA
LOST
LOUD
LOW
MANY
MAYDAY
MERGED
MIKE
MINUS
MIXUP
MOVING
NORMAL
NOVEMBER
ORBIT
OSCAR
PAPA
PING
PLUS
POINT
PORT
POWER
PRONTO
PUNCH
RANGE
READY
RED
RESCUE
RIGHT
ROGER
ROMEO
ROUTE
RUDDER
SALVO
SEAS
SECTOR
SECURE
SHIP
SIERRA
SINGLE
SKUNK
SPLASH
SPOT
SQUAWK
STEADY
STOP
STRIKE
SWITCH
TANGO
TARGET
TOOL
TURN
UNABLE
UNIFORM
VECTOR
VERY
VICTOR
WAVE
WHISKEY
X-RAY
YANKEE
ZULU


Questions/comments?
strassel@ldc.upenn.edu  nmartey@ldc.upenn.edu

Last modified: Fri, July 6, 2001