LDC Transcription Guide
For each file, you will be provided with an audio file and a blank text file. The annotator's job is to produce a perfect verbatim (word-for-word) transcription of each file, complete with accurate timestamps. This will be accomplished in two passes. The first pass is designed to create an intial transcript and timestamps. The second pass will focus on adding additional markup, correcting errors, and making sure all conventions have been applied consistently.
SPINE2 Transcription Conventions
Some special conventions are used to indicate particular kinds of speech:
Partial Words
(audio cuts out)
Partial words (words cut off at the beginning,
middle or end of the word) will be marked with + at the beginning of the
word (no space separating the + from the word). The whole word will be
spelled out in standard orthography; do not try to represent how the word
was pronounced. This convention is used only for words that have
been cut off or interrupted by the audio signal. This notation
does not pertain to those examples where the speaker trails off or does
not complete the word.
e.g. Say that again +please.
Speaker Noise
For speaker noise that occurs within the
speaker's turn, use curly brackets. These noises are limited to the
following:
{cough} {laugh} {breath} {lipsmack}
Background
Non-Speaker Noise
Background noise (extended) is marked
with [text/] [/text] for the start / end of non speaker noise. In the case
of the SPINE files, it has been used to denote strong static.
e.g. [noise/] Armed. Firing [/noise]
Partial Words
(speaker caused)
For these partial words, that part of
the word that is heard should be transcribed followed by a dash.
e.g. Let's tr- Let's try that
again.
Spelled Out
Words
If a speaker spells out the letters of
a word, each individual letter of the word should be preceded by a tilde
(~) and written with a capital letter. Each spelled-out letter should be
space-separated. This would indicate that the speaker said the word 'fear'
and then spelled it out.
e.g. It's fear, ~F ~E ~A ~R.
Capitalization
Capitalization should conform to standard
English usage.
Punctuation
Punctuation should be limited to the following
symbols:
.
(period)
,
(comma)
?
(question mark)
Hesitation
sounds, filled pauses
Transcribers will indicate filled pauses
using a standardized spelling. These hesitation words do not require any
special markup.
(*If you believe a speaker uses a word
that does not appear on this list, let us
know.)
| ach | eh | hm | oh | um | yep | |
| ah | er | huh | ooh | whew | yuh | |
| ah-ha | ew | jeepers | oop | whoa | yup | |
| ay | geez | mm | oops | whoo-hoo | ||
| ay-yi-yi | ha | mm-hm | uh | whoops | hoo | |
| duh | he-hem | nah | uh-huh | yay | op | |
| eee | hee | oof | uh-oh | yeah | ow |
Contractions
and apostrophe -s
Limit your use of contractions to those
that exist in standard written English, and of course only when a contraction
is actually produced by the speaker. The table below, while not comprehensive,
illustrates what is considered standard written English with respect to
contractions.
(Note: Avoid the common mistakes of transposing
possessive its for contraction it's
(it is) and possessive your for the contraction
you're
(you are).
| Complete words | Contraction allowed (when spoken) | Contraction
not allowed |
| I have | I've | |
| cannot | can't | |
| will not | won't | |
| you have | you've | |
| could not | couldn't | |
| we will | we'll | |
| should have | should've | |
| it is | it's | |
| she/he is | she's/he's | |
| they are | they're | |
| Marvin - possessive | Marvin's | |
| Marvin is | -- | Marvin's |
| Marvin has | -- | Marvin's |
| ship is | -- | ship's |
| going to | -- | gonna |
| want to | -- | wanna |
Mispronounced
Words
Mispronounced words will be marked with
an asterik (*). The word should be spelled in standard orthography. Do
not try to represent how the word was pronounced.
Speaker Noises
Sometimes speakers will make noises in
between words. These sounds are not "words" like our hesitation words.
Examples are things like sshhhhhhhhh, ssssssssssss, pssssssss. Note
these sounds with a backslash and the first two letters of the sound heard.
(Put spaces around these sounds - do not connect them to the previous/following
word).
e.g. Well, I /sh I don't
know.
/ss
/ps
These sounds should not be confused with elongated words, such as ssshoooot, which should be transcribed in standard orthography - "shoot".
Things to look out for/Miscellaneous
Some arbitrary
spelling decisions
| Use this | Not this |
| all right | alright |
| OK | okay, ok |
| alrighty | all righty |
| gotcha | got ya |
Some uncommon words
These words are frequently uttered by certain speakers. They're
not mispronunciations or made-up words; they're just uncommon. Don't
correct them, but do spell them consistently:
armorage
armage
ay-yi-yi
Some common mistakes
Spelling
| correct | incorrect |
| acoustic/acoustical | accoustic/accoustical |
Watch Out!
In noisy files, it's sometimes easy to confuse certain words.
Be careful with pairs like:
confirm ~ confirmed
acoustic ~ acoustical
and so on.
Target
Word List
The words in the
table below are the "target" words used by the participants in each game.
Each set of grid coordinates will be a pair of these words. It is
very important that these words be spelled correctly and consistently throughout
all files. Please pay particular attention to this on second passing.
Although the words in the list are spelled with all capital letters, please
use normal capitalization for these words in the transcripts.
| ABORT
ABOVE AFFIRM AFT ALFA ANCHOR AWAY BINGO BLAST BOGEY BOW BRAVO BREAK BROKEN BUSTER CHARLIE CLEAR CODE COPY CREW DANGER DECK DECOY DELTA DITCH DIVERT DRIVE |
ECHO
ENGINE FOE FOXTROT FUEL GOLF GUNS HELP HOTEL INDIA JULIETT KAYBECK KILO KNOTS LAMPS LAUNCH LEFT LEVEL LIMA LOST LOUD LOW MANY MAYDAY MERGED MIKE MINUS |
MIXUP
MOVING NORMAL NOVEMBER ORBIT OSCAR PAPA PING PLUS POINT PORT POWER PRONTO PUNCH RANGE READY RED RESCUE RIGHT ROGER ROMEO ROUTE RUDDER SALVO SEAS SECTOR SECURE |
SHIP
SIERRA SINGLE SKUNK SPLASH SPOT SQUAWK STEADY STOP STRIKE SWITCH TANGO TARGET TOOL TURN UNABLE UNIFORM VECTOR VERY VICTOR WAVE WHISKEY X-RAY YANKEE ZULU |