Data
and Annotation for Sociolinguistics (DASL)
Annotation
Specification: -t/d deletion
Note:
This specification represents our current practices, and was adopted with
the tagging of the Switchboard Corpus. An earlier version (click
here
to view it) of the annotation spec was in place during coding of the TIMIT
Corpus. TIMIT will eventually be re-coded to bring it into line with
the current spec.
For each token,
the annotator makes judgements with respect to seven factor groups: status
of the dependent variable; morphological category; preceding segment; following
segment; same preceding and following segment; stress; and cluster complexity.
-
Status
of the dependent variable
-
Deleted
-t/d segment has been completely deleted
-
Retained
-t/d segment is retained. Although
a binary division of deleted/retained is usually adequate, in some cases
the final -t/d segment is phonetically altered. The segment is often
unreleased, glottalized or (less commonly) flapped. All of these
variant realizations of -t/d are coded as retained, and the annotator notes
the variation in the comments field.
-
N/A
Because the regular expression query used
to generate the list of potential -t/d tokens operates on orthographic
transcripts, even with these filters in place some words which are not
tokens of -t/d appear in the final list of words to be reviewed.
(For instance, based on its orthography, the word 'would' appears
to be a possible -t/d token, but is not.) These tokens are labeled
N/A and are excluded from the final analysis.
-
Morphological
Category (revised from TIMIT spec)
-
Monomorphemic
-t/d appears in a word constituting a
single morpheme, e.g. 'old'
-
Irregular/Semi-weak Past
Tense
-t/d appears in an irregular past tense
verb of semi-weak verbs, which undergo both stem change and -t/d suffixation,
e.g. 'told'. Note: Annotation practices in previous
studies have varied with respect to the treatment of irregular verbs.
Originally in the TIMIT corpus, LDC annotators included all irregular verbs
in this category (modals like must, strong verbs like built;
the
token went). Further annotation excluded must from
this category because of its high rate of deletion and high number of tokens
in the corpus (n=34); a final round of recoding excluded both
went
and strong verbs.
-
Strong Past Tense
-t/d appears in the past tense form of
strong verbs, which undergo stem change but contain a -t/d in both present
and past tense forms, and thus cannot be said to undergo -t/d suffixation,
e.g., 'build/built'.
-
Regular Past Tense Preterite
-t/d appears in a regular past tense verb,
e.g. 'walked'.
-
Regular Past Tense "Helping"
-t/d appears in a regular past tense verb,
as part of a compound verb, e.g. 'have walked'.
(Note: Discussion of functional
load argument here)
-
Went
-t/d appears in the token 'went'.
We chose to treat this verb separately because it appeared in TIMIT to
have an unusually high deletion rate, much higher than that of monomorphemes.
-
Must
-t/d appears in the modal verb 'must'
which also appeared in TIMIT to have an unusually high deletion rate.
Past -t/d studies have shown that monomorphemes
are the most likely category to undergo application of the -t/d deletion
rule, whereas regular past-tense forms are most resistant to deletion.
-
Preceding
Segment
The characteristics of the consonant immediately
preceding the -t/d token influences the probability of deletion as well.
In previous studies, manner of articulation of the preceding segment was
shown to be the most relevant constraint, and our study adopts a seven-way
distinction. Although some previous studies have adopted a five-way
distinction, we chose a finer-grained coding scheme because much of the
published -t/d data indicates that preceding /s/ and other alveolar segments
favor deletion more strongly than their non-alveolar counterparts, so we
chose to code preceding alveolars as separate factors. The seven
categories of preceding segment are:
-
Lateral (/l/)
-
Rhotic (/r/)
-
Alveolar Nasal (/n/)
-
Non-Alveolar Nasal (/m/
or /ng/)
-
Stop or Affricate (/p,
b, k, g/). Cases of preceding affricates were noted in the comments field.
-
Alveolar Fricative (/s,
z/)
-
Non-Alveolar Fricative (/f,
v, th, sh/)
In coding this factor group, the annotator
is sometimes faced with a situation in which the preceding segment has
been reduced or vocalized. This typically occurs with the liquids
/r/ and /l/. If the preceding /l/ segment has been phonetically reduced
in this way, the annotator will note this in the comments field, but still
indicates the preceding environment as /l/, because the segment is phonemically
present although phonetically altered. With vocalized preceding /r/,
no evidence of the /r/ remains in the acoustic signal, so the token is
coded as N/A, and the annotator notes the vocalized /r/ in the comments
field. Less commonly, the annotator might encounter tokens where
non-liquid preceding segments have been deleted (e.g., government being
rendered as "governme_t"). As with vocalized preceding /r/, such
cases are coded as N/A, with appropriate commentary.
-
Following
environment
This refers to which segment, if any,
follows the -t/d token. Past studies of -t/d deletion have consistently
demonstrated that following consonants favor deletion, while following
vowels inhibit deletion. Among the consonants, glides and liquids
favor deletion less than obstruents. The effect of following pause
has been shown to be variable based on geographic region. We have
adopted a seven-way distinction for following segment:
-
obstruents (stops, fricatives
and nasals)
-
lateral (/l/)
-
rhotic (/r/)
-
labio-velar glide (/w/)
-
palatal glide (/y/).
(Note: In the TIMIT corpus, glides were
annotated as 'clustering' or 'non-clustering'. This distinction proved
unimportant; the relevant distinction seems to be place of articulation
of the glide rather than the glide's ability to cluster with the -t/d segment.)
-
vowel
-
pause (silence follows
-t/d segment). Following pause usually occurrs sentence-finally.
In other cases, the annotator would indicate following pause if the speaker
had a significant, noticable break in the flow of speech, usually to take
a breath, etc.
In coding following segments, annotators
occasionally encounter cases where the segment has been entirely deleted.
This happens most frequently with /h/, particularly when the following
word is an unstressed pronoun (her, his). In such cases, the annotator
codes the following segment as a vowel, since the segment has been completely
deleted, and makes a note in the comments field.
-
Same
preceding and following environment (New factor
group)
This indicates whether or not the -t/d
token is surrounded by two identical segments, e.g., 'processed soybeans';
and
if so, what these segments are. In coding TIMIT, annotators observed
that an indentical segment in the preceding and following environments
significantly favored deletion of -t/d (p < 0.001). This effect
seems stronger with surrounding obstruents -- especially the alveolars
/s/ and /z/ -- than with surrounding liquids. The current annotation
specification adopts an eight-way distinction, which may be condensed later.
(Note: Because of the way preceding and following environments are
coded, it is not possible to simply observe after the fact whether the
surrounding environments are identical. For instance, preceding /s/
would be coded as "alveolar fricative" while following /s/ would be coded
as "obstruent". This mismatch is in keeping with previous studies'
coding schemes for -t/d deletion.)
-
N/A
-
surrounding alveolar fricatives
(/s,
z/)
-
surrounding alveolar nasals
(/n/)
-
surrounding stops (/p,
b, k. g/) Also, surrounding affricates, which are noted in the comments
field.
-
surrounding other fricatives
(/f,
v, th, sh/)
-
surrounding other nasals
(/m,
ng/)
-
surrounding rhotics (/r/)
-
surrounding laterals (/l/)
-
Stress
(New factor group)
This refers to whether or not the syllable
containing the -t/d token is stressed ('passed') or unstressed,
('discouraged'). Past studies have shown that unstressed syllables
are more likely to undergo -t/d deletion than stressed syllables.
-
Stressed syllable
-
Unstressed syllable
-
Cluster
complexity (New factor group)
This refers to how many elements are present
in the cluster containing -t/d. Past studies have shown that clusters
containing three or more elements ('mixed') are more likely to undergo
-t/d deletion than clusters with just two elements ('helped').
-
Cluster with two elements
-
Cluster with three elements
-
Cluster with more than three
elements
Back
to main DASL page