Data
and Annotation for Sociolinguistics (DASL)
Annotation
Specification: -t/d deletion
For each token, the annotator
makes judgements with respect to four factor groups: status of the dependent
variable; morphological category; preceding segment and following segment.
-
Status of the dependent variable
-
Deleted
-t/d segment has been completely deleted
-
Retained
-t/d segment is retained. Although
a binary division of deleted/retained is usually adequate, in some cases
the final -t/d segment is phonetically altered. The segment is often
unreleased, glottalized or (less commonly) flapped. All of these
variant realizations of -t/d are coded as retained, and the annotator notes
the variation in the comments field.
-
N/A
Because the regular expression query used
to generate the list of potential -t/d tokens operates on orthographic
transcripts, even with these filters in place some words which are not
tokens of -t/d appear in the final list of words to be reviewed.
(For instance, based on its orthography, the word 'would' appears
to be a possible -t/d token, but is not.) These tokens are labeled
N/A and are excluded from the final analysis.
-
Morphological Category
-
Monomorphemic
-t/d appears in a word constituting a
single morpheme, e.g. 'old'
-
Irregular/Ambiguous
Past Tense
-t/d appears in an irregular past tense
verb of semi-weak verbs, which undergo both stem change and -t/d suffixation,
e.g. 'told'.. Note: Annotation practices in previous
studies have varied with respect to the treatment of irregular verbs.
Originally, LDC annotators included all irregular verbs in this category
(modals like must, strong verbs like built; the token went).
Further annotation excluded must from this category because of its
high rate of deletion and high number of tokens in the corpus (n=34); a
final round of recoding excluded both must and strong verbs.
-
Regular Past Tense
-t/d appears in a regular past tense verb,
e.g. 'walked'.
Because we were interested in preterit/pronominal
distinctions for regular verbs, such cases were noted in the comments field
for possible future analysis.
Past -t/d studies have shown that monomorphemes
are the most likely category to undergo application of the -t/d deletion
rule, whereas regular past-tense forms are most resistant to deletion.
-
Preceding Segment
The characteristics of the consonant immediately
preceding the -t/d token influences the probability of deletion as well.
In previous studies, manner of articulation of the preceding segment was
shown to be the most relevant constraint, and our study adopts a seven-way
distinction. Although some previous studies have adopted a five-way
distinction, we chose a finer-grained coding scheme because much of the
published -t/d data indicates that preceding /s/ and other alveolar segments
favor deletion more strongly than their non-alveolar counterparts, so we
chose to code preceding alveolars as separate factors. The seven
categories of preceding segment are:
-
Lateral (/l/)
-
Rhotic (/r/)
-
Alveolar Nasal (/n/)
-
Non-Alveolar Nasal (/m/
or /ng/)
-
Stop or Affricate (/p,
b, k, g/). Cases of preceding affricates were noted in the comments field.
-
Alveolar Fricative (/s,
z/)
-
Non-Alveolar Fricative (/f,
v, th, sh/)
In coding this factor group, the annotator
is sometimes faced with a situation in which the preceding segment has
been reduced or vocalized. This typically occurs with the liquids
/r/ and /l/. If the preceding /l/ segment has been phonetically reduced
in this way, the annotator will note this in the comments field, but still
indicates the preceding environment as /l/, because the segment is phonemically
present although phonetically altered. With vocalized preceding /r/,
no evidence of the /r/ remains in the acoustic signal, so the token is
coded as N/A, and the annotator notes the vocalized /r/ in the comments
field. Less commonly, the annotator might encounter tokens where
non-liquid preceding segments have been deleted (e.g., government being
rendered as "governme_t"). As with vocalized preceding /r/, such
cases are coded as N/A, with appropriate commentary.
-
Following environment
This refers to which segment, if any,
follows the -t/d token. Past studies of -t/d deletion have consistently
demonstrated that following consonants favor deletion, while following
vowels inhibit deletion. Among the consonants, glides and liquids
favor deletion less than obstruents. The effect of following pause
has been shown to be variable based on geographic region. We have
adopted a seven-way distinction for following segment:
-
obstruents (stops, fricatives
and nasals)
-
lateral (/l/)
-
rhotic (/r/)
-
clustering glides (w+unrounded
V, y+u)
-
non-clustering glides (w+rounded
vowel, y elsewhere)
-
vowel
-
pause (silence follows
-t/d segment). Following pause usually occurrs sentence-finally.
In other cases, the annotator would indicate following pause if the speaker
had a significant (LENGTH??) break in the flow of speech, usually to take
a breath, etc.
In coding following segments, annotators
occasionally encounter cases where the segment has been entirely deleted.
This happens most frequently with /h/, particularly when the following
word is an unstressed pronoun (her, his). In such cases, the annotator
codes the following segment as a vowel, since the segment has been completely
deleted, and makes a note in the comments field.