TIMIT Annotation Overview
|
|
|
|
|
|
|
|
Overall
Deletion Rate
Total number of tokens deleted: 518 (32.8%)
Total number of tokens retained: 1060
(67.2%)
Unique
tokens in TIMIT
The TIMIT Corpus consists of 630 speakers
reading a list of 10 phonetically-rich sentences (selected from a larger
set). Despite the use of read speech, there are few tokens that occur
multiple times relative to the total number of tokens coded.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Morphological Category
|
|
|
|
| Monomorphemes |
|
|
| Irregular verbs (all-inclusive)** |
|
|
| Irregular verbs (excluding must)** |
|
|
| Irregular verbs (excluding must, went & all strong verbs)** |
|
|
| Regular Verbs |
|
|
Preceding Environment
|
|
|
|
| Alveolar Nasal |
|
|
| Alveolar Fricative |
|
|
| Other Fricative |
|
|
| Stop |
|
|
| Lateral |
|
|
| Other Nasal |
|
|
| Rhotic |
|
|
Following Environment
|
|
|
|
| Obstruent |
|
|
| Rhotic |
|
|
| Clustering Glide |
|
|
| Lateral |
|
|
| Other Glide |
|
|
| Pause |
|
|
| Vowel |
|
|
Identical Preceding/Following Environment
|
|
|
|
| s_s (processed_soybeans) |
|
|
| obstruent_obstruent (stopped_passing)
*includes s_s environment |
|
|
| liquid_liquid (guard_rail, old_lady) |
|
|
| overall |
|
|
Education
Race/Ethnicity
Age
Sex
|
|
|
|
| Males |
|
|
| Females |
|
|
Region
|
|
|
|
| Southern |
|
|
| New York |
|
|
| Mixed (Army Brat) |
|
|
| New England |
|
|
| South Midland |
|
|
| North Midland |
|
|
| Northern |
|
|
| Western |
|
|
Rate
of Deletion before PAUSE by Geographical Region
First Run
| Eight Factor Groups; 44 Factors | Selected? |
| Rule Application | n/a |
| Morphology | yes |
| Preceding | yes |
| Following | yes |
| Sex | no |
| Region | no |
| Age | no |
| Race | yes |
| Education | yes |
Second
Run
-eliminated non-selected factor groups
from first run (Gender, Education, Age)
-recodes
RESULTS
Summary Statistics
and Factor Weights
| Group | Factors | Retained (%) | Deleted (%) | Total N (%) | Factor Weight | |
| Morphology | monomorpheme | m | 630 (62) | 394 (38) | 1024 (65) | 0.535 |
| reg. past tense | p | 397 (77) | 116 (23) | 513 (33) | 0.428 | |
| irregular | i | 33 (80) | 8 (20) | 41 (3) | 0.531 | |
| Preceding Segment | lateral | l | 135 (84) | 26 (16) | 161 (10) | 0.240 |
| alveolar fricative | f | 228 (58) | 163 (42) | 391 (25) | 0.635 | |
| stop | k | 187 (77) | 57 (23) | 244 (15) | 0.426 | |
| rhotic | r | 230 (91) | 22 (9) | 252 (16) | 0.161 | |
| alveolar nasal | n | 204 (47) | 228 (53) | 432 (27) | 0.756 | |
| other fricative | c | 55 (75) | 18 (25) | 73 (5) | 0.433 | |
| other nasal | s | 21 (84) | 4 (16) | 25 (2) | 0.390 | |
| Following Segment | rhotic | h | 29 (52) | 27 (48) | 56 (4) | 0.650 |
| vowel | v | 455 (86) | 72 (14) | 527 (33) | 0.245 | |
| obstruent | b | 286 (47) | 321 (53) | 607 (38) | 0.767 | |
| pause | q | 206 (82) | 46 (18) | 252 (16) | 0.305 | |
| lateral | a | 12 (71) | 5 (29) | 17 (1) | 0.380 | |
| cluster glide | g | 61 (58) | 44 (42) | 105 (7) | 0.645 | |
| other glide | o | 11 (79) | 3 (21) | 14 (1) | 0.330 | |
| Group | Factors | Retained (%) | Deleted (%) | Total N (%) | Factor Weight | |
| Race | white | W | 991 (68) | 464 (32) | 1455 (92) | 0.489 |
| black | L | 33 (49) | 34 (51) | 67 (4) | 0.753 | |
| unknown | U | 25 (61) | 16 (39) | 41 (3) | 0.433 | |
| other | O | 11 (73) | 4 (27) | 15 (1) | 0.552 | |
| Education | Bachelors | B | 587 (67) | 289 (33) | 876 (56) | 0.514 |
| High School | H | 133 (64) | 74 (36) | 207 (13) | 0.524 | |
| Masters | T | 250 (71) | 100 (29) | 350 (22) | 0.436 | |
| PhD | P | 45 (78) | 13 (22) | 58 (4) | 0.357 | |
| Unknown | K | 13 (42) | 18 (58) | 31 (2) | 0.752 | |
| Associates | A | 32 (57) | 24 (43) | 56 (4) | 0.616 | |
|
|
|
|
|
|||
Third
Run
-split the irregular verb category into
four new categories:
RESULTS
Summary Statistics
and Factor Weights
| Group | Factors | Retained (%) | Deleted (%) | Total N (%) | Factor Weight | |
| Morphology | monomorpheme | m | 611 (64) | 347 (36) | 958 (61) | 0.521 |
| reg. past tense | p | 392 (76) | 121 (24) | 513 (33) | 0.436 | |
| strong verb | s | 18 (62) | 11 (38) | 29 (2) | 0.476 | |
| semi-weak verbs | i | 26 (67) | 13 (33) | 39 (2) | 0.553 | |
| must | t | 11 (32) | 23 (68) | 34 (2) | 0.747 | |
| went | w | 2 (40) | 3 (60) | 5 (0) | 0.837 | |
| Preceding Segment | lateral | l | 135 (84) | 26 (16) | 161 (10) | 0.244 |
| alveolar fricative | f | 228 (58) | 163 (42) | 391 (25) | 0.635 | |
| stop | k | 187 (77) | 57 (23) | 244 (15) | 0.395 | |
| rhotic | r | 230 (91) | 22 (9) | 252 (16) | 0.161 | |
| alveolar nasal | n | 204 (47) | 228 (53) | 432 (27) | 0.768 | |
| other fricative | c | 55 (75) | 18 (25) | 73 (5) | 0.436 | |
| other nasal | s | 21 (84) | 4 (16) | 25 (2) | 0.383 | |
| Following Segment | rhotic | h | 29 (52) | 27 (48) | 56 (4) | 0.649 |
| vowel | v | 455 (86) | 72 (14) | 527 (33) | 0.248 | |
| obstruent | b | 286 (47) | 321 (53) | 607 (38) | 0.759 | |
| pause | q | 206 (82) | 46 (18) | 252 (16) | 0.313 | |
| lateral | a | 12 (71) | 5 (29) | 17 (1) | 0.372 | |
| cluster glide | g | 61 (58) | 44 (42) | 105 (7) | 0.655 | |
| other glide | o | 11 (79) | 3 (21) | 14 (1) | 0.354 | |
| Group | Factors | Retained (%) | Deleted (%) | Total N (%) | Factor Weight | |
| Race | white | W | 991 (68) | 464 (32) | 1455 (92) | 0.489 |
| black | L | 33 (49) | 34 (51) | 67 (4) | 0.751 | |
| unknown | U | 25 (61) | 16 (39) | 41 (3) | 0.430 | |
| other | O | 11 (73) | 4 (27) | 15 (1) | 0.556 | |
| Education | Bachelors | B | 587 (67) | 289 (33) | 876 (56) | 0.512 |
| High School | H | 133 (64) | 74 (36) | 207 (13) | 0.523 | |
| Masters | T | 250 (71) | 100 (29) | 350 (22) | 0.439 | |
| PhD | P | 45 (78) | 13 (22) | 58 (4) | 0.361 | |
| Unknown | K | 13 (42) | 18 (58) | 31 (2) | 0.725 | |
| Associates | A | 32 (57) | 24 (43) | 56 (4) | 0.622 | |
|
|
|
|
|
|||
Dual
Annotation Results
Reannotation
of 5% of TIMIT corpus is complete. Details coming soon...