DASL Project: Plan and Progress


    1) Establish process team to investigate sociolinguistic annotation project and develop team charge

    • Process team established, team charge developed


    2) Identify an appropriate sociolinguistic variable for annotation and analysis

    • Sociolinguistic variable identified (-t/d deletion); 4 corpora selected for annotation (TIMIT, Switchboard, CallHome English, Hub-4 English)


    3) Devlop a coding scheme (annotation specification)

    • Coding scheme developed (following Guy, 1980 with some modifications)
    4) Modify LDC-Online to allow for easy searching of the corpus/corpora, easy audio playback of examples, and coding/annotation of relevant tokens.  Additional modifications must allow for easy exporting of the coding string/annotations to an external program, and the inclusion of speaker demographics within the coding string.
    • LDC-Online modified and sociolinguistic annotation interface created


    5) Coordinate with part-time annotation staff to complete annotation.  This involves training, annotation and quality assurance.

    • Annotation is currently in progress
      • TIMIT corpus - annotation complete as of 9/22/2000

      • Results from TIMIT annotation; includes VARBRUL analysis
        Dual annotation of ~5% of TIMIT corpus currently underway (11/2000)
         
      • Annotation of Switchboard corpus in progress as of 1/2001
      • Annotation of additional corpora (CallHome, Broadcast News) to follow
    6) Produce documentation of the corpus creation effort: annotation guide, tools & resources, QA efforts, results, comparison with other studies of the variable; create a website containing corpus documentation and results.
    • Website with project documentation established


    7) Publicize the project within sociolinguistics community and solicit feedback from sociolinguists.

    • Members of the DASL Project will attend NWAVE 2000