NWAVE 2003 Workshop

Abstract

Robust sociolinguistic methodology: Tools, data and best practices
Chris Cieri and Stephanie Strassel, University of Pennsylvania


The methodology for sociolinguistic research, despite gradual evolution, has not fully exploited recent technological advances. Given the current state of the art in computing, one can envision a completely digital methodology for collecting, coding, analyzing and publishing linguistic data. Although progress toward this end is ongoing, there now exist tools, standards, examples of best practice and other linguistic resources that ease the process of building databases for analysis and sharing them. Shared digital resources promote research that is more robust and repeatable, and allow testing of competing analyses against a stable benchmark.

This workshop describes a rigorous, end-to-end digital methodology covering:

  • field data collection and the use of publicly available corpora
  • interview transcription, segmentation into meaningful units and alignment of audio and transcript
  • token selection and coding that are compatible with multiple analytical packages
  • publication and distribution of both the analysis and the underlying data

The format of the workshop will combine discussion of general principles with live demonstration of data and tools. Among the resources made available to workshop participants will be the SLX Corpus of Classic Sociolinguistic Interviews, consisting of 8 sociolinguistic interviews collected by William Labov and his students in the 1960s and 70s. The preparation of this corpus, from digitization, segmentation, transcription, coding and analysis to publication will be discussed as an illustration of the proposed digital sociolinguistic methodology.

Workshop Slides

in html format
in Microsoft PowerPoint
.zip file of PowerPoint slides





About LDC | Members | Catalog | Projects | Papers | LDC Online | Search / Help | Contact Us | UPenn | Home | Obtaining Data | Creating Data | Using Data | Providing Data

Contact Christopher.Cieri@ldc.upenn.edu

Contact Stephanie.Strassel@ldc.upenn.edu

Last modified: Friday, December 5 16:26:30
© 2000 Linguistic Data Consortium, University of Pennsylvania. All Rights Reserved.