The LDC has many recordings - broadcasts, telephone calls,
interviews - of natural speech among multiple parties on a variety of
topics. We hope (and believe) that these recordings could serve as a
valuable resource for foreign language instructors at all levels. The
recordings have been digitized; some of them have been transcribed,
and some have even been annotated specifically to support language teaching.
We want to:
We have prepared a simple database and some delivery scripts in order to demonstrate one possible approach to resource access. Here it is.
This mini database contains delivery information about recordings in Arabic and French. The scripts are designed to present the transcripts in language appropriate encodings, and to extract timestamped audioclips automatically. In the case of the Arabic recordings, the transcripts were originally done in transliteration; the processing scripts are performing conversion to either ISO-8859-6 or CP-1256 depending on information included in the database files. A logical extension of this will be to allow for users to choose their preferred encoding (or transliteration).,