a href="mailto:nmartey@ldc.upenn.edu"> Questions?
Linguistic Data Consortium  
Main   Sample Transcript    Keyboard Shortcuts    Tools Help     Emacs Help     LDC        

Guidelines for RT-02 Transcription

Overview

    Labels






    Speaker Identification




      Overlapping speech NOTE
    Multiple speakers

    In situations when you have several people speaking at once, (more than two) and it is very difficult to make them out, insert an <e tag at the start of the confused section. Then start the new turn at the next availabe clear section. This is the same treatment that is used for extended period of non-speech.
     
            <t 223.456> <<male, spkr_1>>
            <b 225.678>
            <e 230.302>
            <t 232.563> <<female, spkr_3>>>
    Speakers start simultaneously
    Create start times for the speakers that are about one tenth of a second apart, and insert the overlap tag.

     

    Checking and separation of unintelligible (( )) speech


    Syntax Checking

    Common Messages include:

  1. time-stamp without text data?
     -The timestamp does not contain corresponding transcript data

  2. time-stamp follows non-empty line
     - an empty line should follow each transcribed timestamp.

  3. turn should be on single line
     -only one turn permitted for each line.

  4. <English ...> should not be inside (( ))
     -foreign speech should not be contained within "guess" brackets.

  5. <English (()) > has no text content)
     -rather the "guess" should be contained within the foreign language bracket

  6. closing angle (`>') should be followed by space
    self evident

  7. bracket error with '[]'
    may be a number of possibilities

  8. bracket error with '()'
    may be a number of possibilities

  9. punctuation should be inside `((...))'
    If completely necessary, punctuation should go inside these brackets - in most case, no punctuation will be necessary.

  10. punctuation should be inside `<... >'
    If there is punctuation immediately outside an < , please place on the inside of the bracket.

  11. bad spacing around punctuation `.'
    There should exist a space after punctuation

  12. bad spacing around punctuation `?'
    There should exist a space after punctuation

  13. closing paren (`))') should be followed by space
    self explanatory

  14. turn contains ILLEGAL CHARACTER `!'
    Some characters are not allowed within the text - for instance, exclamation points - please let your language leader know when you come across this error warning.

  15. digits found in text
    There should not be any numerals in the text outside of the timestamps