Guidelines for Hub4/TDT Transcription 2000

NEW!!  Second Passing
  1.  At the Unix prompt, enter

  2.   bc-type tdt train
  3.  Double click/ highlight the file desired, and paste in at the prompt

  4.  

     
     
     
     
     

    The xwave window will not appear until the first attempt is made to listen to a segment. Use the "Again" key to start. The transcription shortcuts show the keyboard commands for listening, scrolling and changing tags and timestamps in the editor.

    Conventions


 
 
 
  • for interruptions, use <o> & timestamp to indicate beginning of overlapping speech region - overlapping speech is determined by overlapping word boundaries, rather than the exact point in the waveform which may sever a word in two-

  •  
  • The [[NS]] tag can be used when there is an area within a turn that has no speech within it , i.e. a musical interruption, or extended background noise.

  •  

     
     
     

    <b 123.456 >
    The crowd was furious.
    <b 124.567>
    [[NS]]
    <b128.987>
    Calm was soon restored by the arrival of the riot police.
     

  • indicate disflencies by using hyphen to mark partial words; transcribe pause fillers, e.g.

  •  

     
     
     

        We're jus- just waiting for that uh tha- that report to to come in.
     

  • transcribe standard English contractions as they're spoken: they're, won't, isn't, don't, etc.

  •  
  • for non-standard contractions like "gonna" and "wanna" spell out the entire word: going to, want to.

  •  
  • identify extended non-speech sections (music, dead air, sound effects) with <e> and timestamp at beginning of section, followed by <t> and timestamp when speech resumes, e.g.

  •  

     
     
     

        <t 148.57> Sounds of gunfire filled the air.
        <e 154.50>
        <t 170.89> That sound greeted early morning visitors.
     

  • NOTE Several speakers

  •  

     

    In situations when you have several people speaking at once, and it is very difficult to make them out, insert an <e tag at the start of the confused section. Then start the new turn at the next available clear section.

    <t 223.456> <<male>>

            <b 225.678>

            <e 230.302>
            <t 232.563> <<female>>
     

  • use (( )) to indicate words or passsages that are hard to understand or difficult to transcribe accurately

  •  
  • spell out all numerical sequences

  •  

     
     
     
     
     

    twenty-five, sixty-six, one oh seven
     

  • spell out all titles like "doctor" (instead of Dr.) and "junior" (instead of Jr.), EXCEPT for Mr., Ms. and Mrs.

  •  
  • indicate proper names with a ^ (this is being done so that we can standardize the spelling of proper names after transcription; tags will be stripped out before delivery)

  •  
  • acronyms and spoken strings of letters will be indicated with ~, e.g.

  •  

     
     
     

        ~FBI
        Washington ~DC
     
     
     

  • we will use the following set of non-lexemes:

  •  

     
     
     

        ah
        eee
        eh
        ew
        ha
        hee
        huh
        hm
        oh
        oo
        um
        uh
     

  • other "special" words like interjections and acronyms which were specially tagged in some versions of hub4 transcriptions will be transcribed as normal words, i.e. not specially marked. for instance:

  •  

     
     
     

        yeah, uh-huh, okay, gee, hey, AIDS, NAFTA