The TDT3 Corpus supplements TDT2 by adding Mandarin Chinese news data from three sources (two newswires, one radio broadcast), covering the same January-June 1998 period provided in TDT2, and by adding new data from a three month period, October through December, 1998. This new collection period includes material drawn from eight English news sources (the same six used in TDT2, plus two new ones), as well as the three Mandarin Chinese sources.
The following sections describe each source in detail, with regard to its content, how it is received and processed, and any special properties it may have.