(092) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: re-release of Jan-March TDT2 Mandarin data
Date: Tue, 04 May 1999 18:48:19 EDT

Folks,

I apologize for the delay in fixing the problems with last Friday's
release of the Mandarin TDT2 data. In addition to errors made in
generating the topic table, we also found a number of the data files
(28 of the machine-translated "English" set) containing a variety of
formatting errors.

Here is a correct summary of topic hits in this data set (sorted by
number of hits), where the "topicid" numbers match those that were
used in the TDT2 English table. This list shows hits on 15 topics
(three of which have less than 4 hits); when the next three months
worth of data are released, there will be 20 topics in the table, all
of which will have at least 4 hits.

694 topicid=1
367 topicid=15
142 topicid=13
45 topicid=39
29 topicid=76
18 topicid=2
11 topicid=88
10 topicid=48
10 topicid=23
10 topicid=20
8 topicid=5
5 topicid=7
2 topicid=44
2 topicid=71
1 topicid=96

The new release is 167 MB when uncompressed, and the compressed tar
file is 42588112 bytes (42.5 MB).

The new file name, available now via the usual "members_only" method,
is:
tdt-deliv-990503.tar.gz

-----------
David Graff			Linguistic Data Consortium
graff@ldc.upenn.edu		3615 Market St., Suite 200
voice: (215) 898-0887		University of Pennsylvania
fax:   (215) 573-2175		Philadelphia, PA 19104
		http://www.ldc.upenn.edu

(092) previous ~ index ~ next

Last updated Thu May 13 09:28:24 1999