(028) previous ~ index ~ next
To: Ira Carp <Ira@dragonsys.com>
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: Re: TDT2 materials for January
Date: Tue, 28 Apr 1998 13:07:04 EDT
Ira,
Thank you for getting back to me on all those points about
inconsistencies in the previous release of January TDT data. It
turns out that Jon Fiscus at NIST had also noticed these problems, as
well as others (mostly involving mistakes and formatting errors in
many of the boundary tables).
There are two different paths to resolving the matter. First, NIST
will be releasing an "evaluation dry run" version of the January
data, so that sites will have a chance to get acquainted with
evaluation processes. That release will incorporate corrections for
all the inconsistencies that you have reported so far.
Second, the LDC will be releasing the new increment of TDT training
data, which will supercede the release that you are struggling with
now. That is, the next release from LDC will include data from
February as well as repaired and supplemented data from January, with
topic annotations across all this data for the 37 target topics
defined so far.
Both of these upcoming releases (from NIST and LDC) will be out
within the next week. (The NIST eval-test dry run is likely to be
out a bit sooner than the LDC release of Jan/Feb data.)
I apologize for the shortcomings in the initial set of boundary
tables released earlier this month. I will be updating our web-page
corpus description to reflect new features in the upcoming release.
Dave Graff
(028) previous ~ index ~ next
Last updated Wed Sep 9 09:40:47 1998