(016) previous ~ index ~ next

To: Rich Schwartz <schwartz@bbn.com>
From: "Charles L. Wayne" <clwayne@afterlife.ncsc.mil>
Subject: Collecting All the Data at Once
Date: Wed, 10 Feb 1999 09:42:51 -0500 (EST)

Rich,

On Wed, 10 Feb 1999, Rich Schwartz wrote:

> . . .
>
> Somehow, this resulted in a large change in the measured
> performance measures. As I said in my previous message, this large
> change might actually be expected over time in the real world due to
> changing realities. But it would be nice if we could make the
> research (artificially if you like) homogeneous in order to be able to
> understand what's happening. I think this can only be done by
> artificial means, like creating all the corpora at the same time.
> So I'd like to propose that we do that for TDT-3.

Although it might be nice to do so, economic realities prevent us from
creating all the corpora at the same time. The LDC will be creating a
substantial TDT3 test set (using data from October-December 1998) but will
not be creating a corresponding TDT3 training or dev test data.

For training and dev test data, sites may use the TDT2 corpus (which
contains data from January-June 1998). The LDC will be expanding the TDT2
corpus to encompass Mandarin data from January-June 1998, annotating it
with respect to the existing set of 100 topics.

Charles


(016) previous ~ index ~ next

Last updated Thu May 13 09:28:14 1999