(211) previous ~ index ~ next

To: jsmc@watson.ibm.com
From: George Doddington <doddington@email.msn.com>
Subject: Re: parallel news released by Hong Kong Government
Date: Wed, 13 Oct 1999 12:42:01 -0700

jsmc@watson.ibm.com wrote:
>
> What is fair use of this corpus for the evaluation, since it overlaps
> in time with the TDT corpus?
>
> Can we
> (a) use all of it?
> (b) use only those parts which are entirely before the evaluation data?
> (c) as we process the TDT corpus, refer to those parts of the Hong
> Kong corpus which are earlier than the TDT article currently being
> processed?
> (d) same as (c), but complicated by the deferral period?

The answer is (b). You may use data (that is generally available) which
predates the evaluation epoch. You may not use data which is contemporaneous
with the evaluation epoch. (While it may be theoretically reasonable to use
contemporaneous data to expand comprehension of the events, this will not be
allowed, because the resulting potential benefit is judged to be outweighed
by the burden of usage rules and the lack of comparability of results.)
Thank you for bringing this issue to our attention and for your help
in resolving it.
--
George Doddington in Orinda, CA: doddington@nist.gov or 925/631-6628

(211) previous ~ index ~ next

Last updated Tue Oct 19 10:10:09 1999