(119) previous ~ index ~ next

To: doddington@nist.gov
From: Steve Lowe <steve@dragonsys.com>
Subject: Re: Training data for dry run?
Date: Fri, 11 Jun 1999 19:16:10 -0400

George,

I just received your message after sending my reply to Jaime. Would
you agree that a "tainted" experiment is acceptable, namely if we run
the test with our existing background models that were previously
trained from what is now test data? That is, our "dry run" submission
would exercise the full protocol for 1999 evaluation, but would not
constitute a scientifically valid experiment because of the partial
conflation of training and test data.

Thanks,
Steve

--------------

Date: Fri, 11 Jun 1999 18:35:44 -0400
From: George Doddington <doddington@nist.gov>
Reply-To: doddington@nist.gov
Organization: Information Technology Laboratory, NIST
X-Mailer: Mozilla 4.51 [en] (Win95; U)
X-Accept-Language: en
MIME-Version: 1.0
CC: tdt-distrib@ldc.upenn.edu
References: <8525678D.006EB873.00@notes-mta.dragonsys.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Jon_Yamron@dragonsys.com wrote:
>
> If the test data for the dry run spans the entire 6 months of the TDT2 corpus,
> what should we use for training data for those parts of our TDT software that
> require it? For example, one of our trackers requires multiple background
> models trained from news material similar to the test stories.

There is no EvalSet for the June dry run. Furthermore, the evaluation is
over the entire TDT2 data set, and there is no separation imposed between
training data and DevSet data. Don't be alarmed. The purpose of the June
dry run is to debug the task definitions and the research support tools and
infrastructure. It is not a contest. Please feel free to choose training
data in whatever way makes most sense to you under these conditions, so as
to best support your research. We certainly don't want to limit your
research. Selection of training and DevSet data for the September dry run
(and beyond) is an important item to discuss at this month's meeting. For
now, however, each site is free to make their own choice in this matter.
--
George Doddington in McLean, VA: doddington@nist.gov or 703/556-3434


(119) previous ~ index ~ next

Last updated Mon Jun 21 11:18:50 1999