(221) previous ~ index ~ next

To: Jon_Yamron@dragonsys.com
From: Doug Oard <oard@glue.umd.edu>
Subject: Re: Clarification of tracking
Date: Thu, 5 Nov 1998 12:24:01 -0500 (EST)

On Thu, 5 Nov 1998 Jon_Yamron@Dragonsys.com wrote:

> 1) BBN has demonstrated that, due to occasional labeling errors, a few
> "off-topic" training examples may actually be on topic, and they put some
> effort into automatically finding these and eliminating them from the
> training of the background. The first question is: can these mis-labeled
> stories be used to supplement the training material for the topic model?
> (My assumption is NO, because if they had been labeled properly, the
> systems would not have been allowed to use them.)
>
> 2) If we chose to use other material to train a background model, such as
> the January-April data, are the restrictions that same? In other words, if
> we (automatically) scan for and find what we believe to be on-topic
> material in data that predates the test corpus, can we use it to help train
> the topic model? (Again, I assume the answer is NO, but that we are free
> to eliminate this material from the training of the background.)

Jon,

IMHO, if your processing is untouched by human hands once you
begin working with the evaluation material, all materials that you are
allowed to use should be allowable for any purpose. If, on the other
hand, there is even a single manual step in the process of identifying
possibly relevant documents (for example, maual formulation of a query),
then even removing those documents would not be permissible.

Just my 0.02 worth...

Doug


(221) previous ~ index ~ next

Last updated Fri Nov 6 15:29:22 1998