(220) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: Jon_Yamron@Dragonsys.com
Subject: Clarification of tracking
Date: Thu, 5 Nov 1998 11:03:09 -0500

In the tracking task, we are allowed to use the off-topic training examples
in the index file to train a background model. I need a clarification of
exactly what is allowed---specifically, I have the following questions:

1) BBN has demonstrated that, due to occasional labeling errors, a few
"off-topic" training examples may actually be on topic, and they put some
effort into automatically finding these and eliminating them from the
training of the background. The first question is: can these mis-labeled
stories be used to supplement the training material for the topic model?
(My assumption is NO, because if they had been labeled properly, the
systems would not have been allowed to use them.)

2) If we chose to use other material to train a background model, such as
the January-April data, are the restrictions that same? In other words, if
we (automatically) scan for and find what we believe to be on-topic
material in data that predates the test corpus, can we use it to help train
the topic model? (Again, I assume the answer is NO, but that we are free
to eliminate this material from the training of the background.)

3) Of course, the actual test material (May-June data) encountered during
processing may be used in any way we see fit, including training the topic
model, as long as it is done in an automatic fashion.

- Jon


(220) previous ~ index ~ next

Last updated Fri Nov 6 15:29:22 1998