(114) previous ~ index ~ next

To: tdt-distrib@ldc.upenn.edu
From: Jon_Yamron@dragonsys.com
Subject: Training data for dry run?
Date: Fri, 11 Jun 1999 16:14:16 -0400

If the test data for the dry run spans the entire 6 months of the TDT2 corpus,
what should we use for training data for those parts of our TDT software that
require it? For example, one of our trackers requires multiple background
models trained from news material similar to the test stories.

Could we at least reserve the first two months of the corpus for training, both
for tracking and detection? (This comprises only about 20% of the Mandarin
files, not 33% as might be expected, and that very few Mandarin ASR stories are
in this set.) Note that for tracking, it would probably still work to take
topic training stories from the first two months, if that was found to be
necessary.

If this is not acceptable, I don't see any way to avoid using test material as
training. Perhaps this is not a big deal, as this is a dry run , after all.

- Jon


(114) previous ~ index ~ next

Last updated Mon Jun 21 11:18:49 1999