(390) previous ~ index ~ next

To: TDT List <tdt-distrib@unagi.cis.upenn.edu>
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: TDT4 Participants: Input Required ASAP
Date: Mon, 16 Sep 2002 21:35:44 -0400

Folks,

YY has found a problem with the machine translated Mandarin ASR data.
31 of the CNR_MAN files had not been converted to GB encoding prior to
tranlating them into English. The translations are therefore useless.

We want to get the corpus right prior to everyone running their systems
on the corrupted data and submitting their results. This will
facilitate analyzing this and next year's results if we correct the
problem now rather than carrying the problem forward.

Prior to making such a decision, we'd like to pole the community to see
if it is feasible to make the correction given the current October 1
deadline. Participants would need the updated corpus files and a new
set of index files, both of which will be ready before noon EST
tomorrow. If it would help participants, moving the current deadline a
couple days would be an option.

In the absence of any objections, the required corpus will be to process
the TDT4 corpus with the 2 patchs applied. In the mean time, I'll
assume this is acceptable to everyone, and begin preparing the index
files.


Regards,
Jon
-------------------------------------------------------------
To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu
with "unsubscribe tdt-distrib" in the body of the message.
(390) previous ~ index ~ next

Last updated Tue Sep 17 09:27:13 2002