(368) previous ~ index ~ next
To: tdt-distrib@unagi.cis.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: TDT3 Arabic Text Corpus With Translations
Date: Thu, 20 Jun 2002 14:17:30 -0400
Folks,
There is now an updated version of the TDT3 Arabic Text Corpus, which
includes the output of machine translation into English; this release,
which is designated as version 1.1, replaces the original version (1.0)
that I announced a few weeks ago.
The overall quality and vocabulary coverage of this MT output is likely
to fall short of the standards already established in the Chinese-to-
Enlgish MT provided in TDT2 and TDT3. For example, about 18% of word
tokens in the MT output are untranslated Arabic.
Participants who already requested this corpus (LDC2002E32, TDT3 Arabic
Text), have already been notified of the update. If you intend to
participate in the TDT-2002 Evaluation, and have not yet obtained this
corpus, please:
- send email to ldc@ldc.upenn.edu
- mention that you are a participant in TDT-2002
- request corpus LDC2002E32 (TDT3 Arabic Text)
Those who have an LDC membership for 2001 or 2002, or who have already
obtained the previously published "Arabic Newswire Text" corpus
(LDC2001T55), will receive the TDT3 Arabic Text corpus without further
ado.
Participants who do not have an LDC membership, and have not previously
purchased the larger newswire text corpus, will need to submit a signed
user agreement form (Ilya Ahtaridis, the LDC Membership Coordinator,
will provide the appropriate form), and will further need to agree to
the following additional condition:
By December 31, 2002 (i.e. at the conclusion of the TDT2002 evaluation
cycle and workshop), you must either pay for an LDC membership, OR pay
a non-member purchase price for this corpus (to be determined, but
typically less than the cost of a membership), OR DELETE the data from
all storage media at your institution.
(This is a standard condition to permit cost-free use of copyrighted
data as part of research participation in a sponsored evaluation
program.)
The data will be delivered via ftp; instructions for ftp retrieval will
be provided in response to your email request.
Dave Graff
-------------------------------------------------------------
To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu
with "unsubscribe tdt-distrib" in the body of the message.
(368) previous ~ index ~ next
Last updated Mon Jun 24 18:19:30 2002