(374) previous ~ index ~ next
To: tdt-distrib@unagi.cis.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: TDT3 Arabic Text corpus (AGAIN!)
Date: Mon, 24 Jun 2002 18:38:45 -0400
Folks,
It was brought to my attention just a little while ago that I made a
serious mistake when preparing the "v1.1" distribution tar file for the
TDT3 Arabic Text corpus. The tar file that I announced on Thursday,
June 20 had empty symbolic links where it should have had directories
containing machine-translated text data and boundary tables
("mttkn" and "mttkn_bnd").
That faulty tar file has now been replaced with a correct one, where all
the data directories are fully populated, as originally intended. (Note
that the topic tables in this latest file are still marked internally as
"version 1.0" -- that's because the topic-table version tracking is
distinct from the text-corpus versions.)
Those of you who requested the corpus and have downloaded the tar file
via ftp prior to 6:15pm today (June 24): PLEASE DOWNLOAD IT AGAIN at
your earliest convenience (that is, follow the same ftp instructions
that were originally emailed from the LDC for retrieving the data).
The file that you retrieve will now contain all the data. Please note
that the new tar file is now 68,609,572 bytes (instead of the 44.8 MB
of the earlier, faulty version).
When you have the tar file, you should "cd" to the "tdt3_em" directory
that currently contains your copy of TDT3 data distribution (as unpacked
from the tar file that was provided as part of the "TDT3 Multilanguage
Text Corpus" cdrom (LDC2001T58). Place the TDT3 Arabic Text tar file in
that directory, and then execute the following command:
tar xzf *LDC2002E32.tgz # (for users of gnu tar)
or
gunzip -c *LDC2002E32.tgz | tar xf - # (for other tar versions)
The structure of the tar file is such that its contents will be folded
into the existing data directories of the TDT3 Multilanguage corpus, so
as to encompass data from all three languages.
Please accept my sincerest and deepest apologies for having caused
confusion and wasted effort on your part. (Thanks to Alvaro Bolivar of
UMASS for notifying me about the problem.)
Dave Graff
-------------------------------------------------------------
To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu
with "unsubscribe tdt-distrib" in the body of the message.
(374) previous ~ index ~ next
Last updated Fri Jul 5 11:24:26 2002