(393) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: Attachment for accessing updated Mandarin MT files
Date: Tue, 17 Sep 2002 16:08:00 -0400

Folks,

It's not clear to me that Jon's recent post to the list contained the
information that he intended to include, regarding how to obtain the
corrected versions of some "mtasr" and "mtasr_bnd" files for the TDT4
Text corpus. Those directions are provided below, just to be sure.

(Our friends at LIMSI had discovered that 31 files in the mtasr
directory, associated with the "CNR_MAN" data source, contained UTF-8
character data. The fact that UTF-8 data had been used as input to our
SYSTRAN Chinese-to-English system meant that the translations in these
31 files were entirely invalid and useless.)

The tar file referenced below should be unpacked in the same location
and manner as the original cdrom tar files, and the previous patch tar
file, in order to replace the 31 CNR_MAN.mtasr data files, and their
associated boundary tables.

Dave Graff

------------------

Use ftp to connect to hostname: ftp.ldc.upenn.edu
Enter user name: anonymous
Enter your email address as the password
Execute the following set of ftp commands:

ftp> quote site group ldc-mem
ftp> quote site gpass Enab2LDC
ftp> cd pub/ldc/members_only
ftp> binary
ftp> get tdt4_v1_0_update_man_mtasr_cnr.tgz
ftp> bye

Please note that all arguments given to the "ftp>" commands must have
upper- and lower-case characters EXACTLY as shown above. You should
understand that you will not be able to use "ls" or "dir" commands in
the ftp session to view the contents of the members_only directory;
also, the use of wild-card characters to replace (parts of) the file
name in the "get" command will not work. Simply enter the commands as
shown to get your file(s).

Please send email to <online-service@ldc.upenn.edu> if you have any
problems; in this case, it will be very helpful for us if you can
include the exact messages that appeared in the ftp session.

Here is a listing of the number of bytes in the data file(s):

Bytes Filename

742387 tdt4_v1_0_update_man_mtasr_cnr.tgz


-------------------------------------------------------------
To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu
with "unsubscribe tdt-distrib" in the body of the message.
(393) previous ~ index ~ next

Last updated Mon Nov 11 14:16:26 2002