(140) previous ~ index ~ next

To: David Graff <graff@unagi.cis.upenn.edu>
From: Ralf Brown <Ralf_Brown@v.gp.cs.cmu.edu>
Subject: Re: ASR output -- and MT output
Date: Thu, 01 Jul 1999 15:38:47 -0400

>Does anyone have a strenuous objection to this format for MT output?

None, other than the knee-jerk "ugh, even bigger files" (the current
TDT2 data set almost completely fills the largest [1.9G] partition
that our computing facilities will back up).


The mention of file formats reminded me that I had intended to post an
example of the control file that our systems use. During the
discussion of directory structure, either BBN or Dragon mentioned
setting up a file listing logical-to-actual filename mappings, and I
chimed in that we use a control file that does essentially the same
thing. Here's a brief extract from one for running with English
sources only:

<DOCCOLL judgements=tables/topic_relevance.table events=tdt2.events>
<DOCFILE file=tkntext/19980104_0002_0418_APW_ENG.tkn bounds=tables/19980104_0002_0418_APW_ENG.bndtkn>
<DOCFILE file=tkntext/19980104_0720_0851_APW_ENG.tkn bounds=tables/19980104_0720_0851_APW_ENG.bndtkn>
<DOCFILE file=asrtext/19980104_1130_1200_CNN_HDL.asr bounds=tables/19980104_1130_1200_CNN_HDL.bndasr>
...many more files...
</DOCCOLL>

This lists each of the files to be loaded for the test run, along with
the associated boundary files and the judgements file for the run (the
"events" file is just a mapping from event number to name of the form
EVENTnn). To do unsegmented tracking, just create a second control
file that points at different boundary files generated by an automatic
segmenter.

Ralf
(140) previous ~ index ~ next

Last updated Mon Jul 12 17:16:49 1999