(125) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: J michAel schuLtz <mschultz@unagi.cis.upenn.edu>
Subject: index files for devtest
Date: Wed, 2 Sep 1998 11:48:35 -0400 (EDT)

I ran the following perl script called makesense over the trk_*.ndx
files:

#!/usr/local/bin/perl
while (<>) {
# media name extension
if (/[a-z]+\/[0-9]+_[0-9]+_[0-9]+_([A-Z]+_[A-Z]+).([a-z]+)/) {
print "$1 $2\n";
}
}


The output follows. For nwt+asr CNN_HDL, VOA_TDY, and VOA_WRP (e.g.
individual sources) are split across asr and tkn. In man_fdch
ABC_WNT comes from ccap and fdch. How can these be generated from
flist.devtest?

indexes_devtest > ./makesense trk_nwt+asr* | sort | uniq
ABC_WNT asr
APW_ENG tkn
CNN_HDL asr
CNN_HDL tkn
NYT_NYT tkn
PRI_TWD asr
VOA_TDY asr
VOA_TDY tkn
VOA_WRP asr
VOA_WRP tkn

indexes_devtest > ./makesense trk_nwt+man_ccap* | sort | uniq
ABC_WNT ccap
APW_ENG tkn
CNN_HDL tkn
NYT_NYT tkn
PRI_TWD tkn
VOA_TDY tkn
VOA_WRP tkn

indexes_devtest > ./makesense trk_nwt+man_fdch* | sort | uniq
ABC_WNT ccap
ABC_WNT fdch
APW_ENG tkn
CNN_HDL tkn
NYT_NYT tkn
PRI_TWD tkn
VOA_TDY tkn
VOA_WRP tkn

Isn't the purpose of the 3 tests nwt+asr, nwt+man_ccap and
nwt+man_fdch to pick the type of voice data for the relevant
sources? So that you could define the types for the different
news sources like:

nwt+asr
--------
ABC_WNT asr
APW_ENG tkn
CNN_HDL asr
NYT_NYT tkn
PRI_TWD asr
VOA_TDY asr
VOA_WRP asr

nwt+man_ccap
------------
ABC_WNT ccap
APW_ENG tkn
CNN_HDL tkn
NYT_NYT tkn
PRI_TWD tkn
VOA_TDY tkn
VOA_WRP tkn

nwt+man_fdch
------------
ABC_WNT fdch
APW_ENG tkn
CNN_HDL tkn
NYT_NYT tkn
PRI_TWD tkn
VOA_TDY tkn
VOA_WRP tkn

Otherwise I don't see a principled way of going from a single list
like flist.devtest to a collection for testing. Do other people
see this as a problem? As someone building a system that processes
things chronologically this is a nightmare. This means building a
database for each tracking test.

Mike

(125) previous ~ index ~ next

Last updated Wed Sep 9 09:40:55 1998