(182) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: topic_relevance.table patch
Date: Fri, 10 Sep 1999 18:15:08 EDT

Folks,

When looking at the index files just now, I realized that I had neglected one
detail in the conversion script that I circulated yesterday for creating
"version2" style file paths for the latest TDT2 data.

The detail involves the identification of topics in the file
"topic_relevance.table" -- all the existing index files and software assume
that topics are identified by the numbers 1 through 100, whereas the most
recent data release has them identified by numbers 20000 through 20100.

The following shell command, executed in the "tables" directory, will create a
modified version of the topic table with the correct topicids for version-2
style usage; you can check the output file to make sure the process worked
properly before you rename it to replace the topic table that was produced by
TDT2-v3-to-v2.perl:

perl -ne \
'if(/topicid=(\d+)/) {
$tid=$1;$ntid=$tid-20000;
s/topicid=$tid/topicid=$ntid/;
} print;' < topic_relevance.table > topic_relevance.table.fixed

I've tried this myself, pasting those 5 lines directly from my email window
onto the command line, and it did work as intended. An easy way to check
validity of the output is the following command:

grep topicid topic_relevance.table.fixed | cut -f2 "-d " | sort | uniq -c

which will yield the following 96 familiar lines, if everything has been done
correctly. Sorry about having overlooked this detail earlier.

Dave G.


3870 topicid=1
8 topicid=10
8 topicid=100
124 topicid=11
230 topicid=12
986 topicid=13
9 topicid=14
2353 topicid=15
8 topicid=16
37 topicid=17
105 topicid=18
126 topicid=19
1406 topicid=2
49 topicid=20
74 topicid=21
31 topicid=22
178 topicid=23
47 topicid=24
3 topicid=25
72 topicid=26
1 topicid=27
12 topicid=28
14 topicid=29
2 topicid=30
49 topicid=31
132 topicid=32
150 topicid=33
28 topicid=34
6 topicid=35
8 topicid=36
65 topicid=37
1 topicid=38
207 topicid=39
21 topicid=4
9 topicid=40
28 topicid=41
37 topicid=42
18 topicid=43
438 topicid=44
7 topicid=46
125 topicid=47
181 topicid=48
115 topicid=5
11 topicid=50
7 topicid=52
10 topicid=53
3 topicid=54
1 topicid=55
67 topicid=56
30 topicid=57
1 topicid=58
3 topicid=59
8 topicid=6
8 topicid=60
8 topicid=61
2 topicid=62
19 topicid=63
28 topicid=64
63 topicid=65
6 topicid=66
7 topicid=67
9 topicid=68
3 topicid=69
33 topicid=7
709 topicid=70
302 topicid=71
14 topicid=72
1 topicid=73
57 topicid=74
10 topicid=75
685 topicid=76
120 topicid=77
16 topicid=78
8 topicid=79
71 topicid=8
1 topicid=80
1 topicid=81
4 topicid=82
18 topicid=83
15 topicid=84
15 topicid=85
141 topicid=86
99 topicid=87
116 topicid=88
40 topicid=89
56 topicid=9
1 topicid=90
65 topicid=91
3 topicid=92
12 topicid=93
5 topicid=94
4 topicid=95
99 topicid=96
2 topicid=97
9 topicid=98
2 topicid=99


(182) previous ~ index ~ next

Last updated Wed Sep 22 10:26:04 1999