(179) previous ~ index ~ next

To: tdt-distrib@unagi.cis.upenn.edu
From: David Graff <graff@unagi.cis.upenn.edu>
Subject: Re: two questions
Date: Thu, 09 Sep 1999 09:26:34 EDT

Folks,

I received a couple of questions following my announcement of the v3.1 data
release, and thought I should distribute the answers.

First question:

> where are the index files we are to use for DRY RUN #2?

Well, there are the ones that Jon Fiscus posted, along with the scoring
software, on Aug. 12 -- these are still on the NIST web site, and I believe
these are the ones intended. I realized, after inspecting the contents of
that posting, that the software and index files are not yet geared to using
what I have sent out as "version 3.1" of TDT2 text data. (My humble apologies
to Rich, and everyone else, for the havoc I have raised.)

I believe the best solution will be to overlay the older directory and file
names onto the version 3.1 data. I have created a simple perl script (which
is attached below in uuencoded form) that will create symbolic links to
provide "version 2" style paths to all the data.

If you run the attached script (TDT2-v3-to-v2.perl) in the base directory of
TDT2 data after unpacking the contents of the v3.1 cdrom that arrives this
week, you will be able to run the existing NIST software (as well as your own
software that has been geared to earlier versions); any future software that
is intended for the v3 format will also work, since that data structure will
be preserved. The script may take up to 90 minutes to run, and adds about 150
MB to the total space consumed by the corpus (altogether, everything will
still fit within 2GB, but it will be snug).


Next question:

> Will the story-linking judgements finally be included? I couldn't find
> them on the "silver" August CD, on the NIST FTP site, or on the Web
> site....

The LDC has not done story-link annotation on TDT2 data. I believe the
intention that may have been left implicit in the Evaluation Plan is that the
existing target topic information, as provided in the main topic table, would
be adapted for use in doing training and development testing for this task.

George Doddington or Jon Fiscus may want to comment on this point. Based on
the current evaluation plan and index files for the story-link task, it looks
like there is both an index file and a "key" file for each variant of the
story-link task. For every story pair listed in the index file, the key file
contains an identifier, "TARGET" or "NONTARGET", for scoring purposes.

Dave Graff
begin 755 TDT2-v3-to-v2.perl
M(R$O=7-R+V)I;B]P97)L("UW"@HC(%!R;V=R86TZ"51$5#(M=C,M=&\M=C(N
M<&5R; HC(%=R:71T96X@8GDZ"41A=FED($=R869F+"!,1$,*"B,@4'5R<&]S
M93H)26YS=&%L;"!S>6UB;VQI8R!L:6YK<R!I;B!41%0R('9E<G-I;VX@,R!D
M:7-T<FEB=71I;VX@=&AA="!W:6QL"B,)"6-O;F9O<FT@=&\@=F5R<VEO;B R
M(&1A=&$@=7-A9V4*"B,@4W1A<G1I;F<@:6X@=&AE(&)A<V4@9&ER96-T;W)Y
M(&]F('1H92!41%0R(%1E>'0@8V]R<'5S+"!V,RXQ(&]R"B,@;&%T97(L('1H
M:7,@<V-R:7!T('=I;&P@8W)E871E('!R;WAY(&1I<F5C=&]R:65S(&9O<B!E
M86-H(&]F('1H90HC(&1I<W1I;F-T(&1A=&$@='EP97,@:6X@=&AE(&-O<G!U
M<R H;&ES=&5D(&EN("5R96UA<$1I<G,I+"!A;F0@;VYE"B,@<')O>'D@9&ER
M96-T;W)Y(&9O<B!A;&P@8F]U;F1A<GD@=&%B;&5S+B @26X@96%C:"!P<F]X
M>2!D:7)E8W1O<GDL"B,@<WEM8F]L:6,@;&EN:W,@=VEL;"!B92!C<F5A=&5D
M('1H870@<&]I;G0@=&\@=&AE(&9I;&5S(&EN('1H90HC(&-O<G)E<W!O;F1I
M;F<@=C,N,2!D871A(&1I<F5C=&]R>2X@(%1H92!P<F]X>2!D:7)E8W1O<FEE
M<R!A;F0@"B,@<WEM;&EN:R!N86UE<R!A;&P@8V]N9F]R;2!T;R!T:&4@<W!E
M8VEF:6-A=&EO;G,@=7-E9"!I;B!T:&4@96%R;&EE<@HC(')E;&5A<V5S(&]F
M(%1$5#(@*'8R(&]R(&]L9&5R*2X*"B,@26X@=&AE(&-A<V4@;V8@5D]!7T5.
M1R!F:6QE<R!C;VQL96-T960@9G)O;2!*86YU87)Y('1H<F]U9V@@36%Y+ HC
M(#$Y.3@L('1H92!P<F]X>2!D:7)E8W1O<GD@=VEL;"!H;VQD(&$@8V]P>2!O
M9B!T:&4@=C,N,2!F:6QE+"!R871H97(*(R!T:&%N(&$@<WEM;&EN:SL@=&AI
M<R!I<R!D;VYE(&)E8V%U<V4@=&AE(&9I;&4@;F%M97,@:6X@=&AI<R!S970@
M:&%D"B,@8F5E;B!E:71H97(@(E9/05]41%DB(&]R(")63T%?5U)0(B!I;B!T
M:&4@96%R;&EE<B!R96QE87-E<RP@86YD('1H90HC(&9I;&4@;F%M97,@87)E
M(&EN8VQU9&5D(&%S('!A<G0@;V8@=&AE(&1A=&$@8V]N=&5N="!I;B!E86-H
M(&9I;&4N"B,@5VAE;B!P;&%C:6YG(&$@8V]P>2!O9B!E86-H(%9/02!D871A
M(&EN=&\@82!P<F]X>2!D:7)E8W1O<GDL('1H90HC(&9I;&4@;F%M92!I;F9O
M<FUA=&EO;B!I;G-I9&4@=&AE(&9I;&4@:7,@;6]D:69I960@9F]R(&-O;G-I
M<W1E;F-Y+@H*(R!4:&4@;6%I;B!T;W!I8R!T86)L92P@(G1O<&EC<R]T9'0R
M7W1O<&EC7W)E;"YC;VUP;&5T95]A;FYO="(@:6X*(R!V,RXQ+"!I<R!C;W!I
M960@:6YT;R!T:&4@<')O>'D@(G1A8FQE<R(@9&ER96-T;W)Y(&%S('=E;&PN
M("!);B!T:&4*(R!P<F]C97-S+"!T:&4@(F9I;&5I9"(@:6YF;W)M871I;VX@
M:6X@=&AE('1A8FQE(&ES(&UO9&EF:65D(&%S(&YE961E9 HC('1O('1A:V4@
M86-C;W5N="!O9B!T:&4@86QT97)E9"!63T%?14Y'(&9I;&4@;F%M97,N"@HE
M<F5M87!$:7)S(#T@*"=A<S G(#T^("=A<W)T97AT)RP*"2 @(" @("=A<S$G
M(#T^("=A<S%T97AT)RP*"2 @(" @("=M='1K;B<@/3X@)VUT<G1E>'0G+ H)
M(" @(" @)VUT87,P)R ]/B G;71A=&5X="<L"@D@(" @(" G<V=M)R ]/B G
M<V=M;"<L"@D@(" @(" G=&MN)R ]/B G=&MN=&5X="<*"2 @(" @("D["B5R
M96UA<$5X=',@/2 H)V%S,"<@/3X@)V%S<B<L"@D@(" @(" G87,Q)R ]/B G
M87,Q)RP*"2 @(" @("=M='1K;B<@/3X@)VUT<B<L"@D@(" @(" G;71A<S G
M(#T^("=M=&$G+ H)(" @(" @)W-G;2<@/3X@)W-G;2<L"@D@(" @(" G=&MN
M)R ]/B G=&MN)PH)(" @(" @*3L*"F1I92 B66]U(&1O(&YO="!H879E('=R
M:71E('!E<FUI<W-I;VX@:6X@=&AE(&-U<G)E;G0@9&ER96-T;W)Y7&XB"B @
M("!U;FQE<W,@*" M=R B+B(@*3L*"F1I92 B56YA8FQE('1O(')E860@8V]R
M<'5S7VEN9F\O=F]A7VYA;65S+G1A8FQE("TM(')E8V]V97(@=&AI<R!F<F]M
M('1D=#)E;3,Q+G1G>EQN(@H@(" @=6YL97-S("@@+7(@(F-O<G!U<U]I;F9O
M+W9O85]N86UE<RYT86)L92(@*3L*"B,@3&]A9"!L:7-T(&]F(&-H86YG97,@
M=&\@;F%M97,@;V8@5D]!7T5.1R!F:6QE<PH*;W!E;B@@3%-4+" B/&-O<G!U
M<U]I;F9O+W9O85]N86UE<RYT86)L92(@*3L*=VAI;&4@*#Q,4U0^*0I["B @
M("!C:&]P.PH@(" @*"1O;&0L)&YE=RD@/2!S<&QI=#L*(" @("1V,G9O87LD
M;F5W?2 ]("1O;&0["GT*8VQO<V4@3%-4.PH*)&)A<V4@/2!@<'=D8#L*8VAO
M<" D8F%S93L*<')I;G0@(D%D87!T:6YG(%1$5#(@=C,@=&\@=C(@:6X@)&)A
M<V4N+BY<;B(["@HC($-R96%T92!A;'1E<FYA=&4@9&ER96-T;W)Y(&%N9"!F
M:6QE(&YA;65S(&9O<B!S9VT@86YD('1O:V5N('-T<F5A;2!D871A(&9I;&5S
M"@IF;W)E86-H("1D("@@:V5Y<R@@)7)E;6%P1&ER<R I*0I["B @("!U;FQE
M<W,@*" M9" D9" I('L*"7!R:6YT("(@+2T@=V%R;FEN9SH@)&0@9&ER96-T
M;W)Y(&ES(&UI<W-I;F<@:6X@)&)A<V4[(&UO=FEN9R!O;BXN+EQN(CL*"6YE
M>'0["B @("!]"B @("!P<FEN=" B0W)E871I;F<@86YD(&9I;&QI;F<@)')E
M;6%P1&ER<WLD9'TN+BY<;B(["B @("!M:V1I<B@@)')E;6%P1&ER<WLD9'TL
M(# W-34@*2!U;FQE<W,@*" M9" D<F5M87!$:7)S>R1D?2 I.PH@(" @8VAD
M:7(@)')E;6%P1&ER<WLD9'T["B @(" D;W)I9T1I<B ]("(N+B\D9"(["B @
M("! 9FEL97,@/2!@;',@)&]R:6=$:7)@.PH@(" @9F]R96%C:" H($!F:6QE
M<R I"B @("!["@EC:&]P.PH)*"1F:6QE:60L($!E>'0I(#T@<W!L:70H("]<
M+B\@*3L*"6EF("@@97AI<W1S*" D=C)V;V%[)&9I;&5I9'T@*2D@>PH)(" @
M("1L;FM.86UE(#T@(B1V,G9O87LD9FEL96ED?2XD<F5M87!%>'1S>R1D?2([
M"@D@(" @:68@*" D9"!E<2 B<V=M(B I('L*"0E@;&X@+7,@)&]R:6=$:7(O
M)%\@)&QN:TYA;65@.PH)(" @('T@96QS92!["@D)8'-E9" G<R\D9FEL96ED
M+R1V,G9O87LD9FEL96ED?2\G("1O<FEG1&ER+R1?(#X@)&QN:TYA;65@.PH)
M(" @('T*"7T@96QS92!["@D@(" @)&QN:TYA;64@/2 B)&9I;&5I9"XB.PH)
M(" @("1L;FM.86UE("X]('-H:69T*"! 97AT("D@+B B+B(@:68@*"! 97AT
M(#T](#(@*3L*"2 @(" D;&YK3F%M92 N/2 B)')E;6%P17AT<WLD9'TB.PH)
M(" @(&!L;B M<R D;W)I9T1I<B\D7R D;&YK3F%M96 ["@E]"B @("!]"B @
M("!C:&1I<B D8F%S93L*?0H*(R!#<F5A=&4@82 B=&%B;&5S(B!D:7)E8W1O
M<GD@86YD(&9I;&P@:70@=VET:"!A;&P@8F]U;F1A<GD@=&%B;&5S"@I 8FYD
M<R ](&!L<R M9" J7V)N9& ["F1I92 B3F\@5$14,B!V,R!B;W5N9&%R>2!T
M86)L92!D:7)E8W1O<FEE<R!I;B D8F%S95QN(@H@(" @=6YL97-S("@@0&)N
M9',@/B Q("D["@IM:V1I<B@@(G1A8FQE<R(L(# W-34@*3L*8VAD:7(@(G1A
M8FQE<R(["F9O<F5A8V@@)&0@*"!K97ES*" E<F5M87!$:7)S("DI"GL*(" @
M(&YE>'0@:68@*" D9"!E<2 B<V=M(B I.PH@(" @)&]R:6=$:7(@/2 B+BXO
M)&0B+B)?8FYD(CL*(" @('5N;&5S<R H("UD("1O<FEG1&ER("D@>PH)<')I
M;G0@(B M+2!W87)N:6YG.B D>V1]7V)N9"!D:7)E8W1O<GD@:7,@;6ES<VEN
M9R!I;B D8F%S93L@;6]V:6YG(&]N+BXN7&XB.PH);F5X=#L*(" @('T*(" @
M('!R:6YT("),:6YK:6YG(&)O=6YD87)Y('1A8FQE<R!F;W(@)&0N+BY<;B([
M"B @("! 9FEL97,@/2!@;',@)&]R:6=$:7)@.PH@(" @9F]R96%C:" H($!F
M:6QE<R I"B @("!["@EC:&]P.PH)*"1F:6QE:60L0&5X="D@/2!S<&QI="@@
M+UPN+R I.PH):68@*"!E>&ES=',H("1V,G9O87LD9FEL96ED?2 I*2!["@D@
M(" @)&QN:TYA;64@/2 B)'8R=F]A>R1F:6QE:61]+F)N9"1R96UA<$5X='-[
M)&1](CL*"2 @("!@<V5D("=S+R1F:6QE:60O)'8R=F]A>R1F:6QE:61]+R<@
M)&]R:6=$:7(O)%\@/B D;&YK3F%M96 ["@E](&5L<V4@>PH)(" @("1L;FM.
M86UE(#T@(B1F:6QE:60N(CL*"2 @(" D;&YK3F%M92 N/2!S:&EF="@@0&5X
M=" I("X@(BXB(&EF("@@0&5X=" ]/2 R("D["@D@(" @)&QN:TYA;64@+CT@
M(F)N9"1R96UA<$5X='-[)&1](CL*"2 @("!@;&X@+7,@)&]R:6=$:7(O)%\@
M)&QN:TYA;65@.PH)?0H@(" @?0I]"@HC(&QA<W0@<W1E<"!F;W(@(G1A8FQE
M<R(@9&ER96-T;W)Y.B @9F]L9"!I;B!T:&4@;6%I;B!T;W!I8R!T86)L90H*
M<')I;G0@(D%D9&EN9R!T;W!I8U]R96QE=F%N8V4N=&%B;&4N+BY<;B(["F]P
M96XH(%1"3"PB/"XN+W1O<&EC<R]T9'0R7W1O<&EC7W)E;"YC;VUP;&5T95]A
M;FYO="(@*3L*;W!E;B@@3U54+"(^=&]P:6-?<F5L979A;F-E+G1A8FQE(B I
M.PIW:&EL92 H/%1"3#XI"GL*(" @(&EF("@@+V9I;&5I9#TH6UQD7UTK5D]!
M7T5.1RDO("D@>PH))&9I;&5I9" ]("0Q.PH)<R\D9FEL96ED+R1V,G9O87LD
M9FEL96ED?2\@:68@*"!E>&ES=',H("1V,G9O87LD9FEL96ED?2 I*3L*(" @
M('T*(" @('!R:6YT($]55#L*?0IC;&]S92!40DP["F-L;W-E($]55#L*"F-H
?9&ER("1B87-E.PIP<FEN=" B86QL(&1O;F5<;B(["F-L
 
end

(179) previous ~ index ~ next

Last updated Wed Sep 22 10:26:04 1999