(030) previous ~ index ~ next
To: "doddington@nist.gov" <doddington@nist.gov>,
From: Alvin Martin <alvin.martin@nist.gov>
Subject: Re: TDT3, a variety of issues
Date: Tue, 16 Feb 1999 16:05:16 -0500
Ralf Brown has shared with me some of his performance results for the TDT
tracking task, for which I thank him. This has enabled me to look at topic
performance scores based on the defined cost function as promised by George.
Discussion of this in Microsoft Word or postscript form is now available at:
ftp://jaguar.ncsl.nist.gov/tdt98/topic2.doc
ftp://jaguar.ncsl.nist.gov/tdt98/topic2.ps
The previous discussion based only on the numbers of stories per topic is
similarly available at;
ftp://jaguar.ncsl.nist.gov/tdt98/topic.doc
ftp://jaguar.ncsl.nist.gov/tdt98/topic.ps
Alvin Martin wrote:
> The attached document, in Microsoft Word or postscript forms, looks at the
> numbers
> of stories by topic for the training, development, and evaluation sets as
> promised by
> George.
>
> Does anyone have performance scores by topic for the same system run on both
> the
> development and the evaluation data? I still need this to consider
> performance based
> differences.
>
> > -------- Original Message --------
> > Subject: RE: TDT3, a variety of issues
> > Date: Mon, 8 Feb 1999 10:57:09 -0800
> > From: George Doddington <doddington@msn.com>
> > Reply-To: "doddington@nist.gov" <doddington@nist.gov>
> > To: "'Rich Schwartz'" <schwartz@bbn.com>
> > CC: "tdt-distrib@unagi.cis.upenn.edu" <tdt-distrib@unagi.cis.upenn.edu>
> >
> > We always have a huge problem about vast differences between data
> > sets. We tune to one and then find that the next set is completely
> > different in that the criteria used have drifted -- even though not
> > officially. So we see 4000 ontopic stories for Jan-Feb, 500 for
> > Mar-Apr,
> > and 1800 for May-June. These differences are clearly not due to random
> > sampling, because there were plenty of topics to keep the differences
> > smaller than that. It is because the types of topics and the criteria
> > for
> > inclusion of stories obviously drifted -- even if it was because of
> > different people.
> >
> > Alvin Martin has applied several statistical tests to the distribution
> > of the number of stories for topics and has found no statistically
> > significant difference between the TDT2 devset and eval set, even
> > at the relatively low confidence level of 90%. He is also looking
> > at topic tracking performance differences and will distribute his
> > finding to you (all) soon.
> > --
> > George Doddington in Orinda, CA. doddington@nist.gov 925/631-6628
>
> ------------------------------------------------------------------------
> Name: topic.doc
> topic.doc Type: Microsoft Word Document (application/msword)
> Encoding: base64
>
> Name: topic.ps
> topic.ps Type: Postscript Document (application/postscript)
> Encoding: 7bit
--
Alvin F. Martin
National Institute of Standards and Technology
100 Bureau Drive Stop 8940
Gaithersburg, MD 20899-8940
E-mail: alvin.martin@nist.gov
Phone: 301/975-3169
FAX: 301/670-0939
(030) previous ~ index ~ next
Last updated Thu May 13 09:28:15 1999