(017) previous ~ index ~ next

To: Charles Wayne <clwayne@snap.org>,
From: Alvin Martin <alvin.martin@nist.gov>
Subject: [Fwd: RE: TDT3, a variety of issues]
Date: Wed, 10 Feb 1999 09:50:08 -0500

This message below from yesterday proved too large to be received by
most recipients. My apologies to Rich Schwartz and others whose mail
readers were unable to handle it.

I have removed the big attached documents. They are now available at:

ftp://jaguar.ncsl.nist.gov/tdt98/topic.doc
ftp://jaguar.ncsl.nist.gov/tdt98/topic.ps



-------- Original Message --------
Subject: RE: TDT3, a variety of issues
Date: Tue, 09 Feb 1999 17:12:20 -0500
From: Alvin Martin <alvin.martin@nist.gov>
To: Alvin Martin <alvin.martin@nist.gov>,"doddington@nist.gov"
<doddington@nist.gov>,"'Rich Schwartz'"
<schwartz@bbn.com>,"tdt-distrib@unagi.cis.upenn.edu"
<tdt-distrib@unagi.cis.upenn.edu>
CC: Dave Pallett <david.pallett@nist.gov>
References: <36C08DCF.C9D292D7@nist.gov>

The attached document, in Microsoft Word or postscript forms, looks at
the numbers of stories by topic for the training, development, and
evaluation sets as promised by George.

Does anyone have performance scores by topic for the same system run on
both the development and the evaluation data? I still need this to consider
performance based differences.

> -------- Original Message --------
> Subject: RE: TDT3, a variety of issues
> Date: Mon, 8 Feb 1999 10:57:09 -0800
> From: George Doddington <doddington@msn.com>
> Reply-To: "doddington@nist.gov" <doddington@nist.gov>
> To: "'Rich Schwartz'" <schwartz@bbn.com>
> CC: "tdt-distrib@unagi.cis.upenn.edu" <tdt-distrib@unagi.cis.upenn.edu>
>
> We always have a huge problem about vast differences between data
> sets. We tune to one and then find that the next set is completely
> different in that the criteria used have drifted -- even though not
> officially. So we see 4000 ontopic stories for Jan-Feb, 500 for Mar-Apr,
> and 1800 for May-June. These differences are clearly not due to random
> sampling, because there were plenty of topics to keep the differences
> smaller than that. It is because the types of topics and the criteria
> for inclusion of stories obviously drifted -- even if it was because of
> different people.
>
> Alvin Martin has applied several statistical tests to the distribution
> of the number of stories for topics and has found no statistically
> significant difference between the TDT2 devset and eval set, even
> at the relatively low confidence level of 90%. He is also looking
> at topic tracking performance differences and will distribute his
> finding to you (all) soon.
> --
> George Doddington in Orinda, CA. doddington@nist.gov 925/631-6628
(017) previous ~ index ~ next

Last updated Thu May 13 09:28:14 1999