(321) previous ~ index ~ next

To: TDT Distrib <tdt-distrib@ldc.upenn.edu>
From: Jonathan Fiscus <jonathan.fiscus@nist.gov>
Subject: Re: 2001 TDT Evaluation Data Use Outline
Date: Tue, 29 May 2001 11:35:43 -0400

Folks,

I have not heard many remarks concerning TDT corpora usage for the
Fall's TDT evaluation. Are there any objections to options 3 and 4
below? I though they were controversial, but I like the direction it
would take TDT. I'd like to put a definative data usage section in a
revised evaluation plan so that all researchers are on the same page.
Feedback?

Also, at James' request, I've re-coded the tracking evaluation script to
accept story decisions using docno's to identify stories (rather than
source file/recid story coordinates). The aim of this re-coding is to
make the code more TREC-friendly in conjunction with the latest release
of the TDT corpora.

I'll be releasing the software later this week once I have done some
more testing. Later, I plan on modifying other task's evaluation
scripts as requested, (though I probably do topic detection next).

Jon

Jonathan Fiscus wrote:
>
> TDT Evaluation Participants:
>
> Since we'll be using the TDT3 corpus for this Fall's evaluation and half
> of the TDT3 topics will be released for training, we should modify the
> tasks or corpora slightly to make this year's TDT evaluation
> sufficiently different and to protect systems from training to the TDT3
> data.
>
> Through previous discussions, there are four potential ways to
> accomplish this. I believe that options 1 and 2 are going to happen, 3
> and 4 are certainly more controversial. Also, the training and
> development data should be specified for each evaluation task.
>
> The rest of this email outlines the corpus changes and potential rules
> for corpus usage by evaluation task. I'd like to get consensus on these
> issues so that the evaluation plan can be as specific as possible.
>
> Regards,
> Jon
>
> Corpus modifications:
> ---------------------
>
> 1: Release the newswire texts for the intervening three months between
> TDT2 and TDT3 for the evaluation. (I'll call this TDT2.5 for lack of a
> better name.) IBM agreed that this would be a good way to "confuse" :)
> the detection systems. If we did this, all TDT2.5 stories would need to
> be excluded from scoring during the evaluation since there are no topic
> annotations for that data.
>
> 2: Add additional newswire data from the TDT3 epoch. The published TDT3
> corpus contains a sub sample of the available newswire data. This
> option would publish all the newswire data.
>
> 3: Declare BRIEF documents to be scorable, as either on-topic or
> off-topic. (I'm in favor of on-topic). This will give a more
> "realistic" quality to the evaluation, since right now we're closing our
> eyes to a serious issue.
>
> 4: Declare Non-News stories to be scorable, off-topic stories. Again,
> this gives a more realistic data set.
>
> Evaluation Task Data Usage Guidelines:
> --------------------------------------
>
> For each evaluation task, I propose the following test scenarios:
>
> Segmentation:
> Same status as last year.
>
> Training and Development Test Corpora: TDT2
> Testing Corpora: TDT3
>
> Topic Tracking:
> Release 1/2 of the TDT3 topics for system development. The TDT3 topics
> will be divided into two sets balanced by 1999 and 2000 topics and by
> topic size.
>
> The exposed TDT3 topics and the TDT3 corpus can only be used as a
> development test set, i.e. participants can tune systems to the
> development test topics, but not use the corpus for background
> statistics, etc.
>
> During the evaluation, test on all TDT3 topics and permit
> systems access to
> the TDT2.5 corpus for the training epoch, but report results using the
> unexposed evaluation topics and use the performance difference between
> the exposed and unexposed topics to gauge the "training" effect.
>
> Training Corpora: TDT2, all topics;
> Development Test Corpora: TDT3, development topic subset
> Testing Corpora: TDT2.5 and the Augmented TDT3 corpus, all TDT3 topics
>
> Topic Detection:
> There are three possibilities here:
>
> 1) Follow the regime described above for tracking. There is a concern
> that training on the exposed TDT3 topics will distort the performance on
> the unexposed topics. Therefore, I think option 2 is better.
>
> 2) Restrict the training a development to TDT2, like last year, and
> evaluate using TDT2.5 and the Augmented TDT3.
>
> First Story Detection:
>
> Same as Topic Detection:
>
> Link Detection:
>
> Same as Topic Tracking, except generate a new set of index files
> for the unexposed topics.
> To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu with "unsubscribe tdt-distrib" in the body of the message.
-------------------------------------------------------------
To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu
with "unsubscribe tdt-distrib" in the body of the message.
(321) previous ~ index ~ next

Last updated Wed Aug 22 16:07:32 2001