(271) previous ~ index ~ next

To: Doug Oard <oard@glue.umd.edu>
From: George Doddington <doddington@nist.gov>
Subject: Re: Proposed change to the topic tracking task
Date: Wed, 09 Aug 2000 18:20:39 -0400

Doug Oard wrote:
>
> A few additional thoughts on this:
>
> 1) We could conceivably run 1500 conditions, but getting to that scale
> would detract focus from other aspects of system tuning.
>
> 2) Of more concern, the present output is almost 2 MB per condition
> (uncompressed). At 1500 topics, we'd be in the gigabyte range for output.
> So in addition to the space optimization we would need to do internaly to
> fit in our present hardware configuration, getting the results to NIST
> becomes a nontrivial task.
>
> 3) Topic-weighted scoring presently takes hours for 60 conditions, which
> is already a limiting factor on our development efforts. Has thought been
> given to whether scoring 1500 conditions is even feasible with the present
> scoring software's architecture?
>
> 4) Building on that last point, have we agreed on what kind of averaging
> makes sense if there are variable numbers of conditions per
> topic? Condition-weighted? topic-weighted?
>
> Doug

Your note exhibits term warp. In points 1) through 3) your word
"condition" seems to translate to the TDT term "topic". In point
4) your word "condition" would seem to translate to the TDT term
"training story". That being the case, your concern about an
excessive number of topics is well taken. But not to worry --
There is no need to run all possible different sets of training
stories. This may be controlled to taste, balancing the desire
to explore different values of Nt and to have good statistical
significance against limited processing abilities of sites with
limited computational facilities.

Regarding your point 3), there should be no concern about limits
to development efforts, because the processing load that we are
discussing applies only to the evaluation, not to your research
and development.

--
George Doddington at NIST: doddington@nist.gov or 301/975-3261
(271) previous ~ index ~ next

Last updated Tue Sep 19 14:30:55 2000