(227) previous ~ index ~ next
To: tdt-distrib@unagi.cis.upenn.edu
From: Jon_Yamron@Dragonsys.com
Subject: Re: Clarification of tracking
Date: Thu, 5 Nov 1998 14:00:21 -0500
Indeed, I meant that any material encountered SO FAR could be used in the
current decision, as Jaime clarifies.
- Jon
Jaime Carbonell <jgc@NL.CS.CMU.EDU> on 11/05/98 01:25:19 PM
To: Jon Yamron/Dragon Systems USA
cc: tdt-distrib@unagi.cis.upenn.edu
Subject: Re: Clarification of tracking
Folks,
Just as y'all were coming to such nice agreement, and George was giving
his blessing, there is one proverbial fly in the ointment, to wit:
Jon Yamron wrote on Thu, 5 Nov 1998 11:03:09 -0500:
> 3) Of course, the actual test material (May-June data) encountered during
> processing may be used in any way we see fit, including training the
topic
> model, as long as it is done in an automatic fashion.
That's probably not quite how Jon meant it. For the tracking task, we
are NOT allowed to use FUTURE information before making a decision on
the topicality of any given story -- even unlabeled future stories
untouched by human hands. In other words, we may use any test material
up to and including the story on which we must make a judgement. But
not next day stories or next month stories. The latter would allow
us to build better tuned models, do batch full-data unsupervised learning,
etc. but it would not be very unrealistic. A better phrasing of the above
(if my understading of the Tracking task is correct) would be that
we may use the test material up to the current decision point in any
way we see fit, so long as it is done in an automatic fashion.
A similar argument goes for detection, where the lookahead in the
test data is bounded to the current file, 10 files, 100 files.
--Jaime
(227) previous ~ index ~ next
Last updated Fri Nov 6 15:29:23 1998