(402) previous ~ index ~ next

To: Mark Liberman <myl@unagi.cis.upenn.edu>
From: Rich Schwartz <schwartz@bbn.com>
Subject: Re: any tdt svm experiments [from Christine.Piatko@jhuapl.edu]
Date: Sun, 20 Oct 2002 22:41:02 -0400 (EDT)

Christine,

The reason SVMs would not work well for this task is that there is
generally a very small number of training samples (from 1 to 4). SVMs are
designed to look for very complex differences. So it is very easy for an
SVM to think it sees a distinction based on any one or any accidental
combination of the words that happens to be in the training documents that
are not the off-topic documents.

A measure like TF/IDF or a simple unigram probability distribution
cannot give great weight to any particular word. It just compares the
tfidf score or the distribution. Even in the BBN system, which used E/M
to sharpen the distribution, it still ends up with a probability
distribution, so it can't do anything too extreme.

Perhaps if you had hundreds of training examples (as in some
filtering problems), the SVM would be better (than simple unigram
distributions) at determining a more complex model to distinguish these
hundred positive examples from the ones that seem to have some of the same
words.

--Rich
========================================================
On Fri, 18 Oct 2002, Mark Liberman wrote:

> Date: Fri, 18 Oct 2002 14:05:39 -0400
> From: Mark Liberman <myl@unagi.cis.upenn.edu>
> To: David Graff <graff@unagi.cis.upenn.edu>
> Cc: tdt-distrib@unagi.cis.upenn.edu
> Subject: Re: any tdt svm experiments [from Christine.Piatko@jhuapl.edu]
>
>
> Mike Schultz and I did some experiments with SVMs for TDT tracking.
>
> The results were significantly inferior to a TF/IDF baseline.
>
> I believe that we reported on this at one of the TDT meetings several years
> ago.
>
> Of course there are many details that could have been done differently,
> and so perhaps a different version of SVMs would have done better, but
> our evaluation was that the technique itself is not particularly
> promising for this application.
>
> -Mark Liberman
>
> >------- Forwarded Message
> >
> >Date: Fri, 18 Oct 2002 11:50:29 -0400 (EDT)
> >From: piatko@aplcomm.jhuapl.edu
> >Subject: any tdt svm experiments
> >To: tdt-distrib@ldc.upenn.edu
> >Cc: Christine.Piatko@jhuapl.edu
> >
> >Message-id: <0H4600J03PC5DO@aplcomm.jhuapl.edu>
> >MIME-version: 1.0
> >Content-type: TEXT/PLAIN; CHARSET=US-ASCII
> >Content-transfer-encoding: 7BIT
> >
> >
> >I have been trying to find out who (if anyone) has investigated the use of
> >SVMs (Support Vector Machines) for TDT data and tasks. I've looked on the
> >TDT web site but didn't quite find any related papers yet.
> >
> >James Allen suggested posting to this list to ask you as
> >the TDT community if you have looked at this.
> >
> >I would appreciate any replies be sent directly to me at
> >Christine.Piatko@jhuapl.edu
> >as I am not on the tdt-distrib list at this time.
> >
> >Many thanks in advance for any information you can provide!
> >Christine
> >Christine.Piatko@jhuapl.edu
> >
> >Christine Piatko, Ph.D.
> >Research and Technology Development Center
> >The Johns Hopkins University Applied Physics Laboratory
> >11100 Johns Hopkins Road
> >Laurel, Maryland 20723-6099
> >Christine.Piatko@jhuapl.edu
> >Phone: (443)778-6584
> >Fax: (443)778-6904
> >
> >------- End of Forwarded Message
> >
> >
> >
> >-------------------------------------------------------------
> >To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu
> >with "unsubscribe tdt-distrib" in the body of the message.
>
> --
>
> -Mark Liberman
> -------------------------------------------------------------
> To unsubscribe from tdt-distrib, email majordomo@ldc.upenn.edu
> with "unsubscribe tdt-distrib" in the body of the message.
>
(402) previous ~ index ~ next

Last updated Mon Nov 11 14:16:28 2002