Comments on: Please report your SVM’s kernel!

By: PadmaSree

PadmaSree — Thu, 14 Jul 2011 05:22:24 +0000

if u have materials can u plz send me:)

By: PadmaSree

PadmaSree — Thu, 14 Jul 2011 05:21:27 +0000

I am new to SVM .. i need the purpose of kernel , amd wat r the different types of kerenls can v use for tht ..
can u plz help me …….

By: brendano

brendano — Thu, 03 Mar 2011 23:36:24 +0000

@Bob — preaching to the choir on logistic normals :)

http://www.cs.cmu.edu/~scohen/jmlr10pgcovariance.pdf

also, jacob and i used logistic normal priors for a topic model but it was perhaps less central than shay’s or blei/lafferty http://brenocon.com/eisenstein_oconnor_smith_xing.emnlp2010.geographic_lexical_variation.pdf

By: Bob Carpenter

Bob Carpenter — Thu, 03 Mar 2011 23:18:32 +0000

@Shay:

1. There are plenty of solid logistic regression implementations. Start with something like R, which has it built in for small problems, or use any of the larger scale L-BFGS or stochastic gradient versions that are out there. Both SVMs and logistic regression are almost trivial to implement with SGD. If you’re an NLP person, you may need to look for “max entropy” — the field got off on a confusing footing terminologically.

You can also use just about any neural networks package. If you use a logistic sigmoid activation function with a single output neuron, you have standard logistic regression (this is how it’s presented in MacKay’s text, for instance). Backpropagation is just stochastic gradient descent. The so-called “deep belief nets” are liked stacked logistic regressions.

2. The primary reason to use logistic regression is that it gives you probabilities out. The error function is set up so that you’re effectively minimizing probabilistic predictive loss. I said this in my original comment above.

A second reason to use it is that it has a natural generalization to K-way classifiers. Last I saw, SVMs tended to estimate a bunch of binary problems and then combine classifiers, which seems rather clunky.

Another reason probabilities are convenient is that it means a logistic regression can play nicely with other models and in other model components. For instance, Lafferty and Blei used a logistic normal distribution as a prior on document topics in an extension of LDA in order to deal with correlated topics. You see the logistic transform all over in stats, including in many latent variable models such as the item-response model or the Bradley-Terry model for ranking contests or comparisons.

3. The error function is complex in logistic regression. On the latent scale (i.e., as in Albert and Chib’s formulation of logistic regression), the error is normal. But on the actual prediction scale, which is in the [0,1] range, this error gets transformed by the inverse logit.

You can change the sigmoid curve (i.e., with the cumulative normal, you get probit regression instead of logit). You can also change the error term in the latent regression.

The error’s not estimated as a free parameter in logistic regression. You fit the deterministic part of the regression to minimize error, just like in SVD, then the actual error’s estimated from the residuals (difference between predictions and true values).

In SVMs, the error is the simpler hinge loss function (actually, it’s not differentiable, which is more complex, but at least it’s linear).

t’s pretty easy to extend logistic regression to structured prediction problems like taggers or parsers. I’d imagine you could do that with SVMs, too.

A. Yes, you can kernelize logistic regression, too. It’s less common, though. The problem with operating in kernelized dual space is that it doesn’t scale well with number of training instances. So there’s been work on pruning the support in the context of SVMs.

PS: We might as well add perceptrons to the list here. They were invented after logistic regression but before SVMs and have an even simpler error function. And maybe voted perceptrons if you really want to tie things back to boosting.

By: brendano

brendano — Mon, 31 Jan 2011 22:06:51 +0000

@Shay, nicely done. Good point.

@anagi — maybe the useful conceptual difference is: geometric vs. statistical views of classification.

By: anagi

anagi — Fri, 21 Jan 2011 23:51:50 +0000

@ Brendan: agreed :)

By: Shay

Shay — Tue, 18 Jan 2011 19:11:55 +0000

[sawa link to your post on Buzz!]

You have to keep in mind that from a point of view of an end user, support vector machines may be much easier to use, because they have off-the-shelf implementations, while, at least to my impression, good implementations of logistic regression are less common.

I don’t think that there is a reason to use logistic reason over SVM, just like there is perhaps no reason to use SVM over logistic regression, in most cases. It just depends on how easy you can get things done, if you are just interested in a specific application.

And, yes, it is very likely that people use a simple linear kernel when they do not report which kernel they are using. That’s originally how SVMs were developed. The “kernel trick” came out later, I believe, in another paper. (You also have to be careful with wording here… SVMs are *always* linear, even when using a kernel, just in a different feature space. The non-linearity is with respect to the original feature space of the problem… but I am sure you know that.)

And by the way… I believe there is also a kernelized version of logistic regression :-)

By: Brendan O'Connor

Brendan O'Connor — Sun, 16 Jan 2011 17:06:20 +0000

@anagi — i still don’t think that distinction could help you decide which to apply. linear SVM’s and logistic regression will give you basically identical same results.

the difference does slightly change how you reason about them, and more importantly, how you think of their related families of models/algorithms.

By: anagi

anagi — Thu, 13 Jan 2011 00:35:34 +0000

Hi Brendan

Another difference, slightly tied to the minimization of the loss function, is the starting point of the problem… In the generalized linear model, you assume that the error terms has a distribution that comes from the family of exponential distribution (i.e. normal, poisson, binomial,…) where in SVM you don’t make any assumption about the “error” term, and hence the optimization function can be different (depending on the type of SVM, one utilize)… and so while the both, generalized linear model, and SVM (with linear kernel) might share the linearity property, there is a fundamentally different perspective in the way you approach a problem: GLM is a probabilistic approach, while SVM is a data driven approach..

I might be a trivial distinction, but it helps me differentiate the applicability of both method… :)

By: Brendan O'Connor

Brendan O'Connor — Wed, 12 Jan 2011 21:20:42 +0000

I mentioned logistic regression because it’s the usual (generalized) linear model used for classification. Usually SVM’s are used for classification. As you said, the only difference between a linear SVM and logistic regression is the loss function that gets minimized, and they’re quite similar loss functions.