I’m tired of reading papers that use an SVM but don’t say which kernel they used. (There’s tons of such papers in NLP and, I think, other areas that do applied machine learning.) I suspect a lot of these papers are actually using a linear kernel.
An un-kernelized, linear SVM is nearly the same as logistic regression — every feature independently increases or decreases the classifier’s output prediction. But a quadratic kernelized SVM is much more like boosted depth-2 decision trees. It can do automatic combinations of pairs of features — a potentially very different thing, since you can start throwing in features that don’t do anything on their own but might have useful interactions with others. (And of course, more complicated kernels do progressively more complicated and non-linear things.)
I have heard people say they download an SVM package, try a bunch of different kernels, and find the linear kernel is the best. In such cases they could have just used a logistic regression. (Which is way faster and simpler to train! You can implement SGD for it in a few lines of code!)
A linear SVM sometimes has a tiny bit better accuracy than logistic regression, because hinge loss is a tiny bit more like error rate than is log-loss. But I really doubt this would matter in any real-world application, where much bigger issues are happening (like data cleanliness, feature engineering, etc.)
If a linear classifier is doing better than non-linear ones, that’s saying something pretty important about your problem. Saying that you’re using an SVM is missing the point. An SVM is interesting only when it’s kernelized. Otherwise it’s just a needlessly complicated variant of logistic regression.