AI and Social Science – Brendan O'Connor | cognition, language, social systems; statistics, visualization, computation

Shalizi’s review of NKS

Posted on May 5, 2011

I laugh out loud every time I reread Cosma Shalizi’s review of “New Kind of Science” (2005). I remember reading it back in college when everyone was talking about the book, when I was just losing my naivete about the popular science treatments of complex systems and such. I must be getting more cynical as I get older because I keep liking the review more. This time my favorite line was

Wolfram even goes on to refute post-modernism on this basis; I won’t touch that except to say that I’d have paid a lot to see Wolfram and Jacques Derrida go one-on-one.

And on the issue of running your own conventions and citing yourself, he compares it to

… the way George Lakoff uses “as cognitive science shows” to mean “as I claimed in my earlier books”

These quotes are funnier in context.

3 Comments

Rough binomial confidence intervals

Posted on April 8, 2011

I made this table a while ago and find it handy: for example, looking at a table of percentages and trying to figure out what’s meaningful or not. Why run a test if you can estimate it in your head?

References: Wikipedia, binom.test

3 Comments

Poor man’s linear algebra textbook

Posted on March 2, 2011

I keep learning new bits of linear algebra all the time, but I’m always hurting for a useful reference. I probably should get a good book (which?), but in the meantime I’m collecting several nice online sources that ML researchers seem to often recommend: The Matrix Cookbook, plus a few more tutorial/introductory pieces, aimed at an intermediate-ish level.

Main reference:

The Matrix Cookbook – 71 pages of identities and such. This seems to be really popular.

Tutorials/introductions:

Zico Kolter’s linear algebra review and reference [link#2]- it seems to introduce all the essentials and has very nice visual intutions for some things. (May 2015 update:) There’s now a nice video course too to go with it. (26 pages)
Minka’s Old and New Matrix Algebra Useful for Statistics – has a great part on how to do derivatives. (19 pages)
MacKay’s The Humble Gaussian – OK, not really pure linear algebra anymore, but quite enlightening. (12 pages)

After studying for this last stats/ML midterm, I’ve now printed them out and stuck them in a binder. A poor man’s linear algebra textbook.

I’d love to learn of more or different stuff out there. (There are always the appendixes of linear algebra reviews in Hastie et al. ESL and Boyd+Vandenberghe CvxOpt, but I’ve always found them a little too small for usefulness+understanding.)

Update May 2015: tweaked creditation for the CS229/CMU/Kolter review, fixed some dead links.

9 Comments

Move to brenocon.com

Posted on February 19, 2011

I’ve changed my website and blog URL from anyall.org to brenocon.com. The former was supposed to be a reference to first-order logic: the existential and universal quantifiers are fundamental to relational reasoning, and as testament to that, they are enshrined as “any()” and “all()” in wise programming languages like Python and R. Or something like that. It turns out this was obvious only to me :)

I tried to set up everything to automatically redirect, so no links should be broken. Hopefully.

2 Comments

Please report your SVM’s kernel!

Posted on January 11, 2011

I’m tired of reading papers that use an SVM but don’t say which kernel they used. (There’s tons of such papers in NLP and, I think, other areas that do applied machine learning.) I suspect a lot of these papers are actually using a linear kernel.

An un-kernelized, linear SVM is nearly the same as logistic regression — every feature independently increases or decreases the classifier’s output prediction. But a quadratic kernelized SVM is much more like boosted depth-2 decision trees. It can do automatic combinations of pairs of features — a potentially very different thing, since you can start throwing in features that don’t do anything on their own but might have useful interactions with others. (And of course, more complicated kernels do progressively more complicated and non-linear things.)

I have heard people say they download an SVM package, try a bunch of different kernels, and find the linear kernel is the best. In such cases they could have just used a logistic regression. (Which is way faster and simpler to train! You can implement SGD for it in a few lines of code!)

A linear SVM sometimes has a tiny bit better accuracy than logistic regression, because hinge loss is a tiny bit more like error rate than is log-loss. But I really doubt this would matter in any real-world application, where much bigger issues are happening (like data cleanliness, feature engineering, etc.)

If a linear classifier is doing better than non-linear ones, that’s saying something pretty important about your problem. Saying that you’re using an SVM is missing the point. An SVM is interesting only when it’s kernelized. Otherwise it’s just a needlessly complicated variant of logistic regression.

13 Comments

Interactive visualization of Mixture of Gaussians, the Law of Total Expectation and the Law of Total Variance

Posted on January 2, 2011

I wrote an interactive visualization for Gaussian mixtures and some probability laws, using the excellent Protovis library. It helped me build intuition for the law of total variance.

Link

2 Comments

Greenspan on the Daily Show

Posted on November 9, 2010

I love this Daily Show clip with Alan Greenspan. On emotion and economic forecasting. From 2007.

GREENSPAN:
I’ve been dealing with these big mathematical models of forecasting the economy, and I’m looking at what’s going on in the last few weeks. … If I could figure out a way to determine whether or not people are more fearful or changing to more euphoric … I don’t need any of this other stuff. I could forecast the economy better than any way I know.

The trouble is that we can’t figure that out. I’ve been in the forecasting business for 50 years. … I’m no better than I ever was, and nobody else is. Forecasting 50 years ago was as good or as bad as it is today. And the reason is that human nature hasn’t changed. We can’t improve ourselves.”

STEWART:
You just bummed the [bleep] out of me.

I’ve seen it in two separate talks now, from Peter Dodds (co-author of “Measuring the Happiness of Large-Scale Written Expression”) and Eric Gilbert (co-author of “Widespread Worry and the Stock Market”).

An ML/AI approach to P != NP

Posted on August 9, 2010

Like everyone, I’ve been just starting to look at the new, tentative, proof that P != NP from Vinay Deolalikar. After reading the intro, what’s most striking is that probabilistic graphical models and mathematical logic are at the core of the proof. This feels like a machine learning and artificial intelligence-centric approach to me — very different from what you usually see in mainstream CS theory. (Maybe I should feel good that in my undergrad I basically stopped studying normal math and spent all my time with this weird stuff instead!)

He devotes several chapters to an introduction to graphical models — Ising models, conditional independence, MRF’s, Hammersley-Clifford, and all that other stuff you see in Koller and Friedman or something — and then logic and model theory! I’m impressed.

6 Comments

Updates: CMU, Facebook

Posted on April 22, 2010

It’s been a good year. Last fall I started a master’s program in the Language Technologies department at CMU SCS, taking some great classes, hanging out with a cool lab, and writing two new papers (for ICWSM, involving Twitter: polls and tweetmotif; also did some coref work, financial text regression stuff, and looked at social lexicography.) I also applied to CS and stats PhD programs at several universities. Next year I’ll be starting the PhD program in the Machine Learning Department here at CMU.

I’m excited! Just the other day I was looking at videos on my old hard drive and found a presentation by Tom Mitchell on “the Discipline of Machine Learning” that I downloaded back in 2007 or so. (Can’t find it online right now, but this is similar.) That might be where I heard of the department first. Maybe some day I will be smarter than the guy who wrote this rant (though I am much more pro-stats and anti-ML these days…).

Also, I was recently named a finalist for the 2010 Facebook Fellowship program. CMU did impressively well here, with the number of winners and finalists (5+22) per school being:

9 Carnegie Mellon University
3 University of Michigan
3 Stanford University
2 University of Washington
2 University of Illinois at Urbana-Champagne
2 University of California at Berkeley
1 University of California at Irvine
1 Princeton University
1 Massachusetts Institute of Technology
1 Duke University
1 Cornell University
1 Arizona State University

And separately, I’ll be an intern at the Facebook Data team this summer (back in Palo Alto) starting in June. (Yes, notwithstanding my severe annoyances at bugs in the site itself.) I’ve enjoyed reading the work they’ve been doing there in the last year or two, and it seems like a very good place, if not the best place, to try doing some newfangled computational social science (whatever that is).

4 Comments

quick note: cer et al 2010

Posted on April 14, 2010

Quick note, reading this paper from their tweet.

update this reaction might be totally wrong; in particular, the conll dependencies for at least some languages were done completely by hand.

Malt and MSTParser were designed for the Yamada and Matsumodo dependencies formalism (the one used for the CoNLL dependency parsing shared task, from the penn2malt tool). Their feature sets and probably many other design decisions were created to support that. If you compare their outputs side-by-side, you will see that the Stanford Dependencies are a substantially different formalism; for example, compound verbs are handled very differently (the paper talks about copula example).

I think the following conclusion is premature:

Notwithstanding the very large amount of research that has gone into dependency
parsing algorithms in the last ﬁve years, our central conclusion is that the quality of the Charniak, Charniak-Johnson reranking, and Berkeley parsers is so high that in the vast majority of cases, dependency parse consumers are better off using them, and then converting the output to typed dependencies.

Re-run the experiment with Yamada and Matsumodo (or maybe Johannson and Nugues / pennconverter) then that would be convincing. In the meantime, a fair test of dependency parsing approach for the Stanford formalism would be to use a dependency parser and design richer features to support it. The paper points outs cases where Malt’s and MSTparser’s algorithms aren’t powerful enough; I wonder how TurboParser‘s would fare. But this requires lots of work (writing new features) so maybe no one will do it.

5 Comments

AI and Social Science – Brendan O'Connor