Category Archives: Uncategorized

Binary classification evaluation in R via ROCR

A binary classifier makes decisions with confidence levels. Usually it’s imperfect: if you put a decision threshold anywhere, items will fall on the wrong side — errors. I made this a diagram a while ago for Turker voting; same principle … Continue reading

5 Comments

La Jetee

From here.

2 Comments

“Logic Bomb”

Article: Fannie Mae Logic Bomb Would Have Caused Weeklong Shutdown | Threat Level from Wired.com. I love the term “logic bomb”.  Can you pair it with a statistics bomb?  Data-driven bomb?  Or maybe the point is a connectionist bomb.

Leave a comment

SF conference for data mining mercenaries

I got an email from a promoter for Predictive Analytics World, a very expensive conference next month in San Francisco for business applications of data mining / machine learning / predictive analytics.  I’m not going because I don’t want to … Continue reading

2 Comments

Love it and hate it, R has come of age

Seeing a long, lavish article about R in the NEW YORK TIMES (!) really freaks me out. replicate(100, c( “OMG OMG, R is now famous?!”, “People used to make fun of me for learning R since Splus is SO OLD!”, … Continue reading

6 Comments

Facebook sentiment mining predicts presidential polls

I’m a bit late blogging this, but here’s a messy, exciting — and statistically validated! — new online data source. My friend Roddy at Facebook wrote a post describing their sentiment analysis system, which can evaluate positive or negative sentiment … Continue reading

7 Comments

Information cost and genocide

In 1994, the Rwandan genocide claimed 800,000 lives.  This genocide was remarkable for being very low-tech — lots of non-military, average people with machetes killing their neighbors.  Romeo Dallaire, the leader of the small UN peacekeeping mission there, saw it … Continue reading

Leave a comment

Calculating running variance in Python and C++

It’s fairly obvious that an average can be calculated online, but interestingly, there’s also a way to calculate a running variance and standard deviation. Read all about it here. I’m playing around with the Netflix Prize data of 100 million … Continue reading

5 Comments

Python bindings to Google’s “AJAX” Search API

I couldn’t find this anywhere on the web, so I threw together a quick Python binding for Google’s “AJAX” Search API (or rather, JSON-over-HTTP).  (There are bindings out there for the old SOAP interface; I heard that was discontinued though.) … Continue reading

Leave a comment

Netflix Prize

Here’s a fascinating NYT article on the Netflix Prize for a better movie recommendation system.  Tons of great stuff there; here’s a few highlights … First, a good unsupervised learning story: There’s a sort of unsettling, alien quality to their … Continue reading

4 Comments