Category Archives: Uncategorized

Binary classification evaluation in R via ROCR

Posted on April 1, 2009

A binary classifier makes decisions with confidence levels. Usually it’s imperfect: if you put a decision threshold anywhere, items will fall on the wrong side — errors. I made this a diagram a while ago for Turker voting; same principle … Continue reading →

5 Comments

La Jetee

Posted on February 14, 2009

From here.

2 Comments

“Logic Bomb”

Posted on January 30, 2009

Article: Fannie Mae Logic Bomb Would Have Caused Weeklong Shutdown | Threat Level from Wired.com. I love the term “logic bomb”. Can you pair it with a statistics bomb? Data-driven bomb? Or maybe the point is a connectionist bomb.

SF conference for data mining mercenaries

Posted on January 23, 2009

I got an email from a promoter for Predictive Analytics World, a very expensive conference next month in San Francisco for business applications of data mining / machine learning / predictive analytics. I’m not going because I don’t want to … Continue reading →

2 Comments

Love it and hate it, R has come of age

Posted on January 7, 2009

Seeing a long, lavish article about R in the NEW YORK TIMES (!) really freaks me out. replicate(100, c( “OMG OMG, R is now famous?!”, “People used to make fun of me for learning R since Splus is SO OLD!”, … Continue reading →

6 Comments

Facebook sentiment mining predicts presidential polls

Posted on December 27, 2008

I’m a bit late blogging this, but here’s a messy, exciting — and statistically validated! — new online data source. My friend Roddy at Facebook wrote a post describing their sentiment analysis system, which can evaluate positive or negative sentiment … Continue reading →

7 Comments

Information cost and genocide

Posted on December 18, 2008

In 1994, the Rwandan genocide claimed 800,000 lives. This genocide was remarkable for being very low-tech — lots of non-military, average people with machetes killing their neighbors. Romeo Dallaire, the leader of the small UN peacekeeping mission there, saw it … Continue reading →

Calculating running variance in Python and C++

Posted on November 28, 2008

It’s fairly obvious that an average can be calculated online, but interestingly, there’s also a way to calculate a running variance and standard deviation. Read all about it here. I’m playing around with the Netflix Prize data of 100 million … Continue reading →

5 Comments

Python bindings to Google’s “AJAX” Search API

Posted on November 24, 2008

I couldn’t find this anywhere on the web, so I threw together a quick Python binding for Google’s “AJAX” Search API (or rather, JSON-over-HTTP). (There are bindings out there for the old SOAP interface; I heard that was discontinued though.) … Continue reading →

Netflix Prize

Posted on November 21, 2008

Here’s a fascinating NYT article on the Netflix Prize for a better movie recommendation system. Tons of great stuff there; here’s a few highlights … First, a good unsupervised learning story: There’s a sort of unsettling, alien quality to their … Continue reading →

4 Comments