Category Archives: Uncategorized

FFT: Friedman + Fortran + Tricks

…is a tongue-in-cheek phrase from Trevor Hastie’s very fun to read useR-2009 presentation, from the merry trio of Hastie, Friedman, and Tibshirani, who brought us, among other things, the excellent Elements of Statistical Learning textbook.  It’s a joy to read sophisticated … Continue reading

1 Comment

Beta conjugate explorer

Here’s a little interactive explorer for the beta probability distribution, a conjugate prior for the Bernoulli under Bayesian inference… Ack, too much jargon. Simply press the right arrow every time you see the sun rise, the up arrow when it … Continue reading

5 Comments

Michael Jackson in Persepolis

Michael Jackson just died while Iran is in turmoil. I am reminded of a passage in Marjane Satrapi’s wonderful graphic novel Persepolis, a memoir of growing up in revolutionary Iran in the 80′s. (Read the book to see how it … Continue reading

2 Comments

Psychometrics quote

It is rather surprising that systematic studies of human abilities were not undertaken until the second half of the last century… An accurate method was available for measuring the circumference of the earth 2,000 years before the first systematic measures … Continue reading

2 Comments

June 4

BBC News – June 4, 1989, Tiananmen Square Massacre Also worth reading: Nicholas Kristof’s riveting firsthand account.

Leave a comment

Where tweets get sent from

Playing around with stream.twitter.com/spritzer, ggplot2 and maps / mapdata: I think I like the top better, without the map lines, like those night satellite photos: pointwise ghosts of high-end human economic development. This data is a fairly extreme sample of … Continue reading

Leave a comment

Zipf’s law and world city populations

Will Fitzgerald just wrote about an excellent article by Steven Strogatz on Zipf’s Law for the populations of cities. If you look at the biggest city, then the next biggest city, etc., there tends to be an exponential fall-off in … Continue reading

13 Comments

Performance comparison: key/value stores for language model counts

I’m doing word and bigram counts on a corpus of tweets. I want to store and rapidly retrieve them later for language model purposes. So there’s a big table of counts that get incremented many times. The easiest way to … Continue reading

28 Comments

1 billion web page dataset from CMU

This is fun — Jamie Callan‘s group at CMU LTI just finished a crawl of 1 billion web pages. It’s 5 terabytes compressed — big enough so they have to send it to you by mailing hard drives. Link: ClueWeb09 … Continue reading

6 Comments

Pirates killed by President

A lesson in x-axis scaling, and choosing which data to compare.  Two current graphs making their rounds on the internet: (about this.)

Leave a comment