This is a blog on artificial intelligence and "Social Science++", with an emphasis on computation and statistics. My website is brenocon.com.
What inputs do Monte Carlo algorithms need?
Monte Carlo sampling algorithms (either MCMC or not) have a goal to attain samples from a distribution. They can be organized by what inputs or prior knowledge about the distribution they require. This ranges from a low amount of knowledge, … Continue reading
Rise and fall of Dirichlet process clusters
Here’s Gibbs sampling for a Dirichlet process 1d mixture of Gaussians. On 1000 data points that look like this. I gave it fixed variance and a concentration and over MCMC iterations, and it looks like this. The top is the … Continue reading
Correlation picture
Paul Moore posted a comment pointing out this great discussion of the correlation coefficient: Joseph Lee Rodgers and W. Alan Nicewander. “Thirteen Ways to Look at the Correlation Coefficient.” The American Statistician, Vol. 42, No. 1. (Feb., 1988), pp. 5966. … Continue reading
R scan() for quickanddirty checks
One of my favorite R tricks is scan(). I was using it to verify whether I wrote a sampler recently, which was supposed to output numbers uniformly between 1 and 100 into a logfile; this loads the logfile, counts the … Continue reading
Really liking whoever made @SottedReviewer, e.g. BEFORE I START TALKING ABOUT RANDOM FORESTS AND AUCS, I’LL THROW A BONE TO SOCIAL SCIENTISTS WITH A GRANOVETTER CITE. #ICWSM — Sotted Reviewer (@SottedReviewer) March 11, 2013 There’s an entire great story here … Continue reading
Wasserman on Stats vs ML, and previous comparisons
Larry Wasserman has a new position paper (forthcoming 2013) with a great comparison the Statistics and Machine Learning research cultures, “Rise of the Machines”. He has a very conciliatory view in terms of intellectual content, and a very proML take … Continue reading
Perplexity as branching factor; as Shannon diversity index
A language model’s perplexity is exponentiated negative average loglikelihood, $$\exp( \frac{1}{N} \log(p(x)))$$ Where the inner term usually decomposes into a sum over individual items; for example, as \(\sum_i \log p(x_i  x_1..x_{i1})\) or \(\sum_i \log p(x_i)\) depending on independence assumptions, … Continue reading
Graphs for SANCL2012 web parsing results
I was just looking at some papers from the SANCL2012 workshop on web parsing from June this year, which are very interesting to those of us who wish we had good parsers for nonnewspaper text. The shared task focus was … Continue reading
Powerset’s natural language search system
There’s a lot to say about Powerset, the shortlived natural language search company (20052008) where I worked after college. AI overhype, flying too close to the sun, the psychology of tech journalism and venture capitalism, etc. A year or two … Continue reading
CMU ARK Twitter PartofSpeech Tagger – v0.3 released
We’re pleased to announce a new release of the CMU ARK Twitter PartofSpeech Tagger, version 0.3. The new version is much faster (40x) and more accurate (89.2 > 92.8) than before. We also have released new POSannotated data, including a … Continue reading