About
This is a blog on artificial intelligence and "Social Science++", with an emphasis on computation and statistics. My website is brenocon.com.
Blogroll
Blog Search
-
Archives
Author Archives: brendano
What inputs do Monte Carlo algorithms need?
Monte Carlo sampling algorithms (either MCMC or not) have a goal to attain samples from a distribution. They can be organized by what inputs or prior knowledge about the distribution they require. This ranges from a low amount of knowledge, … Continue reading
Rise and fall of Dirichlet process clusters
Here’s Gibbs sampling for a Dirichlet process 1-d mixture of Gaussians. On 1000 data points that look like this. I gave it fixed variance and a concentration and over MCMC iterations, and it looks like this. The top is the … Continue reading
Correlation picture
Paul Moore posted a comment pointing out this great discussion of the correlation coefficient: Joseph Lee Rodgers and W. Alan Nicewander. “Thirteen Ways to Look at the Correlation Coefficient.” The American Statistician, Vol. 42, No. 1. (Feb., 1988), pp. 59-66. … Continue reading
R scan() for quick-and-dirty checks
One of my favorite R tricks is scan(). I was using it to verify whether I wrote a sampler recently, which was supposed to output numbers uniformly between 1 and 100 into a logfile; this loads the logfile, counts the … Continue reading
Really liking whoever made @SottedReviewer, e.g. BEFORE I START TALKING ABOUT RANDOM FORESTS AND AUCS, I’LL THROW A BONE TO SOCIAL SCIENTISTS WITH A GRANOVETTER CITE. #ICWSM — Sotted Reviewer (@SottedReviewer) March 11, 2013 There’s an entire great story here … Continue reading
Wasserman on Stats vs ML, and previous comparisons
Larry Wasserman has a new position paper (forthcoming 2013) with a great comparison the Statistics and Machine Learning research cultures, “Rise of the Machines”. He has a very conciliatory view in terms of intellectual content, and a very pro-ML take … Continue reading
Perplexity as branching factor; as Shannon diversity index
A language model’s perplexity is exponentiated negative average log-likelihood, $$\exp( -\frac{1}{N} \log(p(x)))$$ Where the inner term usually decomposes into a sum over individual items; for example, as \(\sum_i \log p(x_i | x_1..x_{i-1})\) or \(\sum_i \log p(x_i)\) depending on independence assumptions, … Continue reading
Graphs for SANCL-2012 web parsing results
I was just looking at some papers from the SANCL-2012 workshop on web parsing from June this year, which are very interesting to those of us who wish we had good parsers for non-newspaper text. The shared task focus was … Continue reading
Powerset’s natural language search system
There’s a lot to say about Powerset, the short-lived natural language search company (2005-2008) where I worked after college. AI overhype, flying too close to the sun, the psychology of tech journalism and venture capitalism, etc. A year or two … Continue reading
CMU ARK Twitter Part-of-Speech Tagger – v0.3 released
We’re pleased to announce a new release of the CMU ARK Twitter Part-of-Speech Tagger, version 0.3. The new version is much faster (40x) and more accurate (89.2 -> 92.8) than before. We also have released new POS-annotated data, including a … Continue reading