Author Archives: brendano

What inputs do Monte Carlo algorithms need?

Posted on April 21, 2013

Monte Carlo sampling algorithms (either MCMC or not) have a goal to attain samples from a distribution. They can be organized by what inputs or prior knowledge about the distribution they require. This ranges from a low amount of knowledge, … Continue reading →

2 Comments

Rise and fall of Dirichlet process clusters

Posted on April 16, 2013

Here’s Gibbs sampling for a Dirichlet process 1-d mixture of Gaussians. On 1000 data points that look like this. I gave it fixed variance and a concentration and over MCMC iterations, and it looks like this. The top is the … Continue reading →

2 Comments

Correlation picture

Posted on March 18, 2013

Paul Moore posted a comment pointing out this great discussion of the correlation coefficient: Joseph Lee Rodgers and W. Alan Nicewander. “Thirteen Ways to Look at the Correlation Coefficient.” The American Statistician, Vol. 42, No. 1. (Feb., 1988), pp. 59-66. … Continue reading →

Leave a comment

R scan() for quick-and-dirty checks

Posted on March 14, 2013

One of my favorite R tricks is scan(). I was using it to verify whether I wrote a sampler recently, which was supposed to output numbers uniformly between 1 and 100 into a logfile; this loads the logfile, counts the … Continue reading →

Leave a comment

Posted on March 13, 2013

Really liking whoever made @SottedReviewer, e.g. BEFORE I START TALKING ABOUT RANDOM FORESTS AND AUCS, I’LL THROW A BONE TO SOCIAL SCIENTISTS WITH A GRANOVETTER CITE. #ICWSM — Sotted Reviewer (@SottedReviewer) March 11, 2013 There’s an entire great story here … Continue reading →

Leave a comment

Wasserman on Stats vs ML, and previous comparisons

Posted on February 23, 2013

Larry Wasserman has a new position paper (forthcoming 2013) with a great comparison the Statistics and Machine Learning research cultures, “Rise of the Machines”. He has a very conciliatory view in terms of intellectual content, and a very pro-ML take … Continue reading →

Leave a comment

Perplexity as branching factor; as Shannon diversity index

Posted on January 7, 2013

A language model’s perplexity is exponentiated negative average log-likelihood, $$\exp( -\frac{1}{N} \log(p(x)))$$ Where the inner term usually decomposes into a sum over individual items; for example, as $\sum_i \log p(x_i | x_1..x_{i-1})$ or $\sum_i \log p(x_i)$ depending on independence assumptions, … Continue reading →

Leave a comment

Graphs for SANCL-2012 web parsing results

Posted on November 24, 2012

I was just looking at some papers from the SANCL-2012 workshop on web parsing from June this year, which are very interesting to those of us who wish we had good parsers for non-newspaper text. The shared task focus was … Continue reading →

1 Comment

Powerset’s natural language search system

Posted on October 2, 2012

There’s a lot to say about Powerset, the short-lived natural language search company (2005-2008) where I worked after college. AI overhype, flying too close to the sun, the psychology of tech journalism and venture capitalism, etc. A year or two … Continue reading →

3 Comments

CMU ARK Twitter Part-of-Speech Tagger – v0.3 released

Posted on September 21, 2012

We’re pleased to announce a new release of the CMU ARK Twitter Part-of-Speech Tagger, version 0.3. The new version is much faster (40x) and more accurate (89.2 -> 92.8) than before. We also have released new POS-annotated data, including a … Continue reading →

Leave a comment

Author Archives: brendano

What inputs do Monte Carlo algorithms need?

Rise and fall of Dirichlet process clusters

Correlation picture

R scan() for quick-and-dirty checks

Wasserman on Stats vs ML, and previous comparisons

Perplexity as branching factor; as Shannon diversity index

Graphs for SANCL-2012 web parsing results

Powerset’s natural language search system

CMU ARK Twitter Part-of-Speech Tagger – v0.3 released

About

Blogroll

Blog Search

Archives