Category Archives: Uncategorized

Java and IDEs for the R/Python world

(Some tips on how to use Java if you’re from R or Python; some thoughts on software platforms and programming for data-science-or-whatever-we-call-it-now.) Most of my research these days uses Python, R, or Java. It’s terrific that so many people are … Continue reading

Leave a comment

Replot: departure delays vs flight time speed-up

Here’s a re-plotting of a graph in this 538 post. It’s looking at whether pilots speed up the flight when there’s a delay, and find that it looks like that’s the case. This is averaged data for flights on several … Continue reading

Leave a comment

What the ACL-2014 review scores mean

I’ve had several people ask me what the numbers in ACL reviews mean — and I can’t find anywhere online where they’re described. (Can anyone point this out if it is somewhere?) So here’s the review form, below. They all … Continue reading


Scatterplot of KN/PYP language model results

I should make a blog where all I do is scatterplot results tables from papers. I do this once in a while to make them eaiser to understand… I think the following are results are from Yee Whye Teh’s paper … Continue reading

1 Comment

tanh is a rescaled logistic sigmoid function

This confused me for a while when I first learned it, so in case it helps anyone else: The logistic sigmoid function, a.k.a. the inverse logit function, is \[ g(x) = \frac{ e^x }{1 + e^x} \] Its outputs range … Continue reading

Leave a comment

Response on our movie personas paper

Update (2013-09-17): See David Bamman‘s great guest post on Language Log on our latent personas paper, and the big picture of interdisciplinary collaboration. I’ve been informed that an interesting critique of my, David Bamman’s and Noah Smith’s ACL paper on … Continue reading


Probabilistic interpretation of the B3 coreference resolution metric

Here is an intuitive justification for the B3 evaluation metric often used in coreference resolution, based on whether mention pairs are coreferent. If a mention from the document is chosen at random, B3-Recall is the (expected) proportion of its actual … Continue reading


Some analysis of tweet shares and “predicting” election outcomes

Everyone recently seems to be talking about this newish paper by Digrazia, McKelvey, Bollen, and Rojas (pdf here) that examines the correlation of Congressional candidate name mentions on Twitter against whether the candidate won the race.  One of the coauthors also … Continue reading

1 Comment

Confusion matrix diagrams

I wrote a little note and diagrams on confusion matrix metrics: Precision, Recall, F, Sensitivity, Specificity, ROC, AUC, PR Curves, etc. also, graffle source.

Leave a comment

Movie summary corpus and learning character personas

Here is one of our exciting just-finished ACL papers.  David and I designed an algorithm that learns different types of character personas — “Protagonist”, “Love Interest”, etc — that are used in movies. To do this we collected a brand new dataset: 42,306 … Continue reading