Author Archives: brendano

Berkeley SDA and the General Social Survey

It is worth contemplating how grand the General Social Survey is. When playing around with the Statwing YC demo (which is very cool!) I was reminded of the very old-school SDA web tool for exploratory cross-tabulation analyses… They have the … Continue reading

2 Comments

Re So I just wrote this hierarchical kernelized Boltzmann process in Prolog using ed on my iPhone. I can send you the RCS repository. — ML Hipster (@ML_Hipster) July 19, 2012 The best I can do is: I once programmed … Continue reading

1 Comment

p-values, CDF’s, NLP etc.

Update Aug 10: THIS IS NOT A SUMMARY OF THE WHOLE PAPER! it’s whining about one particular method of analysis before talking about other things further down A quick note on Berg-Kirkpatrick et al EMNLP-2012, “An Empirical Investigation of Statistical … Continue reading

3 Comments

The $60,000 cat: deep belief networks make less sense for language than vision

There was an interesting ICML paper this year about very large-scale training of deep belief networks (a.k.a. neural networks) for unsupervised concept extraction from images. They (Quoc V. Le and colleagues at Google/Stanford) have a cute example of learning very … Continue reading

3 Comments

F-scores, Dice, and Jaccard set similarity

The Dice similarity is the same as F1-score; and they are monotonic in Jaccard similarity. I worked this out recently but couldn’t find anything about it online so here’s a writeup. Let \(A\) be the set of found items, and … Continue reading

2 Comments

Cosine similarity, Pearson correlation, and OLS coefficients

Cosine similarity, Pearson correlations, and OLS coefficients can all be viewed as variants on the inner product — tweaked in different ways for centering and magnitude (i.e. location and scale, or something like that). Details: You have two vectors \(x\) … Continue reading

23 Comments

I don’t get this web parsing shared task

The idea for a shared task on web parsing is really cool. But I don’t get this one: Shared Task – SANCL 2012 (First Workshop on Syntactic Analysis of Non-Canonical Language) They’re explicitly banning Manually annotating in-domain (web) sentences Creating … Continue reading

5 Comments

Save Zipf’s Law (new anti-credulous-power-law article)

To the delight of those of us enjoying the ride on the anti-power-law bandwagon (bandwagons are ok if it’s a backlash to another bandwagon), Cosma links to a new article in Science, “Critical Truths About Power Laws,” by Stumpf and … Continue reading

4 Comments

Histograms — matplotlib vs. R

When possible, I like to use R for its really, really good statistical visualization capabilities. I’m doing a modeling project in Python right now (R is too slow, bad at large data, bad at structured data, etc.), and in comparison … Continue reading

8 Comments

Bayes update view of pointwise mutual information

This is fun. Pointwise Mutual Information (e.g. Church and Hanks 1990) between two variable outcomes \(x\) and \(y\) is \[ PMI(x,y) = \log \frac{p(x,y)}{p(x)p(y)} \] It’s called “pointwise” because Mutual Information, between two (discrete) variables X and Y, is the … Continue reading

Leave a comment