About
This is a blog on artificial intelligence and "Social Science++", with an emphasis on computation and statistics. My website is brenocon.com.
Blogroll
Blog Search
-
Archives
Author Archives: brendano
Berkeley SDA and the General Social Survey
It is worth contemplating how grand the General Social Survey is. When playing around with the Statwing YC demo (which is very cool!) I was reminded of the very old-school SDA web tool for exploratory cross-tabulation analyses… They have the … Continue reading
Re So I just wrote this hierarchical kernelized Boltzmann process in Prolog using ed on my iPhone. I can send you the RCS repository. — ML Hipster (@ML_Hipster) July 19, 2012 The best I can do is: I once programmed … Continue reading
p-values, CDF’s, NLP etc.
Update Aug 10: THIS IS NOT A SUMMARY OF THE WHOLE PAPER! it’s whining about one particular method of analysis before talking about other things further down A quick note on Berg-Kirkpatrick et al EMNLP-2012, “An Empirical Investigation of Statistical … Continue reading
The $60,000 cat: deep belief networks make less sense for language than vision
There was an interesting ICML paper this year about very large-scale training of deep belief networks (a.k.a. neural networks) for unsupervised concept extraction from images. They (Quoc V. Le and colleagues at Google/Stanford) have a cute example of learning very … Continue reading
F-scores, Dice, and Jaccard set similarity
The Dice similarity is the same as F1-score; and they are monotonic in Jaccard similarity. I worked this out recently but couldn’t find anything about it online so here’s a writeup. Let \(A\) be the set of found items, and … Continue reading
Cosine similarity, Pearson correlation, and OLS coefficients
Cosine similarity, Pearson correlations, and OLS coefficients can all be viewed as variants on the inner product — tweaked in different ways for centering and magnitude (i.e. location and scale, or something like that). Details: You have two vectors \(x\) … Continue reading
I don’t get this web parsing shared task
The idea for a shared task on web parsing is really cool. But I don’t get this one: Shared Task – SANCL 2012 (First Workshop on Syntactic Analysis of Non-Canonical Language) They’re explicitly banning Manually annotating in-domain (web) sentences Creating … Continue reading
Save Zipf’s Law (new anti-credulous-power-law article)
To the delight of those of us enjoying the ride on the anti-power-law bandwagon (bandwagons are ok if it’s a backlash to another bandwagon), Cosma links to a new article in Science, “Critical Truths About Power Laws,” by Stumpf and … Continue reading
Histograms — matplotlib vs. R
When possible, I like to use R for its really, really good statistical visualization capabilities. I’m doing a modeling project in Python right now (R is too slow, bad at large data, bad at structured data, etc.), and in comparison … Continue reading
Bayes update view of pointwise mutual information
This is fun. Pointwise Mutual Information (e.g. Church and Hanks 1990) between two variable outcomes \(x\) and \(y\) is \[ PMI(x,y) = \log \frac{p(x,y)}{p(x)p(y)} \] It’s called “pointwise” because Mutual Information, between two (discrete) variables X and Y, is the … Continue reading