Author Archives: brendano

How Facebook privacy failed me

At some point, I put extra email addresses on Facebook because I thought it was necessary for something, but didn’t want to show them, so in the privacy settings marked their visibility as “Only Me.” It turns out that right … Continue reading

6 Comments

List of probabilistic model mini-language toolkits

There are an increasing number of systems that attempt to allow the user to specify a probabilistic model in a high-level language — for example, declare a (Bayesian) generative model as a hierarchy of various distributions — then automatically run … Continue reading

6 Comments

Seeing how “art” and “pharmaceuticals” are linguistically similar in web text

Earlier this week I asked the question, How are “art” and “pharmaceuticals” similar? People sent me lots of submissions! Some are great, some are a bit of a stretch. Overpriced by an order of magnitude. The letters of “art” are … Continue reading

3 Comments

Quiz: “art” and “pharmaceuticals”

A lexical semantics question: How are “art” and “pharmaceuticals” similar? I have a data-driven answer, but am curious how easy it is to guess it, and in what sense it’s valid. I’ll post my answer and supporting evidence on Tuesday.

1 Comment

Don’t MAWK AWK – the fastest and most elegant big data munging language!

update 2012-10-25: I’ve been informed there is a new maintainer for Mawk, who has probably fixed the bugs I’ve been seeing. From: Gert Hulselmans [The bugs you have found are] indeed true with mawk v1.3.3 which comes standard with Debian/Ubuntu. … Continue reading

48 Comments

Patches to Rainbow, the old text classifier that won’t go away

I’ve been reading several somewhat recent finance papers (Antweiler and Frank 2005, Das and Chen 2007) that use Rainbow, the text classification software originally written by Andrew McCallum back in 1996. The last version is from 2002 and the homepage … Continue reading

5 Comments

Another R flashmob today

Dan Goldstein sends word they’re doing another Stackoverflow R flashmob today. It’s a neat trick. The R tag there is becoming pretty useful.

Leave a comment

Beautiful Data book chapter

Today I received my copy of Beautiful Data, a just-released anthology of articles about, well, working with data.  Lukas and I contributed a chapter on analyzing social perceptions in web data.  See it here. After a long process of drafting, … Continue reading

4 Comments

Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features

I haven’t done a paper review on this blog for a while, so here we go. Coreference resolution is an interesting NLP problem.  (Examples.)  It involves honest-to-goodness syntactic, semantic, and discourse phenomena, but still seems like a real cognitive task that … Continue reading

2 Comments

Blogger to WordPress migration helper

A while ago I moved my blog from Blogger (socialscienceplusplus.blogspot.com) to a custom WordPress installation here (anyall.org/blog).  Wordpress has a nice Blogger import feature, but I also wanted all the old URL’s to redirect to their new equivalents.  This is … Continue reading

Leave a comment