CMU Twitter Part-of-Speech tagger 0.2

Announcement: We recently released a new version (0.2) of our part-of-speech tagger for English Twitter messages, along with annotations and interface. See the link for more details.

Leave a comment

One last thing on the Norvig vs. Chomsky thing from a little while ago (http://norvig.com/chomsky.html), which (correctly) casts the issue as Shannon vs. Chomsky.

The relevant seminal publications are:

  • Shannon, “Mathematical Theory of Communication,” 1948
  • Chomsky, “Syntactic Structures,” 1957

One of those historical figures is still around and representing himself in 2011 — he should get credit just for still showing up to the fight. Are there any historical figures from the Shannon side still around?  What I would’ve given to see a Jelinek vs. Chomsky public debate.  Though I guess Pereira vs. Chomsky would be pretty great.

3 Comments

Good linguistic semantics textbook?

I’m looking for recommendations for a good textbook/handbook/reference on (non-formal) linguistic semantics.  My undergrad semantics course was almost entirely focused on logical/formal semantics, which is fine, but I don’t feel familiar with the breadth of substantive issues — for example, I’d be hard-pressed to explain why something like semantic/thematic role labeling should be useful for anything at all.

I somewhat randomly stumbled upon Frawley 1992 (review) in a used bookstore and it seemed pretty good — in particular, it cleanly separates itself from the philosophical study of semantics, and thus identifies issues that seem amenable to computational modeling.

I’m wondering what else is out there?  Here’s a comparison of three textbooks.

5 Comments

How much text versus metadata is in a tweet?

This should have been a blog post, but I got lazy and wrote a plaintext document instead.

For twitter, context matters: 90% of a tweet is metadata and 10% is text.  That’s measured by (an approximation of) information content; by raw data size, it’s 95/5.

2 Comments

iPhone autocorrection error analysis

re @andrewparker:

My iPhone auto-corrected “Harvard” to “Garbage”. Well played Apple engineers.

I was wondering how this would happen, and then noticed that each character pair has 0 to 2 distance on the QWERTY keyboard.  Perhaps their model is eager to allow QWERTY-local character substitutions.

>>> zip(‘harvard’,'garbage’)
[('h', 'g'), ('a', 'a'), ('r', 'r'), ('v', 'b'), ('a', 'a'), ('r', 'g'), ('d', 'e')]

And then most any language model thinks p(“garbage”) > p(“harvard”), at the very least in a unigram model with a broad domain corpus.  So if it’s a noisy channel-style model, they’re underpenalizing the edit distance relative to the LM prior. (Reference: Norvig’s noisy channel spelling correction article.)

On the other hand, given how insane iPhone autocorrections are, and from the number of times I’ve seen it delete a quite reasonable word I wrote, I’d bet “harvard” isn’t even in their LM.  (Where the LM is more like just a dictionary; call it quantizing probabilities to 1 bit if you like.)  I think Hal mentioned once he would gladly give up GB’s of storage for a better language model to make iPhone autocorrect not suck.  That sounds like the right tradeoff to me.

Language models with high coverage are important.  As illustrated in e.g. one of those Google MT papers.  Wish Apple would figure this out too.

6 Comments

Log-normal and logistic-normal terminology

I was cleaning my office and found a back-of-envelope diagram Shay drew me once, so I’m writing it up to not forget.  The definitions of the logistic-normal and log-normal distributions are a little confusing with regard to their relationship to the normal distribution.  If you draw samples from one, the arrows below show the transformation to make it such you have samples from another.

For example, if x ~ Normal, then transforming as y=exp(x) implies y ~ LogNormal.  The adjective terminology is inverted: the logistic function goes from normal to logistic-normal, but the log function goes from log-normal to normal (other way!).  The log of the log-normal is normal, but it’s the logit of the logistic normal that’s normal.

Here are densities of these different distributions via transformations from a standard normal.

In R:  x=rnorm(1e6); hist(x); hist(exp(x)/(1+exp(x)); hist(exp(x))

Just to make things more confusing, note the logistic-normal distribution is completely different than the logistic distribution.

What are these things?  There’s lots written online about log-normals.  Neat fact: it arises from lots of multiplicative effects (by the CLT, since additive effects imply the normal).  The very nice Clauset et al. (slidesblogpost) finds log-normals and stretched exponentials fit pretty well to many types of data that are often claimed to be power-law.

The logistic-normal is more obscure–it doesn’t even have a Wikipedia page, so see the original Aitchison and Shen.  Hm, on page 2 they talk about the log-normal, so they’re responsible for the very slight naming weirdness.  The logistic-normal is a useful Bayesian prior for multinomial distributions, since in the d-dimensional multivariate case it defines a probability distribution over the simplex (i.e. parameterizations of d-dim. multinomials), similar to the Dirichlet, but you can capture covariance effects and chain them together and other fun things, though inference can be trickier (typically via variational approximations).  A biased sample of text modeling examples include Blei and Lafferty, another B&L, Cohen and Smith, Eisenstein et al.

OK, so maybe these distributions aren’t really related beyond involving transformations of the normal.

Finally, note that the diagram only writes out the logistic-normal for the one-dimensional case; in the multivariate case, there’s an additional wrinkle that the logistic-normal has one less dimension than the normal, since you don’t need a parameter for the last one dimension (subtract the rest out of 1).  For example, a 3-d normal (distribution over 3-space) corresponds to a logistic-normal distribution over a simplex in 3-space, having only 2 dimensions.

1 Comment

Shalizi’s review of NKS

I laugh out loud every time I reread Cosma Shalizi’s review of “New Kind of Science” (2005).  I remember reading it back in college when everyone was talking about the book, when I was just losing my naivete about the popular science treatments of complex systems and such.  I must be getting more cynical as I get older because I keep liking the review more.  This time my favorite line was

Wolfram even goes on to refute post-modernism on this basis; I won’t touch that except to say that I’d have paid a lot to see Wolfram and Jacques Derrida go one-on-one.

And on the issue of running your own conventions and citing yourself, he compares it to

… the way George Lakoff uses “as cognitive science shows” to mean “as I claimed in my earlier books”

These quotes are funnier in context.

1 Comment

Rough binomial confidence intervals

I made this table a while ago and find it handy: for example, looking at a table of percentages and trying to figure out what’s meaningful or not. Why run a test if you can estimate it in your head?

References: Wikipediabinom.test

3 Comments

Poor man’s linear algebra textbook

I keep learning new bits of linear algebra all the time, but I’m always hurting for a useful reference.  I probably should get a good book (which?), but in the meantime I’m collecting several nice online sources that ML researchers seem to often recommend: The Matrix Cookbook, plus a few more tutorial/introductory pieces, aimed at an intermediate-ish level.

Main reference:

Tutorials/introductions:

After studying for this last stats/ML midterm, I’ve now printed them out and stuck them in a binder.  A poor man’s linear algebra textbook.

I’d love to learn of more or different stuff out there.  (There are always the appendixes of linear algebra reviews in Hastie et al. ESL and Boyd+Vandenberghe CvxOpt, but I’ve always found them a little too small for usefulness+understanding.)

9 Comments

Move to brenocon.com

I’ve changed my website and blog URL from anyall.org to brenocon.com. The former was supposed to be a reference to first-order logic: the existential and universal quantifiers are fundamental to relational reasoning, and as testament to that, they are enshrined as “any()” and “all()” in wise programming languages like Python and R. Or something like that. It turns out this was obvious only to me :)

I tried to set up everything to automatically redirect, so no links should be broken. Hopefully.

2 Comments