Author Archives: brendano

Memorizing small tables

Lately, I’ve been trying to memorize very small tables, especially for better intuitions and rule-of-thumb calculations. At the moment I have these above my desk: The first one is a few entries in a natural logarithm table. There are all … Continue reading

5 Comments

Be careful with dictionary-based text analysis

OK, everyone loves to run dictionary methods for sentiment and other text analysis — counting words from a predefined lexicon in a big corpus, in order to explore or test hypotheses about the corpus. In particular, this is often done … Continue reading

2 Comments

Information theory stuff

Actually this post is mainly to test the MathJax installation I put into WordPress via this plugin. But information theory is great, why not? The probability of a symbol is . It takes bits to encode one symbol — sometimes … Continue reading

5 Comments

End-to-end NLP packages

What freely available end-to-end natural language processing (NLP) systems are out there, that start with raw text, and output parses and semantic structures? Lots of NLP research focuses on single tasks at a time, and thus produces software that does … Continue reading

19 Comments

CMU Twitter Part-of-Speech tagger 0.2

Announcement: We recently released a new version (0.2) of our part-of-speech tagger for English Twitter messages, along with annotations and interface. See the link for more details.

Leave a comment

One last thing on the Norvig vs. Chomsky thing from a little while ago (http://norvig.com/chomsky.html), which (correctly) casts the issue as Shannon vs. Chomsky. The relevant seminal publications are: Shannon, “Mathematical Theory of Communication,” 1948 Chomsky, “Syntactic Structures,” 1957 One … Continue reading

3 Comments

Good linguistic semantics textbook?

I’m looking for recommendations for a good textbook/handbook/reference on (non-formal) linguistic semantics.  My undergrad semantics course was almost entirely focused on logical/formal semantics, which is fine, but I don’t feel familiar with the breadth of substantive issues — for example, … Continue reading

5 Comments

How much text versus metadata is in a tweet?

This should have been a blog post, but I got lazy and wrote a plaintext document instead. Link For twitter, context matters: 90% of a tweet is metadata and 10% is text.  That’s measured by (an approximation of) information content; … Continue reading

2 Comments

iPhone autocorrection error analysis

re @andrewparker: My iPhone auto-corrected “Harvard” to “Garbage”. Well played Apple engineers. I was wondering how this would happen, and then noticed that each character pair has 0 to 2 distance on the QWERTY keyboard.  Perhaps their model is eager … Continue reading

6 Comments

Log-normal and logistic-normal terminology

I was cleaning my office and found a back-of-envelope diagram Shay drew me once, so I’m writing it up to not forget.  The definitions of the logistic-normal and log-normal distributions are a little confusing with regard to their relationship to … Continue reading

4 Comments