Link: Today’s international organizations

Fascinating — a review of the current international system, focusing on international organizations (that is, organizations of states). Who runs the world? | Wrestling for influence | Economist.com

Leave a comment

Bias correction sneak peek!

(Update 10/2008: actually this model doesn’t work in all cases.  In the final paper we use an (even) simpler model.)

I really don’t have time to write up an explanation for what this is so I’ll just post the graph instead. Each box is a scatterplot of an AMT worker’s responses versus a gold standard. Drawn are attempts to fit linear models to each worker. The idea is to correct for the biases of each worker. With a linear model y ~ ax+b, the correction is correction(y) = (y-b)/a. Arrows show such corrections. Hilariously bad “corrections” happen. *But*, there is also weighting: to get the “correct” answer (maximum likelihood) from several workers, you weight by a^2/stddev^2. Despite the sometimes odd corrections, the cross-validated results from this model correlate better with the gold than the raw averaging of workers. (Raw averaging is the maximum likelihood solution for a fixed noise model: a=1, b=0, and each worker’s variance is equal).

Much better explanation is coming… will be a blog.doloreslabs.com post I think.

Picture!

Leave a comment

Turker classifiers and binary classification threshold calibration

I wrote a big Dolores Labs blog post a few days ago. Click here to read it. I am most proud of the pictures I made for it:

1 Comment

Pairwise comparisons for relevance evaluation

Not much on this blog lately, so I’ll repost a comment I just wrote on whether to use pairwise vs. absolute judgments for relevance quality evaluation. (A fun one I know!)

From this post on the Dolores Labs blog.

The paper being talked about is Here or There: Preference Judgments for Relevance by Carterette et al.

I skimmed through the Carterette paper and it’s interesting. My concern with pairwise setup is, in order to get comparability among query-result pairs, you need to get annotators to do an O(N^2) amount of work. (Unless you do something horribly complicated with partial orders.) The absolute judgment task scales linearly, of course. Given the AMT environment and a fixed budget, if I stay in the smaller-volume task, instead of spending a lot on a quadratic taskload, I can simply get a higher number of workers per result and boil out more noise. Of course, if it’s true the pairwise judgment task is easier — as the paper claims — that might make my spending more efficient. But since it’s polynomial, no matter the cost/benefit ratios, there has to be a tipping point where, for a given data set size, you’d always want to switch back to absolute judgments.

Absolute judgments are just so much easier to compute with — both for analysis and to use as machine learning training data. I really don’t want to have fancy utility inference or stopping rule schemes just to know the relative ranking of my data. (And I think real-valued scores will always become a necessity. Theoretical microeconomists have made boatloads of theorems about representing preferences by pairwise comparisons. It turns out that when you add enough rationality assumptions — e.g. the sort that are demanded of search engine ranking tasks anyways — then your fancy ordering can always be mapped back to real-valued utility function.)

I’d be most interested in a paper that compares real-valued scores derived from some sort of pairwise comparison task, versus absolute judgments, and is mindful of the cost tradeoffs in service of an actual goal, like ranking algorithm training.

4 Comments

Clinton-Obama support visualization

This interactive histogram is brilliant. The NYT data visualization folks never fail to impress.

margins.swf (application/x-shockwave-flash Object)

1 Comment

Sub-reddit for Systems Science and OR

I’ve been a big fan of Reddit’s Programing subsite for a while. Just this morning I found another sub-reddit:
SYSOR: Systems Science, Operations Research and Everything In Between, and I’m loving it. Lots of links on data mining, graph software, image recognition, learning theory, etc etc.

Not sure if the Operations Research part in the title is so big now — there’s some sort of complicated reorganization of a number of fields including operations research, systems science, computational learning, and more general computer science areas. I’m a fan of the great historical overview in the Introduction in Rusell and Norvig’s AI book; I’m sure it’s slanted in various ways, but what else are overarching narratives for? :)

4 Comments

conplot – a console plotter

This has to be the most quick-and-dirty data visualizer out there: I wrote an ascii art plotter script that takes a column of numbers on stdin and throws out a plot on your console. I’ve been using it for several months to quickly look at numbers on the commandline, especially from logs and such. (Back in school I would use gnuplot for this; R is good too. But sometimes you want to move really fast, esp if you have a few hideous perl -pe one-liners on your hands and mucking around with temp files will interrupt your flow.)

Link: github.com/brendano/conplot

“Demo”:

$ cat time.log | conplot
14601
                                                                         oooooooo
                                                                    oooooo
                                                            ooooooooo
                                                 oooooooooooo
11269                                     oooooooo
                                       oooo
                                     ooo
                                  oooo
                                 oo
                               ooo
7271                       ooooo
                        oooo
                     oooo
                  oooo
               oooo
            oooo
3272       oo
           o
           o
          oo
        ooo
      ooo
-726  0                                                                   76826

I must say, it’s way easier to throw up some code on GitHub than on to SourceForge, which is the only other open source code hosting service I’ve used. I guess Google Code is their biggest competitor in that respect; I haven’t tried it.

3 Comments

The best natural language search commentary on the internet

With Powerset’s launch, there’s an awful lot of hot air and crappy blog posts about natural language search being written. Instead of contributing to that mess, I prefer to direct the reader to the best writing on the topic that I’ve seen: Fernando Pereira’s posts on search.

Leave a comment

Are women discriminated against in graduate admissions? Simpson’s paradox via R in three easy steps!

R has a fun built-in package, datasets: a whole bunch of easy-to-use, interesting tables of data. I found the famous UC Berkeley admissions data set, from a 1970′s study of whether sex discrimination existed in graduate admissions. It’s famous for illustrating a particular statistical paradox. Thanks to R’s awesome mosaic plots interface, we can see this really easily.

UCBAdmissions is a three-dimensional table (like a matrix): Admit Status x Gender x Dept, with counts for each category as the matrix’s values. R’s default printing shows the basics just fine. Here’s the data for just the first of six departments:

> UCBAdmissions
, , Dept = A

          Gender
Admit      Male Female
  Admitted  512     89
  Rejected  313     19

...

Overall, women have a lower admittance rate than men:

> apply(UCBAdmissions,c(1,2),sum)

          Gender
Admit         M    F
  Admitted 1198  557
  Rejected 1493 1278

This is the phenomenon that prompted a lawsuit against Berkeley which prompted the study that collected this data.

R’s plot function is overloaded to do a mosaic plot for this sort of categorical data. Very cool. With just

> plot(UCBAdmissions)

or, playing around after reading Quick-R’s page on this:

> install.packages(”vcd”)
> library(vcd)
> mosaic(UCBAdmissions, condvars=c('Dept'))

We have a plot showing admittance and gender breakdowns per department:

In each department, women have similar admittance rates as men. This seems to be at odds with the fact that women have a lower admittance rate overall. This discrepancy is an example of Simpson’s paradox.

This mosaic also shows the explanation: Selective departments have more female applicants. It’s easy to see since the departments are ordered by selectiveness. Departments A and B let in many applicants, but they’re mostly male. The reverse is true for the rest. This means that the overall female population takes big admittance hits in departments C through F, while lots of males get in via departments A and B.

I think these mosaic plots are impressive for visualizing categorical proportions for high dimensional data sets. Well, by “high” I think I mean, more than 2. I can’t think of a better way to see several cross relationships in categorical data at once. And the only tuning I needed to do was play around a bit with the order of those three dimensions.

Sources:

  • R’s UCBAdmissions help page. It comes with the standard download of R.

  • R’s vcd::mosaic function. I recommend the pdf vigenette about it, which has many more pictures of cool mosaic plots.
  • I would post the original 1975 Science paper, but it’s not freely available. I hate academic publishers. Here’s the paper, at least for now:
    • Bickel, P. J., Hammel, E. A., and O’Connell, J. W. (1975) Sex bias in graduate admissions: Data from Berkeley. Science, 187, 398–403. [PDF]
8 Comments

a regression slope is a weighted average of pairs’ slopes!

Wow, this is pretty cool:

From an Andrew Gelman article on summaring a linear regression as a simple difference between upper and lower categories. I get the impression there are lots of weird misunderstood corners of linear models… (e.g. that “least squares regression” is a maximum likelihood estimator for a linear model with normal noise… I know so many people who didn’t learn that from their stats whatever course, and therefore find it mystifying why squared error should be used… see this other post from Gelman.)

Leave a comment