By: Breck Baldwin

Breck Baldwin — Wed, 26 Aug 2009 22:06:03 +0000

Yikes! That system architecture sure looks familiar with the exception of the semantic filtering bootstrapped from Wikipedia–only it was 1995 and I was calling Michael Collins “boy”. It was the UPenn MUC-6 entry that “unofficially” won coref done with Jeff Reynar, Michael Collins, Jason Eisner, Adwait Ratnaparkhi, Joseph Rosenzweig, Anoop Sarkar and Srinivas UNIVERSITY OF PENNSYLVANIA :DESCRIPTION OF THE UNIVERSITY OF PENNSYLVANIA SYSTEM USED FOR MUC- 6

The performance improvement is stunning, 87.2% precision, 77.3% recall over our performance of 72% precision, 63% recall. But we had to find our own mentions lemme tell ya, it was up hill both ways to the LINC lab and we thought a megabyte of memory was a pretty huge thing.

But seriously, when did coreference papers get to report performance without finding their mentions? I know ACE provided that option but it somehow just seems wrong to have gold standard mentions that you only have to do coref over since there is no natural data out there that contains perfect noun phrases that are known to corefer to something.

Amit Bagga and I did experiments with the MUC-6 data like score the case where all gold standard mentions were made coreferent–you get 100% recall with around 35% precsion as I recall. Those results let us to consider other scoring metrics which then led to B-cubed which additionally involved Alan Bierman.

A request: Run the same system with a mention detection pass. Those numbers should be exciting.

By: Bob Carpenter

Bob Carpenter — Sun, 09 Aug 2009 23:23:35 +0000

I also have the peeve that I can’t reproduce people’s papers, because they go into great detail about CRFs (which is standard fare), but then gloss over the details of their features. I really liked Jenny Finkel’s bakeoff paper on CRFs for Biocreative that described their features and the recent Ratinov and Roth paper on named entity for the same reason. But even those fall short of a reproducible system description, which I still hold out as one of the hallmarks of science.

Papers that squeeze half a percent improvement on no-longer-held out data seem useless (one more paper evaluating on section 23 of the Treebank or the ModApte split of Reuters and I’ll scream).

I also like papers that introduce statistical techniques or applications, and those papers are important, too. For instance, I loved Haghighi and Klein’s previous approach to coref, based on the Dirichlet process, because it seems such a natural fit (other than computationally).

Of course there are non hyponym/hypernym coref cases! It’d be a stretch to call “Ronald Reagan, the U.S. president, …” a hypernym case, because “president” isn’t in any sense more generic than “Ronald Reagan” (and it’s temporally dependent; the relation isn’t permanent). But even eliminating names, you get things like, “the president, a former actor”, where there’s clearly no hypernym/hyponym relation.

The earliest work I know of for bootstrapping hypernyms/hyponyms was Ann Copestake’s work on LKB from 1990/1991.

We’ve probably both been thinking about natural tagging examples (your footnote [1]), because those are things you can get done with Mechanical Turk! A deeper problem is that annotation schemes like the Penn Treebank (even just the POS part) are built on theories of underlying linguistic structure which may be nothing like “the truth”, and may not even be useful in real world tasks. For instance, distributional clustering can be much more effective than POS tags as features in CRFs or parsers, and require no labeled data. (Though you can also use both in a discriminative model like CRFs.) But even named entities and coref have a large number of corner cases that need to be documented or just punted (e.g. “erstwhile Phineas Phoggs” from MUC-6 or “the various Bob Dylans” from recent film reviews).

Comments on: Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features

By: Breck Baldwin

By: Bob Carpenter