quick note: cer et al 2010

Quick note, reading this paper from their tweet.

update this reaction might be totally wrong; in particular, the conll dependencies for at least some languages were done completely by hand.


Malt and MSTParser were designed for the Yamada and Matsumodo dependencies formalism (the one used for the CoNLL dependency parsing shared task, from the penn2malt tool). Their feature sets and probably many other design decisions were created to support that. If you compare their outputs side-by-side, you will see that the Stanford Dependencies are a substantially different formalism; for example, compound verbs are handled very differently (the paper talks about copula example).

I think the following conclusion is premature:

Notwithstanding the very large amount of research that has gone into dependency
parsing algorithms in the last five years, our central conclusion is that the quality of the Charniak, Charniak-Johnson reranking, and Berkeley parsers is so high that in the vast majority of cases, dependency parse consumers are better off using them, and then converting the output to typed dependencies.

Re-run the experiment with Yamada and Matsumodo (or maybe Johannson and Nugues / pennconverter) then that would be convincing. In the meantime, a fair test of dependency parsing approach for the Stanford formalism would be to use a dependency parser and design richer features to support it. The paper points outs cases where Malt’s and MSTparser’s algorithms aren’t powerful enough; I wonder how TurboParser‘s would fare. But this requires lots of work (writing new features) so maybe no one will do it.

This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to quick note: cer et al 2010

  1. Mihai says:

    This is a partial answer to your critique (and also a shameless pitch). Here (http://www.surdeanu.name/mihai/ensemble/) I compared dependency parser performance for Stanford dependencies and CoNLL-2008. The context is slightly different: I used only linear SVMs and I’m interpolating several shift-reduce models. But I think the main observation holds: the performance of dependency parsers is very close on CoNLL-2008 and Stanford dependencies.

    I can’t easily evaluate constituent parsers on CoNLL-2008 dependencies because the tokenization is different and the generation of syntactic dependencies for this corpus was not fully automated, but, extrapolating a bit, I don’t think we would see numbers too different than in the Cer et al. paper on CoNLL-2008.

  2. brendano says:

    Cool.

    Your paper only talks about the CoNLL experiment. For the experiment shown on the webpage, where do the Stanford Dependencies data come from? The SD extractor run on gold penn treebank parses?

  3. Mihai says:

    Yes. Hence the gold POS tags in the experiment.

  4. Daniel Cer says:

    I suspect the results may reflect just how good we are at producing phrase structure parses for English. Rather than just focusing on other dependency formalisms, I think the more interesting question might be do the results generalize to other languages.

  5. brendano says:

    But then you need a phrase->dependency extractor for every language. They’re always rule-based. Maybe that’s not too hard? Comparing on CoNLL starts becoming more problematic, but maybe that’s not the point.