Comments on: I don’t get this web parsing shared task

By: Graphs for SANCL-2012 web parsing results | AI and Social Science – Brendan O'Connor

Sat, 24 Nov 2012 01:24:30 +0000

[...] of Wall Street Journal annotated data and very little in-domain training data. (Previous discussion here; see Ryan McDonald’s detailed comment.) Here are some graphs of the results (last page in the [...]

By: Brendan O'Connor

Brendan O'Connor — Fri, 09 Mar 2012 04:10:32 +0000

re: getting new annotations the standard way of doing business — yes. I remember when Powerset finally hired an annotation expert. Was way more useful than quite a bit of the other work we were doing.

Thank you for the explanation of your goals, it is helpful.

By: Brendan O'Connor

Brendan O'Connor — Fri, 09 Mar 2012 04:04:52 +0000

re: semi-supervised learning for twitter POS tagging — I don’t know the details, but there was one experiment that turned out negative. It is easy to believe more feature engineering could be a better use of researcher time, though hard to prove such a negative of course.

By: Ryan McDonald

Ryan McDonald — Fri, 09 Mar 2012 03:37:35 +0000

A couple points. First, as the website points out:

“It is permissible to use previously constructed lexicons, word clusters or other resources provided that they are made available for other participants.”

So you can use clusters, but in the spirit of open competition, we ask that these resources be made available.

Second, I agree that taking a domain, running system X on the domain, doing an error analysis and then adding features, changing the model or annotating some more data is a very good way to adapt systems. I don’t think anyone is ‘scared’ of this approach. In fact, outside academia, this is the standard way of doing business, not the exception. However, this is not as easy as it sounds. First, you need the resources (human resources that is) to do this for every domain on the web or domain you might be interested in. Second, the annotations you wish to collect must be easily created by you or via a system like mechanical turk. It is one thing to annotate some short twitter posts with 12-15 part of speech tags and a whole other thing to annotate consumer reviews with syntactic structure. I have tried both. They are not comparable. Even the former cannot be done reliably by turkers, which means you will need grad students, staff research scientists or costly third party vendors to do this every time you want to study a new domain.

So the spirit of the competition was to see, from a modeling/algorithm perspective, what are the best methods for training robust syntactic analyzers on the data currently available. By limiting the resources we are trying to make this as much an apples-to-apples comparison as we can. Even this is impossible. Some parsers require lexicons that might have been tuned for specific domains, etc.

Understanding this is still valuable in the “analyze, annotate and iterate” approach. Don’t you want to start off with the best baseline to reduce the amount of human labor required?

By: Fernando Pereira

Fernando Pereira — Fri, 09 Mar 2012 03:21:26 +0000

The purpose of a shared task is to compare models and algorithms, not to compare human annotation and error analysis skill, which depends mainly on a team’s supply of human labor. As for “Our lack of semi-supervised learning was not a weakness,” how do you know? Have you proven that such a better learning algorithm is impossible?