Update (2013-09-17): See David Bamman‘s great guest post on Language Log on our latent personas paper, and the big picture of interdisciplinary collaboration.

I’ve been informed that an interesting critique of my, David Bamman’s and Noah Smith’s ACL paper on movie personas has appeared on the Language Log, a guest post by Hannah Alpert-Abrams and Dan Garrette. I posted the following as a comment on LL.

Thanks everyone for the interesting comments. Scholarship is an ongoing conversation, and we hope our work might contribute to it. Responding to the concerns about our paper,

We did not try to make a contribution to contemporary literary theory. Rather, we focus on developing a computational linguistic research method of analyzing characters in stories. We hope there is a place for both the development of new research methods, as well as actual new substantive findings. If you think about the tremendous possibilities for computer science and humanities collaboration, there is far too much to do and we have to tackle pieces of the puzzle to move forward. Clearly, our work falls more into the first category — it was published at a computational linguistics conference, and we did a lot of work focusing on linguistic, statistical, and computational issues like:

  • how to derive useful semantic relations from current syntactic parsing and coreference technologies,
  • how to design an appropriate probabilistic model on top of this,
  • how to design a Bayesian inference algorithm for the model,

and of course, all the amazing work that David did in assembling a large and novel dataset — which we have released freely for anyone else to conduct research on, as noted in the paper. All the comments above show there are a wealth of interesting questions to further investigate. Please do!

We find that, in these multidisciplinary projects, it’s most useful to publish part of the work early and get scholarly feedback, instead of waiting for years before trying to write a “perfect” paper. Our colleagues Noah Smith, Tae Yano, and John Wilkerson did this in their research on Congressional voting; Brendan did this with Noah and Brandon Stewart on international relations events analysis; there’s great forthcoming work from Yanchuan Sim, Noah, Brice Acree and Justin Gross on analyzing political candidates’ ideologies; and at the Digital Humanities conference earlier this year, David presented his joint work with the Assyriologist Adam Anderson on analyzing social networks induced from Old Assyrian cuneiform texts. (And David’s co-teaching a cool digital humanities seminar with Christopher Warren in the English department this semester — I’m sure there will be great cross-fertilization of ideas coming out of there!)

For example, we’ve had useful feedback here already — besides comments from the computational linguistics community through the ACL paper, just in the discussion on LL there have been many interesting theories and references presented. We’ve also been in conversation with other humanists — as we stated in our acknowledgments (noted by one commenter) — though apparently not the same humanists that Alpert-Abrams and Garrett would rather we had talked to. This is why it’s better to publish early and participate in the scholarly conversation.

For what it’s worth, some of these high-level debates on whether it’s appropriate to focus on progress in quantitative methods, versus directly on substantive findings, have been playing out for decades in the social sciences. (I’m thinking specifically about economics and political science, both of which are far more quantitative today than they were just 50 years ago.) And as several commenters have noted, and as we tried to in our references, there’s certainly been plenty of computational work in literary/cultural analysis before. But I do think the quantitative approach still tends to be seen as novel in the humanities, and as the original response notes, there have been some problematic proclamations in this area recently. I just hope there’s room to try to advance things without being everyone’s punching bag for whether or not they liked the latest Steven Pinker essay.

    I appreciate that you took the time to comment on the LL post. I think it’s important for humanities and engineers to have a constructive dialogue. I am sympathetic to both sides (having lived in both worlds), but I suspect this is a case of “talking past each other.” Engineering has a culture of publishing proof-of-concept papers based on small or unbalanced data sets, just to get the ball rolling (as you point out). But this is virtually unheard of in the humanities.

    You mention that you believe “it’s most useful to publish part of the work early and get scholarly feedback, instead of waiting for years before trying to write a “perfect” paper.” While I agree with the interactive feedback notion underlying your point, I have to tell you that you come across as a bit smug and arrogant by saying it in this way. You are certainly not showing much respect to the traditions within humanities by adding the snide remark about a “perfect paper.” Humanities is its own academic culture, with it’s own traditions of what counts as publishable. Simply declaring your own academic traditions as preferable is not particularly respectful.

    I also agree that the UT Austin team’s response posted on Language Log was somewhat condescending and disrespectful of you and your team at CMU as well (and some of the LL commenters called them out on it as well). This is a clash of academic cultures.

    FWIW, I think you’re doing really interesting work that’s worth continuing. I look forward to seeing what you publish next.

    Thanks for the insights. On the topic of when something is ready to publish — say we have a goal to develop an automated analysis of characters in a corpus of stories, and want to intensively use it to achieve new literary insights. I think that is a giant, enormous task. It seems much more manageable to tackle it as a series of subprojects than to invent everything at once without publishing something along the way.

    This is, of course, the style of peer-reviewed CS conference publications — in fact, like many other similar venues, ACL has an 8-page limit so you can’t even do an in-depth analysis. The question of how large a project has to be before it’s publishable isn’t just about the humanities. It comes up in many other areas related to CS — for example, when comparing statistics versus machine learning — the latter usually proceeds via peer-reviewed CS conference publishing, whereas the former is usually in a more traditional journal context. (I think you’re saying that humanities is more like the journal style.) I decided to do a CS PhD in part because I talked to multiple people in related areas (statistics and economics) who basically seemed jealous of the CS publishing style, and thought it brought about real benefits in terms of faster experimentation and more innovation.

    The great frustration I have about this style is that these size papers naturally have less content in them than a giant journal article or book, and that to understand them you have to be pretty familiar with the context of other work they build upon. Ideally, it’s better to follow up with more work or a longer journal publication. But I do think it’s basically much better research culture than when you have to wait for years to get one article published, and it’s part of why I’m participating in it in the first place.

    Totally agree. A big part of these constraints are simply procedural: how can a large group of scholars come together and share their work in an efficient way? There has to be a cut-off and someone will complain regardless of where you put that cut-off. I like the emerging community of project pages where updates and progress are regularly posted and discussed.

    AA and Garette (though probably just AA) did not offer you a critique. They essentially said, “We don’t like your theory.” Worse, they implied that people outside R1 humanities departments (like yourself) had better work with the same pet theories as R1 humanities departments if they’re going to do humanities type things. AA’s pet theory is Foucauldian “power/oppression” theory, along racial and gender lines. You drew on archetypal criticism. AA wishes you hadn’t drawn on archetypal criticism. She wishes you had chosen her favorite theory to inform your questions and methods. Will she actually redesign your study with her own pet theory in mind? No. Her point is not to move forward the research you’ve begun but to fight the use of any theory that isn’t her pet theory. You see this a lot from Foucault fans: “If you’re not taking power and oppression into consideration, you’re doing it wrong!” Very Lysenko.

    Oh, and you know of course that the literary theory you drew upon was only a minor element of your paper, providing a “jumping off” point for a demonstration of methodology. But even that drew AA’s ire! Don’t you touch that archetypal stuff, young man! Put down the Aristotle! Read some Foucault, that’s what all the cool kids are doing!