Movie summary corpus and learning character personas

Here is one of our exciting just-finished ACL papers.  David and I designed an algorithm that learns different types of character personas — “Protagonist”, “Love Interest”, etc — that are used in movies.

To do this we collected a brand new dataset: 42,306 plot summaries of movies from Wikipedia, along with metadata like box office revenue and genre.  We ran these through parsing and coreference analysis to also create a dataset of movie characters, linked with Freebase records of the actors who portray them.  Did you see that NYT article on quantitative analysis of film scripts?  This dataset could answer all sorts of things they assert in that article — for example, do movies with bowling scenes really make less money?  We have released the data here.

Our focus, though, is on narrative analysis.  We investigate character personas: familiar character types that are repeated over and over in stories, like “Hero” or “Villian”; maybe grand mythical archetypes like “Trickster” or “Wise Old Man”; or much more specific ones, like “Sassy Best Friend” or “Obstructionist Bureaucrat” or “Guy in Red Shirt Who Always Gets Killed”.  They are defined in part by what they do and who they are — which we can glean from their actions and descriptions in plot summaries.

Our model clusters movie characters, learning posteriors like this:

Screen Shot 2013-05-07 at 10.11.23 PM


Each box is one automatically learned persona cluster, along with actions and attribute words that pertain to it.  For example, characters like Dracula and The Joker are always “hatching” things (hatching plans, presumably).

One of our models takes the metadata features, like movie genre and gender and age of an actor, and associates them with different personas.  For example, we learn the types of characters in romantic comedies versus action movies.  Here are a few examples of my favorite learned personas:

Screen Shot 2013-05-07 at 11.02.19 PM

One of the best things I learned about during this project was the website TVTropes (which we use to compare our model against).

We’ll be at ACL this summer to present the paper.  We’ve posted it online too:

This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to Movie summary corpus and learning character personas

  1. mike says:

    Very cool.

    Captain Jack Sparrow = Shrek

    Did you find magic pixie dream girl or magical black man/noble savage?

  2. Pingback: LightSide | LightSide’s Top Ten Papers at ACL 2013

  3. Pingback: LightSide’s Top Ten Papers at ACL 2013 | LiXiang

  4. “”We conclude that the trademark, as thus modified, is entitled to trademark protection,”The ruling said.On monday, eat whole wheat waffles with fruit, scrambled egg whites with vegetables and low-Fat yogurt.In fact, if tony cheap ralph lauren gonzalez pas cher christian louboutin had made one touchdown on sunday night i would be in., you’ll also find a text option for Christmas Elf is Watching.A al meeting may be the best way because you can discuss things in detail. If you make that decision t

  5. Tammy says:

    Awesome site you have here but I was wanting to know
    if you knew of any discussion boards that cover
    the same topics discussed here? I’d really love to be a part
    of community where I can get feed-back from other knowledgeable people that share the same
    interest. If you have any recommendations, please
    let me know. Thanks a lot!

  6. Johng511 says:

    I know this if off topic but I’m looking into starting my own blog and was curious what all is required to get set up? I’m assuming having a blog like yours would cost a pretty penny? I’m not very web savvy so I’m not 100 positive. Any recommendations or advice would be greatly appreciated. Thanks kkkaeffdffae

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>