Here’s a fascinating NYT article on the Netflix Prize for a better movie recommendation system. Tons of great stuff there; here’s a few highlights …
First, a good unsupervised learning story:
There’s a sort of unsettling, alien quality to their computers’ results. When the teams examine the ways that singular value decomposition is slotting movies into categories, sometimes it makes sense to them — as when the computer highlights what appears to be some essence of nerdiness in a bunch of sci-fi movies. But many categorizations are now so obscure that they cannot see the reasoning behind them. Possibly the algorithms are finding connections so deep and subconscious that customers themselves wouldn’t even recognize them. At one point, Chabbert showed me a list of movies that his algorithm had discovered share some ineffable similarity; it includes a historical movie, “Joan of Arc,” a wrestling video, “W.W.E.: SummerSlam 2004,” the comedy “It Had to Be You” and a version of Charles Dickens’s “Bleak House.” For the life of me, I can’t figure out what possible connection they have, but Chabbert assures me that this singular value decomposition scored 4 percent higher than Cinematch — so it must be doing something right. As Volinsky surmised, “They’re able to tease out all of these things that we would never, ever think of ourselves.” The machine may be understanding something about us that we do not understand ourselves.
Well, I’m pretty suspicious of drawing conclusions from that single example — it could have been a genuine grouping error while different, better groupings elsewhere were responsible for that 4 percent gain. That’s why I’m a fan of systematically evaluating unsupervised algorithms; for example, as in political bias and SVD.
Another bit: suspicions that demographics might be less useful than individual movie preferences:
Interestingly, the Netflix Prize competitors do not know anything about the demographics of the customers whose taste they’re trying to predict. The teams sometimes argue on the discussion board about whether their predictions would be better if they knew that customer No. 465 is, for example, a 23-year-old woman in Arizona. Yet most of the leading teams say that personal information is not very useful, because it’s too crude. As one team pointed out to me, the fact that I’m a 40-year-old West Village resident is not very predictive. There’s little reason to think the other 40-year-old men on my block enjoy the same movies as I do. In contrast, the Netflix data are much more rich in meaning. When I tell Netflix that I think Woody Allen’s black comedy “Match Point” deserves three stars but the Joss Whedon sci-fi film “Serenity” is a five-star masterpiece, this reveals quite a lot about my taste. Indeed, Reed Hastings told me that even though Netflix has a good deal of demographic information about its users, the company does not currently use it much to generate movie recommendations; merely knowing who people are, paradoxically, isn’t very predictive of their movie tastes.
Though I would like to see the results for throwing in demographics as features versus not. It’s a little annoying that so many of the claims in the article aren’t backed up by empirical evidence — which you’d think would be the norm for such a data-driven topic!
Finally, an interesting question:
Hastings is even considering hiring cinephiles to watch all 100,000 movies in the Netflix library and write up, by hand, pages of adjectives describing each movie, a cloud of tags that would offer a subjective view of what makes films similar or dissimilar. It might imbue Cinematch with more unpredictable, humanlike intelligence.
At the very least, I bet that would help Cinematch by supplying a new data source that’s unlike the current ones they have — always a good move. As for “humanlike” — well, computational intelligence is a tough game to be in!