Part 1 of a multi-part series concluding 30 days from now, more or less
Part 1 | Part 2 | Part 3 | Part 4 | Part 5
The Netflix Prize has finally been secured. Or has it?
It took over three years and a conglomeration of at least four leading teams in order to pass the 10% improvement mark of Netflix’s in-house Cinematch recommendation system. The $1 million posting prize is still not disbursed, though, because the rules of the contest state that all other teams have 30 days to submit a better algorithm in this “last call” stage.
Many of the contestants in the Netflix Prize have praised the company for funding such a competition and creating a question worthy of their engineering attention. We here at MediaUnbound want to laud Netflix, as well, not so much for their pursuit of the perfected recommendation engine, but for bringing the problem of recommendations out into spotlight. But while the public is in the 30-day limbo of finding out who will actually claim the prize money, MediaUnbound would like to investigate what the end of the contest actually means.

Movies in ancient Greece.
Right now, the leading team (BellKor’s Pragmatic Chaos) has produced an algorithm which can take data from a set of Netflix users and predict how they rated movies that they have already watched. And the algorithm can, on average, get within .8572 (plus or minus) of the actual rating (an integer on a scale of 1-5). On a 1-5 scale, this still seems like a significant gap.
The underlying assumption in the contest is that the algorithm can be used and scaled by Netflix to predict what their users will like for movies which they haven’t seen–usually considered the most interesting problem for recommendation systems. But based on the methodologies of the contest and the teams, it’s not clear that this will be the case. The danger of Netflix’s narrow definition of what constitutes “recommendations” is that you end up with a decent system for predicting how a user will rate a movie on a 1-5 scale. If you think about recommendations as a dialogue between service and user, the scenario unfolds something like this:
Crito: “Socrates, I just saw The Shawshank Redemption last week.”
Socrates: “I would guess that you truly enjoyed that film, Crito. A 4 out of 5?”
Crito “Indeed. I found the theme of escape from imprisonment most satisfying.”
Socrates: “Does that mean I got it right?”
Presumably, Socrates could then recommend a prison movie that Crito doesn’t know about (say, Lock Up). But, this is where the contest algorithms have nothing to say about movie discovery. Socrates can’t even interpret Crito’s natural response beyond a 1-5 rating.
This leaves out a whole bunch interesting questions which would undeniably enrich any recommendation engine:
- How is the ability to predict viewer ratings (movies already seen) different from the problem of movie discovery (movies not yet seen)?
- Will a 10% increase in Cinematch actually translate into noticeable product improvements for Netflix subscribers?
- Does the Netflix contest ask a question with direct relevance to Netflix’s overall business goals and financial bottom-line?
- Are 1-5 star ratings objective? Are there better ways of soliciting input from viewers that would yield more meaningful data with less tedium?
- Does the BellKor’s Pragmatic Chaos approach allow for fine-tuning and human curatorship of results without disturbing the black-box?
- How could an algorithm from the Netflix contest handle newly released movies?
- Do TV series and movies require different types of approaches?
- Is the user data from the Netflix contest representative of the entire Netflix subscriber base? Is the contest data set static or does it take into account changed preferences over time?
- Are the conclusions from the Netflix Prize reusable for making better media recommendation systems?
Since the inception of the Netflix Prize, we at MediaUnbound have been thinking about these issues underlying the contest and figured that now was as good a time as any to start sharing them with you. Stay tuned over the coming weeks for further posts addressing many of these topics at length, all leading up to the final announcement of the Netflix Prize winner in about 27 days.
Part 1 | Part 2 | Part 3 | Part 4 | Part 5
Tags: Countdown to 10%, movie recommendations, netflix, plato's cave, shawshank redemption, socratic dialogue
If you go through the forum you will find extension discussion about some of these issues. The leading algorithms do take into account changing preferences over time. The big advance that has been achieved appears to be the demonstration that SVD inspired methods which scale better than kNN methods on large datasets also out perform kNN on large datasets.
The solution that wins 10% is clearly undesirable in production based on the complexity\reward curve. The gradient factorization publicized in the first few weeks of the contest by Simon Funk was essentially a small page of code netting a 4-5% improvement. An 8% solution can be found now, after years of effort, with perhaps 2-10 times the complexity. The 10% solution is at least 100-1000 times as complex (in terms of volume of code, compute time, and hand-fiddling with tuning).
A basic 80/20 rule comes into play in practice.
Nice solution for large datasets
Very good text. I’m really glad to have found your blog. Keep up the great work!