MediaUnBlog

Archive for July, 2009

Countdown to 10%: And the winner is?

Part 5 and a 1/2 of a multi-part series which could go on for a while yet, more or less

Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 5.5 | Part 6

In case you had an inspired moment recently and devised a brilliant algorithm designed to predict how movie watchers would rate films, you’re too late. The Netflix Prize has officially closed its doors to further submissions as of last Friday, having exhausted the 30-day window after BellKor’s Pragmatic Chaos reached the 10% improvement required by the contest. There was some exciting jostling at the end, with team BellKor and team Ensemble vying for ten-thousandths of decimal places of improvement. This seemingly microscopic difference could be just the edge which the winner of the $1M will require. Though certain team members are convinced of their victory, the public won’t know the final result for a few weeks, according to Netflix. They need to validate the entries and adjudicate a winner. No real announcement date has been posted; Netflix is keeping it vague. But the expectation is late August, grabbing headlines just before baseball teams expand their rosters with some late season call-ups. We at MediaUnbound will undoubtedly keep an eye on any breaking news both regarding Netflix and our favorite minor league prospects. And we can now add computer algorithm prizes to the list of other late summer indicators, along with fresh corn in New England, thoughts of back to school shopping, and August swoons by the local ball club. Fear not, though! In this interstitial period, we are still thinking about Netflix Prize topics and questions left yet uncovered, if only to get our pal Socrates out into the daylight once again. Additionally, if you the reader have any thoughts you would like discussed, please send us an email: blargh@mediaunbound.com. We welcome the interaction and the dialogue (we promise not to play the fool like Socrates; or, only if you want us to.)

Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 5.5 | Part 6

Countdown to 10%: Socrates on the dynamics of Napoleon Dynamite

Part 5 of a multi-part series concluding 5 days from now, more or less

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

Back in 1949, economist Bill Phillips built a machine which was designed to model the economic workings of the United Kingdom. It was part Rube Goldberg machine, part primitive hydraulic computer. Instead of electronics, the machine used water to power itself and to represent both inputs and outputs. The water, which passed through clear tubes in the machine, represented money in the economy, and it flowed from the treasury to other tanks which stood for all types of governmental expenditures. Valves were adjusted to set things like tax rates and investment rates. The Phillips machine (known as the MONIAC in the USA) was able to label, localize, and compartmentalize different economic stimuli and outputs. Still, though the machine was deemed quite accurate back in 1949, neither it nor its modern electronic offspring could truly model or predict the various economic complexities which affect (or afflict) us now.

Like the MONIAC, recommendation engines are themselves complex systems. As we have demonstrated in previous discussions, the Netflix Prize attempts to isolate one part of the recommendations suite: the one which examines user ratings to extrapolate how users will respond to films they haven’t seen. At the surface level, everything about its ambitions seems, if not simple, then at least manageable. But consider not only the amount of time which went into achieving the 10% increase (3 years) but also the combination and recombination of teams and their accompanying technology. The first team to reach the 10% increase (and the presumed favorite right now) is itself a hybrid of four teams who joined forces when, independently, they were unable to reach the 10% goal. By combining personnel, the teams have no doubt also combined various technologies developed internally. Complex algorithms have been sewn and soldered to one another in some Frankensteinian approach to engineering which might achieve a 10% increase in the very narrow question posed by the Netflix contest.

What was once an isolated segment of an already complex system has now itself become quite complex, perhaps even cumbersome. As the MONIAC demonstrates, the controls and inputs needed to achieve a desired outcome are many and need to be calibrated delicately. Though the Netflix Prize has isolated one component of recommendation systems, it has also potentially made it so complex that it is impossible to interact with or tune. The question then becomes: is this algorithmic chimera adaptable to Netflix’s whole recommendation system?

BellKor’s Pragmatic Chaos’s algorithm probably resembles the MONIAC in some ways: complex but well-ordered. But what happens when you need to add unforeseen inputs to such complex systems? In the case of the Phillips machine, let’s say mortgage-backed securities. You could build a tank filled with putrid and poisonous water which gets connected to the rest of the waterworks. It mucks up the computer’s inner workings and accuracy, throwing everything out of equilibrium and contaminating the system. The carefully-calibrated model cannot deal with unforeseen variables or offer solutions — appropriately, it fails.

The BellKor algorithm does away with large labeled tanks and, instead, uses statistical processes to create its own network of millions of tiny chambers and pipes through which electrons flow. It’s an attempt to capture intricacies, nuance, and breadth that would have taken years to model by hand by probing the human psyche on the details of why we like movies. Like the Phillips machine, the algorithm is an untweakable entity. The millions of connections are nameless. To modify the behavior of the algorithm, you would need to go back to your statistical crazy-water-network generating code and reprogram it; as a result, you may well lose the positive traits or accuracy of the original model. The result is take it or leave it.

Algorithms are efficient at crunching lots of data and returning accurate results, tempting stuff for recommendations systems. But there are times when the results need a human touch to massage the data and help the machine understand the nuances and idiosyncrasies of human preference. It would be laborious to hardcode data into every movie, providing the machine with a list of rules and behaviors for users. But human editors must be able to adapt a recommendation system to their own expert advice if humans and machines are to work in concert. Let’s return to Athens for a moment to see how this problem unfolds for Socrates:

Socrates: “Crito, behold this wondrous box I received yesterday at the temple of Athena! Some say it is a gift from the goddess herself, forged in the fires of Hephaestus’s realm.”

Crito: “That is a beautiful black box, my friend. The luster alone threatens to blind me. But how does it work?”

Socrates: “I need only say your name and a movie and the box will return a number of pebbles, from 1-5. The more pebbles, the more you will like the film.”

Crito: “This is truly a gift of the gods.”

Socrates: “Yes; the only thing which puzzles me is that it has returned five pebbles for Glaucon regarding the movie Napoleon Dynamite. He despised the film.”

Crito: “Glaucon lacks the necessary sense of irony1 for that film.”

Socrates: “Precisely as I figured! But when I tried to tell the black box of Glaucon’s deficiency, it cut me off and began spitting out five pebbles for the movie I Heart Huckabees. Once again, he despised the film.”

Crito: “Most curious.”

Socrates: “I tried to open up the box to adjust it, but Hephaestus’s technology is beyond me.”

Crito: “Be careful, Socrates. You must keep in mind Hesiod’s tale of Pandora, most elegantly told in the film Hellraiser. Perhaps the box is best left unopened.”

Socrates understands the mistake which the black box is making. But he is incapable of getting inside the guts of the box in order to provide a solution. The box is not made for tinkering. It simply goes about its job, which is spitting out pebbles. Any human input or interaction is beyond the box’s capabilities and design. Socrates should not blame Hephaestus; he just needs to request some additional boxes to supplement the one he now has. Or, Socrates could use the black box as a starting point for his own recommendations and then edit or amend the black box’s pebble recs once he considers them, thus adding a human element to the process. He was successful with the human approach before; there is no reason it will not continue to hold true.

So how did Socrates proceed? Let’s find out:

Glaucon: “Socrates, I have heard about this movie Sideways. It has something to do with wine, which is of course of great interest to me. Would you please ask the black box how I will enjoy it.”

Socrates (speaking to the black box): “Glaucon. Sideways. He lacks he necessary ir-…”

The black box spits out five pebbles and Socrates tries to quickly pocket as many of them as he can before Glaucon notices

Glaucon: “There are only two pebbles here now, but I swear I saw the box disburse five! Do my eyes deceive me, Socrates?”

Socrates: “Silly Glaucon. There are clearly only two pebbles. Perhaps you yourself had too much wine today. Sideways is not for you.”

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

Footnotes

  1. the original meaning of the Greek eironeia was something like feigned ignorance or purposeful dissembling,a charge often laid against Socrates himself. []
Countdown to 10%: The Trinity of netSkix

Part 4 of a multi-part series concluding 9 days from now, more or less

Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 5.5 | Part 6

Here is the fundamental business model of Netflix: a user pays a monthly subscription fee to have DVDs mailed to them, the number of which is limited only by the swiftness of the US Postal Service (and available free time for video watching). The user watches the DVDs and mails them back in order to get more. There are no late fees; the sooner you return your movies, the more movies you are able to receive. The company makes money by selling more subscriptions. It’s pretty simple.

So, how do “better” recommendations translate into better business success for Netflix? As we have pointed out previously, judging whether a movie recommendation has been successful can be a subjective enterprise. The ultimate measure of success is not accuracy in predicting ratings but, rather, customer satisfaction which promotes increased usage and increased advocacy for the service. Learning this simple rule (”successful recommendations drive successful business”1) has led to important conclusions for recommendation technology providers like MediaUnbound. For example, each implementation of recommendations is unique and requires different types of technology, tuning and features. Recommendation science is not a theoretical exercise; it only makes sense as an applied and practical problem. The best holistic approach to judge the quality and accuracy of a recommendation system is to evaluate it against a service’s key business performance metrics.

In the case of the Netflix Prize, we first need to outline the business performance metrics before determining whether the strictures of the contest will actually lead to a 10% improvement. For Netflix we have:

  • Reducing churn rate. Netflix doesn’t want current customers to cancel their accounts.
  • Upgrading to higher-value plans. Netflix wants current subscribers to upgrade to more expensive plans.
  • Enticing new users to sign up. Netflix wants to acquire more subscribers.

The goal in spending $1M on the contest should be to improve user experience and positively affect one of the three metrics. But, what is the link between “better” recommendations and each metric?

A concrete example is useful here. Returning to our ancient friends from earlier episodes, imagine the Greek movie rental subscription service netSkix.2 netSkix has exactly one current subscriber: Crito. In an attempt to turn netSkix into the most successful movie rental service in all of Athens, netSkix has hired Socrates–the most rational, knowledgeable (and doggedly persistent) movie seer of his time, to make movie recommendations for a salary of one million drachma. How can Socrates justify his cost through providing movie recommendation services?

Churn rate: In a subscription environment, the viewer is paralyzed by an abundance of choice. Almost any piece of long-form visual entertainment imagineable can be added to the queue.3 A good recommendation system will help the viewer navigate this choice–always feeling that there are a large number of interesting items to watch but never feeling overwhelmed by the choice. Socrates’s first challenge occurs early on his very first day of work:

Crito: “Socrates, there are no more movies to watch. I have watched every single Will Ferrell movie. Having completed my education in movies, I think I shall quit netSkix.”

Socrates: “So you have seen even A Bucket of Blood (1995) featuring Will Ferrell, the bad Roger Corman remake of the already bad A Bucket of Blood (1959) also by Roger Corman?”

Crito: “Don’t play the fool. That’s not available on DVD, Socrates.”

Socrates: “Oh, oops. Well, since you like Will Ferrell so much, have you thought to watch movies of other SNL comedians in their prime? Like, Stripes with Bill Murray?”

Crito: “I know Bill Murray. He’s that disaffected businessman guy who is in all the movies with Will Ferrell’s friend Owen Wilson. I guess I will watch this Stripes movie and then afterward I can quit netSkix.”

Plan Upgrades: Once Socrates entices Crito to keep his netSkix subscription (theoretically, as long as Socrates offers new good movies for Crito to watch, Crito will remain with netSkix), the next challenge is to convince him that he needs the top level of service. This upgrade will cost money but it will also open up better features and more freedom within the service, like getting more movies at any one time.

Socrates: “Crito, since you enjoy the movies of SNL cast members, I think you might want to also watch all of Dana Carvey’s work.”


Crito: “There are only three Dana Carvey movies.”

Socrates: “You must be forgetting Clean Slate and his short-lived sketch comedy show.”

Crito: “OMG! How will I be able to watch all these great movies. I will definitely need to upgrade my subscription so I can watch more movies every day.”

New User Subscriptions: Finally, to earn his weight in drachmas, Socrates should be able satisfy Crito to such an extent that Crito will recommend the service to all of his friends, resulting in more subscriptions for netSkix. This is one of the most powerful ways that netSkix will grow beyond its proud but paltry customer base of one. Glaucon, friend of Crito but not a netSkix user, seeks out Socrates who is only a couple of days into his job:

Glaucon: “Socrates, I was truly impressed by the film Down By Law that you recommended to Crito. It combined a good jailbreak movie with high comedy. We watched it during our weekly movie night.”

Socrates: “Did Crito enjoy the film? It seemed like a good intersection of his interests, though I warned him it might be out of his comfort zone.”

Glaucon: “Though I am not sure he thought that Roberto Benigni was as funny as Will Ferrell, Crito liked the film, nonetheless. More to the point: I was hoping I could subscribe to netSkix in order to enjoy your recommendations further.”

All three of these examples hinge on Socrates’s ability to build trust with the user, Crito. While it is important for Socrates to deliver good quality recommendations, it is more important not to deliver bad recommendations. Bad recommendations severely undermine the trust between service and user, sometimes permanently. A history, even a short one, of decent recommendations sets the stage for more adventurous recommendations. Recommendations must be individually tuned, responsive to the shifting context of customer usage. A successful system will prolong the customer’s subscription and help them explore the service. It may even let them feel that they are expanding their tastes. Even though Socrates was far from certain that Crito would enjoy Down By Law, he still offered it to him in the hope that he would make Crito reach a little to broaden his palette. In this case, it worked. Even if it had not, Crito should be able to discern why Socrates had recommended the movie: it clearly contains elements of Crito’s movie preferences. The bad recommendations which come out of left field, with no discernible logic, are the ones which Socrates must avoid because they can nullify any previous good recommendations.

It ends up that Socrates can quantifiably benefit netSkix with some combination of the following: retaining Crito as a customer; getting him to upgrade his account; attracting new customers. Socrates does not succeed in these categories simply by his raw talent for determining user ratings. Instead, he embraces a whole suite of recommendation faculties to serve each customer, making them happy about being a netSkix customer and encouraging increased usage, engagement, and referrals.

Part 1 | Part 2 | Part 3 | Part 4 | Part 5 | Part 5.5 | Part 6

Footnotes

  1. Chris Anderson watch out, we too can boil down complex business concepts to pithy one-liners. []
  2. The “skix” root of netSkix is a reference to the ancient Greek word skia which means “shadow,” yet itself a reference to Plato’s cave allegory wherein folks who were not yet of a higher consciousness watched the shadows of objects moving behind them which were illuminated by the light of a fire. You could call this the ancient Greek version of movies, albeit very primitive ones. The whole phonetic similarity to Netflix was too much to ignore. We beg your indulgence. []
  3. Except, for safety purposes of course, The Entertainment. []
Countdown to 10%: There Can Be Only One (Rating)

Part 3 of a multi-part series concluding 16 days from now, more or less

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

The first time we here at MediaUnbound saw the movie Highlander, we loved it. It had all the right mixture of science fiction, sword-fighting, and Sean Connery required of any quality movie. We even enjoyed the muted themes of metallurgy and Scottish highlands. Some of our appreciation must have come from the age at which we saw it, all young and impressionable, full of wonder and innocence. Some certainly came from the Queen-produced soundtrack.1 Years later, when we saw Highlander II: The Quickening (and, later on, Highlanders 3, 4, and 5), two things happened: we found the first Highlander to shine even brighter and we lost a little faith in Sean Connery.2

Our Highlander story highlights the relational and comparative nature of rating movies. Ratings are hardly absolute or independent. They rely on a complex schema of appreciation and preference, history and comparison, mood and time. The debacle that was the second Highlander movie (to everyone except Michael) made us appreciate the first one even more. Similarly, movie watchers heralded The Godfather as one of the best movies ever when it was released in 1972. Two years later, when The Godfather: Part II came out, people reconsidered the first one as still very good, but perhaps not as good as the second. Even the de facto official rating system of the movie world, the Oscars, said that the second Godfather was worthy of six Oscars while the first one was worthy only of three. This is roughly the opposite trajectory of the Highlander phenomenon, but the same premise: that people change their estimation of movies over time and in relation to other movies.

The Netflix Prize asks its algorithms to predict, with as much accuracy as possible, how its users rated a set of specific movies. The ratings are a 1-5 Likert scale, with 5 being a favorite movie and 1 being a truly disliked movie; 2, 3, and 4 supply all the gradations between. As such, the integers from 1 to 5 [inclusive] are responsible for encompassing the whole range of emotional responses to a movie, from outrage and horror to ecstasy and delight.

So how reliable are 1-5 ratings when whims, preferences, and memories change with each moment passed and each film watched? The answer is: not very. In a study by Xavier Amatriain, Josep Pujol, and Nuria Oliver, the researchers quantified how much “noise” was present in ratings from users who rated the same 100 movies in three separate trials. The study found that there was significant variance in how people rated the same movies, with the largest ratings “drift” happening between 2 and 3 and 3 and 4 (across the three trials, users were even inconsistent as to whether they had seen a particular movie or not). It turns out that the antipodal ratings of 1 and 5 are the most reliable. When 2 and 3 were removed, the consistency improved significantly. The middle ground ratings are more nuanced and harder to capture with simple integers, largely because they represent a less certain appreciation of a film. Responses like good, OK, decent, enjoyable, and alright all contain elements of both satisfaction and dissatisfaction, a nebulous middle where the numbers lack real definition. The study also suggests that user ratings become less stable over time, as the memory and impact of a film evanesces. For instance, if we had rated Highlander when we first saw it, surely it would have been a solid 5. But the intervening years have dulled some of its original luster, or perhaps the infinite progression of sequels has also sullied our memory of it.

These findings suggest that the very premise of the Netflix Prize is awash in noise obscuring the actual task of making meaningful recommendations for viewers. Consider this: if the ratings data are unreliable to begin with (as shown in the study), how can the algorithm which has been fitted to this unreliable data be expected to spit out predictions any more accurate than the noisy ratings? Algorithms, like many other things in life, follow that simple principle of “put garbage in and get garbage out.”

The history of the Netflix contest has been a persistent hill climb as teams slowly inched toward the magic 10% barrier. However, what if the bulk of this effort has been overtuning algorithms to match noisy inputs? If the subjectivity in 1-5 ratings data is, as Amatriain et al. suggest, larger than the 10% improvement sought, have the past 3 years been an inefficient use of computational resources?

Xavier’s blog post “What if there is no Million $” posits this specific question. In an in-depth comment thread participants are caught up in a discussion as to whether the rules of the competition allow teams to unfairly overfit their algorithms to the Test dataset. But, this misses the larger point. We don’t really care whether the participants are overfitting their algorithms to the Quiz or Test datasets. What if the entire Prize3 itself is an exercise in overfitting algorithms to noise present in subjective ratings data?

This is why we suggest in comments to our last post that a more holistic approach to evaluating recommendation systems is required. Evaluation metrics should be tied to actual improvements in recommendation user experience not just blinded test datasets. Even metrics involving blinded data like the Netflix Prize should be better tailored to product goals. We think that candidate algorithms should face additional penalties for making egregious miscalculations about a movie the user hates. Making a bad recommendation can be very damaging to user trust. For instance, if your algorithm tells me that I will likely rate Highlander III 4-stars, I will look at you sternly and somewhat incredulously before I can calmly explain that Mario Van Peebles belongs nowhere near a sword or the Highlander franchise.4 And just like that, you’ve lost my confidence in your recommendation system. Sometimes it is better to know what not to recommend.


The simple assignment of numbers to user preferences is problematic. It calls to question:

  • What is a 5? Is it something we highly enjoy, or is it something which has deep meaning, or something which has been created with the utmost skill? Is it all of these things?
  • Furthermore, what is a 1? Do we save that for the worst movie we have ever seen, while other bad movies get a slightly more favorable 2?
  • Does a neutral 3 elicit no noticeable response in either direction, good or bad, or do the responses just balance each other out?

It sounds critical, but anyone who has ever filled out a survey knows that the provided responses never satisfactorily describe how you feel. And because the survey itself is not very descriptive, think about having to fill out the same survey a month later. Would you be able to replicate all of your answers? Probably not.

Future articles in this series will explore alternatives to ratings systems for gathering user feedback and soliciting preferences. In addition, we will look at more objective methods to measure user trust and satisfaction with recommendation systems.

Any subjective rating system is going to be vulnerable to fickle moods and whims. This is why a rating system alone accounts for only a fraction of what a good recommendation system should consider. Even with a 5, we are uncertain that a mere integer can capture the excitement, satisfaction, and tremble of the ineffable which we feel every time we see Highlander. Surely a 5* with a couple of swords criss-crossing the number would be more appropriate. Oh, and some lightning exploding around the 5. You need lightning.

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

Footnotes

  1. Yes, even the Roger Taylor-penned disco number Don’t Lose Your Head which proved the point that drummers shouldn’t write lyrics. []
  2. Michael would like to offer a quick defense of Highlander II.

    While Ebert has called the film “almost awesome in its badness,” I would like to point out how far ahead of its time Highlander II really was in its original form. The movie took place in a dystopia where global warming first decimated the planet (killing Connor’s wife Brenda) and was then completely conquered by Connor and his invention of the ozone layer replacement Shield…only to have the Shield taken over by the evil Shield Corporation which enslaved Earth’s population in never-ending night. Which is where the domestic eco-terrorists come in. This plot sequence alone (covered in the first 45 seconds of the movie) already puts us years ahead of $600M grossing The Day After Tomorrow. And, we haven’t even gotten started on the part of the movie where we find out that the immortal Highlanders are actually from the alien planet of Zeist and were sent to Earth as a punishment for engaging in rebellion, which of course triggers the the arrival of several new aliens on hoverboards causing Connor to reverse-age (hello The Curious Case of…) into his badass immortal self. Seriously, there’s no way you could have come up with this stuff in 1991.

    []

  3. No, not the Highlander one. []
  4. Jim, who you see above pictured at his workstation below a Highlander poster in Italian(!), would like to point out that recommending Highlander III on the basis of enjoying Highlander I would be akin to recommending him other late 80s Queen based on his love of the A Kind of Magic album. Though, this is a bit of a sore point at MediaUnbound where the entire company once lost a perfectly good afternoon attempting to determine at exactly what point Queen became and then lost and then regained ‘cool’. Jeff, please do not restart this discussion in the comments–it will be deleted. []
Countdown to 10%: Tell Me Something I Haven’t Seen

Part 2 of a multi-part series concluding 24 days from now, more or less

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

Imagine this, if you will: Netflix asks you to rate 10 movies you saw in the last month. Then they give you a magic black box with a computer inside. Into that black box they feed 10 entries which include your name, names of the movies you watched, and when you rated the movies. For five of these entries, they include your rating for the movie. Then they ask the black box to predict, as best as it can, the rating you gave the other five movies. This is essentially what Netflix is paying $1 million for: a system that on average gets +/- .8558 away from the actual user rating. It doesn’t sound very exciting when abstracted that way, and perhaps we do it some injustice because it actually requires a sophisticated algorithm, but this is what the Netflix Prize is all about.

So the question remains: does increasing a computer’s predictive accuracy by 10% on this very narrow question  equate to improving a recommendation system by the same amount? We think not. Recommendations encompass a lot more than a user’s ratings about movies already seen. Most importantly, the Netflix Prize neglects to ask the question of its contestants: how well can your algorithm recommend new movies to users? This is the goal for which every recommendation system strives: the ability to surprise, astound, and delight a user by suggesting a movie yet unseen by them. And to keep them coming back for more. In contrast, it does very little to remind them of how they rated something they have seen.

People love to discover new movies in the same way they love to hear new music; and even for the most voracious consumers, there will always be undiscovered territory. But how can a machine (in this case, our “magic black box”) which only specializes in telling you how much you liked something you’ve already seen also tell you how much you will like something you haven’t yet heard of? Although these two things are sometimes related, they aren’t always so.

The assumption underlying the Netflix data exercise is that an algorithm which can accurately predict how you will rate movies you have already seen will have the same predictive accuracy on a dataset of movies you have yet to watch. In the case where a large percentage of the audience has watched the same movies, this works well. But the assumption breaks down when we try to make predictions on movies that rarely appear together on viewers’ watched lists. From a product perspective, this hampers the ability to make recommendations to adventurous viewers and find interesting choices that lie outside the obvious list of related items.

For example, in the dialogue from Part 1 of this series, Socrates finds out that Crito really liked The Shawshank Redemption; in fact, he gave it 4 out of 5 stars. From this, Socrates (our little black box) could extrapolate that Crito really likes movies about jail or jailbreaks; or movies with Morgan Freeman; or movies based on Stephen King stories. But Socrates, as he is designed, cannot interpret anything beyond the 1-5 rating given by Crito and thus his recommendation power is severely limited. By design, he misses the point when Crito mentions his love of a good jailbreak movie and can’t do anything with that very valuable information (we’ll look in more depth at ways of eliciting preference without 1-5 ratings in a future article). Socrates will evenly weight recommendations of unseen movies like The Great Escape, Driving Miss Daisy, and Children of the Corn when, really, he should give The Great Escape and other jailbreak movies much more weight than anything else. Furthermore, Socrates could lose Crito’s trust by recommending him something entirely awry. Children of the Corn is enough to scare anyone off, generally.

In a complete recommendation system (as opposed to the narrow Netflix algorithm bake-off), questions on how a technique works on both a viewer’s discovered and undiscovered items are central. By focusing all efforts on a specific algorithm to match known ratings, the contest introduces a strong bias. Undoubtedly, as Mr. Smith mentions in the comments to our last article, advances in algorithm technique have been made. But, the algorithms as presently constructed should not be directly applied to an actual live recommendation system. Instead, they are merely building blocks upon which other datasets, techniques, specific tunings, human curatorship and product choices need to be imposed. Given the unique biases of the specific dataset and question posed by the Netflix Prize, is there any proof that all the specific tuning and blending from BellKor’s Pragmatic Chaos to meet the 10% will yield tangible improvements to the final product? Or, would any of the extremely similar SVD algorithms developed for the contest perform equally well?

Of course, Netflix intentionally set up a narrow definition of “recommendations” for their contest. They never ask the teams to test out their algorithm on movies which the user hasn’t rated. And for good reason: the accuracy of such a diagnostic would be difficult to assess and verify. But this is where the true value of such an algorithm lies. As it stands, true predictive power in the scheme of the Netflix Prize consists of accurately telling you what you already know, 3 years and $1 million later.

Stay tuned for additional articles in this series coming soon…

Part 1 | Part 2 | Part 3 | Part 4 | Part 5



By combining the numerical power of computers with knowledge from teams of human analysts, MediaUnbound helps people find, discover and interact with large catalogs of entertainment content to deliver an exciting entertainment experience. Every day people receive music, video, concert and image recommendations generated by MediaUnbound through customers such as eMusic, Ericsson, Napster, MTVN / Viacom, Terra Networks, NTT DoCoMo, HMV, and TransWorld Entertainment.



The Social Web Community 2.0 Network marketing gurus all agree: Every corporation needs a corporate blog. Ours gives an inside peak into the people, opinions and activities at MediaUnbound.

What to expect: our thoughts on media recommendation technology; occasional customer and product announcements; in-depth discussions on whether MilkMoneyMaffia is best band from Greenland, or best band ever.

What not to expect: multiple posts every day; corporate babble-speak



Please direct feedback on blog posts to blargh@mediaunbound.com. For further information about MediaUnbound or other questions please see the Contact page on our main site.