January 7, 2008

Netflix Contest has Produced Prizes for the Analytics Community

In June 2007, we posted about the "Netflix Prize" - a contest promoted by analytics savvy movie-rental-house Netflix.

The goal: improve the accuracy of the existing Cinewatch movie recommendation system.

The prize: $1 million

Fifteen months along, and no model has come forward meeting the victory threshold of 10% improvement on matching accuracy. Fortunately, for everyone that doesn't work at Netflix, this contest has produced something of value.

The discussions and attempts conceived from this contest have provided those interested in analytics new perspectives and questions to ponder as we seek to analytically quantify and predict preference and behavior.

This article discusses some of the most interesting insights thus far:

"Open Questions" (text mining) has emerged as a theme to "fine-tune" the specificity of predictive models. Allowing individuals an opportunity to express, instead of forcing them to conform entirely to a pre-defined format, is really emerging as a more nuanced and "high-touch" approach. As I have posted previously, there is software emerging that is making great strides towards allowing text mining to be a pragmatic tool. Discriminate choice models of "ultimate" giving destination preference (athletics, fine arts, brick and mortar) for example, could be greatly enhanced by appropriately applied text mining.

Another model suggested that information about tastes as related genre, language, actors, directors etc, was surprisingly powerless compared to the star ranking of the movie itself. Perhaps this suggests that second tier "affiliation" data (I love Tom Hanks, or in the fundraising field, I was a Sociology major) may be more ambiguous than standard industry assumptions. At minimum, this revelation suggests that more consideration should be given to the importance of the top preference metric (for movies its a star rating, for fundraising, it is giving to the institution).

The $1,000,000 Netflix Prize competition has produced interesting results, even if no winner, 15 months in. Some of those results are a bit surprising; others we should have expected but didn't anticipate. So while participants haven't yet bettered the accuracy of Netflix's Cinematch recommendation algorithm by 10%, the threshold to win the $1 million prize, we can still take away lessons about predictive-analytics fundamentals.

Read More

Labels: