Netflix Prize

LeaderBoard

Since late October of last year much of my free time has been spent pursuing the Netflix Prize. Netflix is offering $1 million to anyone who can take the dataset they provide (containing 100 million ratings from 500 thousand customers covering 20 thousand movies) and make predictions for 2 million unknown ratings (from those same customers and movies) that are 10% better than Netflix's own predictions as measured by the root mean squared error (RMSE).

Working on this challenge has been fun and frustrating and a great learning experience. I worked on my early ideas in LISP before switching to C++ due to my inability to optimize LISP for performance. I exchanged the pain of slow execution for the pain of slow development and having to compile and reload all of the data into memory for each new test. I may eventually research LISP optimization (maybe using type declarations for all variables?) and write a language performance comparison page. I bet that would be a hit on Programming.Reddit.

If you came here hoping to read about a new idea that can get you higher on the leaderboard I'm afraid my best scores come from using the ideas already discussed by others. I don't want you to go away empty handed so I compiled a list below of the web pages that helped me get my score.

Recommended Pages

Current Best Score

As of March 25th my best score is 0.9064, a 4.73% improvement over Netflix and good enough for 67th place on the leaderboard. There are currently 17,151 teams (1,702 have made a valid submission and 438 have beaten the netflix score of 0.9514).

LeaderBoard

Scores by Method

  • 0.9402: kNN (ratings)
  • 0.9392: kNN (SVD)
  • 0.9344: kNN (blended)
  • 0.9153: SVD
  • 0.9101: SVD + kNN

SVD = Singular Value Decomposition

kNN = k Nearest Neighbors

The scores above were obtained testing against the probe dataset. The scores I get from Netflix tend to be about 0.004 better.