Netflix Prize
Since late October of last year much of my free time has been spent pursuing the Netflix Prize. Netflix is offering $1 million to anyone who can take the dataset they provide (containing 100 million ratings from 500 thousand customers covering 20 thousand movies) and make predictions for 2 million unknown ratings (from those same customers and movies) that are 10% better than Netflix's own predictions as measured by the root mean squared error (RMSE).
Working on this challenge has been fun and frustrating and a great learning experience. I worked on my early ideas in LISP before switching to C++ due to my inability to optimize LISP for performance. I exchanged the pain of slow execution for the pain of slow development and having to compile and reload all of the data into memory for each new test. I may eventually research LISP optimization (maybe using type declarations for all variables?) and write a language performance comparison page. I bet that would be a hit on Programming.Reddit.
If you came here hoping to read about a new idea that can get you higher on the leaderboard I'm afraid my best scores come from using the ideas already discussed by others. I don't want you to go away empty handed so I compiled a list below of the web pages that helped me get my score.
Recommended Pages
- Simon Funk: Try This at Home
- Timely Development: Netflix Prize Results and Source Code
- Billy McCafferty: Using the Pearson Correlation Coefficient
- David Vogel / Ognian Asparouhov: Netflix Solution
- Anil Thomas: Netflix Contest
