Wednesday, June 24, 2009

If I have to say it, you wouldn't understand

This blog is degenerating a bit into my personal tastes so I thought I would amuse myself with yet another pat on the back. I was surfing the internet for ideas on my math helper program. One thing I would like to do, is be able to predict which problems a person will have trouble with. Let's say a student has trouble adding 6+7, then it logically follows they would have trouble following 16+7. One possible solution is to try to encode aspects of the problem, such as the operands and operator. We would then put the problems into a database and try to find links between them. This seems exceptionally difficult. I then turned to the web and asked, "how does NetFlix do this type of thing?" Shortening the story, they have this public prize for algorithms to predict movie ratings of people. One of the algorithms used is called Singular Value Decomposition and the basic idea is to discover the encodings I mentioned above from the results of the data. This led me to an article in the New York Times where the aforementioned pat on the back comes in.

"Interestingly, the Netflix Prize competitors do not know anything about the demographics of the customers whose taste they’re trying to predict. The teams sometimes argue on the discussion board about whether their predictions would be better if they knew that customer No. 465 is, for example, a 23-year-old woman in Arizona. Yet most of the leading teams say that personal information is not very useful, because it’s too crude. As one team pointed out to me, the fact that I’m a 40-year-old West Village resident is not very predictive. There’s little reason to think the other 40-year-old men on my block enjoy the same movies as I do. In contrast, the Netflix data are much more rich in meaning. When I tell Netflix that I think Woody Allen’s black comedy “Match Point” deserves three stars but the Joss Whedon sci-fi film “Serenity” is a five-star masterpiece, this reveals quite a lot about my taste. Indeed, Reed Hastings told me that even though Net­flix has a good deal of demographic information about its users, the company does not currently use it much to generate movie recommendations; merely knowing who people are, paradoxically, isn’t very predictive of their movie tastes." (emphasis mine)


Phillip said...

Oddly enough though, the current system for helping identify possible income tax fraud uses that sort of information. Marriage, age, income, location -- all of these things when discretized into bins can create fairly good predictions. The details of the system are unknown, however the information it uses has been established for some time now.

You may want to take a look at something called "Information Gain" as it is a fundamental block of rule based learning/prediction systems that could help you build your system -- it tells you which categories of information help you make more accurate predictions on a meaningful scale for your data.

Post a Comment