The black box that is the Netflix similarity score

Note: I'm no statistics major, so if I'm completely missing the boat here, I hope some of you stats geeks will correct me.

Netflix's Friends page changed sometime in the past few days, perhaps over the weekend. I noticed it yesterday. The most curious new feature is that all of my friends are given a % similarity score relative to me. For example, under Robert's name, I see: 86% similarity to you.

My inclination was at once to believe that Robert had pretty decent taste, but perusing the similarity scores of my friends, I found some of them to be somewhat odd. Of all my friends, Eleanor ranked lowest in similarity to me, at 54%. I may not be a fan of Grey's Anatomy, but anecdotally, that seemed low to me.

I searched the site to see if there was an explanation of how this similarity score was calculated, but I couldn't find anything, not even an explanation of how to interpret the score. If the score is 54%, does that mean that if we both watched a movie, there's a 54% chance we'd both rate the movie exactly the same? Or does that mean that 46% of the time, one of us would like the movie and the other person would dislike the movie? Or something else entirely?

If you click on the similarity score, the site displays a list of all movies you've seen in common with that friend and how you each rated the movie. Thankfully, the overlapping data between Eleanor and I was only 38 movies, so I put our ratings into a spreadsheet. Of those 38, Eleanor hadn't rated 8 of the movies yet, so I dumped those out of the data and looked at the remaining 30.

Of those, we had the exact same rating for 19 of the movies. So of the 30 movies we'd both seen, we had the same rating for 63.3% of them (Netflix allows you to rate a movie on a 5 point scale, from 1 through 5 stars). Of all the movies we'd seen in common, including those Eleanor had not yet rated, we had the exact same score for 50% of them.

Of the 11 movies we differed on, Eleanor gave 1 additional star on 8 of them, I gave 1 additional star on 2 movies and 2 additional stars on 1 movie. At any rate, that information didn't help me to understand the 54% similarity score. On the 30 movies we'd both rated, Eleanor's mean rating was 3.53 stars, mine was 3.40 stars, and the mean of the difference between our ratings on the movies was .13.

Netflix assigns a textual description to each of its 5 star rankings:

  • 1 star equals "You hated it"

  • 2 stars equals "You didn't like it"

  • 3 stars equals "You liked it"

  • 4 stars equals "You really liked it"

  • 5 stars equals "You loved it"

By that system, a rating of 1 or 2 stars was a negative review, and 3 stars up equated to a positive review. If Eleanor and I differed on our ratings but both assigned a movie a negative or positive review, then in my mind our ratings were not as different as if one of us had assigned the movie a negative review while the other assigned it a positive review.

Of the 11 movies we differed on, in only 3 cases did one of us assign a positive review when the other assigned a negative review. So of 30 movies we'd seen, we had both given the movie a thumbs up or thumbs down in 27 of them, or 90% of the movies we'd both rated. This rendered the 54% similarity score even more peculiar to me.

I looked up some collaborative filtering papers online, and it seemed that the Pearson linear correlation coefficient and cosine similarity were two popular methods for calculating user or item similarity in collaborative filtering online. I couldn't do cosine similarity in Excel (at least not easily), but Excel did offer a formula for calculating the Pearson coefficient of two arrays, so I calculated that for Eleanor and my ratings. Our Pearson coefficient was .564 (correlation coefficients range from -1 to 1). Close, but it didn't match up to the 54% similarity score.

I decided to look at relative similarity scores to see if they meant more. Audrey had a 75% similarity score to me according to Netflix, so by any number of measures, we should be more similar in our movie tastes than Eleanor. But a quick look at the facts didn't support that.

Of the 103 movies Audrey and I both rated, we had the same rating on 38 of them, or 36.9%. Audrey's average rating was 3.75, while mine was 3.36, and the average of the difference of our ratings was .39. Our Pearson coefficient was .454, or lower than the Pearson coefficient between Eleanor and me.

I don't expect Netflix to reveal its methodology for calculating similarity scores. Most companies are protective of their personalization algorithms. Even if I knew how Netflix calculated its similarity scores, I'm not sure it's much more than a minor curiosity. If you knew some people were similar to me in our film ratings, the way that would help me on a movie site is to use those people's ratings to predict which other movies I'd rate highly. Netflix probably already does that. If Netflix explained how the figure was calculated, or even how to interpret the figure, it might be more meaningful.

Having used the personalization features of lots of sites, I find the most useful personalization feature to be item similarities, e.g. Amazon's "Customers who bought this item also bought" feature. Attempts to use similar people to predict my tastes has always yielded mediocre results. I haven't encountered any sites that have really cracked that nut, and that's not surprising. There's no accounting for taste, especially those of creatures as complex as human beings.

Still, if someone out there can explain the similarity scores, drop me an e-mail (commenting doesn't work right now; my e-mail address is on my homepage). I'm curious.

UPDATE: Eleanor wrote to tell me that I show up as 85% similar to her in her Friends page, even though she's only 54% similar to me in my Friends page. Audrey says I show up as 80% similar to her on her end, or 5% lower than she shows up on my end. I'm guessing that even movies we haven't rated must factor into the similarity equation.