Sometimes I get a bit narcissistic, and check how my book is doing. I log on to the seller portal to see how many copies have been sold. I go to the Amazon page and see what are the other books that people who have bought my book are buying (on the US store it’s Ray Dalio’s Principles, as of now. On the UK and India stores, Sidin’s Bombay Fever is the beer to my book’s diapers).
And then I check if there are new reviews of my book. When friends write them, they notify me, so it’s easy to track. What I discover when I visit my Amazon page are the reviews written by people I don’t know. And so far, most of them have been good.
So today was one of those narcissistic days, and I was initially a bit disappointed to see a new four-star review. I started wondering what this person found wrong with my book. And then I read through the review and found it to be wholly positive.
A quick conversation with the wife followed, and she pointed out that this reviewer perhaps reserves five stars for the exceptional. And then my mind went back to this topic that I’d blogged about way back in 2015 – about rating systems.
The “4.8” score that Amazon gives as an average of all the ratings on my book so far is a rather crude measure – since one reviewer’s 4* rating might differ significantly from another reviewer’s.
For example, my “default rating” for a book might be 5/5, with 4/5 reserved for books I don’t like and 3/5 for atrocious books. On the other hand, you might use the “full scale” and use 3/5 as your average rating, giving 4 for books you really like and very rarely giving a 5.
By simply taking an arithmetic average of ratings, it is possible to overstate the quality of a product that has for whatever reason been rated mostly by people with high default ratings (such a correlation is plausible). Similarly a low average rating for a product might mask the fact that it was rated by people who inherently give low ratings.
As I argue in the penultimate chapter of my book (or maybe the chapter before that – it’s been a while since I finished it), one way that platforms foster transactions is by increasing information flow between the buyer and the seller (this is one thing I’ve gotten good at – plugging my book’s name in random sentences), and one way to do this is by sharing reviews and ratings.
From this perspective, for a platform’s judgment on a product or seller (usually it’s the seller, but for products such as AirBnb, information about buyers also matters) to be credible, it is important that they be aggregated in the right manner.
One way to do this is to use some kind of a Z-score (relative to other ratings that the rater has given) and then come up with a normalised rating. But then this needs to be readjusted for the quality of the other items that this rater has rated. So you can think of some kind of a Singular Value Decomposition you can perform on ratings to find out the “true value” of a product (ok this is an achievement – using a linear algebra reference given how badly I suck in the topic).
I mean – it need not be THAT complicated, but the basic point is that it is important that platforms aggregate ratings in the right manner in order to convey accurate information about counterparties.
Nice post. Calibrating across different users/items is a major challenge in recommender system design.
Reminded me of a talk that I attended several years ago, where the speaker put a blue dot on the screen, and asked the audience to rate “how blue” the dot was from 1 to 10. Of course, different people came up with different numerical answers.
Next, he put up two blue dots, and asked us “which dot is bluer”. This time, all of us agreed that the left one was bluer than the right one.
The message is that cardinal ratings are not particularly useful; on the other hand, ordinal (comparative) ratings are surprisingly robust. This is why Netflix has recently moved away from the 5 star system to a more robust like/dislike model.