You are here

Rating System Editorials - Bayesian Method

What is the Bayesian method?

In short, the Bayesian method takes into account both the average score a module has received, and the total number of votes counted.  Specifically, the more votes a module receives, the more "reliable" we may consider that average score to be (since more people agree on that score).

This doesn't mean that more votes always translates into a higher placement, however: if two mods have the same score and it is a low score, then the mod which has more votes will actually be placed lower in the rankings.
The technical details behind the Bayesian method are at the end of this article, but first let me address the pros and cons of the method.

Pros of the Bayesian method
TRUE REPRESENTATION. Ranking is accurately shown in order of the true popularity of the mods.  This means that the more votes a mod receives, the more "reliable" its average score is considered to be.  This doesn't necessarily mean that more votes finds you a higher place in the rankings, though: a mod with a large number of low-ranking votes will actually find itself further down in the list than a mod with the same average score, but fewer votes.  The "simple mean" method of calculation is problematic because a mod with 500 votes and score of 9.0 will be ranked BELOW a mod with 10 votes and score of 9.1 -- when clearly, the mod with 500 votes has received the lion's share of the voter's approval.

SELF CALIBRATING. The Bayesian method is "self calibrating".  Other systems such as a "trimmed mean" system have been proposed, which drop the top and bottom x% of voters in order to account for fanboy/hater votes, which unfairly skew the rankings. The problem here is in choosing what percent of top/bottom votes are to be dropped.  The system would probably require frequent tuning to keep the rankings sane.  The Bayesian method is largely unaffected by outlying votes because of how it weights the rankings. In this sense, the Bayesian method is self-calibrating. This means less work for site maintainers.

The combination of these two benefits results in the most accurate and hands-free system available. While a better system could probably be created with the help of constant human attention and fine-tuning, that is almost certainly more work than it's worth, and even then the benefits are questionable.

Cons of the Bayesian method

FAVORS OLDER MODS. New mods face an uphill battle to appear high in the listings. This is the case because between two mods which have a high equal score, the one with more votes wins out in the Bayesian system. If the goal of the top mods list is to show the true popularity of all mods throughout history, then this is a decided benefit of the system. But if the goal is to promote new mods, those new mods will have a tough time getting exposure.

COMPLEXITY. The Bayesian calculation isn't as easy to understand as a simple mean or median system. On the other hand, it isn't terribly difficult to grasp for most people, at least conceptually. In plain English, under the Bayesian system:
* The more votes a mod has, the closer its weighted rank will be to its "simple" average rank.
* The fewer votes a mod has, the more it will tend towards being placed in the middle of the list.

These two facts together are what drive the Bayesian method. As a mod accumulates more votes, its "true" rank emerges and the Bayesian method reflects this. With fewer votes, a mod's average rank is considered somewhat "uncertain", and therefore drifts towards the overall average ranking (middle of the list).

A proposal

The problem is this: there are two conflicting goals behind the top mods list.
1. Show the most popular mods over all of time (true rank).
2. Promote new mods by making them appear high in the rankings.

If the only goal was to accurately show the most popular mods of all time, then the Bayesian method is the obvious choice. However, getting exposure for new mods is also very important, and in fact that early exposure helps a new mod get enough votes to place it accurately in the listings using the Bayesian ranking.

Therefore I propose making two lists: "TOP MODS OF ALL TIME" and "HOT NEW MODS".

The 'Top Mods of All Time' list would be built on the Bayesian method, and its explicit purpose would be to show the relative popularity of all mods over time.

The 'New Hot Mods' would be just like the 'Top Mods of All Time' list (both would use Bayesian ranking), but only mods released within the last 6 months would appear here.

Splitting the list into two would serve both purposes: to promote new mods AND to show the true representative popularity of each mod over time.

In this way, mod authors which have received their due ranking in the top-mods list will continue to enjoy their true ranking, without being constantly displaced by new mods which have only a handful of votes. This is the primary problem with using the "simple mean" calculation method.

And new mods would also enjoy increased exposure, because visitors could look at the 'Hot New Mods' list specifically to find what's both new and popular. This would give even better exposure to new mods than the current rankings do, because only new mods would show in the list.

Under the current single-list setup, only new mods which are fortunate enough to get several very high-ranking votes early on will get that "top 10" exposure in the top mods list.  This also encourages artificial inflation of a mods' votes to achieve top placement in the list. By excluding older mods in the 'Hot New Mods' list, new mods would get a much stronger focus from the community in their own list.

It has also been observed that older mods tend to have high rankings because of excellent story or design-work, while newer mods will be more technically 'flashy', they being in a better position to use the latest Bioware-created functionality and community content. By splitting the top mods list in two, users will be better able to find mods which excel in either of these two different areas, depending on their interest.

No ranking system is perfect.  Whichever method is used, there will be some people or mods who are treated unfairly by the ranking mechanism.  The goal here should be to simply choose a method which is the best available method to get the best results we can. 

The Bayesian method appears to be the best because the ranking best reflects the "will of the voters". And in order to guarantee exposure for new mods, creating a dedicated list for new mods only should help them get the maximum exposure possible, and from the most appropriate audience.

The technical details

The following is adapted from an earlier post found elsewhere on the Vault.

Some sites use this formula to calculate their rankings; for instance uses it to determine their Top 250 movies rankings.

Here is a simple explanation of the formula, adapted to the NWVault top mods list:

weighted rank (WR) = (v / (v+m)) * R + (m / (v+m)) * C

R = average for the module (mean) = (Rating)
v = number of votes for the module = (votes)
m = minimum votes required to be listed in the top modules list (currently 10 on nwvault)
C = the mean vote across the whole report

So, let's take as an example the top modules list as it stands (433 entries, from 11/21/2003).
Note these figures are just EXAMPLES; since my calculation of the overall mean (C) is an average of rounded averages
it is bound to be somewhat inaccurate.

Ok, so the mean vote (C) across the top 433 mods is 7.9684, and the min votes required to appear in this top-list is 10.

To calculate the weighted rank of the current #1 mod, 'Subterra: The Last Manaquake' the formula looks like this:

WR = (10 / (10 + 10)) * 9.76 + (10 / (10 + 10) ) * 7.9684
WR = (0.5 * 9.76) + (0.5 * 7.9684)
WR = 8.8642

Now remember, Subterra has gotten only 10 total votes, yet is currently listed #1 in the rankings. Is that really reflecting the "voter's favorite", when the 3rd place mod, with a mean of 9.7, has 250 votes -- 25x as many?

So let's calculate the Bayesian WR for the 3rd place mod, 'Evermeet 1.06':

WR = (250 + (250 + 10)) * 9.70 + (10 / (250 + 10)) * 7.9684
WR = (0.9615 * 9.70) + (0.0385 * 7.9684)
WR = 9.6333

As you can see, with 25x as many total votes, Evermeet's weighted rank comes out to 9.6333 -- more accurately reflecting its actual (higher) popularity relative to Subterra.

Note that under this formula, simply having more votes does NOT mean that you get a higher rank. The average of all those votes also has to be high in order to get the high ranking. This formula simply reflects the fact that the more votes a module has, the more accurate the average score is for that module.

Under this system, Evermeet comes out as the #1 mod, because a lot of people feel that it deserves a 9.7 ranking (on average). Subterra comes in at 43rd place, which is quite respectable, and accurately reflects its true popularity. To see how the current top-list looks under the Bayesian rankings, download the Excel sheet linked to above, and do sort-descending on the Weighted Rank (WR) column.

I am sure I'm not the only one who finds it annoying to visit a product-review site, for example, and when I choose "sort by user ranking", I get the 5-star items who have just 1 person's vote listed above a product which has 4.5 stars yet 180 voters contributed to that score. Under the Bayesian system, this would be correctly represented.
Migrate Wizard: 
First Release: 
  • up
  • down