top of page


I wanted to take a brief opportunity to identify some nuances within the data that are impacting my analysis. For example, some games have inflated numbers due to bundle deals. One such game is Wii Sports, which is the highest selling game in the list. The data does not reflect is that it was packaged in with every Wii console ever sold outside of Japan, which means even people who would not have bought the game by itself ended up "buying" it if they wanted a Wii.


On the other side of it, some games' numbers are reflected lower overall than they should be due to the data rows being limited to one item per column. Many games were released on multiple platforms, and thus each have their own row in the data, but the sales numbers should realistically be combined to reflect the total sales. One example of this is with a game like Call of Duty: Black Ops II. The PS3 version of the game is in rank 35 on the list of overall gross sales, while the Xbox 360 version is right behind it in rank 36. Their total combined sales in all regions is 27.78 million units, which would actually place the game at rank 11. for lower selling games, I don't think the difference would be staggering, but at the top of the list I think these discrepancies likely led to some inaccuracies in any of my analyses that looked at a particular subset of games, i.e. the top 25 or top 100.


These duplicate listings could have affected the analysis of games by genre, because additional entries may have counted when they should not have been, which is partially why I thought the analysis by platform was important to isolate these cases. That process showed similar trends, so I don't think it was too problematic, but I am aware of the possibility.

Outliers and Error: Text


Another thing to consider is many publishers are not always fully transparent with their sales figures. This could result in imperfect or inexact numbers in the data or the exclusion of many games that have sold well. I would conclude that this data is not fully representative of the gaming industry. However, I do still believe everything I have done here is valid as an arbitrary analysis of a data subset, which is why I framed everything in the lens of the data and hypothesized some potential plans of action for a game developer with regard to the data I was referencing. I know the data was not perfect, but in order to have a more cohesive analysis I chose to assume that I did. Developers seeking real action would need much more comprehensive information to draw meaningful conclusions.

Outliers and Error: Text
Outliers and Error: Text
bottom of page