16. Restless Bandits and Competition
Many economists seem to believe that every problem in economics can be solved by removing regulation and “letting the markets decide”. Other people disagree and will use a variety of hand-waving arguments to explain why that’s sub-optimal. This chapter will consider a certain well-analysed, statistical conundrum that has a striking parallel to a problem in economics, and its study could remove the need for some of these hand-waving arguments and replace them with something closer to mathematical proof. The conundrum is a variation of the “multi-armed bandit problem” – but before we explain what it is, or its solution, we’d better explain the problem in economics that it so neatly parallels.
The problem is: Who should make new Product X? Communists might say, “Let’s have an expert government committee choose a single company (Y) and only allow that company to make X”. The free marketeers would say, “Let multiple companies (A, B, C, D and E) make X and the market will decide which is the best and let the others go bust – and for goodness sake, don’t let the government interfere with this process!”
Now we will introduce the statistical conundrum:
The casino owner and the screwdriver
Imagine you have a collection of one-armed bandits (slot machines) in a casino. Each one has a certain payout rate, which corresponds to the percentage of the money paid into it, that it will pay out in the long run. In real casinos this is often set at something like 80–90%, but imagine that this particular model of one-armed bandit can be set to any predefined payout rate (0% to 100%) by turning a dial inside the machine that the casino owner can set with a screwdriver. Now let’s say that one night the casino owner comes in and sets each machine to a different payout rate. You arrive the following morning with a big bag of coins. You are determined to spend the whole day playing on these bandits. You have complete freedom to choose which ones you play on; you are allowed to switch from one to another at will. Now the question is: What is your strategy for selecting bandits such that you come home with the greatest winnings? Or, more likely, the least losses.
One possible solution: Put 20 coins in each of the bandits in order to make an estimate of their true payout rates. Then stick to the one that appears to have the highest rate for the rest of the day.
At this point it is essential to be quite clear about the difference between the estimated payout rate and the true payout rate.
The true payout rate is the rate set (with the screwdriver) by the casino owner. This is the theoretical rate given an infinite amount of plays. The estimated payout rate is your estimate of the true payout rate based on the evidence of your trials so far. If you have had very few trials then your estimate can only be a rough guess. Your guess will gradually become more accurate the more trials you have, but in practice you can never become completely certain.
This “20 coin” solution is certainly better than simply selecting the bandits at random but can be mathematically proven to be sub-optimal, i.e. there are known strategies that will lead to greater winnings. The problem with the 20 coin strategy is that if two bandits paid out rather good, but very similar amounts, than it may not be very clear which is better. It may be more profitable to continue playing these two for a greater number of trials to gain more accuracy in your estimated payout rates before selecting which is the best one. The problem illustrates what is known as the exploitation–exploration dilemma. The “exploration” refers to the effort exploring which bandit may be the best (e.g. the 20 coin trial at the start) and the “exploitation” refers to exploiting your current knowledge, i.e. simply repeatedly playing the bandit which you estimate is the best.
This conundrum is rather analogous to the process of choosing companies to make products in a free market. The bandits are like the companies, the payouts are like the goods they make and the gambler is like the public, choosing the “company” that produces the best “goods”. At the start of the process the public does not know for sure who can make the best version of Product X so they may try a variety of them. After a while the different companies start to gain different reputations. The reputations are like the estimated payout rates.
The known bad companies will cease to be tried (= “go bust”) while the still-possibly-best will get tried some more.
Restless bandits
Now there is one more complication that needs to be added to the standard multi-armed bandit problem to make it even more analogous to real-life business. There is a variation called the “restless bandit problem” where the payout rates are not fixed but rather evolve over time. This is more like a real company where the management and employees will change over time. Their manufacturing equipment may wear out, break or become redundant and a host of other things may happen that will change the ability of the company to produce good products. Now in the restless bandit problem it is essential to do more “exploration” than in the case of the standard multi-armed bandit problem. You would never want to entirely give up trying a previously poor performing bandit because it may have now evolved into a better performing bandit.
It can be mathematically proven that for the restless bandit problem the 20 coin strategy is even more sub-optimal than is the case for the fixed bandit problem. There is too little “exploration”. It is sub-optimal for two reasons:
You may be mistaken in your estimate of which one has the highest true payout rate.
The true best bandit may change over time.
This result has important implications for free marketeers. It proves that the free market is sub-optimal. A free market acts like the “too little exploration” strategy. In a free market companies that fall short of producing the best goods tend to go bust even if they only fall short by a small margin. Also companies occasionally go bust for essentially random reasons unconnected with their underlying ability to make goods cheaply. For example, a company can go bust as a consequence of a one-in-a-hundred-years hurricane that disrupts a key supplier. Obviously when a company goes bust it can never be “tried” again; it doesn’t get a second chance. The surviving companies will then tend to grow, filling the void, and dominate the market. In the absence of counteracting forces, this process is liable to be one-way, resulting in a monopoly. Once a company grows very large and dominates a market, it will naturally benefit from a variety of economies of scale. It will be able to negotiate harder with both suppliers and distributors. What’s more, it can use its market power to make exclusivity deals with various other companies in the chain from raw materials to shop window. All of these things mean that there is no longer a level playing field between the company and any new small rivals.
Monopolies: okay at first – but then…
When a monopoly first emerges it may well do so because it was actually a well-run, efficient company beating its competition into bankruptcy with superior products or services. However, armed with such advantages over any new competition that may arise later, the evolutionary process is now weakened. The company’s incentive to remain good value is reduced, so over the years problems can emerge. If the management retire or move on and are replaced by some less skilled people the company may now not be so efficient, but by exploiting its monopolistic advantage it may well be able to hold on to its dominance. Society can end up with an inefficient supplier and little choice.
These factors make the “too little exploration” strategy in economics even more sub-optimal when compared to the restless bandit domain. It’s as if as soon as we make up our minds and settle on the bandit that we think is best, it gradually reduces its payout rate.
In any exploration–exploitation dilemma it is, of course, quite possible to make the mistake of doing too much exploration. In the extreme that would be like playing all of the bandits equally often regardless of their observed payout rates. This can easily be proven to be sub-optimal too. So there is a balance between exploration and exploitation to be struck.
Recent examples 2024
Since the original time of writing (2011), the "Monopolies: okay at first – but then…" section, a suite of very clear examples have emerged of such companies. Examples like Amazon, Google, YouTube and Facebook, all internet based giants. These companies grew to dominate their markets through being genuinely innovative and providing services that were better than their competition but now that that the competition has been essentially killed off, these companies have been offering worse and worse services. This downward slide has been so blatant that a new word enshittification has now entered the vocabulary to describe the phenomenon.
Conclusion
In the real world there are many things that could be done to make sure that there is enough “exploration” in an economy, some of which are already in place to a greater or lesser extent in many countries. We would recommend the following:
laws to discourage or prevent the setting up of exclusive supply or distribution channels
lower levels of regulation of smaller companies compared to larger ones
lower taxes for smaller companies compared to larger ones.
You might point out that some of these suggestions are already in place in some countries, but hopefully the analogy of the restless bandit problem a) gives some mathematical support for these kind of policies and b) proves unequivocally that free market fundamentalism is sub-optimal.
Did you like contents of this chapter? If not click here.
Last updated