Computer Player Strategy
From PokerAI
In his thesis, Mike Johanson describes several types of poker strategies considered by the University of Alberta CPRG, and implemented by CPRG as well as private researchers:
"Before describing some of the poker programs that have already been developed, it is useful to consider the different types of strategies that a player could use when playing the game. One of the features of poker is that exploitation is important: the goal is to win as much money as possible from each opponent. This means that there is not a “correct way” to play poker, like there is in games that have recently been solved such as awari or checkers. Instead, the correct strategy to use should ideally depend on the opponent that is being faced. Against a weak or known opponent, this may mean using a strategy designed to exploit their faults. Through examining histories of past games or through online learning, one can build a model of the opponent, and act in such a way as to maximally exploit the model. If the model is very accurate, then this may have a high win rate. If the model is inaccurate, however, it can lose badly.
Against an unknown or stronger opponent, we may want to adopt a strategy that is very difficult to exploit. The standard way of thinking about such a strategy, in any game, is the concept of a Nash equilibrium. A Nash equilibrium is a strategy for each player of the game, with the property that no single player can do better by changing to a different strategy. There can be several different (and possibly infinitely many) equilibria for any given game, but if the game is two-player and zero-sum, every Nash equilibrium provides the same payoffs to the players. In a repeated game where the players change positions, such as heads-up poker, this is a very useful property — if both players are playing an equilibrium strategy, the expected score for both players will be zero. If one player plays the equilibrium strategy, since their opponent cannot do better by playing a strategy other than the equilibrium, they can expect to do no worse than tie the game. In poker, using such a strategy allows us to defend against any opponent, or allows us to learn an opponent’s tendencies safely for several hands before attempting to exploit them.
When trying to find a Nash equilibrium in a complex game, we can rarely arrive at the precise equilibrium. Instead, we approximate the Nash equilibrium with an e-Nash equilibrium strategy, where e is a measure of how far from the equilibrium the strategy is. Since a Nash equilibrium strategy should expect to get a value of no less than 0 against any opponent, e is the value of the best response to the strategy. Other ways to say this are that that the strategy is e suboptimal or exploitable.
A common theme we will explore when considering poker strategies is the tradeoff between exploiting an opponent and one’s own capacity to be exploited. If we use a strategy that is specifically designed to beat one opponent, we are exploiting them but are also opening ourselves up to be exploited by a different strategy. If we choose to minimize our own exploitability by playing very close to an equilibrium, then we have to sacrifice our ability to exploit an opponent. It would be very valuable to have strategies along this line, and not just at these two extremes. Furthermore, we would like to obtain more than a linear tradeoff when we do this: we want to get more than we give up.
Instead of just having one well-designed strategy, we would also like to have a variety of strategies to choose from. For example, we may want to consider a set of strategies to be a team, from which we will choose one strategy at a time to play the game. One approach could be to randomly select strategies from a pool, and set a higher probability of choosing strategies that have historically been successful. A more complicated approach may be to start with an equilibrium strategy until we discover an opponent’s weakness, and then use the appropriate response to the weakness. These types of strategies are presented as examples we are interested in for the purposes of this thesis. In this thesis, we will describe methods for producing poker agents that play according to each of these strategies—specific responses to opponents, careful equilibria, exploitative-but-robust compromises, and teams of strategies with varying abilities." [1, p. 12-13]
See also
References
- Michael Bradley Johanson, "Robust Strategies and Counter-Strategies: Building a Champion Level Computer Poker Player" [1]