# Solving Poker - Toy Poker Games

We will take a look at solving a very simple toy poker game called Kuhn Poker.

## Kuhn Poker

Kuhn Poker is the most basic poker game with interesting strategic implications.

The game in its standard form is played with 3 cards {A, K, Q} and 2 players. Each player starts with \$2 and places an ante (i.e., forced bet before the hand) of \$1. And therefore has \$1 left to bet with. Each player is then dealt 1 card and 1 round of betting ensues. The rules in bullet form: • 2 players • 3 card deck {A, K, Q} • Each starts the hand with$2
• Each antes (i.e., makes forced bet of) $1 at the start of the hand • Each player is dealt 1 card • Each has$1 remaining for betting
• There is 1 betting round and 1 bet size of $1 • The highest card is the best (i.e., A$>$K$>$Q) Action starts with P1, who can Bet$1 or Check

• If P1 bets, P2 can either Call or Fold
• If P1 checks, P2 can either Bet or Check
• If P2 bets after P1 checks, P1 can then Call or Fold

These outcomes are possible:

• If a player folds to a bet, the other player wins the pot of $2 (profit of \$1)
• If both players check, the highest card player wins the pot of $2 (profit of \$1)
• If there is a bet and call, the highest card player wins the pot of $4 (profit of \$2)

The following are all of the possible full sequences.

The “History full” shows the exact betting history with “k” for check, “b” for bet, “c” for call, “f” for fold.

The “History short” uses a condensed format that uses only “b” for betting/calling and “p” (pass) for checking/folding, meaning that “b” is used when putting $1 into the pot and “p” when putting no money into the pot. We reference this shorthand format since we’ll use it when putting the game into code. P1 P2 P1 Pot size Result History full History short Check Check$2 High card wins $1 kk pp Check Bet$1 Call $1$4 High card wins $2 kbc pbb Check Bet$1 Fold $2 P2 wins$1 kbf pbp
Bet $1 Call$1 $4 High card wins$2 bc bb

### Writing Kuhn Poker in Normal Form

Now that we have defined information sets, we see that each player in fact has 2 information sets per card that he can be dealt, which is a total of 6 information sets per player since each can be dealt a card in {Q, K, A}. (If the game was played with a larger deck size, then we would have $\text{N} * 2$ information sets, where N is the deck size.)

Each information set has 2 actions possible, which are essentially “do not put money in the pot” (check when acting first/facing a check or fold when facing a bet – we call this pass) and “put in $1” (bet when acting first or call when facing a bet – we call this bet). The result is that each player has $2^6 = 64$ total combinations of strategies. Think of this as each player having a switch between pass/bet for each of the 6 information sets that can be on or off and deciding all of these in advance. Here are a few examples of the 64 strategies for Player 1 (randomly selected): 1. A - bet, Apb - bet, K - bet, Kpb - bet, Q - bet, Qpb - bet 2. A - bet, Apb - bet, K - bet, Kpb - bet, Q - bet, Qpb - pass 3. A - bet, Apb - bet, K - pass, Kpb - bet, Q - bet, Qpb - bet 4. A - bet, Apb - pass, K - bet, Kpb - pass, Q - bet, Qpb - bet 5. A - bet, Apb - pass, K - bet, Kpb - bet, Q - bet, Qpb - bet 6. A - pass, Apb - bet, K - bet, Kpb - bet, Q - pass, Qpb - bet We can create a $64 \text{x} 64$ payoff matrix with every possible strategy for each player on each axis and the payoffs inside. P1/P2 P2 Strat 1 P2 Strat 2 P2 Strat 64 P1 Strat 1 EV(1,1) EV(1,2) EV(1,64) P1 Strat 2 EV(2,1) EV(2,2) EV(2,64) P1 Strat 64 EV(64,1) EV(64,2) EV(64,64) This matrix has 4096 entries and would be difficult to use for something like iterated elimination of dominated strategies. We turn to linear programming to find a solution. ### Solving with Linear Programming The general way to solve a game matrix of this size is with linear programming, which is essentially a way to optimize a linear objective, which we’ll define below. This kind of setup could be used in a problem like minimizing the cost of food while still meeting objectives like a minimum number of calories and maximum number of carbohydrates and sugar. We can define Player 1’s strategy as $x$, which is a vector of size 64 corresponding to the probability of playing each strategy. We do the same for Player 2 as $y$. We define the payoff matrix as $A$ with the payoffs written with respect to Player 1. $A = \quad \begin{bmatrix} EV(1,2) & EV(1,2) & ... & EV(1,64) & \\ EV(2,1) & EV(2,2) & ... & EV(2,64) & \\ ... & ... & ... & ... & \\ EV(64,1) & EV(64,2) & ... & EV(64,64) & \\ \end{bmatrix}$ We can use payoff matrix $B$ for payoffs written with respect to Player 2 – in zero-sum games like poker, $A = -B$, so it’s easiest to just use $A$. We can also define a constraint matrix for each player: Let P1’s constraint matrix = $E$ such that $Ex = e$ Let P2’s constraint matrix = $F$ such that $Fy = f$ The only constraint we have at this time is that the sum of the strategies is 1 since they are a probability distribution, so E and F will just be vectors of 1’s and e and f will $= 1$. In effect, this is just saying that each player has 64 strategies and should play each of those some % of the time and these %s have to add up to 1 since this is a probability distribution and probabilities always add up to 1. In the case of Kuhn Poker, for step 1 we look at a best response for Player 2 (strategy y) to a fixed Player 1 (strategy x) and have the following. Best response means best possible strategy for Player 2 given Player 1’s fixed strategy. $\max_{y} (x^TB)y$ $\text{Such that: } Fy = f, y \geq 0$ We are looking for the strategy parameters $y$ that maximize the payoffs for Player 2. $x^TB** is the transpose of$x$multiplied by$B$, so the strategy of Player 1 multiplied by the payoffs to Player 2. Player 2 then can choose$y$\$ to maximize his payoffs.

$= \max_{y} (x^T(-A))y$

We substitute $-A$ for $B$ so we only have to work with the $A$ matrix.

$= \min_{y} (x^T(A))y$

We can substitute $-A$ with $A$ and change our optimization to minimizing instead of maximizing.

$\text{Such that: } Fy = f, y \geq 0$

In words, this is the expected value of the game from Player 2’s perspective because the $x$ and $y$ matrices represent the probability of ending in each state of the payoff matrix and the $B == -A$ value represents the payoff matrix itself. So Player 2 is trying to find a strategy $y$ that maximizes the payoff of the game from his perspective against a fixed $x$ player 1 strategy.

For step 2, we look at a best response for Player 1 (strategy x) to a fixed Player 2 (strategy y) and have:

$\max_{x} x^T(Ay)$ $\text{Such that: } x^TE^T = e^T, x \geq 0$

Note that now Player 1 is trying to maximize this equation and Player 2 is trying to minimize this same thing.

For step 3, we can combine the above 2 parts and now allow for $x$ and $y$ to no longer be fixed, which leads to the below minimax equation. In 2-player zero-sum games, the minimax solution is the same as the Nash equilibrium solution. We call this minimax because each player minimizes the maximum payoff possible for the other – since the game is zero-sum, they also minimize their own maximum loss (maximizing their minimum payoff). This is also why the Nash equilibrium strategy in poker can be thought of as a “defensive” strategy, since by minimizing the maximum loss, we aren’t trying to maximally exploit.

$\min_{y} \max_{x} [x^TAy]$ $\text{Such that: } x^TE^T = e^T, x \geq 0, Fy = f, y \geq 0$

We can solve this with linear programming, but this would involve a huge payoff matrix $A$ and length 64 strategy vectors for each player. There is a much more efficient way!

## Solving by Simplifying the Matrix

Kuhn Poker is the most basic poker game possible and requires solving a $64 \text{x} 64$ matrix. While this is feasible, any reasonably sized poker game would blow up the matrix size.

We can improve on this form by considering the structure of the game tree. Rather than just saying that the constraints on the $x$ and $y$ matrices are that they must sum to 1, we can redefine these conditions according to the structure of the game tree.

### Simplified Matrices for Player 1 with Behavioral Strategies

Previously we defined $E = F = \text{Vectors of } 1$, the most basic constraint that all probabilities have to sum to 1.

However, we know that some strategic decisions can only be made after certain other decisions have already been made. For example, Player 2’s actions after a Player 1 bet can only be made after Player 1 has first bet!

Now we can redefine the $E$ constraint as follows for Player 1:

Infoset/Strategies 0 A_b A_p A_pb A_pp K_b K_p K_pb K_pp Q_b Q_p Q_pb Q_pp
0 1
A -1 1 1
Apb     -1 1 1
K -1         1 1
Kpb             -1 1 1
Q -1                 1 1
Qpb                     -1 1 1

We see that $E$ is a $7 \text{x} 13$ matrix, representing the root of the game and the 6 information sets vertically and the root of the game and the 12 possible strategies horizontally. The difference now is that we are using behavioral strategies instead of mixed strategies. Mixed strategies meant specifying a probability of how often to play each of 64 possible pure strategies. Behavior strategies assign probability distributions over strategies at each information set. Kuhn’s theorem (the same Kuhn) states that in a game where players may remember all of their previous moves/states of the game available to them, for every mixed strategy there is a behavioral strategy that has an equivalent payoff (i.e. the strategies are equivalent).

Within the matrix, the [0,0] entry is a dummy and filled with a 1. Each row has a single $-1$, which indicates the strategy (or root) that must precede the infoset. The $1$ entries represent strategies that exist from a certain infoset.

$E = \quad \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ -1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & -1 & 1 & 1 & 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 1 & 1 \\ \end{bmatrix}$

$x$ is a $13 \text{x} 1$ matrix of probabilities to play each strategy.

$x = \quad \begin{bmatrix} 1 \\ A_b \\ A_p \\ A_{pb} \\ A_{pp} \\ K_b \\ K_p \\ K_{pb} \\ K_{pp} \\ Q_b \\ Q_p \\ Q_{pb} \\ Q_{pp} \\ \end{bmatrix}$

We have finally that $e$ is a $7 \text{x} 1$ fixed matrix.

$e = \quad \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix}$

So we have overall:

$\quad \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ -1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & -1 & 1 & 1 & 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 1 & 1 \\ \end{bmatrix} \quad \begin{bmatrix} 1 \\ A_b \\ A_p \\ A_{pb} \\ A_{pp} \\ K_b \\ K_p \\ K_{pb} \\ K_{pp} \\ Q_b \\ Q_p \\ Q_{pb} \\ Q_{pp} \\ \end{bmatrix} = \quad \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix}$

### What do the matrices mean?

To understand how the matrix multiplication works and why it makes sense, let’s look at each of the 7 multiplications (i.e., each row of $E$ multiplied by the column vector of $x$ $=$ the corresponding row in the $e$ column vector.

Row 1

We have $1 \text{x} 1$ = 1. This is a “dummy”

Row 2

$-1 + A_b + A_p = 0$ $A_b + A_p = 1$

This is the simple constraint that the probability between the initial actions in the game when dealt an A must sum to 1.

Row 3 $-A_p + A_{pb} + A_{pp} = 1$ $A_{pb} + A_{pp} = A_p$

The probabilities of Player 1 taking a bet or pass option with an A after initially passing must sum up to the probability of that initial pass $A_p$.

The following are just repeats of Rows 2 and 3 with the other cards.

Row 4

$-1 + K_b + K_p = 0$ $K_b + K_p = 1$

Row 5

$-K_p + K_{pb} + K_{pp} = 1$ $K_{pb} + K_{pp} = K_p$

Row 6

$-1 + Q_b + Q_p = 0$ $Q_b + Q_p = 1$

Row 7

$-Q_p + Q_{pb} + Q_{pp} = 1$ $Q_{pb} + Q_{pp} = Q_p$

### Simplified Matrices for Player 2

And $F$ works similarly for Player 2:

Infoset/Strategies 0 A_b(ab) A_p(ab) A_b(ap) A_p(ap) K_b(ab) K_p(ab) K_b(ap) K_p(ap) Q_b(ab) Q_p(ab) Q_b(ap) Q_p(ap)
0 1
Ab -1 1 1
Ap -1     1 1
Kb -1         1 1
Kp -1             1 1
Qb -1                 1 1
Qp -1                     1 1

From the equivalent analysis as we did above with $Fx = f$, we will see that each pair of 1’s in the $F$ matrix will sum to $1$ since they are the 2 options at the information set node.

### Simplified Payoff Matrix

Now instead of the $64 \text{x} 64$ matrix we made before, we can represent the payoff matrix as only $6 \text{x} 2 \text{ x } 6\text{x}2 = 12 \text{x} 12$. (It’s actually $13 \text{x} 13$ because we use a dummy row and column.) These payoffs are the actual results of the game when these strategies are played from the perspective of Player 1, where the results are in {-2, -1, 1, 2}.

P1/P2 0 A_b(ab) A_p(ab) A_b(ap) A_p(ap) K_b(ab) K_p(ab) K_b(ap) K_p(ap) Q_b(ab) Q_p(ab) Q_b(ap) Q_p(ap)
0
A_b           2 1     2 1
A_p                 1       1
A_pb               2       2 0
A_pp               -1       -1
K_b   -2 1             2 1
K_p         -1               1
K_pb       -2               2
K_pp       -1               -1
Q_b   -2 1     -2 1
Q_p         -1       -1
Q_pb       -2       -2
Q_pp       -1       -1
$A = \quad \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 2 & 1 & 0 & 0 & 2 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & -1 & 0 \\ 0 & -2 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & -2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 \\ 0 & -2 & 1 & 0 & 0 & -2 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -2 & 0 & 0 & 0 & -2 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -1 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 \\ \end{bmatrix}$

We could even further reduce this by eliminating dominated strategies:

P1/P2 0 A_b(ab) A_b(ap) A_p(ap) K_b(ab) K_p(ab) K_b(ap) K_p(ap) Q_b(ab) Q_p(ab) Q_b(ap) Q_p(ap)
0
A_b         2 1     2 1
A_p               1       1
A_pb             2       2 0
K_p       -1               1
K_pb     -2               2
K_pp     -1               -1
Q_b   1     -2 1
Q_p       -1       -1
Q_pp     -1       -1

For simplicity, let’s stick with the original $A$ payoff matrix and see how we can solve for the strategies and value of the game.

### Simplified Linear Program

Our linear program is now updated as follows. It is the same general form as before, but now our $E$ and $F$ matrices have constraints based on the game tree and the payoff matrix $A$ is smaller, evaluating when player strategies coincide and result in payoffs, rather than looking at every possible set of strategic options as we did before:

$\min_{y} \max_{x} [x^TAy]$ $\text{Such that: } x^TE^T = e^T, x \geq 0, Fy = f, y \geq 0$

MATLAB code is available to solve this linear program: https://www.dropbox.com/s/na9rta8sqj0zlmb/matlabkuhn.m?dl=0

## Iterative Algorithms

Now we have shown a way to solve games more efficiently based on the structure/ordering of the decision nodes (which can be expressed in tree form).

Using behavioral strategies significantly reduces the size of the game and solving is much more efficient by using the structure/ordering of the decision nodes. This can be expresessed in tree form and leads to algorithms that can use self-play to iterate through the game tree.

Specifically CFR (Counterfactual Regret Minimization) has become the foundation of imperfect information game solving algorithms. We will go into detail on this in section 4.1.

Updated: