# AIPT Section 3.2: Solving Poker – Toy Poker Games

## Kuhn Poker Rules

Kuhn Poker is the most basic poker game with interesting strategic implications.

The game in its standard form is played with 3 cards in {A, K, Q} and 2 players. Each player starts with $2 and places an ante (i.e., forced bet before the hand) of$1. And therefore has $1 left to bet with. Each player is then dealt 1 card and 1 round of betting ensues. The rules with dots: • 2 players, 3 card deck {A, K, Q} • Each starts the hand with$2
• Each antes (i.e., makes forced bet of) $1 at the start of the hand • Each player is dealt 1 card • Each has$1 remaining for betting
• There is one betting round and one bet size of $1 • The highest card is the best (i.e., A$>$K$>$Q) Action starts with P1, who can Bet$1 or Check

• If P1 bets, P2 can either Call or Fold
• If P1 checks, P2 can either Bet or Check
• If P2 bets after P1 checks, P1 can then Call or Fold

• If a player folds to a bet, the other player wins the pot of 2 (profit of 1)
• If both players check, the highest card player wins the pot of 2 (profit of 1)
• If there is a bet and call, the highest card player wins the pot of 4 (profit of 2)

The following sequences are possible.

The “History full” shows the exact betting history with “k” for check, “b” for bet, “c” for call, “f” for fold.

The “History short” uses a condensed format that uses only “b” for betting/calling and “p” (pass) for checking/folding, meaning that “b” is used when putting $1 into the pot and “p” when putting no money into the pot. P1 P2 P1 Pot size Result History full History short Check Check$2 High card wins $1 kk pp Check Bet$1 Call $1$4 High card wins $2 kbc pbb Check Bet$1 Fold $2 P2 wins$1 kbf pbp
Bet $1 Call$1 $4 High card wins$2 bc bb
Bet $1 Fold$2 P1 wins $1 bf bp ## Solving Kuhn Poker We’re going to solve for the GTO solution to this game using 3 methods, an analytical solution, a normal form solution, and then we will introduce game trees, which allows for solving the game using the CFR counterfactual regret algorithm, that will be detailed more in the next section. ### Analytical Solution There are 4 decision points in this game: P1’s opening action, P2 after P1 bets, P2 after P1 checks, and P1 after checking and P2 betting. Let’s first look at P1’s opening action. P1 should never bet the K card here because if he bets the K, P2 with Q will always fold (since the lowest card can never win) and P2 with A will always call (since the best card will always win). By checking the K always, P1 can try to induce a bluff from P2 when P2 has the Q. P1 initial action Therefore we assign P1’s strategy: • Bet Q: $x$ • Bet K: $0$ • Bet A: $y$ P2 after P1 bet After P1 bets, P2 should always call with the A and always fold the Q as explained above. Therefore we assign P2’s strategy after P1 bet: • Call Q: $0$ • Call K: $a$ • Call A: $1$ P2 after P1 check After P1 checks, P2 should never bet with the K for the same reason as P1 should never initially bet with the K. P2 should always bet with the A because it is the best hand and there is no bluff to induce by checking (the hand would simply end and P2 would win, but not have a chance to win more by betting). Therefore we assign P2’s strategy after P1 check: • Bet Q: $b$ • Bet K: $0$ • Bet A: $1$ P1 after P1 check and P2 bet This case is similar to P2’s actions after P1’s bet. P1 can never call here with the worst hand (Q) and must always call with the best hand (A). Therefore we assign P1’s strategy after P1 check and P2 bet: • Call Q: $0$ • Call K: $z$ • Call A: $1$ So we now have 5 different variables $x, y, z, a, b$ to represent the unknown probabilities. Solving for $x$ and $y$ For P1 opening the acxtion, $x$ is his probability of betting with Q (bluffing) and $y$ is his probability of betting with A (value betting). A key game theory principle is that we want to make P2 indifferent between calling and folding with the K (since again, Q is always a fold and A is always a call for P2). When P2 has K, P1 has $\frac{1}{2}$ of having a Q and A each. P2’s EV of folding with a K to a bet is $0$. Note that we are defining EV from the current decision point, meaning that money already put into the pot is sunk and not factored in. P2’s EV of calling with a K to a bet $= 3 * \text{P(P1 has Q and bets with Q)} + (-1) * \text{P(P1 has A and bets with A)}$ $= (3) * \frac{1}{2} * x + (-1) * \frac{1}{2} * y$ Setting the calling and folding EVs equal, we have: $0 = (3) * \frac{1}{2} * x + (-1) * \frac{1}{2} * y$ $y = 3 * x$ That is, P1 should bet the A 3 times more than bluffing with the Q. This result is parametrized, meaning that there isn’t a fixed number solution, but rather a ratio of how often P1 should value-bet compared to bluff. Solving for $a$ $a$ is how often P2 should call with a K facing a bet from P1. P2 should call $a$ to make P1 indifferent to bluffing (i.e., betting or checking) with card Q. If P1 checks with card Q, P1 will always fold afterwards if P2 bets (because it is the worst card and can never win), so his EV is 0. $\text{EV P1 check with Q} = 0$ If P1 bets with card Q, $\text{EV P1 bet with Q} = (-1) * \text{P2 has A and always calls/wins} + (-1) * \text{P2 has K and calls/wins} + 2 * \text{P2 has K and folds} = \frac{1}{2} * (-1) + \frac{1}{2} * (a) * (-1) + \frac{1}{2} * (1 - a) * (2) = -\frac{1}{2} - \frac{1}{2} * a + (1 - a) = \frac{1}{2} - \frac{3}{2} * a$ Setting the probabilities of betting with Q and checking with Q equal, we have: $$0 = \frac{1}{2} - \frac{3}{2} * a \frac{3}{2} * a = \frac{1}{2} a = \frac{1}{3}$$ Solving for $b$ Now to solve for $b$, how often P2 should bet with a Q after P1 checks. The indifference for P1 is only relevant when he has a K, since if he has a Q or A, he will always fold or call, respectively. If P1 checks a K and then folds, then $\text{EV P1 check with K and then fold to bet} = 0$ $\text{EV P1 check with K and then call a bet} = (-1) * \text{P(P2 has A and always bets) + (3) * P(P2 has Q and bets) = \frac{1}{2} * (-1) + \frac{1}{2} * b * (3)$ Setting these probabilities equal, we have: $0 = \frac{1}{2} * (-1) + \frac{1}{2} * b * (3)$ $\frac{1}{2} = \frac{1}{2} * b * (3)$ $3 * b = 1$ $b = \frac{1}{3}$ Solving for $z$ The final case is when P1 checks a K, P2 bets, and P1 must call so that P2 is indifferent to checking vs. betting (bluffing) with a Q. $\text{P(P1 has A | P1 checks A or K)} = \frac{\text{P(P1 has A and checks)}}{\text{P(P1 checks A or K)}}$ $= \frac{(1-y) * \frac{1}{2}{ {(1-y) * \{\frac{1}{2} + \frac{1}{2}}$ $= \frac{1-y}{2-y}$ $\text{P(P1 has K | P1 checks A or K)} = 1 - \text{P(P1 has A | P1 checks A or K)}$ $= 1 - \frac{1-y}{2-y}$ $= \frac{2-y}{2-y} - \frac{1-y}{2-y}$ $= \frac{1}{2-y}$ If P2 checks his Q, his EV $= 0$. If P2 bets (bluffs) with his Q, his EV is: $-1 * P(P1 check A then call A) - 1 * P(P1 check K then call K) + 2 * P(P1 check K then fold K)$ $= -1 * \frac{1-y}{2-y} + -1 * z * \frac{1}{2-y} + 2 * (1-z) * \frac{1}{2-ya}$ Setting these equal: $0 = -1 * \frac{1-y}{2-y} + -1 * z * \frac{1}{2-y} + 2 * (1-z) * \frac{1}{2-y}$ $0 = -1 * \frac{1-y}{2-y} + -1 * z * \frac{1}{2-y} + 2 * (1-z) * \frac{1}{2-y}$ $0 = -\frac{1-y}{2-y} - z * \frac{3}{2-y} + \frac{2}{2-y}$ $z * \frac{3}{2-y} = \frac{2}{2-y} - \frac{1-y}{2-y}$ $$z = \frac{2}{3} - \frac{1-y}{3} $z = \frac{y+1}{3}$ Summary We now have the following result: P1 initial actions: Bet Q: $x = \frac{y}{3}$ Bet A: $y = ??$ P2 after P1 bet: Call K: $a = \frac{1}{3}$ P2 after P1 check: Bet Q: $b = \frac{1}{3}$ P1 after P1 check and P2 bet: Call K: $z = \frac{y+1}{3}$ P2 has fixed actions, but P1’s are dependent on the $y$ parameter. We can look at every possible deal-out to evaluate the value for $y$. Case 1: P1 A, P2 K 1. Bet fold$$ y * \frac{2}{3} * 1 = \frac{2 * y}{3} 2. Bet call $$y * \frac{1}{3} * 2 = \frac{2 * y}{3} 3. Check check$$ (1 - y) * (1) * (1) = 1 - y Total = $$\frac{4 * y}{3} + 1 - y = \frac{y}{3} + 1 Case 2: P1 A, P2 Q 1. Bet fold $y * 1 * 1 = y$ 2. Check bet call$$ (1 - y) * \frac{1}{3} * 1 * 2 = \frac{2}{3} * (1 - y) 3. Check check $$(1 - y) * \frac{2}{3} * 1 = \frac{2}{3} * (1 - y) Total =$$ \frac{4}{3} * (1 - y) + y = \frac{4}{3} - \frac{1}{3} * y Case 3: P1 K, P2 A 1. Check bet call $$(1) * (1) * \frac{y+1}{3} * (-2) = \frac{-2}{3} * (y + 1) 2. Check bet fold$$ (1) * (1) * (1 - \frac{y+1}{3}) * (-1) = Case 4: P1 K, P2 Q 1. Check check $$(1) * \frac{2}{ 2. Check bet call 3. Check bet fold Case 5: P1 Q, P2 A 1. Bet call 2. Check bet fold P1 bets $x$ and P2 calls, EV = $-2 * x$ P1 checks $1 - x$ and P2 bets and P1 folds. EV =$$ -1 * (1-x) = x - 1 Case 6: P1 A, P2 K 1. Check check 2. Bet call 3. Bet fold Summing up the cases Since each case is equally likely based on the initial deal, we can multiply each by $\frac{1}{6}$ and then sum them to find the EV of the game. ### Kuhn Poker in Normal Form Information Sets Given a deal of cards in Kuhn Poker, each player has 2 fixed decision points. Player 1 acts first and also acts if P1 checks and P2 bets. P2 acts second either facing a bet or facing a check from P1. This amounts to a total of 12 decision points per player. However, each player has 2 decision points that are equivalent in different states of the game. For example, if Player 1 is dealt a K and Player 2 dealt a Q or P1 dealt K and P2 dealt A, P1 is facing the decision of having a K and not knowing what his opponent has. Likewise if Player 2 is dealt a K and is facing a bet, he must make the same action regardless of what the opponent has because from his perspective he only knows his own card. We define an information set as the set of information used to make decisions at a particular point in the game. It is equivalent to the card of the acting player and the history of actions up to that point. So for Player 1 acting first with a K, the information set is “K”. For Player 2 acting second with a K and facing a bet, the information set is “Kb”. For Player 2 acting second with a K and facing a check, the information set is “Kk”. For Player 1 with a K checking and facing a bet from Player 2, the information set is “Kkb”. We use “k” to define check, “b” for bet”, “f” for fold, and “c” for call. Writing Kuhn Poker in Normal Form Now that we have defined information sets, we see that each player in fact has 2 information sets per card that he can be dealt, which is a total of 6 information sets per player since each can be dealt a card in {Q, K, A}. Each information set has 2 actions possible, which are essentially “do not put money in the pot” (check when acting first/facing a check or fold when facing a bet) and “put in$1” (bet when acting first or call when facing a bet).

The result is that each player has $2^6 = 64$ total combinations of strategies. That is, there are $2^64$ strategy combinations.

Here are a few examples for Player 1:

1. A - bet, Apb - bet, K - bet, Kpb - bet, Q - bet, Qpb - bet
2. A - bet, Apb - bet, K - bet, Kpb - bet, Q - bet, Qpb - pass
3. A - bet, Apb - bet, K - bet, Kpb - bet, Q - bet, Qpb - bet
4. A - bet, Apb - bet, K - bet, Kpb - bet, Q - bet, Qpb - bet
5. A - bet, Apb - bet, K - bet, Kpb - bet, Q - bet, Qpb - bet
6. A - bet, Apb - bet, K - bet, Kpb - bet, Q - bet, Qpb - bet

Think of this as each player having a switch between pass/bet that can be on or off and showing every possible combination of these switches for each information set.

We can create a $64 \text{x} 64$ payoff matrix with every possible strategy for each player on each axis and the payoffs inside and then

Put expected values in matrix form according to chance.

Minimax theorem

P1/P2 P2 Strat 1 P2 Strat 2 P2 Strat 64
P1 Strat 1 0 0
P1 Strat 2
P1 Strat 64 0 0 0
$A = \quad \begin{bmatrix} 0 & 0 & ... & 0 & \\ 0 & 0 & ... & 0 & \\ ... & ... & ... & ... & \\ 0 & 0 & ... & 0 & \\ \end{bmatrix}$

Solving with Linear Programming

The general way to solve a game matrix of this size is with linear programming.

Define Player 1’s strategy vector as $x$ and Player 2’s strategy vector as $y$

Define the payoff matrix as $A$ (payoffs written with respect to Player 1)

We can also define payoff matrix $B$ for payoffs written with respect to Player 2

In zero-sum games like poker, $A = -B$

We can also define a constraint matrix for each player

Let P1’s constraint matrix = $E$ such that $Ex = e$

Let P2’s constraint matrix = $F$ such that $Fy = f$

The only constraint we have at this time is that the sum of the strategies is 1 since they are a probability distribution, so E and F will just be matrices of 1’s and e and f will $= 1$.

A basic linear program is set up as follows:

$\text{Maximize: } S_1 * x_1 + S_2 * x_2$ $\text{Subject to: } x_1 + x_2 \leq L$ $x_1 \geq 0, x_2 \geq 0$

In the case of poker, for step 1 we look at a best response for player 2 (strategy y) to a fixed Player 1 (strategy x) and have:

$\max_{y} (x^TB)y = \max_{y} (x^T(-A))y = \min_{y} (x^T(A))y$ $\text{Such that: } Fy = f, y \geq 0$

In words, this is the expected value of the game from Player 2’s perspective because the $x$ and $y$ matrices represent the probability of ending in each state of the payoff matrix and the $B == -A$ value represents the payoff matrix itself. So Player 2 is trying to find a strategy $y$ that maximizes the payoff of the game from his perspective against a fixed $x$ player 1 strategy.

For step 2, we look at a best response for player 1 (strategy x) to a fixed player 2 (strategy y) and have:

$\max_{x} x^T(Ay)$ $\text{Such that: } x^TE^T = e^T, x \geq 0$

For the final step, we can combine the above 2 parts and now allow for $x$ and $y$ to no longer be fixed.

$\min_{y} \max_{x} [x^TAy]$ $\text{Such that: } x^TE^T = e^T, x \geq 0, Fy = f, y \geq 0$

We can solve this with linear programming, but there is a much nicer way to do this!

Simplifying the Matrix

Kuhn Poker is the most basic poker game possible and requires solving a $64 \text{x} 64$ matrix. While this is feasible, any reasonably sized poker game would blow up the matrix size!

We can improve on this form by considering the structure of the game tree, also known as writing the game in extensive form. Rather than just saying that the constraints on the $x$ and $y$ matrices are that they must sum to 1, we can redefine these conditions according to the structure of the game tree.

Previously we defined $E = F = \text{Vectors of } 1$. However, we know that some strategic decisions can only be made after certain other decisions have already been made. For example, Player 2’s actions after a bet can only be made after Player 1 has first bet!

Now we can redefine $E$ as follows:

Infoset/Strategies 0 A_b A_p A_pb A_pp K_b K_p K_pb K_pp Q_b Q_p Q_pb Q_pp
0 1
A -1 1 1
Apb     -1 1 1
K -1         1 1
Kpb             -1 1 1
Q -1                 1 1
Qpb                     -1 1 1

We see that $E$ is a $7 \text{x} 13$ matrix.

$E = \quad \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ -1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & -1 & 1 & 1 & 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 1 & 1 \\ \end{bmatrix}$

$x$ is a $13 \text{x} 1$ matrix of probabilities to play each strategy.

$x = \quad \begin{bmatrix} 1 \\ A_b \\ A_p \\ A_{pb} \\ A_{pp} \\ K_b \\ K_p \\ K_{pb} \\ K_{pp} \\ Q_b \\ Q_p \\ Q_{pb} \\ Q_{pp} \\ \end{bmatrix}$

We have finally that $e$ is a $7 \text{x} 1$ fixed matrix.

$e = \quad \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix}$

So we have:

$\quad \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ -1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & -1 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & -1 & 1 & 1 & 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 1 & 1 \\ \end{bmatrix} \quad \begin{bmatrix} 1 \\ A_b \\ A_p \\ A_{pb} \\ A_{pp} \\ K_b \\ K_p \\ K_{pb} \\ K_{pp} \\ Q_b \\ Q_p \\ Q_{pb} \\ Q_{pp} \\ \end{bmatrix} = \quad \begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{bmatrix}$

To understand how the matrix multiplication works and why it makes sense, let’s look at each of the 7 multiplications (i.e., each row of $E$ multiplied by the column vector of $x$ $=$ the corresponding row in the $e$ column vector. .

Row 1

We have $1 \text{x} 1$ = 1. This is a “dummy”

Row 2

$-1 + A_b + A_p = 0$ $A_b + A_p = 1$

This is the simple constraint that the probability between the initial actions in the game must sum to 1.

Row 3 $-A_p + A_{pb} + A_{pp} = 1$ $A_{pb} + A_{pp} = A_p$

The probabilities of Player 1 taking a bet or pass option with an A after initially passing must sum up to the probability of that initial pass $A_p$.

The following are just repeats of Rows 2 and 3 with the other cards.

Row 4

$-1 + K_b + K_p = 0$ $K_b + K_p = 1$

Row 5

$-K_p + K_{pb} + K_{pp} = 1$ $K_{pb} + K_{pp} = K_p$

Row 6

$-1 + Q_b + Q_p = 0$ $Q_b + Q_p = 1$

Row 7

$-Q_p + Q_{pb} + Q_{pp} = 1$ $Q_{pb} + Q_{pp} = Q_p$

And $F$:

Infoset/Strategies 0 A_b(ab) A_p(ab) A_b(ap) A_p(ap) K_b(ab) K_p(ab) K_b(ap) K_p(ap) Q_b(ab) Q_p(ab) Q_b(ap) Q_p(ap)
0 1
Ab -1 1 1
Ap -1     1 1
Kb -1         1 1
Kp -1             1 1
Qb -1                 1 1
Qp -1                     1 1

From the equivalent analysis as we did above with $Fx = f$, we will see that each pair of 1’s in the $F$ matrix will sum to $1$ since they are the 2 options at the information set node.

Now instead of the $64 \text{x} 64$ matrix we made before, we can represent the payoff matrix as only $6 \text{x} 2 \text{ x } 6\text{x}2 = 12 \text{x} 12$.

P1/P2 0 A_b(ab) A_p(ab) A_b(ap) A_p(ap) K_b(ab) K_p(ab) K_b(ap) K_p(ap) Q_b(ab) Q_p(ab) Q_b(ap) Q_p(ap)
0
A_b           2 1     2 1
A_p                 1       1
A_pb               2       2 0
A_pp               -1       -1
K_b   -2 1             2 1
K_p         -1               1
K_pb       -2               2
K_pp       -1               -1
Q_b   -2 1     -2 1
Q_p         -1       -1
Q_pb       -2       -2
Q_pp       -1       -1
$A = \quad \begin{bmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 2 & 1 & 0 & 0 & 2 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & -1 & 0 \\ 0 & -2 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & -2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 0 \\ 0 & -2 & 1 & 0 & 0 & -2 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -2 & 0 & 0 & 0 & -2 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -1 & 0 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 \\ \end{bmatrix}$

We could even further reduce this by eliminating dominated strategies:

P1/P2 0 A_b(ab) A_b(ap) A_p(ap) K_b(ab) K_p(ab) K_b(ap) K_p(ap) Q_b(ab) Q_p(ab) Q_b(ap) Q_p(ap)
0
A_b         2 1     2 1
A_p               1       1
A_pb             2       2 0
K_p       -1               1
K_pb     -2               2
K_pp     -1               -1
Q_b   1     -2 1
Q_p       -1       -1
Q_pp     -1       -1

For simplicity, let’s stick with the original $A$ payoff matrix and see how we can solve for the strategies and value of the game.

Our linear program is as follows (the same as before, but now our $E$ and $F$ matrices have constraints based on the game tree and the payoff matrix $A$ is smaller, evaluating when player strategies coincide and result in payoffs, rather than looking at every possible set of strategic options as we did before:

$\min_{y} \max_{x} [x^TAy]$ $\text{Such that: } x^TE^T = e^T, x \geq 0, Fy = f, y \geq 0$

Rhode Island Hold’em Resutls with other poker agents playing worse strategies exploitable

Now we have shown a way to solve games more efficiently based on the structure/ordering of the decision nodes (which can be expressed in tree form).

Mixed vs behavioral strategies

Updated: