| B) Two players, P and T, independently select a number between 0 and 9
inclusive. If they agree to within �1, then P gains, while T loses.
If they disagree then T wins, while P loses. What is the winning strategy?
P's optimal strategy is to choose from {1,4,5,8} with equal probabilities,
while T's optimal strategy is to randomly choose from {0,3,6,9}. P's strategy
ensures P will win at least 1/4 of the time, and T's strategy ensures T a win
3/4 of the time.
To see this, form the payoff matrix, and note that P always finds 1 better
than 0, and 8 better than 9, while T prefers 0 over 1, and 9 over 8. Crossing
out these columns and rows yields a smaller payoff matrix. Continuing in
this way, the payoff matrix (for P) is eventually reduced to:
t_0 t_3 t_6 t_9
p_1 +1 -1 -1 -1
p_4 -1 +1 -1 -1
p_5 -1 -1 +1 -1
p_8 -1 -1 -1 +1
At this point, the problem yields to solution by a symmetry argument.
|
| re .2, .-1
P chooses i with probability p_i, and T chooses j with probability t_j.
Here's the payoff matrix; the value (to P) for each possible combination:
T
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9)
(0) +1 +1 -1 -1 -1 -1 -1 -1 -1 -1
(1) +1 +1 +1 -1 -1 -1 -1 -1 -1 -1
(2) -1 +1 +1 +1 -1 -1 -1 -1 -1 -1
(3) -1 -1 +1 +1 +1 -1 -1 -1 -1 -1
P (4) -1 -1 -1 +1 +1 +1 -1 -1 -1 -1
(5) -1 -1 -1 -1 +1 +1 +1 -1 -1 -1
(6) -1 -1 -1 -1 -1 +1 +1 +1 -1 -1
(7) -1 -1 -1 -1 -1 -1 +1 +1 +1 -1
(8) -1 -1 -1 -1 -1 -1 -1 +1 +1 +1
(9) -1 -1 -1 -1 -1 -1 -1 -1 +1 +1
On each play, P expects to win Sum p_i * t_j * M[i,j].
i,j
Suppose P has the strategy {p_0, p_1, ..., p_9}. If p_0 is non-zero, P can
do better (or at least as well) with the strategy {0, p_0+p_1, p_2, ..., p_9}
(notice that M[1,j] >= M[0,j] for all j). Thus, we see that P's optimal
strategy has p_0 = 0. Similarly, p_9 is zero, because P should prefer playing
8 rather than playing 9. Because P (being so smart) will never play 0 or 9,
we can remove those two rows from the payoff matrix.
Also, T always prefers playing 9 rather than 8, and 0 over 1 (remember that
T wants to *minimize* the payoff, and note that M[i,0] <= M[i,1] for all i).
So we may also remove those two columns, viz:
T
(0) (2) (3) (4) (5) (6) (7) (9)
(1) +1 +1 -1 -1 -1 -1 -1 -1
(2) -1 +1 +1 -1 -1 -1 -1 -1
(3) -1 +1 +1 +1 -1 -1 -1 -1
P (4) -1 -1 +1 +1 +1 -1 -1 -1
(5) -1 -1 -1 +1 +1 +1 -1 -1
(6) -1 -1 -1 -1 +1 +1 +1 -1
(7) -1 -1 -1 -1 -1 +1 +1 -1
(8) -1 -1 -1 -1 -1 -1 +1 +1
P's optimal strategy must be prepared to compete with T's optimal strategy,
so P considers this new payoff matrix, and sees that M[3,j] >= M[2,j] and
M[6,j] >= M[7,j], so P can wisely decide to never play 2 or 7.
For T's part, he sees that M[i,0] <= M[i,2], and M[i,9] <= M[i,7], so T
wisely decides to never play 2 or 7. This reduces the payoff matrix to:
T
(0) (3) (4) (5) (6) (9)
(1) +1 -1 -1 -1 -1 -1
(3) -1 +1 +1 -1 -1 -1
P (4) -1 +1 +1 +1 -1 -1
(5) -1 -1 +1 +1 +1 -1
(6) -1 -1 -1 +1 +1 -1
(8) -1 -1 -1 -1 -1 +1
Continuing; because M[4,j] >= M[3,j], P need never consider playing 3;
because M[5,j] >= M[6,j], P never plays 6; because M[i,3] <= M[i,4],
T never plays 4; because M[i,6] <= M[i,5], T never plays 5. We now have:
T
(0) (3) (6) (9)
(1) +1 -1 -1 -1
P (4) -1 +1 -1 -1
(5) -1 -1 +1 -1
(8) -1 -1 -1 +1
P now chooses values for p_1, p_4, p_5, and p_8 (which must sum to 1).
P could choose p_1 = p_4 = p_5 = 1/3, and p_8 = 0, but that allows T
the very profitable strategy of always choosing 9. Instead, P wants to
choose a strategy that maximizes his payoff, regardless of what strategy
T may choose.
P's expectation E = Sum p_i * t_j * M[i,j], which can be rewritten as:
i,j
E = -1 + 2*( p_1 * t_0 + p_4 * t_3 + p_5 * t_6 + p_8 * t_9 )
Suppose that the above t_j values are optimal (for T). This means that
for the p_i values to be optimal, any small adjustment in the t_j values
would cause P to do better; i.e.,
E' = -1 + 2*( p_1 * (t_0+d) + p_4 * (t_3-d) + p_5 * t_6 + p_8 * t_9 ) => E
So, expanding this on both sides, and cancelling, we see that
2 * p_1 * d - 2 * p_4 * d >= 0 (or t_3 = 0, or t_0 = 1)
Now this same idea holds, even if we adjust t_0 and t_3 the other way:
E' = -1 + 2*( p_1 * (t_0-d) + p_4 * (t_3+d) + p_5 * t_6 + p_8 * t_9 ) => E
So,
-2 * p_1 * d + 2 * p_4 * d >= 0 (or t_3 = 1, or t_0 = 0)
Thus we see that (unless t_0 or t_3 are at a boundary),
2 * p_1 * d - 2 * p_4 * d = 0, so that p_1 - p_4 = 0
Proceeding in this way, we can determine that p_1 = p_4 = p_5 = p_8. Since
these sum to 1, p_1 = p_4 = p_5 = p_8 = 1/4. Similarly for the t_j values
(since T wants to protect himself from a clever P), we have t_0 = t_3 = t_6
= t_9 = 1/4.
|
| >A) Two players, P and T, independently select a number between 0 and 9
>inclusive. If they agree to within �1, they each gain, otherwise they lose.
>What is the winning strategy?
Previous discussion focussed on game B), but A) yields to a similar
approach, even though it's not a zero-sum game. The payoff matrix is:
0123456789
+----------
0|1100000000
1|1110000000
2|0111000000
3|0011100000
4|0001110000
5|0000111000
6|0000011100
7|0000001110
8|0000000111
9|0000000011
No player will say 0 or 9, since by saying 1 or 8 they cover a strict
superset of possible guesses by the other player.
12345678
+--------
1|11000000
2|11100000
3|01110000
4|00111000
5|00011100
6|00001110
7|00000111
8|00000011
But if 0 and 9 are impossible, then 1 is dominated by 2, and 8 by 7.
Continuing this process, for a couple more iterations, the matrix becomes:
45
+--
4|11
5|11
and so the two players, if playing rationally and if they believe that the
other is playing rationally, are certain to win, since they will both say 4 or
5.
C) A third game we could imagine is the opposite of A). Both players are
trying to *avoid* matching or being adjacent to the other player's guess.
Here, 0 dominates 1, and 2 dominates 8.
0123456789
+----------
0|0011111111
1|0001111111
2|1000111111
3|1100011111
4|1110001111
5|1111000111
6|1111100011
7|1111110001
8|1111111000
9|1111111100
reduces immediately to
02345679
+--------
0|01111111
2|10011111
3|10001111
4|11000111
5|11100011
6|11110001
7|11111001
9|11111110
Similarly, 2 dominates 3, and 7 dominates 6:
024579
+------
0|011111
2|101111
4|110011
5|110011
7|111101
9|111110
This is as far as the simplification goes, but now each player can
guarantee himself a win 80% of the time, by picking evenly from
{0, 2, (4or5), 7, 9}. In practice, if two players are playing this game
repeatedly, they can each stabilize on a single response, as soon as they
have won the game once, and thus approach 100% wins. However, if each player
is being told neither his opponents guess, nor indeed whether he won or not,
80% is the best 'blind' payoff one can get.
Regards,
Andrew.
|