ORIE4350 Game Theory - Incomplete CS Notes @ Cornell

Rational Decision Making¶

The Single-Person Decision Problem¶

Actions, Outcomes, and Preferences¶

Decision Problem and Preference Relation¶

A decision problem consists of three features:

Actions are all the alternatives from which the player can choose. $A$
Outcomes are the possible consequences that can result from any of the actions. $X$
Preferences describe how the player ranks the set of possible outcomes. The preference relation $\succsim$ describes the player’s preferences, and the notation x $\succsim$ y means “x is at least as good as y.” This is an equivalence relation.

Payoff Function¶

We can quantify the preference relation by introducing the payoff function $u: X \to \mathbb{R}$ . Payoff is an ordinal construct: It is only used for ranking and does not carry any practical meaning.

if $x(a)$ is the outcome resulting from action $a$ , then the payoff from action $a$ is given by $v(a) = u(x(a))$ , the payoff from $x(a)$ . Function $v: A \to \mathbb R$ .

Rational Choice Paradigm¶

Rational Choice Paradigm says that all players will choose rationally - they will choose the best strategy. A player is rational if he chooses an action $a^* \in A$ that maximizes his payoff.

a^* \in A \text{ is chosen when } \forall a \in A, v(a^∗) \ge v(a)

(1)

This paradigm needs several assumptions: The player fully understands the decision problem by knowing:

$A$ , all possible actions
$X$ , all possible outcomes
how each action leads to which outcome
his rational preferences (payoffs) over outcomes.

Uncertainty and Time¶

Evaluating Random Outcomes¶

Expected Payoff¶

Imagine a case where results are non-deterministic. That is, after we take an action $a$ , the payoff we get is drawn from a probability distribution. For example, if we choose to buy a lottery, there is a slight chance we win a big amount of money and a big chance that money is just gone. On the other hand, if we choose not to buy, our money remains unchanged.

To describe such a situation, we will need to calculate the expected payoff from action $a$ : $E[u(x) | a]$ .

The rational paradigm changes accordingly: A rational player choose an action that maximizes the expected payoff.

Risk Attitude¶

Consider two actions: one guarantees to give you 9 yuan, and the other gives you 4 yuan with probability $\frac{7}{12}$ or 16 yuan with probability $\frac{5}{12}$ .

A player is

risk neutral if he is willing to exchange any sure monetary reward with any lottery that promises the same expected reward. That is, this player has the payoff function $u(x) = x$ . In our example, $\frac{7}{12}u(4)+\frac{5}{12}u(16) = u(9)$ .
risk aversive if he prefers not to be exposed to risk for the same expected monetary reward, so $\frac{7}{12}u(4)+\frac{5}{12}u(16) \lt u(9)$ .
risk loving if he strictly prefers any lottery that promises the same expected monetary reward $\frac{7}{12}u(4)+\frac{5}{12}u(16) \gt u(9)$ .

In real world, most people are risk aversive.

Static Games of Complete Information¶

Introduction¶

A game of complete information requires that the following be common knowledge among all the players of the game:

all the possible actions of all the players
all the possible outcomes
how each combination of actions of all players affects which outcome will materialize
the preferences of each and every player over outcomes

A game is static if each player simultaneously and independently chooses an action.

Normal-Form Games with Pure Strategies¶

A normal-form game consists of three features:

A set of players - $N = \{1, 2, ..., n\}$
Collection of sets of pure strategies - $\{S_1, S_2, ..., S_n\}$ , where $S_i = \{s_1, s_2, ..., s_m\}$
A set of payoff functions for each player that give a payoff value to each combination of the players’ chosen actions - $v_i: S_1 \times S_2 \times \dots \times S_n \to \mathbb R$

Strategy is a plan of actions describing what action a player takes in every possible situation of the games. It is a set of actions not necessarily specific to a single task, but for a goal possibly containing multiple tasks and is influenced by other player’s actions.

Pure strategy indicate that the players choose actions deterministically, not stochastically.

Since the players choose actions once and for all and simultaneously, we can equate pure strategies with actions in this chapter, but this wordy definition will be useful in later chapters.

Rationality and Common Knowledge¶

Define $S_{-i}$ as the set of set of all the strategy sets of all players who are not player $i$ , so $S_{-i}\equiv S_1 \times \dots S_{i-1} \times S_{i+1} \times \dots S_n$ and $(s_i, s_{-i}) = (s_1, \dots, s_{i-1}, s_i, s_{i+1}, \dots, s_n)$ .

Dominance¶

For a player $i$ , $s'_i \in S_i$ is strictly dominated by $s_i\in S_i$ when for any possible combination of the other player’s strategy, $s_{-i} \in S_{-i}$ , payoff from $s'_i$ is strictly less than that from $s_i$ :

\forall s_{-i} \in S_{-i}, \; v_i(s_i, s_{-i}) \gt v_i(s'_i, s_{-i})

(2)

$s_i\in S_i$ is a strictly dominant strategy for $i$ if every other strategy of $i$ is strictly dominated by it:

\forall s_{-i} \in S_{-i}, \; \forall s'_i \in S_i, \; v_i(s_i, s_{-i}) \gt v_i(s'_i, s_{-i})

(3)

From the two definitions above, observe that:

A rational player will never play a strictly dominated strategy.
A rational player will always play the strictly dominant strategy if he has one.

Iterated Elimination of Strictly Dominated Pure Strategies¶

We will now on make the following assumptions that:

All players are rational.
Players have complete information about the game.
The facts above are common knowledge.

Now we introduce IESDS - iterated elimination of strictly dominated strategies. It is just what its name tells: it iteratively eliminates the strictly dominated strategies of any player. A formal description of the algorithm is as follows:

$\forall i$ , define $S^0_i = S_i$ .
for (k = 0; ; k++):
stop_iteration = True
for (i = 1; i <= N; i++):
$\hat S^{k}_i$ = filter(fun s: strictly_dominated(s), $S^{k}_i$ )
if $\hat S^{k}_i \not= \emptyset$ , stop_iteration = False
$S^{k+1}_i =$ $S^{k}_i$ - $\hat S^{k}_i$
if stop_iteration: return $S^k$

Eliminate continuous strategies by truncating the interval we can choose this strategy from.

Eliminate discrete strategy by deleting a column or row of the matrix.

Beliefs, Best Response, and Rationalizability¶

Best Response and Beliefs¶

IESDS is based on eliminating actions that players would never play. An alternative is to ask: what possible strategies might players choose to play given other players strategies? In this case, he has to choose a best strategy as a response to the strategies of his opponents.

The strategy $s_i \in S_i$ is player’s $i$ ’s best response to his opponents’ strategy $s_{-i} \in S_{-i}$ when

\forall s'_{i} \in S_{i}, \; v_i(s_i, s_{-i}) \ge v_i(s'_i, s_{-i})

(4)

By introducing this notion of best strategy in response to other’s strategies, we are in effect introducing the idea of belief: What should I play if I believe the others will play $s_{-i}$ ? More formally, a belief is a strategy profile of his opponents’ strategies $s_{-i} \in S_{-i}$ .

Strategy profile is a vector containing strategies from some players.

Note that t

There can be more than one best responses, because we used the weekly preference relation in the definition.
A rational player who believes his opponents are playing $s_{-i}$ will always choose a best response to $s_{-i}$ .
If we compare best response to strictly dominant strategy, the latter is the best in any given situation, while the former is the best in a particular situation. Therefore, a strictly dominated strategy cannot be a best response to any $s_{-i}$ . On the other hand, a strictly dominant strategy is a best response to all $s_{-i}$

The best response correspondence $BR_i(s_{-i}) \subseteq S_i$ for player $i$ to a belief $s_{-i}$ is all the best responses he has to $s_{-i}$ . That is, the set of strategies he would play if he knows that the others play $s_{-i}$ .

Rationalizability¶

More generally, the set of rational strategies for player $i$ is the set $\cup_{s_{-i} \in \text{i's beiliefs}} BR_i(s_{-i})$ . Remember rational strategies are define above: the strategies that maximize payoff.

Next, we introduce the process of iterative rationalizability. Similar to IESDS, we iteratively delete strategies, but this time instead of asking “What would a rational player not do?” we ask “What might a rational player do?” A rational player will select only strategies that are a best response to some profile of his opponents, so we eliminate those strategies that are not best response to any beliefs.

We will start with belief = $S_{-i}$ for all players (one believes that the others can play anything), and then we will update iteratively according to the above algorithm.

This strategy profiles that remain after this iterative procedure are called rationalizable strategies.

E.g. the Battle of Sex: rationality failing
$> BR_1(\{O\}) = \{O\} \\ > BR_1(\{F\}) = \{F\} >$
(5)
We see the rational strategies are all the strategies. We cannot go any further. Basically any thing can happen, 1 still can play anything. So we cannot eliminate anything.

Finally, we will emphasize that Rationalizable equilibrium $\subseteq$ IESDS equilibrium. This is a direct result of definition: recall that IESDS only removes strictly dominated strategy, which is a more strict requirement than removing a strategy that is never a best response. Therefore, we always remove more (or equal) strategies in rationalizability elimination.

Nash Equilibrium¶

The reason why our approach failed in the Battle of Sex is that we cannot determine what exactly they will play - our beliefs can be wrong. What happens if what we believe is always true?

The pure-strategy profile is $(s_1^*, s_2^*, \dots, s_n^*)$ is a Nash Equilibrium if all the players’ beliefs are correct (each player plays exactly the way others think they will play):

\forall i \in N, \; s_i^* \in BR_i(s_{-i}^*)

(6)

E.g. the Battle of Sex with Nash
So if we analyze the problem under Nash Equilibrium assumptions, (O,F) is not possible in the battle of sex: If 1 thinks 2 will play F (and 2 actually does, because 1’s belief is true), 1 will play F, not O. A similar reasoning makes (F,O) also impossible. This leaves us with two Nash Equilibrium: (O, O) and (F, F).

Find a Nash Equilibrium: simply take the intersection of all player’s Best Responses to each of their beliefs. $\bigcap (BR_i(s_{-i}), s_{-i})$ . For a continuous example, refer to Tadelis 5.2.2 The Tragedy of the Commons; for a discrete example, refer to 5.1.1 Pure-Strategy Nash Equilibrium in a Matrix.

Strictly Dominant Equilibrium $\subseteq$ Nash Equilibrium $\subseteq$ Rationalizable Strategies $\subseteq$ IESDS

In this section, we are in fact introducing some kind of a “self-fulfilling beliefs”, “norm of society”. If I believe something, people will do that way.

Mixed Strategies¶

Definitions¶

Consider discrete strategies set $S_i = \{s_{i1}, \dots, s_{im} \}$ for player $i$ and each strategy has a probability to be played.

We represent a mixed strategy using $\sigma_i \in \Delta S_i$ where $\sigma_i = \{\sigma_i(s_{i1}), \dots, \sigma_i(s_{im})\}$ meaning $\sigma_i$ is a probability distribution associated with each element in $S_i$ . The player plays $s_{ik}$ with probability $\sigma_i(s_{ik})$ .

$\Delta S_i$ is the simplex of $S_i$ It is the set of all probability distribution over $S_i$

Note: Pure strategies is a special case of mixed strategies.

A pure strategy $s_{i}$ is in the support of $\sigma_i$ when it is played with a non-zero probability, so $\sigma_i(s_{i}) \gt 0$

With a continuous strategy set, instead of the probability distribution function $\sigma_i$ , we have culmulative distribution function $F_i$ and its differentiate, the probability density function $f_i$ , over the continuous strategy set. $s_{i} \in S_i$ is in the support of $F_i$ if $f_i(s_i) \gt 0$

A belief for player $i$ is a probability distribution $\pi_i\in \Delta S_{-i}$ over the strategies of his opponents. This is generally not the same as $\sigma_{-i}$ , which describes what the other players will actually play. One is the prob dist of one’s belief, while the other is the prob dist of what one will actually play. We should be very familiar with this distinction from discussions in previous chapters. However, note $\pi_i=\sigma_{-i}$ in a Nash setting.

The expected payoff of player $i$ if he chooses pure strategies $s_i \in S_i$ and the other players play the mixed strategy $\sigma_{-i} \in \Delta S_{-i}$ , then

\begin{align} v_i(s_i, \sigma_{-i}) &= \underset{s_{-i} \sim \sigma_{-i}}{\mathbb E} v_i(s_i, s_{-i})\\ &= \sum_{s_{-i} \in S_{-i}} v_i(s_i, s_{-i}) \; \sigma_{-i}(s_{-i}) \end{align}

(7)

If player $i$ chooses a mixed strategy $\sigma_i \in S_i$ ,

\begin{align} v_i(\sigma_i, \sigma_{-i}) &= \underset{\substack{s_i \sim \sigma_i \\ s_{-i} \sim \sigma_{-i}}}{\mathbb E} v_i(s_i, s_{-i})\\ &= \sum_{s_{i} \in S_{i}} \sum_{s_{-i} \in S_{-i}} v_i(s_i, s_{-i}) \; \sigma_{i}(s_{i}) \; \sigma_{-i}(s_{-i}) \end{align}

(8)

Mixed-Strategy Nash Equilibrium¶

We now generalize the concept of Nash equilibrium to mixed strategies:

The mixed strategy profile $\sigma^* = (\sigma_1^*, \dots, \sigma_n^*)$ is a Nash equilibrium if for every player $i\in N$ , $\sigma_i^*$ is a best response to $\sigma_{-i}^*$

A strategy that is not a best response will not be in the support of $\sigma^*$ , so before looking for Mixed-Strategy Nash Equilibrium, we should first perform rationality elimination.

Proposition: Given a Nash equilibrium $\sigma^* = (\sigma_1^*, \dots, \sigma_n^*)$ , $s_i$ and $s_i'$ are in the support of $\sigma_i^*$ , then $v_i(s_i, \sigma_{-i}^*) = v_i(s_i', \sigma_{-i}^*) = v_i(\sigma_i^*, \sigma_{-i}^*)$

Proof.

Take arbitrary $s_i, s_i' \in S_i$
AFOC, $v_i(s_i, \sigma_{-i}^*) \gt v_i(\sigma_i^*, \sigma_{-i}^*)$ . We can then conclude that $\sigma_i^*$ is not a best response to $\sigma_{-i}^*$ , so not inside Nash equilibrium. Contradiction. Therefore, we can concldue that ...
AFOC, $v_i(s_i, \sigma_{-i}^*) \lt v_i(\sigma_i^*, \sigma_{-i}^*)$ . $$
$\begin{align} > v_i(\sigma_i^*, \sigma_{-i}^*) > &= \sum_{s_i \in support(\sigma_i^*)} \sigma_i^*(s_i) v_i(s_i, \sigma^*_{-i}) \\ > &\lt \sum_{s_i \in support(\sigma_i^*)} \sigma_i^*(s_i) v_i(\sigma_i^*, \sigma^*_{-i}) \\ > &= v_i(\sigma_i^*, \sigma^*_{-i}) \sum_{s_i \in support(\sigma_i^*)} \sigma_i^*(s_i) \\ > &= v_i(\sigma_i^*, \sigma^*_{-i}) > \end{align}$
(9)
$$
Therefore, $v_i(s_i, \sigma_{-i}^*) = v_i(\sigma_i^*, \sigma_{-i}^*)$ and the same holds for $s_i'$ by symmetry.

Take away: In a Nash equilibrium, a player who plays a mixed strategy does not gain any more direct payoff. The payoff of all pure strategies that she is mixing over are all the same (so playing one pure strategy is no different from playing another pure strategy and so no different from playing a mixed of these pure strategies). The player plays a mixed strategy to make sure all the other players also play a mixed strategy so they can arrive at Nash Equilibrium.

In short, player 1 plays a mixed-strategy not to directly gain more, but so that player 2 is indifferent between the pure strategies in the support of $\sigma_2^*$ so that player 1 can gain indirectly.

We can use this proposition that $v_i(s_i, \sigma_{-i}^*) = v_i(s_i', \sigma_{-i}^*)$ to find the Nash equilibrium mixed strategy. Refer to HW4 for some examples.

Nash’s Existence Theorem¶

Any n-player normal-form game with finite strategy sets $S_i$ for all players has a (Nash) equilibrium in mixed strategies.

We will not write down a proof of this theorem, but you can find proof in lecture notes 10, 11. Refer to HW4 Tadelis 6.12(e) for an application of this theorem.

Dynamic Games of Complete Information¶

We always use backward induction to find Nash Equilibrium of Dynamic Games.

Introduction¶

The Extensive-Form Game¶

In this section, we introduce sequential rationality: Earlier movers will take into account the rationality of players who move later in the game. Note it is not the chronological order of play that matters, but what players know when they make their choices.

A game tree describes this sequential property. There is $X$ , a set of nodes associated with each tree. Among them, we have

root $x_0$ and set of leaves / terminals $Z$ . Leaves denote the the final payoffs of each player. The game proceeds from the root to the leaf.
Every node $x$ that is not a leaf is assigned either to Nature or to a player $i$ with the action set $A_i(x)$ . Function $i(x)$ specifies which player (Nature is also a player) will play at moment $x$ .
Edges going from a node correspond to $A_i(x)$ - the possible actions of the player associated with that node.

We need to specify what players know when it’s their turn. Therefore, we partition the nodes that are labelled with player $i$ into information sets $h_i$ . Every player has a collection of information sets $H_i$ associated with them. Mathematically, $H_i$ is a partition of $\{v | v\in X, i(v)=i\}$ . Information Sets has the following properties:

If $h_i$ is a singleton that includes only $x$ then player $i$ who moves at $x$ knows that he is at $x$ .
A player cannot distinguish between nodes that are in the same information set: If $x\not=x'$ and both $x, x' \in h_i$ , then player $i$ who moves at $x$ does not know whether he is at $x$ or $x'$
We also assume that if $x\not=x'$ and both $x, x' \in h_i$ , then $A_i(x) = A_i(x')$

Remember we defined Game of Complete Information at the beginning of part 2. This definition is enough for a normal-form game, but not suffices our extended form of game. Therefore, we introduce:

Game of Perfect Information: a game of complete information in which every information set is a singleton and there are no moves of Nature. Therefore, at every step of the game, the player knows exactly where he is by knowing what happened before he chooses.
Game of Imperfect Information: a game in which some information sets contain several nodes or in which there are moves of Nature. Therefore, at some steps of the game, some players do not know where they are because they either don’t know other players’ (endogenous uncertainty) or nature’s (exogenous uncertainty) choice.

Strategies¶

                    x0                    P1 chooses first
                  /    \
               O /      \ F
                /        \
               x1         x2              P2 chooses second
              /  \       /  \
           o /    \ f  o/    \ f
            /      \   /      \
           2       0   0       1          P1 payoff
           1       0   0       2          P2 payoff

A pure strategy of player $i$ is a strategy profile for all information sets associated with $i$ . Intuitively, for every information set $i$ can be in, a pure strategy directs $i$ what action to paly. Formally, it is mapping $s_i: H_i \to A_i$ that to every information set $h_k \in H_i$ assigns an action $s_i(h_k) \in A_i$ . Therefore, if player $i$ can be in information set $[h_1, h_2]$ , a pure strategy $s$ says $[s(h_1), s(h_2)]$ , meaning $i$ will play $s(h_1)$ when at information set $h_1$ and play $s(h_2)$ when at information set $h_2$ . Denote $S_i$ as the set of all pure strategies.

As before, a mixed strategy for player $i$ is a probability distribution over his pure strategies. Therefore, the player selects a plan randomly before the game is played and then follows a particular pure strategy.

A behavioral strategy $\sigma_i: H_i \to \Delta A_i(h_i)$ specifies for each information set $h_i \in H_i$ an independent probability distribution over $A_i(h_i)$ , so $\sigma_i(a_i(h_i))$ is the probability of player $i$ plays $a_i \in A_i(h_i)$ when at information set $h_i$ . Therefore, the player chooses what to play independently at each state as the game unfolds.

Compare pure/mixed strategy with behavioral strategy in the context of Battle of the Sex with player 1 chooses first, then player 2. Player 2’s pure strategy is $oo, of, fo, ff$ , where $xy$ means to play $x$ after 1 plays $O$ and plays $y$ after 1 plays $F$ .

We can think of mixed strategy as joint distribution: $\sigma(ff) = Pr[\text{2 f after 1 O and 2 f after 1 F}]$
We can think of behavioral strategy as conditional distribution: $\sigma^F(o) = Pr[\text{2 o | 1 F}]$

Behavioral strategy and mixed strategy can be easily translated into each other: $\sigma(ff) + \sigma(of) = \sigma^F(f)$ . Note this only holds with the mild assumption of perfect recall (Tadelis Definition 7.7, in the example of absent minded driver, A randomized strategy will never receive the payoff of 4 because P1 will play the same actions in both worlds, while behavioral strategy flips a coin at every node. )

Also note that if a player has multiple turns, these distributions are assumed to be independent. Therefore, behavioral strategies and mixed strategies are not completely the same thing.

Nash Equilibrium¶

To find a Nash equilibrium, we just have to list all pure strategies of each player in the matrix and find it as we used to do. (Tadelis 7.2.3)

Credibility & Sequential Rationality¶

From the sequential Battle of Sex, we have 3 Nash Equilibrium if we consider it in the normal form: (O, oo) (O, of) (F, ff). However, is (F, ff) really rational? It seems like P1 will not choose O over F because he is threatened by player 2 choosing ff. But in fact, think about whether this ff threat is valid: if 1 suddenly switches to O, ff will be an irrational choice. (the first f in ff says P2 plays f when P1 plays O, but yeah think about this. This is not a rational move. $BR_2(O) = o$ is not $f$ ) Therefore, we say that this ff threat is incredible.

Here, we introduce a new concept that helps us eliminate such incredible Nash Equilibrium.

Sequential Rationality & Backward Induction¶

Sequential Rationality: a player use strategies that are optimal at every information set in the game tree - Given strategies $\sigma_{-i} \in \Delta S_{-i}$ of $i$ ’s opponents, we say that $σ_i$ is sequentially rational when $i$ is playing a best response to $\sigma_{-i}$ in each of his information sets.

In $\{x_1\}$ , $BR_2(\{x_1\}) = o$ ; In $\{x_2\}$ , $BR_2(\{x_2\}) = f$ ; so the sequentially rational best response from 2 is $of$ : Namely, if 1 plays $O$ , 2 plays $o$ , and if 1 plays $F$ , 2 plays $f$ . $oo$ is not optimal at $x_2$ and $ff$ is not optimal at $x_1$

1 does the same reasoning, so he will play $BR_1(of) = O$ because $O$ gives him a reward 2, while $F$ will give him a reward 1.

Backward Induction: We start at the leaves of the game tree and work up the tree, reasoning about best responses.

This is to model the fact that since every agent knows the entire game tree, the player that chooses early can reason about what the later players will choose based on his decisions. Once he knows what the later players choose, he can chooses to play what maximizes his own payoff.

Backward Induction only works when at each point a decision is made, we know exactly where we are in the tree. That is, if all information sets are singletons (contain exactly 1 node of the tree). That is, we are playing a game of perfect information.

Any finite game of perfect information has a backward induction solution that is sequentially rational. This is a pure strategy Nash Equilibrium. Furthermore if no two terminal nodes prescribe the same payoffs to any player then the backward induction solution is unique.

Subgame-Perfect Nash Equilibrium¶

Definition¶

In this section, we look at the games of imperfect information (we don’t know exactly where we are in the tree). In such games, we cannot perform backward induction, because we may encounter multiple parents when working up the tree.

Subgame is a subtree of the game tree s.t. no information set contains nodes both in and not in the subtree. Note the game itself is the subgame of itself.

A behavioral strategy $\sigma^*$ is a subgame-perfect Nash equilibrium when for every subgame $G$ , if we only look at this subgame, $\sigma^*$ is a Nash equilibrium in G.

The difference between a Nash Equilibrium and a Subgame-Perfect Nash Equilibrium is that what satisfies a NE but not a SGPNE is a strategy profile (for P2) that is only BR to what P1 plays when he believes P2 will play the current strategy (this is just the definition of NE) , but not a BR to all P1’s possible strategies (this is just the definition of SGPNE).

Example: Voluntary Battle of Sex¶

In voluntary battle of sex, we let P1 first choose whether it plays or not. If it doesn’t play, both players just get a payoff of 1.5 If it chooses to play, assume 2 chooses simultaneously with 1, so 2 doesn’t know what 1 chose.

                         x0               P1 chooses to play or not
                       /    \
                  Yes /      \ No
                     /        \
                    x1        1.5         If P1 wants to play, then it chooses first
                  /    \      1.5
               O /      \ F
                /        \
             -----------------
             | x2         x3 |            P2 chooses second
             -----------------
              /  \       /  \
           o /    \ f  o/    \ f
            /      \   /      \
           2       0   0       1          P1 payoff
           1       0   0       2          P2 payoff

In this example, we have two subgames: the classic battle of sex is a strict subgame of the voluntary battle of sex and this voluntary battle of sex is a subgame of itself. To solve the voluntary battle of sex, we first look at the strict subgame classic battle of sex. We know the Pure Strategy Nash equilibrium for this subgame is $(O,o)$ and $(F,f)$ . Now we only have to consider these two cases. These two cases each have payoff $(2,1)$ and $(1,2)$ . If we have another mixed strategy Nash Eq $(\sigma_1, \sigma_2)$ , we just have to calculate its corresponding expected payoff and go up one level to $x_0$ .

From there we can apply backward induction: back at $x_0$ , this is player 1’s state so he player 1 gets to decide: $BR_1(O,o) = Y, \; BR_1(F,f) = N$ . Therefore, a rational player should play $(YO,o)$ or $(NF, f)$ . These two are the only Pure Nash Equilibriums that are rational for every subgame.

Just to revisit our definition, a behavioral strategy profile specifies for each player, an action for each of its information set. In this game, player 1 has information set $\{x_0\}, \{x_1\}$ and player 2 has information set $\{x_2, x_3\}$ . For the strategy profile $(YO,o)$ , 1 plays $Y$ at $\{x_0\}$ and $O$ at $\{x_1\}$ , 2 plays $o$ at $\{x_2, x_3\}$ .

Example: Stackelberg Competition (Variant of Cournot Duopoly)¶

Say P1 announces his price, and P2 decides based on that. So when P2 makes the decision, he already knows (not believes) quantity set by P1. P2 has a continuous information set, each being a particular $q_1$ P1 announces. Based on our definition of a behavioral strategy, P2 must react to each of his information set, so a strategy profile for P2 is $\frac{90-q_1}{2}$ . P1 uses this fact and reason backward and reach a single value 45.

It is curious that the player who announces first gets a better option: he has a unique value for Subgame Perfect Nash Eq. We should note this is because by announcing first in a game of complete information, P1 can reason about what P2 will do and use that information to improve his own decision. However, P2 cannot reason about P1’s behavior, because he will be told what P1 plays, he doesn’t get to believe what P1 plays.

Multistage Game¶

Introduction¶

A multistage game has $T$ stages. At each stage $i = 1, \dots, T$ , the players play a static game of complete information. (Games can be different for different stages) with the following rules:

After playing game in stage $i$ and before moving on to the next stage $i+1$ , all actions played of this current stage are revealed to all players
These above rules about multistage game is common knowledge.

To calculate payoff, we will assume a discount factor $\delta$ , so that payoff $v$ one period in the future is worth $\delta v$ now. Higher discount factor $\delta$ means that the players are more patient and care more about future payoffs.

The total payoff of a player when he plays a strategy for this series of games is the discounted sum of payoffs this player can get at each stage: ( $v_i^t$ denotes the player $i$ ’s payoff at the game of stage $t$ )

v_i = \sum^T_{t=1} \delta^{t-1}v^t_i

(10)

Strategies & Subgame Perfect Equilibrium¶

In this section, we will use the Prisoner-Revenge Game:

Stage1: Prisoner’s Dilemma, with only 1 Pure Strategy Nash Eq.
Stage2: Revenge Game, with 2 Pure Strategy Nash Eq.

For each 2nd (last) stage subgame, its Nash Eq are the same Nash Eq when you play it as a standalone game. That is because: Given a static game $G$ , define a new static game $\tilde G$ that is exactly the same except the payoffs are modified in the following way: $\tilde{v_i}(s_i, s_{-i}) = a_iv_i(s_i, s_{-i}) + b_i$ for some $a_i \gt 0, b_i \in \mathbb R$ . Pure Nash Equilibriums of this new game $\tilde G$ are exactly the same as the Pure Strategy Nash Equilibrium of original game $G$ . This is because such modification preserves the preference relation and it’s obvious that two games with payoffs having the same preference relation should have the same Nash Eq.

In this Prisoner-Revenge Game, two players each have 2 actions at stage 1, so there are a total $2\times 2 = 4$ results from playing stage 1. Each result leads to a subgame in stage 2, so there are 4 subgames in stage 2. As we just analyzed, the PSNE in any subgame of stage 2 are the same as if we play this subgame as an independent standalone game. Therefore, each subgame of stage 2 has two PSNE: $(L, l)$ and $(G, g)$ .

To be a SGPNE, it needs to be a NE at each subgame. We first look at each 2nd-stage subgame.

Arbitrarily choose one of the NE for each 2nd-stage subgame. There are 4 2nd-stage subgames and each subgame has 2 NE, so there are a total $2^4 = 16$ combinations of strategies to consider. For any SGPNE, the actions for these 4 2nd-stage subgame, must be one of the 16 combinations (or it is not a NE at at least one of these subgames, so this profile as a whole cannot be subgame perfect). Therefore, we only have to consider these 16 profiles for SGPNE. Such a profile has the form $s_i^2 = (s_i^2(Mm), s_i^2(Mf), s_i^2(Fm), s_i^2(Ff))$ , where $s_i^2(ab)$ means what player $i$ will do in 2nd-stage if in 1st-stage, 1 plays $a$ and 2 plays $b$ . One pair $ab$ is one of the 4 possible outcomes of 1st-stage game, and therefore specifies a particular subgame in 2nd-stage. Since we have want SGPNE, $s^2(ab)$ should also be a Nash Equilibrium when looked alone, so it is always either $(L,l)$ or $(G,g)$ . A profile $s^2(ab)$ of all players at situation $ab$ specifies all players’ payoffs at a particular case of the 4 outcomes in 1st-stage game. Then we just have to find the PSNE in this 1st-stage game with this new payoff specified. (Like finding a PSNE in any game, it’s possible that we can find only one, none, or multiple) If we have a mixed-strategy $s^2(ab)$ for the 2nd-stage game, we just have to calculate each player’s expected payoff in this subgame and use that as the new payoff of situation $ab$ in 1st-stage game.

A strategy profile for the whole game has the form $s_i = (s_i^1, s_i^2) = (s_i^1, s_i^2(Mm), s_i^2(Mf), s_i^2(Fm), s_i^2(Ff))$ . For example, if we have $s_1 = (F, L, G, G, L), s_2 = (f, l, g, g, l)$ . Player 1 first chooses to fink at 1st-stage, chooses loner at 2nd-stage if 1 plays $M$ and 2 plays $m$ at 1st-stage, chooses Gangers if $Mf$ , Gangers if $Fm$ , and loner if $Ff$ . Similarly for player 2. Eventually, the game path is $((Ff), (Ll))$ . For an arbitrary strategy profile, it doesn’t have to be symmetric like this. It just happens to be in this Prisoner-Revenge Game.

Non-Nash to Nash¶

Note we get to choose which Nash Eq will be played in each of the 2nd (later) stage games and we can use this to reward good behavior in the 1st (any previous) stage game. In fact, we can persuade the players to play at this first (any previous) stage what is not a Nash equilibrium when play 1st-stage as a standalone game.

Continue with the Prisoner-Revenge Game. In the 1st-stage of the game, the prisoner’s dilemma, we only have one Nash Eq $(F,f)$ . We want these two players to be loyal to each other and achieve $(M,m)$ . Therefore, in the 2nd-stage, we will reward (carot) loyalty $(M,m)$ and punish (stick) anything else by making both players join the gangs if they don’t cooperate and be lone wolves if they do. Therefore, $s_1^2 = (L, G, G, G), s_2^2 = (l, g, g, g)$ . Draw the payoff matrix of 1st-stage based on the 2nd-stage strategies:

	m	f
M	$4,4$	$-1-3\delta, 5-3\delta$
F	$5-3\delta, -1-3\delta$	$1-3\delta, 1-3\delta$

We can see that $v_2(Mm) \ge v_2(Mf)$ when $\delta \ge \frac 1 3$ . The same holds for P1 by symmetry. Therefore, as long as we have a discount factor $\ge \frac 1 3$ , the Nash equilibrium for this 1st-stage game will be $Mm$ and $((M, L, G, G, G), (m, l, g, g, g))$ are our Subgame Perfect Nash Equilibrium.

We basically change the players’ beliefs about what will happen later. After understanding the bad result of betraying the partners (joining gangs for revenge and hurting each other), they will decide to keep the secret.

Repeated Games¶

$G(T,\delta)$ denotes the finitely repeated game in which the stage-game G is played T consecutive times, and $\delta$ is the common discount factor.

We crucially relied on the fact that by having multiple Nash Eq in the 2nd-stage, we can have different factors for $\delta$ (these factors are just different payoff in 2nd-stage game) and this gives us the power to use some Nash Eq as a carrot and use some as a stick. If we only have one Nash Eq for this static game, then the only Subgame Perfect Equilibrium will be playing this Nash Eq all the time.

However, if we have an Infinitely Repeated Game and there is only one NE in the standalone game, we can still design strategies that enables us to achieve a non-NE. Here, we introduce the “Grim-Trigger” Strategy for the prisoner’s game, where both keep playing $(M,m)$ until one deviates. Specifically, they

Play $(M,m)$ as long as both players played $(M,m)$ in all stages before this one
Play $(F,f)$ if otherwise

This turns out to also be a Subgame Perfect Nash Equilibrium for some $\delta$ value. However, since it’s an infinite game, we cannot check on every subgame. Luckily, we have a new principle to check SGPNE:

One-Step Deviation Principle: A strategy profile $(\sigma_1, \dots, \sigma_n)$ is a SGPNE when there is no subgame where a player can improve his payoff using a one-step deviation.

For a player $i$ , a One-Step Deviation from his strategy $s_i$ is a strategy $s_i'$ when for all information sets, only action for one of them is different from $s_i$ .

Take the above Infinitely Repeated Prisoner’s Game as an example. Say if at the middle of the game, P1 wants to betray (deviate from all these $(M,m)$ played) Consider this round where P1 betrays: he will gain 5 immediately, but 1 for all the subsequent rounds. On the other hand, if he sticks to cooperation, he will gain 5 in all of these rounds. To keep him from betraying, we need a discount factor that is

4 + 4\delta + 4\delta^2 + \dots \ge 5 + 1\delta + 1\delta^2 + \dots \\ 4 + 4 \frac{\delta}{1-\delta} \ge 5 + 1 \frac{\delta}{1-\delta} \\ \delta \ge \frac 1 4

(11)

The generalized version of this principle is Folk Theorem: any periodic strategy that has average payoff that is larger than the trigger strategy is subgame perfect.

Static Games of Incomplete Information¶

Bayesian Games¶

Representation¶

To capture the idea that a player’s characteristics may be unknown to other players, we now say that a player can be of different types. A same player of different type can have different payoff values over the same strategy profile.

Games of Incomplete Information are the games that incorporate the possibility that players could be of different types.

Each combination of player type gives a different game. We always have nature node first deciding which game tree we are in. We use information sets to denote that a player knows what type he is, but doesn’t know what type the other players are.

Normal Form¶

Normal Form of a static Bayesian Game of Incomplete Information has

a set of players - $N = \{1, 2, \dots, n\}$
For each player $i$ :
- a action set - $A_i$
- a type space - $\Theta_i$ , where different types of the same player gives different games
- type dependent payoff function - $v_i: A_1 \times \dots \ A_n \times \Theta_i \to \mathbb R$ , where different types of the same player playing the same action gives different payoffs
a probability distribution over the types of the players that specifies the probability of us being on a particular game tree - $\Theta_1 \times \Theta_2 \times \dots\ \times \Theta_n$

Beliefs¶

And all the information above is common knowledge. So a player surely knows this probability distribution over all the types. One’s own type is a private knowledge to oneself. If a player knows his own type, he can deduce something more about the others’ types: he can deduce the posterior belief about others’ types $\phi_i = \phi(\theta_{-i} | \theta_i)$

Formally, the game proceeds as follows:

Nature chooses a profile of types ( $\theta_1, \theta_2, \dots, \theta_n$ ) according to the probability distribution of types
Each player $i$ learns his own type $\theta_i$ and deduces a belief about the probability distribution over the other player’s types, denoted by $\phi(\theta_{-i} | \theta_i)$ .
Each player $i$ simultaneously chooses an action $a_i \in A_i$
Payoffs are realized for each player $i$ by $v_i(a_i, a_{-i}, \theta_i)$

Strategies¶

Strategies in Games of Incomplete Information are defined very similar to the strategies on information sets in Dynamic Games.

A Pure Strategy for player $i$ is a function for each of its type - $s_i : \Theta_i \to A_i$

A Mixed Strategy is a probability distribution over pure strategies.

Payoffs¶

and Bayesian Nash Equilibrium¶

Expected payoff of a player $i$ of type $\theta_i$ takes the weighted average over all possible combination of other players’ types. Note a player in the Bayesian Game always knows his own type.

\begin{align} v_i(s ; \theta_i) &= \mathbb E_{\theta_{-i}} \; v_i(s_i(\theta_i), s_{-i}(\theta_{-i}) ; \theta_i)\\ &= \sum_{\theta_{-i}} \phi(\theta_{-i} | \theta_i) \; v_i(s_i(\theta_i), s_{-i}(\theta_{-i}) ; \theta_i) \end{align}

(12)

Bayesian Nash Equilibrium is a strategy profile where every player chooses a best response for every possible type he can be. This definition is almost the same as the sequential rationality, except it was “best response for every possible information set he can be in”.

Formally, $s^*$ is a Pure-Strategy Nash Equilibrium when for every player $i$ and every type $\theta_i \in \Theta_i$ this player can be, $s^*_i(\theta_i)$ is a best response to $s^*_{-i}$

\forall i, \; \forall \theta_i \in \Theta_i, \; s_i(\theta_i) \in BR_i(s^*_{-i})\\ \forall i, \; \forall \theta_i \in \Theta_i, \; \forall s_i' \in S_i, \; v_i(s_i^*, s_{-i}^*; \theta_i) \ge v_i(s_i', s_{-i}^*; \theta_i)

(13)

Eg. Game of Chicken¶

In the game of chicken, for player 1, his strategy set is $\{CC, CD, DC, DD\}$ where $AB$ means 1 plays $A$ when he has type $L$ and plays $B$ when he has type $H$ .

Do an example calculating expected payoff for P1 at situation $(CD, dc)$ :

\begin{align} v_1(CD, dc) &= P[\theta_1=L, \theta_2 = L] v_1(C,d; L) \\ &+ P[\theta_1=L, \theta_2 = H] v_1(C,c; L) \\ &+ P[\theta_1=H, \theta_2 = L] v_1(D,d; H) \\ &+ P[\theta_1=H, \theta_2 = H] v_1(D,c; H) \\ \end{align}

(14)

Eg. Study Groups¶

Two students form a study group. A student can choose to work (W) or shirk (S). If at least one student works, the work will be done. If a student chooses to work, he will pay a cost $c$ no matter what. Type of each student $\Theta_i = [0,1]$ determines how important the assignment is for each student. Types are uniformly i.i.d drawn. A player’s strategy is $s_i: \Theta_i \to \{W,S\}$ We have payoffs (the former one represents action of the player in discussion)

$v_i(S,S; \theta_i) = 0$
$v_i(W,W;\theta_i) = v_i(W,S;\theta_i) = \theta_i^2-c$
$v_i(S,W;\theta_i) = \theta_i^2$

Compute the expected payoff if P1 plays $W$ and $S$ :

$v_1 (W,s_2(\theta_2);\theta_1) = \theta_1^2 - c$
$v_1 (S,s_2(\theta_2);\theta_1) = \theta_1^2 \; Pr[s_2(\theta_2)=W] + 0 \; Pr[s_2(\theta_2)=S]$

Therefore, we choose $W$ when

v_1(W;\theta_1) \gt v_1(S;\theta_1) \\ \theta_1^2-c \gt \theta_1^2 \; Pr[s_2(\theta_2)=W]\\ \theta_1 \gt \sqrt{\frac c {1 - Pr[s_2(\theta_2)=W]}}

(15)

Denote $Pr[s_2(\theta_2)=W]$ as $p$ , $s_1 = W \iff \theta_1 \gt \sqrt{\frac c {1-p}}$ . Since $\theta_1$ is drawn uniformly random from $[0,1]$ , $Pr[\theta_1 \gt \sqrt{\frac c {1-p}}] = 1 - \sqrt{\frac c {1-p}}$ .

By symmetry, this also holds for P2. $Pr[s_1(\theta_1)=W] = Pr[s_2(\theta_2)=W] = p$ . We therefore have $p = 1 - \sqrt{\frac c {1-p}}$ and $1 - p = \sqrt[3] c$ . $\theta_1 \gt \sqrt{\frac c {1-p}} = \sqrt[3] c$

Now we can conclude that

s_i^*(\theta_i) = \begin{cases} W \text{ if } \theta_i \gt \sqrt[3] c \\ S \text{ if otherwise} \end{cases}

(16)

Auctions¶

open auction: bidders observe some dynamic price process until winner emerges
sealed-bid auction: players write down their bids and submit these without knowing the bids of other players. Bids are collected. Then somebody wins, somebody pays according to rules.

We focus on Sealed-Bid Auction in this section

Model¶

n players
each player $i$ has a type $\theta_i$ , which corresponds to private value $i$ has for this object. $\theta_i$ is drawn independently from some distribution $\theta_i \sim F_i(\cdot)$ where $F_i$ is the cumulative probability function of this distribution.
action is a bid $b_i$
payoff to player $i$ is
- 0 if he does not win
- $\theta_i - p$ if she wins
where $p$ is a function of the bids to a price, representing the paying rule of this auction. For example, most of the time we let the bidder who wins the bid (that is who proposes the highest bid) pays his bids, but maybe we can let the winner to pay the second highest bid instead.

Second-Price Auction - Strategy¶

In second-price auction, $p$ is second highest bid. Person who wins the bid, instead of paying his own bid, pays the second highest bid. Bidding your private value is the weakly dominant strategy.

Observe for any bid $b_i$ player $i$ chooses to pay, his payoff function $v_i(b_i, b_{-i})$ is

v_i(b_i, b_{-i})= \begin{cases} \theta_i - max(b_{-i}) &\text{if $i$ wins the bid: $b_i \gt max(b_{-i})$}\\ 0& \text{if $i$ loses the bid: $b_i \lt max(b_{-i})$ } \end{cases}

(17)

Observe that if we decide to pay our private value $\theta_i$ , we have a weakly dominant strategy. (See this by replacing all $b_i$ above with $\theta_i$ )

First-Price Auction - Strategy¶

Bidding your private value is weakly dominated: you eventually just get $b_i - \theta_i = \theta_i - \theta_i = 0$ . One can get the same payoff by simply not participating in bidding at all. Next we find the Bayesian Nash Eq strategy for this problem.

Assumption: types of each player are drawn independently from some distribution with CDF, cumulative distribution function $F_i$

Strategy $b_i = s_i: \Theta_i \to \mathbb R^{\ge 0}$ is a map from private value to bidding value.

We will assume that $s_i$ is strictly increasing (higher private valuation leads to higher bid)

\begin{align} & P[\text{player $i$ wins}] \\ =\;& P[ \forall j \not=i, s_i(\theta_i) \gt s_j(\theta_j)]\\ =\;& P[\forall j \not=i, s_j(\theta_j) \lt t] &&\text{replace $s_i(\theta_i)$ with $t$}\\ =\;& P[\forall j \not=i, \theta_j \lt s^{-1}_j(t)] &&\text{since $s$ is strictly increasing, its inverse func exists}\\ =\;& \prod_{j\not=i} P[\theta_j \lt s^{-1}_j(t)] &&\text{each type is independent}\\ =\;& \prod_{j\not=i} F_j(s^{-1}_j(t)) &&\text{definition of CDF}\\ \end{align}

(18)

Payoff if $i$ wins: $P[\text{player i wins}] \times (\theta_i - x) = \prod_{j\not=i} F_j(s^{-1}_i(t)) \times (\theta_i - x)$

To simplify this question a bit more, we make some extra assumptions: players’ types are drawn from the same CDF $F$ and each player have the same strategy $s$ . Therefore, the payoff is now $[F(s^{-1}(t))]^{n-1} \times (\theta_i - x)$

To find its maximum, we take the derivative of this thing. And solve for the best response,

s(\theta_i) = \theta_i - \frac {\int^{\theta_i}_{-\infty} [F(t)]^{n-1} dt } {[F(\theta_i)]^{n-1}}

(19)

Consider the special case where $\theta_i \sim uniform[0,1]$ , where we are basically normalizing private value to the scale of [0,1]

\begin{align} s(\theta_i) &= \theta_i - \frac {\int^{\theta_i}_{0} t^{n-1} dt } {\theta_i^{n-1}} \\ &= \frac {n-1} {n} \theta_i \end{align}

(20)

First-Price Auction - Revenue¶

Expected revenue for this auction is $\mathbb E[s(\theta)] = \mathbb E[(1 - \frac 1 n) \theta_{max}]$ . We need to focus on $\theta_{max}$ because according to strategy $s(\theta) = (1-\frac 1 n)\theta$ , the one with highest $\theta$ wins the bid and therefore pays the auction. That is, the one with highest private value eventually determine how much revenue an auction can make.

This expected value will average on the distribution of this $\theta_{max}$ . Look at CDF and PDF of $\theta_{max}$ :

\begin{align} G(x) &= \mathbb P(\theta_{max} \le x)\\ &= \mathbb P(\theta_{1} \le x) \mathbb P(\theta_{2} \le x) \dots \mathbb P(\theta_{n} \le x) && \text{$\theta$ are independent} \\ &= x^n && \text{$\theta$ is uniformly distributed on [0,1]}\\ g(x) &= \mathbb P(\theta_{max} = x) = nx^{n-1} \end{align}

(21)

With PDF, we can calculate the expected revenue

\begin{align} E[(1 - \frac 1 n) \theta_{max}] &= (1 - \frac 1 n) E[\theta_{max}] \\ &= (1 - \frac 1 n) \int_0^1 x \; nx^{n-1} \; dx \\ &= \frac {n-1} {n+1} \end{align}

(22)

If we let $\theta_i \sim uniform[0,c]$ , this result will be $\frac {n-1} {n+1}c$

Second-Price Auction - Revenue¶

Recall that $s_i(\theta_i) = \theta_i$ , and the second highest price $\theta_i^{(2)}$ is paid. so the expected revenue of an auction is $\mathbb E(\theta_i^{(2)})$ Look at the CDF and PDF of $\theta_i^{(2)}$

\begin{align} G(x) &= \mathbb P(\theta_i^{(2)} \le x) \\ &= \sum_{j\in N} \mathbb P(\forall i\not=j, \theta_i \le x \text{ and } \theta_j \gt x) + \mathbb P (\forall i \in N, \theta_i \le x)\\ &= n\; x^{n-1}(1-x) + x^n\\ g(x) &= n(n-1)x^{n-2} + n(1-n)x^{n-1} \end{align}

(23)

Now we can calculate the expected revenue

\begin{align} \mathbb E(\theta_i^{(2)}) &= \int_0^1 x \; (n(n-1)x^{n-2} + n(1-n)x^{n-1}) \; dx \\ &= \frac {n-1} {n+1} \end{align}

(24)

如果 private value is [0,c] 那么最后就是 $\frac {n-1} {n+1}c$

Revenue Equivalence¶

As we introduced all these different auctions, it is natural to ask which will yield the highest expected revenue for a seller. However, it turns out that the expected revenue a seller obtains from any of auctions we considered is the same as long as we meet some very general conditions.

Appendix¶

Abbreviation¶

NE = Nash Eq = Nash Equilibrium
PSNE: Pure Strategy Nash Equilibrium
SGPNE: Subgame Perfect Nash Equilibrium

Test Tips¶

Find all Nash Equilibrium: pure strategy and mixed strategy.

当连续的时候，很多东西 (BR, payoff, ...) 都要分类讨论一般情况和极端情况 (Cournot Duopoly 中当对手正常生产的时候，你有一个 BR；以及当对手直接拉满的时候，你有另一个 BR=0 因为你不管生产多少都亏本）

Incomplete Note: Mechanism Design¶

This section is not complete and will likely never be completed.

In this section, we discuss how to design a game given a goal.

Auctions can be viewed as a solution to a social problem: We have a good and players have a private valuation of the good. Society is best off if the good is assigned to players with the highest valuation.

We want to design a game (mechanism) to find this best assignment even if all players strategically (even if players try to game the mechanism)

Formalize the problem:

players have private types $\theta_i$
if we know these types, we know what is best for society. So there will be a function $f$ from type to outcome, describing what is the best outcome given players’ types. $f:\theta\to \text{outcome}$ Note $\theta$ is a type profile here $\theta = (\theta_1, \theta_2, \dots, \theta_n)$

We say a rule is implementable if we can ???

Players know the game and play strategically (so they want to get the best outcome for themselves, not necessarily for the whole society). How can we design a mechanism so the society good is achieved?

How did we solve this?¶

Problem: since all people can calculate this result, it’s possible players will end up reporting a private value smaller than this to skip some payments.

Quasilinear Performances Assumption: we will assume that for player $i$ if the final (global) outcome is $x$ and monetary payment to player $i$ is $m_i$ , he has a payoff $v_i$ : (Note $m_i$ can be negative, representing how much money $i$ has to pay out)

v_i(x, m_i, \theta_i) = v_i(x, \theta_i) + m_i

(25)

Design this game:

choose actions $A_i$ for each player
Based on actions, decide an outcome and payment for this game
$g: A_1 \times \dots \times A_n \to \text{outcome space}$
$m_i:A_1 \times \dots \times A_n \to \mathbb R$

We want that for the equilibrium strategy profile $s^*$ , the outcome we get from each player choosing based on its own benefit is indeed our socially best outcome. $\forall \theta \in \Theta, g(s^*(\theta)) = f(\theta)$

Consider this special type of game: let all player’s actions be their types(private values), so just assign each of their action space to be their type space. i.e. $\forall i \in N, A_i = \Theta_i$

Then $g \equiv f$ will give us the desired mechanism, provided that playing their true types is an equilibrium strategy profile.

Given f, define g???

-- ting bu dong--

Games where $\forall i\in N, A_i = \Theta_i$ are called “direct revelation mechanisms”

Revelation Principle: If f is implementable (see implementable definition) in Bayesian Nash/ dominant strategies, then there exists a direct revelation mechanism.

prove. If $f$ is implementable, then there exists an equilibrium $s^*$ such that $\forall \theta \in \Theta g(s^*(\theta))$

Consider the following mechanism ??

Vickrey-Clarke-Groves Mechanism¶

We will think about implementing functions $f(\theta) = x$ where $x$ maximizes $\sum_{i \in N} v_i(x, \theta_i)$ That is the outcome that gives highest sum of payoffs is there are no monetary transfers.

In this sense, this objective is sometimes called “social welfare”

If there is o payment, then lying about your type may be beautiful if $f(\theta_i, \theta_{-i}) \not= f(\theta_i', \theta_{-i})$ and $v_i(f(\theta_i, \theta_{-i}), \theta_i) \gt v_i(f(\theta_i', \theta_{-i}), \theta_i)$

We do not want this happen, so to counteract

if he is reporting his true types, we want the monetary payment to him to be at least than that, so he is willing to report his true type

We can simply define $m_i(\theta_i,)$ to be ..., so $m_i(\theta_i',)$ is and we see the previous inequality holds

Mechanism 2¶

Goal: create a game to implement “social welfare function” $f(\theta) = argmax*x \sum*{i \in N} v_i(x, \theta_i)$ . So it is the outcome that maximizes overall payoff (sum of payoffs).

Types are private info and we use game to extract this info from the players.

We will show that we can implement social welfare functions using the revelation mechanism in weakly dominant strategies.

use differences of payments if you lie about your type - you can lie, you can get better payoff by lying, but your monetary payment will be about how much you hurt the other players because it’s a social thing.

CS4787 Principles of Large-Scale Machine Learning

CS4820 Intro to Analysis of Algorithms