Mixed nash equilibrium and its application in two-player zero-Sum games

Hongjian Wu

doi:10.54254/2753-8818/26/20241052

1. Introduction

The Colonel Blotto game represents a classic example of a two-player zero-sum game [1]. In this game, Colonel Blotto and his adversary prepare for a battle over three mountain passes, each commanding five regiments. The objective is simple: send more regiments to a pass to occupy it. However, a draw occurs in the case of an equal number of regiments on both sides. The ultimate victory goes to the side that occupies more passes.

Both Col. Blotto and his adversary independently craft strategies for this fierce battle. Their strategies involve dividing their five regiments into three groups, such as \( (0, 1, 4) \) , indicating that one pass will face an attack from four regiments, another from one regiment, and one pass will remain unchallenged. These groups are then randomly assigned to the passes.

This random assignment significantly impacts the winning probabilities for Col. Blotto and his adversary. They strive to skew the probabilities in their favor as much as possible. This intriguing game sets the stage for exploring the concept of mixed Nash equilibrium, which can be applied to various real-world scenarios in decision-making and strategy development [2, 3].

The fundamental idea of mixed Nash equilibrium is further upon in this paper. In the context of a two-player zero-sum game, it addresses the practical application of mixed Nash equilibrium and explains its significance in problem-solving and decision-making. The paper aims at the importance and broad applicability of mixed Nash equilibrium.

2. Theory and application

2.1. The Colonel Blotto Game- An Introduction to Mixed Nash Equilibrium

The representation of payoffs in an \( m×n \) payoff matrix is facilitated by sequentially numbering the strategies as \( 1,2,…,m \) for the one player and \( 1,2,…,n \) for the other player [4].

In the Colonel Blotto game, the payoff matrix is structured with rows representing Col. Blotto’s strategies and columns representing those of the adversary. For instance, consider a scenario where Col. Blotto decides to allocate his regiments as \( (0, 2, 3) \) , and his adversary simultaneously chooses to distribute their forces as \( (0, 0, 5) \) .

In this situation, Colonel Blotto will secure a victory (without the need for actual combat) if and only if his two non-empty groups successfully occupy the unattended passes left by his adversary. The probability of this favorable outcome is \( 1/3 \) . Conversely, there’s a \( 2/3 \) probability that a draw will ensue, resulting in a winning probability difference of \( 1/3 \) in Col. Blotto’s favor (Table 1).

Table 1. The payoff matrix.

	\( (0, 0, 5) \)	\( (0, 1, 4) \)	\( (0, 2, 3) \)	\( (1, 1, 3) \)	\( (1, 2, 2) \)
\( (0, 0, 5) \)	\( 0 \)	\( -1/3 \)	\( -1/3 \)	\( -1 \)	\( -1 \)
\( (0, 1, 4) \)	\( 1/3 \)	\( 0 \)	\( 0 \)	\( -1/3 \)	\( -2/3 \)
\( (0, 2, 3) \)	\( 1/3 \)	\( 0 \)	\( 0 \)	\( 0 \)	\( 1/3 \)
\( (1, 1, 3) \)	\( 1 \)	\( 1/3 \)	\( 0 \)	\( 0 \)	\( -1/3 \)
\( (1, 2, 2) \)	\( 1 \)	\( 2/3 \)	\( -1/3 \)	\( 1/3 \)	\( 0 \)

In the absence of prior knowledge about the adversary’s choices, Col. Blotto’s objective is to determine an approach that guarantees the highest possible payoff in the worst-case scenario [5]. The strategy that meets this criterion is \( (0, 2, 3) \) . Irrespective of the adversary’s decisions, Col. Blotto secures a minimum payoff of \( 0 \) when employing this approach, while all other approaches yield negative payoffs in unfavorable circumstances. Col. Blotto assesses the minimum values within each row of the payoff matrix and opt for the row with the highest minimum value.

The adversary’s objective aligns with selecting an approach that ensures the lowest worst-case payoff for Col. Blotto, just as Col. Blotto aims to secure the highest worst-case payoff [6]. It becomes evident that the choice of (0, 2, 3) remains the sole optimal option for the adversary, ensuring that Col. Blotto can receive a maximum payoff of 0. All other strategies provide opportunities for Col. Blotto to attain a positive payoff if he successfully anticipates or uncovers the adversary’s strategy. The adversary assesses the maximum values in each column of the payoff matrix and selects the column with the most modest maximum value.

Likewise, the adversary strives to choose an approach that ensures the least unfavorable outcome for Col. Blotto. It appears that \( (0, 2, 3) \) is the unique choice for the adversary, ensuring that Col. Blotto receives a maximum of \( 0 \) payoff. All other strategies allow Col. Blotto to achieve a positive payoff if he correctly anticipates or spies on the adversary’s approach. The adversary evaluates the maximum value in each column of the payoff matrix and chooses the column with the smallest maximum [7].

In a similar vein, the adversary’s objective revolves around identifying an approach that ensures the lowest possible payoff for Col. Blotto in the worst-case scenario. It becomes evident that the exclusive choice for the adversary is the strategy \( (0, 2, 3) \) . This particular strategy guarantees that Col. Blotto will receive a maximum payoff of 0, while all alternative strategies provide him with the potential for a positive payoff if he can accurately anticipate or gather intelligence about the adversary’s strategy [8]. In the context of the payoff matrix, the adversary assesses the highest value within each column and selects the one with the least maximum value.

In practice, when both Col. Blotto and his adversary employ the aforementioned strategies, they find themselves facing the worst-case scenarios they had envisioned when adopting their strategies with a cautious approach. Observing these worst-case scenarios may dampen any hopes of a more favorable outcome in the impending battle. However, it also provides a sense of relief, as it assures them that their choices have been prudent. After the battle concludes, neither Col. Blotto nor his adversary will have cause for regret. Even if both had possessed prior knowledge of each other’s strategies, they would still lack any incentive to revise their own strategies.

This intriguing aspect of the game highlights that the strategies chosen by Col. Blotto and his adversary represent optimal responses to each other. In terms of the payoff matrix, the entry \( 0 \) in both the row \( (0, 2, 3) \) and the column \( (0, 2, 3) \) is referred to as a “saddle point”; it constitutes both the row’s minimum and the column’s maximum. A pair of strategies that serve as optimal responses to each other is identified as a mixed Nash equilibrium of the game [9,10].

2.2. A Deeper Explanation of Mixed Nash Equilibrium and Linear Programming Approaches

Continuing into the exploration of mixed Nash equilibrium in two-player zero-sum games, we will now provide a more formal description of the setup. We maintain the convention of referring to the players as Elsa and Peter. Elsa possesses a set of \( m \) pure strategies, while Peter holds a set of \( n \) pure strategies (assuming \( m,n≥1 \) ).

Within this framework, we have an m × n payoff matrix, denoted as M, consisting of real numbers. The entry \( {m_{ij}} \) signifies Elsa’s gain (and Peter’s loss) when Elsa employs her \( i \) th pure strategy against Peter’s \( j \) th pure strategy. For clarity, one can envision this as Peter having to make a payment of \( {m_{ij}} \) to Elsa. It is crucial to recognize that this scenario is symmetric, meaning that if \( {m_{ij}} \) is negative, Elsa would be obligated to pay \( {-m_{ij}} \) to Peter.

The payoff matrix \( M \) serves as the foundation for assessing and understanding the outcomes and strategies within a given two-player zero-sum game. It encapsulates the consequences of their choices and is instrumental in the calculation of mixed Nash equilibria. In the subsequent sections, we will delve into the mathematical modelling process that underlies the existence and computation of mixed Nash equilibria in zero-sum games, shedding light on the optimal strategies employed by the players [11].

A player’s mixed strategy is characterized by a probability distribution across their set of pure strategies. In the context of Elsa, her mixed strategy is represented through an \( m \) -dimensional vector comprising probabilities.

\( x=({x_{1}},…,{x_{m}}),\sum _{i=1}^{m} {x_{i}}=1,x≥0 \) .(1)

And a mixed strategy of Peter by an \( n \) -dimensional vector of probabilities

\( y=({y_{1}},…,{y_{n}}),\sum _{j=1}^{n} {y_{j}}=1,y≥0 \) .(2)

A mixed strategy is distinct from a pure strategy. When considering mixed strategies x and y employed by Elsa and Peter, respectively, the expected payoff (corresponding to Elsa’s expected gain) when x is pitted against y can be determined as follows:

\( \begin{array}{c} \sum _{i,j} {m_{ij}}{Pr_{x,y}}⁡[ Elsa plays i, Peter plays j] \\ =\sum _{i,j} {m_{ij}}{Prob_{x}}⁡[ Elsa plays i]\cdot {Prob_{y}}⁡[ Peter plays j] \\ =\sum _{i,j} {m_{ij}}{x_{i}}{y_{j}} \\ ={x^{T}}My \end{array} \) .(3)

The paper now proceeds to formalize the principle akin to Col. Blotto’s approach: “Prepare for the worst.” When Elsa contemplates the implementation of a mixed strategy denoted as \( x \) , her expectation involves Peter selecting a best response against \( x \) . This best response is encapsulated within a strategy denoted as \( y \) , designed to minimize her anticipated payoff, defined as \( {x^{T}}My \) . Similarly, when y is fixed and known, Peter anticipates Elsa’s strategy as \( x \) , seeking to maximize the value of \( {x^{T}}My \) .

For a fixed matrix \( M \) , these worst-case payoffs are captured by the following two functions:

\( β(x)=\underset{y}{min}{x^{T}}My,α(y)=\underset{x}{max}{x^{T}}My \) .(4)

In the context of the paper, \( β(x) \) signifies the optimal, or smallest, expected payoff that Peter can attain when confronted with Elsa’s mixed strategy denoted as \( x \) . Similarly, \( α(y) \) represents the most advantageous, or largest, expected payoff that Elsa can achieve in response to Peter’s strategy \( y \) . It is noteworthy that \( {y_{0}} \) serves as Peter’s optimal response to a specific \( x \) if and only if \( {x^{T}}M{y_{0}}=β(x) \) . The symmetrical statement regarding Elsa’s optimal response is left for the reader’s consideration.

In the context of the paper, a mixed Nash equilibrium of the game is defined as a pair of mixed strategies \( (\overset{~}{x},\overset{~}{y}) \) where in \( \overset{~}{x} \) serves as a best response to \( \overset{~}{y} \) , and concurrently, \( \overset{~}{y} \) acts as a best response to \( \overset{~}{x} \) . It is worth noting that the adjective “mixed” is frequently omitted in scholarly contexts. In mathematical notation, this concept can be articulated as follows:

\( β(\overset{~}{x})={\overset{~}{x}^{T}}M\overset{~}{y}=α(\overset{~}{y}) \) .(5)

It can be defined within the context of the paper that Elsa’s mixed strategy, denoted as \( \overset{~}{x} \) , is considered worst-case optimal when \( β(\overset{~}{x})=\underset{x}{max}β(x) \) . In this scenario, Elsa foresees Peter’s best response to any of her mixed strategies and strategically chooses a mixed strategy, denoted as \( \overset{~}{x} \) , that maximizes her expected payoff under this cautious assumption.

Likewise, Peter’s mixed strategy, referred to as \( \overset{~}{ y} \) , is deemed worst-case optimal when \( α(\overset{~}{y})=\underset{y}{min}α(y) \) . In this case, Peter seeks to minimize his expected payoff by selecting the mixed strategy \( \overset{~}{y} \) , operating under the assumption that Elsa will strategically respond with her best strategy. This approach of worst-case optimality underscores the careful consideration of strategic choices under adverse circumstances.

It is established that within the paper’s framework, the inequality holds: \( \underset{x}{max}β(x)≤\underset{y}{min}α(y) \) . In fact, for any given pair of mixed strategies, denoted as \( x \) and \( y \) , it follows that \( β(x)≤{x^{T}}My≤α(y) \) . This inequality relationship underscores the fundamental dynamics within the context of the discussed concepts.

The Minimax theorem for zero-sum games, as outlined in the paper, posits the existence of worst-case optimal mixed strategies for both players within every zero-sum game. Furthermore, it asserts that these strategies can be effectively computed through linear programming techniques. When a mixed strategy denoted as \( \overset{~}{x} \) is the worst-case optimal choice for Elsa, and \( \overset{~}{y} \) represents the worst-case optimal mixed strategy for Peter, it follows that \( (\overset{~}{x},\overset{~}{y}) \) constitutes a mixed Nash equilibrium. Importantly, the value \( β(\overset{~}{x})={\overset{~}{x}^{T}}M\overset{~}{y}=α(\overset{~}{y}) \) remains consistent across all potential worst-case optimal mixed strategies, both for \( \overset{~}{x} \) and \( \overset{~}{y } \) [12]. This theorem elucidates the foundational principles underpinning zero-sum games and the inherent equilibrium properties they exhibit.

The terminology “minimax theorem” requires clarification. By examining the equality \( β(\overset{~}{x}) \) = \( α(\overset{~}{y}) \) and applying the definitions of \( β \) , \( α \) , and the worst-case optimality of \( \overset{~}{x} \) and \( \overset{~}{y} \) , we derive the expression:

\( \underset{x}{max}\underset{y}{min}{x^{T}}My=\underset{y}{min}\underset{x}{max}{x^{T}}My \) .(6)

This equality captures the essence of the minimax theorem and underscores its significance in zero-sum games. It highlights the interchangeability of the maximization and minimization operations, emphasizing the fundamental duality inherent in these games.

The paper proceeds by outlining the methodology for identifying worst-case optimal mixed strategies, \( \overset{~}{x} \) for Elsa and \( \overset{~}{y} \) for Peter, through linear programming. Subsequently, it establishes the validity of the desired equality, \( β(\overset{~}{x}) \) = \( α(\overset{~}{y}) \) ).

To initiate this process, it is observed that Peter’s best response to a fixed mixed strategy, x, employed by Elsa, can be effectively determined through the solution of a linear program. In this context, \( β(x) \) , with \( x \) denoting a specific vector of m numerical values, serves as the optimal solution for the ensuing linear program, featuring variables \( {y_{1}},…,{y_{n}} \) [13]. Based on the theoretical development of mixed Nash equilibrium discussed above, the objective function is defined as \( {x^{T}}My \) , and the variables \( {y_{1}},…,{y_{n}} \) are subject to the following set of linear constraints.

\( \sum _{j=1}^{n} {y_{j}}=1 \) (7)

\( y≥0 \) (8)

In conclusion, the preceding discussion can be translated into the following linear programming problem.

\( \begin{matrix} Minimize & {x^{T}}My \\ subject to & \sum _{j=1}^{n} {y_{j}}=1 \\ & y≥0 \\ \end{matrix} \) .(9)

By employing the principles of operations research duality theory [13], the dual problem for this issue can be derived.

\( Maximize \) \( {x_{0}} \) .

\( subject to {M^{T}}x-1{x_{0}}≥0 \) .(10)

3. Conclusion

In conclusion, “The Colonel Blotto game” is a quintessential case in game theory, representing the essence of two-player zero-sum games. The fundamental theorem of game theory establishes the existence of a mixed Nash equilibrium in all zero-sum games. This paper has elucidated the principles of mixed Nash equilibrium, using the Colonel Blotto game as a starting point. Furthermore, we have explored the practical applications of mixed Nash equilibrium within two-player zero-sum games, demonstrating its importance in addressing real-world issues and decision-making.

By delving into the Colonel Blotto game and its associated theories, we have highlighted mixed Nash equilibrium’s significance and broad applicability. This equilibrium concept finds its place in various real-world applications, including competitive strategies, resource allocation, risk management, etc. This paper aim has been to provide readers with a comprehensive understanding of mixed Nash equilibrium and how to employ it effectively to solve practical problems.

References

[1]. Roberson B 2006 TheColonel Blotto game Economic Theory 29 1 1-24

[2]. Mezzetti C, Renou L 2012 Implementation in mixed Nash equilibrium Journal of Economic Theory 147 6 2357-2375

[3]. Lee C Y 2018 Mixed-strategy Nash equilibrium in data envelopment analysis European Journal of Operational Research 266 3 1013-1024

[4]. Thompson E R, Iyer N, Simpson B D 2015 Enhancing listener strategies using a payoff matrix in speech-on-speech masking experiments The Journal of the Acoustical Society of America, 2015, 138 3 1297-1304

[5]. Gaspars-Wieloch H 2018 The impact of the structure of the payoff matrix on the final decision made under uncertainty Asia-Pacific Journal of Operational Research 35 01 1850001

[6]. Rivera A, Wang H, Xu H 2018 The online saddle point problem and online convex optimization with knapsacks arXiv preprint arXiv 1806 08301

[7]. Geckil I K, Anderson P L 2016 Applied game theory and strategic behavior CRC Press

[8]. Brandenburger A M, Nalebuff B J 1995 The right game: Use game theory to shape strategy Chicago: Harvard Business Review

[9]. Wu D, Lisser 2023 A Improved saddle point prediction in stochastic two-player zero-sum games with a deep learning approach Engineering Applications of Artificial Intelligence 126 106664

[10]. Markakis E, Sgouritsa A, Tsikiridis A 2021 Towards a characterization of worst case equilibria in the discriminatory price auction International Conference on Web and Internet Economics Cham: Springer International Publishing 186-204

[11]. Reny P J 1999 On the existence of pure and mixed strategy Nash equilibria in discontinuous games Econometrica 67 5 1029-1056

[12]. Matoušek J, Sharir M, Welzl E 1992 A subexponential bound for linear programming Proceedings of the eighth annual symposium on Computational geometry 1-8

[13]. Tyndall W F 1965 A duality theorem for a class of continuous linear programming problems Journal of the Society for Industrial and Applied Mathematics 13 3 644-666

Cite this article

Wu,H. (2023). Mixed nash equilibrium and its application in two-player zero-Sum games. Theoretical and Natural Science,26,158-163.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 3rd International Conference on Computing Innovation and Applied Physics

ISBN：978-1-83558-235-0(Print) / 978-1-83558-236-7(Online)

Editor：Yazeed Ghadi

Conference website: https://www.confciap.org/

Conference date: 27 January 2024

Series: Theoretical and Natural Science

Volume number: Vol.26

ISSN：2753-8818(Print) / 2753-8826(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).