A Project Studying Cooperation
in Games Through
Modeling in Terms
of Formally
Non-Cooperative Action
in a Repeated Game
Context
In 1996, at a summer "science camp" I gave a talk on the topic of the use of the "Prisoner's Dilemma" game, in a context of repetition and evolution of strategies, by theoretical biologists who were interested in studying the natural evolution of cooperative adaptations. And after giving the talk I thought more about the concept
of studying a game by studying it as a repeated game and through this viewpoint I got an idea of how to eliminate
all
of the "verbal" complications that could become involved in the
consideration of coalitions and coalition formation.
In principle, coalitions, and specifically
coalitions as considered by Von Neumann and Morgenstern in "Theory of
Games and Economic Behavior", are things that could
be
implemented by contracts, like contracts in roman
law.
But of course a contract is quite intrinsically
a
"verbal" thing because indeed it could (or should!)
be
written down in words.
My idea was that in a repeated game context
that the players could be given the right to vote for "agencies" or
"agents" among themselves. Thus at a first step a player, say player
A, would have the option to accept player B as his agent. And the effect of
this would
be
that the coalition (A,B) would be formed (like
a
committee with B as chairman) without any verbal processes occurring between A
and B. Furthermore, this process adapts to successive steps of coalescence
since if another step of elections is held then B, as the "agent"
representing the coalition (A,B), can vote
to
accept the agency of player C and then C will thus become the agent
representing the coalition (A,B,C).
And in this manner, with generalized
"agencies" being electable, the level of a "grand
coalition" can always
be
reached (for a game of finitely many players), and
as
a consequence of that the requisites for "Pareto efficiency" will be
available.
More on Agencies
Instead of there being unlimited means by
which coalitions might actually be formed or form, dissolve,
and
reform, we must specify an election procedure that
can
form the basis for a game of non-cooperative form
to
be played by the players of the original game being translated. And in the
context of studying a repeated
game
we can afford to prescribe that this election process is such that the agent is
entirely uncommitted
and
his election is irrevocable for each specific playing of the game. (Of course
election choices are expected
to
vary as the game is repeated.)
A set of rules can be devised so that there
are election stages in each of which all of the players remaining independent
(not represented by another player as agent) have, each of them, the option of
electing another player as an accepted agent. It is natural for these rules to
require convergence so that no more than (n-1) stages of election will ever be
needed for a game of n players.
Election rules need to do something to
resolve the impasse of a situation where A votes to accept B as his agent but B
simultaneously also votes similarly for A.
It
is not exactly clear, a priori, which rule version handles these situations in
the most preferable fashion,
we
have worked with more than one variant. When we more recently found, in the
course of the use of specific model games for calculations, that it seemed to
be desirable to allow elections to be repeated when an election had failed to
result in the election of any agency power, this finding had the effect of
suggesting that election rules which had the effect that at most one agency
could be elected at any stage of the election process would be most convenient.
(And furthermore the actual results of calculations seem to indicate that
this
convention on elections is "asymptotically non-prejudicial" since the
probability of simultaneous
votes
seemed to tend towards zero while the probability of "successful"
elections tended towards +1.)
If one begins with a quite general CF game
and then
if
one introduces a game formally requiring that all coalition benefits must be
attained through the means of the action of agents who have the authority to
represent all the members of the coalition then the "agencies game"
resulting from this still has the same validly derivable characteristic
function as the original game. In essence the coalitions simply have the same
potentialities as before, but in a formal sense, for these potentialities to be
exploited, the members of a coalition would need
TO
CONSPIRE on a practical procedure for electing agents successively and finally
taking the effective action of the coalition as an action of the agent finally
elected to represent all of the members of that coalition.
Current Form of Our Work
We have found that it is possible to find
"bargaining equilibria" in games of two or three players using models
or modeling of this sort. Ideally, there could be evolved and developed an
analogy to the mathematical methods (studies of PDE descriptions for the air
flow, etc.)
that
are used in weather prediction. On a materialistic-commercial level this would
naturally apply to the details of the deals in big corporation mergers (like
of
Pfizer and Pharmacia recently). Or also we can hope to get more insight
relating to existing value or evaluation concepts such as, notably, the Shapley
value or the nucleolus.
This program of research has led, in
connection with the approach to coalitions and cooperation via the concept of
agencies, to the consideration of varied models of how the players or
participants can "react"
to
favorable or unfavorable behavior by other players in the repeated game and to
the study of varied concepts of how the players may choose "demands"
that regulate their reactive behavior. For example, in the analogous area of
the study by theoretical biologists of the possibility of the evolution of
cooperation in repeated games of PD type, there have been found various types
of "reaction instinct" that can work to favor cooperation. Besides
the simplest "tit for tat" instinct there can be more complex
variants which may require more memory (of recent past game experiences in the
sequence of repetitions). The analogous situation in terms of models with the
election of agencies is that the "demands" that players could
be
allowed to make might be more or less elaborately structured.
I have been studying a more complicated
model (for three players) than that previously studied (in 2001) because an
expedient used in that simpler model to reduce the number of equations and
variables to be solved for seemed to lead to problems. The payoff results
seemed to depend on the comparative smallnesses of two different parameters,
both of which were supposed to be decreased asymptotically to zero.
Common Features of Last Year's
Model and the
Current Model
These models are the same in having a
procedure
of
elections by means of which the agency powers are elected. In the previous
model there was a simpler procedure employed after a first stage of agency
elections had been effective in reducing the remaining set of active players to
merely two. We used a sort
of
natural bargaining scheme through which competing choices, by the remaining
players or agents, of utility allocations could be rationalized. However this
device
of
simplification did not follow straightforwardly the basic concept of the
agencies as players in the same sort of game interaction as that for all the
players at the beginning of the playing of the game.
Both versions begin with a first stage of
elections where each of the three players could vote (or "choose")
any specific one of the other two players to be his agent. In either version
these voting or choosing actions (as repeatedly taken in the repeated game
context) are described by numbers that represent, effectively, the BEHAVIOR
involved, or the probability of the action being taken when the opportunity is
presented.
Thus we have an array of six numbers (and
three "implied numbers") that describe this:
For Player 1: For Player 2: For
Player 3:
a1f2 & a1f3 a2f1 & a2f3
a3f1 & a3f2
And these represent the probabilities of
specific voting choices. Thus, for example, a2f1 is the probability that Player
2 will choose (at the first stage of the game) to elect that Player 1 should be
authorized to act as his agent (as if with "power of attorney"). So
it is the "acceptance rate of P2 for P1" in the concept underlying
the notation.
The other aifj variables have parallel
meanings. And it is sometimes convenient to use another type of symbol like n3
= 1 - a3f1 - a3f2 which is the probability that Player 3 chooses NOT to accept
either of Players 1 or 2 as his agent; or thus this is the probability that
Player 3 votes "neither" or for himself.
And since the three players make their
first votes simultaneously there can be various outcomes. We have chosen a rule
to simplify the process by which there is the proper response to the votes. If
the number of acceptance votes cast by the players at the first voting
opportunity is more than one then we apply a rule that only one of these votes,
chosen at random, is to be accepted, so that the result of the election is
either (1): that one of the players has elected some other player to be his
agent or (2): that none of the players has voted to accept any other player to
be an agent representing his interest.
We have also introduced a convention that
in case of a failure of the players to come together at all so that no agency
has been elected that, with a certain probability, the first stage of elections
can be repeated. This idea was also used in our prior model. The probability of
the players being given another chance to elect an agent is called (1-e4) or
"one minus epsilon sub 4" and the idea is that we want to study
results of calculations as e4 tends asymptotically to zero. (It was found, in
the case
of
the previous model, that as e4 would tend towards zero the probabilities like
a1f2 would also tend towards zero but that agencies would be elected with high
probability because of the continued offering of "second chances"
for
the basic action of coalescence (the election of an agency).)
The Second Stage of the
Election of
Agencies
In our prior model, after one agency had
been elected and only two active players remained, we based the payoff on the
utility assignments which had been chosen by the two remaining players, taking
into consideration two numbers chosen by the player that had been elected as
an
agent and one number chosen by the remaining solo player. In the model
currently being studied the approach is, in a sense, more "orthodox"
with respect to the idea of agencies and all possibility of general cooperation
is reduced to the idea of the final election of a "general agent". So
when one player has accepted another as his agent there then remain two freely
acting players and
the
level of cooperation corresponding to the "grand coalition" is not
realized until one of them has elected the other as his agent.
But if this final agency election fails to
be realized then we can allow the existing agency to exploit the resources of
the two player coalition formed by the
two
players involved. (And in a simple case, like we consider, this can lead simply
to the use of the resources specified by the characteristic function
for
the coalition of that pair of players.)
Similarly with the idea for the first stage
of
elections we allow the second stage to be repeated, with a probability of
(1-e5), if neither side has elected
the
other side to agency power, and the idea is that we want to study the limiting
form of the results as e5 asymptotically goes to zero.
Once a "general agent" has been
elected then he/she has the privilege of being able to allocate payoffs,
from
the total of the payoff utility resources available, to all of the players, including
himself. Our model has simply a resource total available of +1 which
corresponds also to the Pareto boundary of the game.
For each player there are four possible
ways by which he could have become chosen as final agent. Either of two players
may have initially elected him, and this leads to two cases, or the other two
players may have initially combined in either of two ways followed by his being
elected by the agent of that two player combination. And then as final agent he
has essentially to choose a point in 2 dimensions to determine his choice of
Pareto-accessible utility allocations.
This leads to 8 dimensions for each player
and there are three players so that this leads to 24 dimensions in all for the
choices made by the players when they specify allocations of utility after
being elected as "final agent". These are 24 out of a total of 39
"strategy variables" that are regarded as subject to the individual and individually optimizing choices by
the
players.
And the other 15 dimensions of strategic
choice
by
the players correspond to their choice options in relation to their reactive
behavior (in the repeated game). The behavior of the players that is affected
by
or
controlled by their reactive strategy choices is,
in
general, their "acceptance behavior".
The Third Stage of the Game;
the Allocation of
Utility
When two stages of agency election are
complete then one of the original players has become the agent for all and he
"allocates" the payoffs. These are presumed to be Pareto-efficient
and so we suppose that he/she specifies three non-negative numbers with sum =
+1. The information for this is specified by the amounts allocated to the other
players and that is two numbers. This leads to 24 strategic choices in all, of
this type, for all the players.
For example, in cases of type UjBijRk
player number i is in control and was first elected by Player j and then by
player k and player i chooses to allocate the payoff of ujbijrk to player j
(and ukbijrk to Player k, but that is another case of allocation strategy
variables).
Or for example, u1b3r21 is decided upon by
Player 3, who was elected by Player 2 after Player 2 had been
chosen
by Player 1 in the first stage. This is the amount allocated to Player 1 and
u2b3r21 would be the amount allocated to Player 2 (who has a different position
in the history of the elections). Player 3 of course also allocates u3b3r21 to
himself, but this is eliminated from our system of 42 equations in 42 variables
because of its simple relation to the other two allocations by Player 3 here.
So there are 24 "utility
allocation" variables (which correspond to strategic choices by the
players) and they group into the 4 categories of UjBijRk, UkBijRk, UjBiRjk, and
UkBiRjk.
"Demands" and Behavior in the
Second Stage of
the Game
When the "second stage" is
reached one player has
become
an agent, another player is represented through
this
agency, and a third player remains solo.
Suppose that Player 1 now represents 2 and
that
3
remains solo. We call a12, short for a12f3, the probability that 1 now chooses
to vote for 3 as the
final
agent. (This is observable behavior on the part
of
Player 1 in the repeated game.) And we call af12, short for a3f12, the complimentary
probability that Player 3 will vote to accept Player 1 (who already represents
Player 2) as the final agent. These classifications lead to 12 numbers, six of
each type.
And the 12 numbers ARE NOT
"strategic" choices by
the
players involved, rather we arrange that they are determined by "reactive
behavior" regulated by "demands" which are the actual strategic
choices for the players.
For example a12 (or a12f3) is specified to
be A12/(1+A12) where A12 will be a positive number.
This
makes a12 a positive number less than +1. And
the
quantity A12 controlling a12 is specified to be
A12
= Exp[ (u1b3r12 - d12)/e3 ] .
Here e3, or "epsilon sub 3", is
intended to be made ultimately very small, as we study the equilibria of the
model. That smallness will make A12 react sharply as d12 and u1b3r12 vary in
relation to each other. The number
"d12" is the "demand" chosen (strategically) by Player 1 in
relation to this situation where he can vote to accept Player 3 as general
(final) agent or alternatively wait and hope that Player 3 will accept him
instead (!). And what the formula takes into consideration is simply the
prospective gain or payoff to Player 1 in the case where Player 3 becomes
general agent and previously Player 1 had been elected to represent Player 2,
and this is specifically u1b3r12.
There are 6 demand strategy numbers of the
same type as d12 (which controls a12). And there are also six quite analogous
strategy choices like, for example, df23 controlling af23 (or a1f23).
In the second stage of agency elections an
acceptance can be made either by a single player accepting the coalition led by
another player (and thus that other player’s leadership as final general
agent). Or on the other hand the player leading a coalition of two may accept
the solo player as the final agent for all.
"df23" is a choice by player 1,
since it is structured to control a1f23 or the probability of the acceptance by
Player 1, as a solo player in stage 2 of a game, of the agency of Player 2 when
Player 2 is already representing Player 3.
So we have
af23 = AF23/(1+AF23) or
a1f23
= A1F23/(1+A1F23) with the relation of
AF23
= Exp[ (u1b23r1 - df23)/e3 ] being
specified
for
the control of the acceptance behavior af23 by
the
(strategic) demand choice df23. Or this could be called, in longer
notation,
A1F23
= Exp[ (u1b23r1-d1f23)/e3 ] .
Demands and Acceptance
Behavior at
Stage One
At the first stage of elections, when all
three players are "solo" we have made a choice, conventionally, of
how to relate election behavior to "demands". The choice made is not
absolutely free from any possible arbitrariness and something more complex
might also be considered appropriate.
Each player, like e.g. player 2, has the
option of voting either for Player 1 (behavior of probability a2f1) or for
Player 3 (a2f3) or for not voting for either of them (behavior described by n2
= 1 - a2f1 - a2f3 ).
The
model, in the same manner as our previous model
(of
2001), relates these behavior-describing numbers (or probabilities) to a single
demand parameter called d2 which is all of the strategic choice by Player 2
that relates to stage 1 of the game. We specify in the model that positive
numbers A2f1 and A2f3 are determined (controlled by d2) and that a2fj = A2fj/(1 + A2f1 + A2f3) where j is either 1 or 3.
A2fj is specified to be Exp[ ( q2j - d2 )/e3 ] where q2j is
specified to be the calculated expectation of payoff, to Player 2, under the
hypothesis that the game has passed to stage 2 with Player 2 having become
represented by Player j as his agent. So Player 2 chooses, strategically, the
demand d2 which is interpretable as describing what Player 2
"demands" he/she should expect in gains if either, (q21), Player 1
becomes his representing agent or, (q23), Player 3 becomes the agent
representing Player 2 in stage 2 of the game.
Then the three strategy variables d1, d2,
and d3 control the six behavior probabilities a1f2, a1f3, a2f1, a2f3, a3f1, and
a3f2 which completely describe the actual (observable) behavior of the players
in stage 1.
Variables in the Model
In all we have 39 "strategic"
variables in the model, 15 "demands" and 24 choices of "utility
allocations". But we replace all the demand variables, like d23 or d1, by
associated controlled behavior probabilities like a23 or a1f2 and a1f3. And
then we arrive at simpler equations with most of the appearance of exponential
functions being eliminated.
It is a matter of practical considerations,
how
to
find actual numerical solutions of the resulting equations. There was
experience with this problem from work on the previous simpler model and much
of the programming earlier developed for use with MATHEMATICA for the finding
of numerical solutions could be adapted to the new model with its larger number
of variables. With the help of an NSF project assistant (AK) the 42 equations
to be associated with the 42 unknown variables deriving from the model were
derived and cross-checked.
At the present time there are some areas of
example
games
for which solutions are known and some where no explicit solution has yet been
found. For example, for
a
game with two symmetrically favored players who can already gain a shared
payoff of 3/4 just by themselves, no proper solution has yet been found
although illegitimate solutions with a variable assigned a game-theoretically
impermissible value have been found.
So it is possible that for this example
that some of the behavior-describing variables should assume extreme values or
become irrelevant. And then a reformulation of the set of equations to be
solved may properly handle this example.
We can illustrate the form of numerical
results found for the case on the borderline of where the nucleolus seems to
treat the values of the two-person coalition as "irrelevant
alternatives" and where it depends strongly on their values. This is the
case where Players P1 and P2 are symmetrically situated and have a coalition
value of 1/3 by themselves (while the coalition of all three players is worth 1
and no other two-person coalition has any value).
We can also illustrate, with a graphics of
a sort
first
prepared last summer, how the Pareto efficiency
of
the game outcome (derived from the payoffs to the players) tends to improve as
e3 tends towards zero. The effect of decreasing e3 is that the
"demands" made by the players become effectively "more
precise" and that leads to the improvement of the bargaining efficiency,
as it were. These illustrative graphics were prepared (by AK and SL) on the
basis of calculated solutions for a fully symmetric example game. We were able
to find the first solutions for fully symmetric games more easily since
the
effect of that symmetry was to reduce the number
of
equations and variables from 42 to merely 7.
Comparing Results and Various Theories
The motivation for studying cooperative
games via means for modeling the actions of the players as actions possible in
a non-cooperative game is, ultimately, to shed some light on simple issues like
that of assigning values for the players of the game. This is a simple concept
but it seems to be a difficult and challenging problem.
Other researchers have also recently gotten
into studies of the process of negotiating a mode of cooperation. The work, for
example, of Armando Gomes in Philadelphia was interesting and for three person
games his model was giving either the nucleolus or the Shapley value as the
evaluation result. Of course any method which leads similarly to an
"evaluation" of games becomes a natural basis for an
"arbitration scheme", using the words of Luce and Raiffa.
Our modeling, as it was also with the
previous model, has three parameters that describe the values of each of the
two-player coalitions. When these are small positive numbers, say of size less
than 1/3, then the nucleolus evaluation effectively disregards them although
the Shapley value gives them a moderate weighting in its evaluation of the
game.
The results of the most recent
computational studies using the model (in which studies I am being assisted by
the student SL working on the NSF grant for the research project) are giving
evaluation indications that are more equitably distributed among the three
players than those of the Shapley value. But when the two person coalitions are
quite weak and the nucleolus ignores them our model does indicate some
inequality in the payoffs to the players but less what the Shapley value would
indicate.
If two favored players can gain in their
two-person coalition 2/3 of the total available to the grand coalition (of all
three players) then the computational results from our model indicate
significantly less payoff to the two favored players than would be indicated
either by the nucleolus or by the Shapley value.
It seems plausible that some refinements in
modeling may allow the players to more effectively cooperate in terms of the
virtues of the "alliance possibilities" involving subsets of the
total set of players. I have thought of some ideas here, of varying degrees of
complexity. In principle it is quite natural, returning to the analogy with
biology and evolution, that there should be some desirable behavior patterns
that encourage cooperative behavior on the part of other interacting biological
"players". (We already have "demands" in
the
modeling and it is possible to design things, effectively, so that the players
can "demand" certain specific forms of cooperative behavior from
other players. For example, they might be enabled to "demand" that
acceptance probabilities should be more in reciprocal relationship. Since any
player is competent to be an agent and thus a leader it might be natural for a
player to "demand" that he/she should be elected to lead with frequency
like that of another player.)
It is notable in the specific model
currently studied that when Player 3 seems to gain more than he/she might be
expected to gain, when in a disfavored position in relation to the two-player
coalition values, that part
of
HOW Player 3 seems to manage this is by having a rejecting rather than
accepting behavior in relation to any elected agency representing a coalition
of Players 1 and 2.
A Technical Issue
Our previous model turned out to give numerical
results for cases of weak two-player coalitions where, depending on the ratio
of two "epsilons" (comparable to "e3" as described above),
the evaluation result could
be
either "hyper-Shapleyan" (compared with the nucleolus at
(1/3,1/3,1/3)) or "sub-Shapleyan".
Ultimately I came to the conclusion that it
was needed to have a MORE UNIFORM concept of "fuzziness" to be
applied in smoothing out the sharpness of the effects
of
"demand" choices. Otherwise, and this became evident in earlier
studies of two-player models, if one player would have "sharper"
demands and the other would make "duller" demands then the player
with the sharper
demands
would indeed become like a "sharper" bargainer and would tend to gain
an advantage in the calculated game outcomes! The fuzziness is needed for
mathematical reasons, so that derivatives of smooth functions can be computed.
But under certain circumstances it seems that "unbalanced" fuzziness
can somehow "prejudice" the game evaluation objective.
Ultimately I thought of a
"neutral" source of fuzziness by means of supposing a general
uncertainty about the actual boundary of the Pareto-accessible utility
outcomes.
NSF Project Assistants
Earlier, through the Summer of 2002, AK
assisted in the work of deriving and verifying the payoff functions and the
equilibrium equations (for equilibria in pure strategies with parameters
describing probabilities of actions by the players). Also AK managed to
effectively set up usable programs for calculating the nucleolus. These
programs had been developed originally by S. Klauke in Bielefeld.
More recently ,starting in the Fall, SL has
been the student assistant and has found sets of solutions for varying values
of the adjustable parameters, for example for when b3 = 2/3.