A Project Studying Cooperation
in Games
Via Modeling in Terms of
Formally Non-Cooperative
Action in a Repeated Game Context
In 1996, at a summer "science camp", I gave a talk
on the topic of the use of the "Prisoner's Dilemma"
game, in a context of repetition
and evolution
of strategies, by theoretical biologists who were
interested in studying the natural evolution of cooperative adaptations. And after giving the talk
I thought more about the concept of studying a game
by studying it as a repeated game and
through this viewpoint I got an idea of how to eliminate all of the "verbal"
complications that could become involved in
the consideration
of coalitions and coalition formation.
In principle,
coalitions, and specifically coalitions
could (or should!)
be written down in words.
My idea
was that in a repeated game context that
the players could be given the right to vote for "agencies" or "agents"
among themselves. Thus at a
first step a player, say player A, would have the option to accept player
B as his agent. And the effect of this would be that the coalition (A,B)
would be formed (like
a committee with B as chairman) without any verbal processes occurring
between A and B. Furthermore, this process adapts to successive steps of
coalescence since if another step of elections is held then B, as the "agent"
representing the coalition (A,B), can vote
to accept the agency
of player C and then C will thus become the agent representing the coalition
(A,B,C).
And in this manner,
with generalized "agencies" being electable, the level of a "grand coalition"
can always be reached (for a game of finitely many players), and
as a consequence of
that the requisites for "Pareto efficiency" will be available.
and reform, we must
specify an election procedure that
can form the basis
for a game of non-cooperative form
to be played by the
players of the original game being translated. And in the context of studying
a repeated
game we can afford
to prescribe that this election
process is such that
the agent is entirely uncommitted
and his election is
irrevocable for each specific playing of the game. (Of course election choices
are expected to vary as the game is repeated.)
A set
of rules can be devised so that there are election stages in each of which
all of the players remaining independent (not represented by another player
as agent) have, each of them, the option of electing another player as an
accepted agent. It is natural for these rules to require convergence so that
no more than (n-1) stages of election will ever be needed for a game
of n players.
Election
rules need to do something to resolve the impasse of a situation where A
votes to accept B as his agent but B simultaneously also votes similarly
for A. It is not exactly clear, a priori, which rule version handles
these situations in the most preferable fashion, we have worked with more
than one variant. When we more recently found, in the course of the use of
specific model games for calculations, that it seemed to be desirable to
allow elections to be repeated when an election had failed to result in the
election of any agency power, this finding had the effect of suggesting that
election rules which had the effect that at most one agency could be elected
at any stage of the election process would be most convenient. (And furthermore
the actual results of calculations seem to indicate that this convention
on elections is "asymptotically non-prejudicial" since the probability of
simultaneous votes seemed to tend towards zero while the probability of "successful"
elections tended towards +1.)
If one
begins with a quite general CF game and then
if one introduces a
game formally requiring that all coalition benefits must be attained through
the means of the action of agents who have the authority to represent all
the members of the coalition then the "agencies game" resulting from this
still has the same validly derivable characteristic function as the original
game. In essence the
coalitions simply have the same potentialities as before, but in a formal
sense, for these potentialities to be exploited, the members
of a coalition would
need TO CONSPIRE on a practical procedure for electing agents successively
and finally taking the effective action of the coalition as an action of
the agent finally elected to represent all
of the members of that
coalition.
be evolved and developed
an analogy to the mathematical methods (studies of PDE descriptions for
the air flow, etc.) that are used in weather prediction. On a materialistic-commercial
level this would naturally apply to the details of the deals in big corporation
mergers (like of Pfizer and Pharmacia recently). Or also we can hope to
get more insight relating to existing value or evaluation concepts such
as, notably, the Shapley value or the nucleolus.
This program
of research has led, in connection with
the approach to coalitions
and cooperation via the concept of agencies, to the consideration of varied
models of how the players or participants can "react"
to favorable or unfavorable
behavior by other players in the repeated game and to the study of varied
concepts of how the players may choose "demands" that regulate their reactive
behavior. For example, in the analogous area of the study by theoretical
biologists of the possibility of the evolution of cooperation in repeated
games of PD type, there have been found various types of "reaction instinct"
that can work to favor cooperation. Besides the simplest "tit for tat" instinct
there can be
more complex variants which may require more memory
(of recent past game experiences in the sequence of repetitions). The
analogous situation in terms of models with the election of agencies is
that the "demands" that players could be allowed to make might be more or
less elaborately structured.
I have
been studying a more complicated model (for three players) than that previously
studied (in 2001) because an expedient used in that simpler model to reduce
the number of equations and variables to be solved for seemed to lead to
problems. The payoff results seemed to depend on the comparative smallnesses
of two different parameters, both of which were supposed to be decreased asymptotically
to zero.
and the Current Model
of elections by means of which the agency powers are elected. In the previous
model there was a simpler procedure employed after a first stage of agency
elections had been effective in reducing the remaining set of active players
to merely two. We used a sort
of natural bargaining scheme through which competing choices, by the remaining
players or agents, of utility allocations could be rationalized. However
this device of simplification did not follow straightforwardly the basic
concept of the agencies as players in the same sort of game interaction as
that for all the players at the beginning of
the playing of the game.
Both versions begin with a first stage of elections where
each of the three players could vote (or "choose") any specific one of the
other two players to be his
agent. In either version
these voting or choosing actions (as repeatedly taken in the repeated game
context) are described by numbers that represent, effectively, the BEHAVIOR
involved, or the probability of the action being taken when the opportunity
is presented.
Thus we
have an array of six numbers (and three "implied numbers") that describe
this:
The other aifj variables have parallel meanings.
And it is sometimes convenient to use another type
of symbol like n3 = 1 - a3f1 - a3f2 which is the probability that Player 3 chooses NOT to accept either of Players 1 or 2 as his agent; or thus this is the probability that Player 3 votes "neither" or for himself.
And
since the three players make their first votes simultaneously there can
be various outcomes. We have chosen a rule to simplify the process by which
there
is the proper response
to the votes. If the number of acceptance votes cast by the players at the
first voting opportunity is more than one then we apply a rule that only
one of these votes, chosen at random, is to be accepted, so that the result
of the election is either (1): that one of the players has elected some other
player to be his agent or (2): that none of the players has voted to accept
any other player to be an agent representing his interest.
We have
also introduced a convention that in case
of a failure of the
players to come together at all so that no agency has been elected that,
with a certain probability, the first stage of elections can be repeated.
This idea was also used in our prior model. The probability of the players
being given another chance to elect an agent is called (1-e4) or "one minus
epsilon sub 4" and the idea is that we want to study results of calculations
as e4 tends asymptotically to zero. (It was found, in the case of the previous
model, that as e4 would tend towards zero the probabilities like a1f2 would
also tend towards zero but that agencies would be elected with high probability
because of the continued offering of "second chances" for the basic action
of coalescence (the election of an agency).)
Election of Agencies
to the idea of agencies
and all possibility of general cooperation is reduced to the idea of the
final election of a "general agent". So when one player has accepted another
as his agent there then remain two freely acting players and the level of
cooperation corresponding to the "grand coalition" is not realized until
one of them has elected the other as his agent.
But if
this final agency election fails to be realized then we can allow the existing
agency to exploit the resources of the two player coalition formed by the
two players involved. (And in a simple case, like we consider, this can
lead simply to the use of the resources specified by the characteristic
function for the coalition of that pair of players.)
Similarly
with the idea for the first stage of
elections we allow
the second stage to be repeated, with a probability of (1-e5), if neither
side has elected
the other side to agency
power, and the idea is that we
Once a
"general agent" has been elected then he/she has the privilege of being
able to allocate payoffs, from the total of the payoff utility resources
available, to all of the players, including himself. Our model has simply
a resource total available of +1 which corresponds also to the Pareto boundary
of the game.
For each
player there are four possible ways by which
his being elected by the agent of that two player combination. And then
as final agent he has essentially to choose a point in 2 dimensions to determine
his choice of Pareto-accessible utility allocations.
This leads
to 8 dimensions for each player and there
are three players so
that this leads to 24 dimensions
in all for the choices
made by the players when they specify allocations of utility after being
elected
as "final agent". These
are 24 out of a total of 39 "strategy variables" that are regarded as subject
to
the individual and
individually optimizing choices
by the players.
And the
other 15 dimensions of strategic choice
by the players correspond to their choice options in relation to their reactive behavior (in the repeated game). The behavior of the players that is affected by or controlled by their reactive strategy choices is,
in
general, their "acceptance behavior".
The Third Stage of the Game;
the Allocation of Utility
When two stages of agency election are complete then one of the original
players has become the agent for all and he "allocates"
the payoffs. These are presumed
to be Pareto-efficient and so we suppose that he/she specifies three non-negative
numbers with sum = +1. The information for
this is specified by the amounts allocated to the other players and that
is two numbers. This leads to 24 strategic choices in all, of this type,
for all the players.
For example,
in cases of type UjBijRk player number i
Or for
example, u1b3r21 is decided upon by Player 3, who was elected by Player
2 after Player 2 had been
chosen by Player 1
in the first stage. This is the amount allocated to Player 1 and u2b3r21
would be the amount allocated to Player 2 (who has a different position
in the history of the elections). Player 3 of course
also allocates u3b3r21 to himself, but this is eliminated from our
system of 42 equations in 42 variables because of its simple relation to the
other
two allocations by Player 3 here.
So there
are 24 "utility allocation" variables (which correspond to strategic choices
by the players) and they group into the 4 categories of UjBijRk, UkBijRk,
UjBiRjk, and UkBiRjk.
Second Stage of the Game
become an agent, another
player is represented through
this agency, and a
third player remains solo.
Suppose
that Player 1 now represents 2 and that
3 remains solo. We
call a12, short for a12f3, the probability that 1 now chooses to vote for
3 as the
final agent. (This
is observable behavior on the part of Player 1 in the repeated game.) And
we call af12, short for a3f12, the complimentary probability that Player
3 will vote to accept Player 1 (who already represents Player 2) as the final
agent. These classifications lead to 12 numbers, six of each type.
And the
12 numbers ARE NOT "strategic" choices by
the players involved,
rather we arrange that they are determined by "reactive behavior" regulated
by "demands" which are the actual strategic choices for the players.
For example a12 (or
a12f3) is specified to be A12/(1+A12) where A12 will be a positive number.
This makes a12 a positive number less than +1. And the quantity A12 controlling
a12 is specified to be
A12 = Exp[ (u1b3r12
- d12)/e3 ] .
Here e3,
or "epsilon sub 3", is intended to be made ultimately very small, as we
study the equilibria of the model. That smallness will make A12 react sharply
as d12 and u1b3r12 vary in relation to each other. The number "d12" is the
"demand" chosen (strategically) by
1 in relation to this situation where he can vote
to accept Player 3
as general (final) agent or alternatively wait and hope that Player 3 will
accept him instead (!). And what the formula takes into consideration is
simply the prospective gain or payoff to Player 1 in the case where Player
3 becomes general agent and previously Player 1 had been elected to represent
Player 2, and this is specifically u1b3r12.
There
are 6 demand strategy numbers of the same type
In the
second stage of agency elections an acceptance can be made either by a single
player accepting the coalition led by another player (and thus that other
"df23"
is a choice by player 1, since it is structured to control a1f23 or the
probability of the acceptance by Player
1, as a solo player in stage 2 of a game, of the agency of Player 2 when
Player 2 is already representing Player 3.
So we
have af23 = AF23/(1+AF23) or
a1f23 = A1F23/(1+A1F23)
with the relation of
AF23 = Exp[ (u1b23r1
- df23)/e3 ] being specified
for the control of
the acceptance behavior af23 by
the (strategic) demand
choice df23. Or this could
be called, in longer notation,
A1F23 = Exp[ (u1b23r1-d1f23)/e3
] .
Behavior at Stage One
Each player,
like e.g. player 2, has the option
of voting either for
Player 1 (behavior of probability a2f1) or for Player 3 (a2f3) or for not
voting for either of them (behavior described by n2 =
model that positive numbers A2f1 and A2f3
A2fj/(1 + A2f1 + A2f3) where j is either 1 or 3.
A2fj is
specified to be Exp[ ( q2j - d2 )/e3 ]
where q2j is specified to be the calculated expectation of payoff, to
Player 2, under the hypothesis that the
game has passed to
stage 2 with Player 2 having become represented by Player j as his agent.
So Player 2 chooses, strategically, the demand d2 which is interpretable
as describing what Player 2 "demands" he/she should expect in gains if either,
(q21), Player 1 becomes his representing agent or, (q23), Player 3 becomes
the agent representing Player 2 in stage 2 of
the game.
Then the
three strategy variables, d1, d2, and d3, control the six behavior probabilities
a1f2, a1f3, a2f1, a2f3, a3f1, and a3f2 which completely describe the actual
(observable) behavior of the players in stage 1.
d1, by associated controlled behavior probabilities like a23
or a1f2 and a1f3. And then we arrive at simpler equations with most of the
appearance of exponential functions being eliminated.
It is a matter of practical considerations, how
to find actual numerical
solutions of the resulting equations. There was experience with this problem
from work on the previous simpler model and much of the programming earlier
developed for use with MATHEMATICA for the finding of numerical solutions
could be adapted to the new model with its larger number of variables. With
the help of an NSF project assistant (AK) the 42 equations to be associated
with the 42 unknown variables deriving from the model were derived and cross-checked.
At the
present time there are some areas of example
games for which solutions
are known and some where no explicit solution has yet been found. For example,
for
a game with two symmetrically
favored players who can already gain a shared payoff of 3/4 just by themselves,
no proper solution has yet been found although illegitimate solutions with
a variable assigned a
game-theoretically
impermissible value have been found.
So it
is possible that for this example that some of the behavior-describing variables
should assume extreme values or become irrelevant. And then a reformulation
of the set of equations to be solved may properly handle this example.
We can
illustrate the form of numerical results found
seems to treat the
values of the two-person coalition as
Players P1 and P2 are symmetrically situated and have
a coalition value
We can
also illustrate, with a graphics of a sort
first prepared last
summer, how the Pareto efficiency
of the game outcome
(derived from the payoffs to the players) tends to improve as e3 tends towards
zero. The effect of decreasing e3 is that the "demands" made by the players
become effectively "more precise" and that leads to the improvement of the
bargaining efficiency, as it were. These illustrative graphics were prepared
(by AK and SL) on the basis of calculated solutions for a fully symmetric
example game. We were able to find the first solutions for fully symmetric
games more easily since
the effect of that symmetry was to reduce the number
of equations and variables from 42 to merely 7.
Other researchers
have also recently gotten into studies of the process of negotiating a mode
of cooperation. The work, for example, of Armando Gomes in Philadelphia was
interesting and for three person games his model was giving either the nucleolus
or the Shapley value as the evaluation result. Of course any method which
leads similarly to an "evaluation" of games becomes a natural basis for an
"arbitration scheme", using the words of Luce and Raiffa.
Our modeling,
as it was also with the previous model, has three parameters that describe
the values of each of the two-player coalitions. When these are small positive
numbers, say of size less than 1/3, then the nucleolus evaluation effectively
disregards them although the Shapley value gives them a moderate weighting
in its evaluation of the game.
The results
of the most recent computational studies using the model (in which studies
I am being assisted by the student SL working on the NSF grant for the research
project) are giving evaluation indications that are more equitably distributed
among the three players than those
If two
favored players can gain in their two-person coalition 2/3 of the total
available to the grand
coalition (of all three
players) then the computational results from our model indicate significantly
less payoff to the two favored players than would be indicated either by
the nucleolus or by the Shapley value.
It seems
plausible that some refinements in modeling
may allow the players
to more effectively cooperate in
terms of the virtues
of the "alliance possibilities"
involving subsets of
the total set of players. I have
thought of some ideas
here, of varying degrees of complexity. In principle it is quite natural,
returning
to the analogy with
biology and evolution, that there should be some desirable behavior patterns
that encourage cooperative behavior on the part of other interacting biological
"players". We already have "demands" in
the modeling and it is possible to design things, effectively, so that the
players can "demand" certain specific forms of cooperative behavior from
other players. For example, they might be enabled to
"demand" that acceptance probabilities should be more
in reciprocal relationship.
Since any player is competent to be an agent and thus a leader it might
be natural
for a player to "demand" that he/she should be elected
to lead with frequency like that of another player.)
It is
notable in the specific model currently studied
be expected to gain,
when in a disfavored position in relation to the two-player coalition values,
that part
of HOW Player 3 seems to manage this is by having a rejecting rather than
accepting behavior in relation to any elected agency representing a coalition
of Players 1 and 2.
Ultimately
I came to the conclusion that it was needed to have a MORE UNIFORM concept
of "fuzziness" to be applied in smoothing
out the sharpness of the effects of "demand" choices. Otherwise, and this
became evident in earlier studies of two-player models, if one player would
have "sharper" demands and the other would make "duller" demands then the
player with the sharper demands would indeed become like a "sharper" bargainer
and would tend to gain an advantage in the calculated game outcomes!
The fuzziness is needed for mathematical reasons, so that derivatives of
smooth functions can be computed. But under certain circumstances it seems
that "unbalanced" fuzziness can somehow "prejudice" the game evaluation objective.
And ultimately
I thought of a "neutral" source of fuzziness by means of supposing a general
uncertainty about the actual boundary of the Pareto-accessible utility outcomes.
in the work of deriving
and verifying the payoff functions and the equilibrium equations (for equilibria
in pure strategies with parameters describing probabilities of actions by
the players). Also AK managed to effectively
More recently
,starting in the Fall, SL has been the
student assistant and
has found sets of solutions for
varying values of the
adjustable parameters, for example