A Project Studying Cooperation
in Games
Through Modeling in Terms of
Formally
Non-Cooperative Action in a
Repeated
Game Context
In 1966, at a summer "science
camp" I gave a talk on the topic of
the use
of the "Prisoner's Dilemma" game, in a context of repetition and
evolution
of strategies, by theoretical biologists who were interested
in
studying the natural evolution of cooperative adaptations. And after
giving the
talk I thought more about the concept of studying a game by
studying
it as a repeated game and through this viewpoint I got an idea
of how
to eliminate all of the "verbal" complications that could become
involved
in the consideration of coalitions and coalition formation.
In principle, coalitions, and specifically
coalitions as considered by
Von
Neumann and Morgenstern in "Theory of Games and Economic Behavior",
are
things that could be implemented by contracts, like contracts in
roman law.
But of course a contract is quite intrinsically a "verbal"
thing
because indeed it could (or should!) be written down in words.
My idea was that in a repeated game context
that the players could be
given
the right to vote for "agencies" or "agents" among
themselves. Thus
at a
first step a player, say player A, would have the option to accept
player
B as his agent. And the effect of this would be that the coalition
(A,B)
would be formed (like a committee with B as chairman) without any
verbal
processes occurring between A and B. Furthermore, this process
adapts
to successive steps of coalescence since if another step of
elections
is held then B, as the "agent" representing the coalition (A,B),
can
vote to accept the agency of player C and then C will thus become the
agent
representing the coalition (A,B,C).
And in this manner, with generalized
"agencies" being electable,
the
level of a "grand coalition" can always be reached (for a game of
finitely
many players), and as a consequence of that the requisites for
"Pareto
efficiency" will be available.
More on Agencies
Instead of there being unlimited means by
which coalitions might
actually
be formed or form, dissolve, and reform, we must specify an
election
procedure that can form the basis for a game of non-cooperative
form to
be played by the players of the original game being translated.
And in
the context of studying a repeated game we can afford to prescribe
that
this election process is such that the agent is entirely uncommitted
and his
election is irrevocable for each specific playing of the game.
(Of
course election choices are expected to vary as the game is repeated.)
A set of rules can be devised so that there
are election stages in
each of
which all of the players remaining independent (not represented
by
another player as agent) have, each of them, the option of electing
another
player as an accepted agent. It is natural for these rules to
require
convergence so that no more than (n-1) stages of election will
ever be
needed for a game of n players.
Election rules need to do something to
resolve the impasse of a
situation
where A votes to accept B as his agent but B simultaneously
also
votes similarly for A. It is not exactly clear, a priori, which rule
version
handles these situations in the most preferable fashion, we have
worked
with more than one variant. When we more recently found, in the
course
of the use of specific model games for calculations, that it seemed
to be
desirable to allow elections to be repeated when an election had
failed
to result in the election of any agency power, this finding had the
effect
of suggesting that election rules which had the effect that at most
one
agency could be elected at any stage of the election process would be
most
convenient. (And furthermore the actual results of calculations seem
to
indicate that this convention on elections is "asymptotically non-
prejudicial"
since the probability of simultaneous votes seemed to tend
towards
zero while the probability of "successful" elections tended
towards
+1.)
If one begins with a quite general CF game
and then if one introduces
a game
formally requiring that all coalition benefits must be attained
through
the means of the action of agents who have the authority to
represent
all the members of the coalition then the "agencies game"
resulting
from this still has the same validly derivable characteristic
function
as the original game. In essence the coalitions simply have
the
same potentialities as before, but in a formal sense, for these
potentialities
to be exploited, the members of a coalition would need
TO
CONSPIRE on a practical procedure for electing agents successively and
finally
taking the effective action of the coalition as an action of the
agent
finally elected to represent all of the members of that coalition.
Current Form of Our Work
We have found that it is possible to find
"bargaining equilibria" in
games
of two or three players using models or modeling of this sort. And
some
problems or details of complication have appeared which have led to
consideration
of refinements in the models.
Ideally, there could be evolved and
developed an analogy to the
mathematical
methods (studies of PDE descriptions for the air flow, etc.)
that
are used in weather prediction. On a materialistic-commercial level
this
would naturally apply to the details of the deals in big corporation
mergers
(like of Pfizer and Pharmacia recently). Or also we can hope to
get
more insight relating to existing value or evaluation concepts such
as,
notably, the Shapley value or the nucleolus.
This program of research has led, in
connection with the approach
to
coalitions and cooperation via the model of agencies, to the
consideration
of varied models of how the players or participants can
"react"
to favorable or unfavorable behavior by other players in the
repeated
game and to the study of varied concepts of how the players
may
choose "demands" that regulate their reactive behavior. For example,
in the
analogous area of the study by theoretical biologists of the
possibility
of the evolution of cooperation in repeated games of PD type,
there
have been found various types of "reaction instinct" that can work
to
favor cooperation. Besides the simplest "tit for tat" instinct there
can be
more complex variants which may require more memory (of recent past
game
experiences in the sequence of repetitions). The analogous situation
in
terms of models with the election of agencies is that the "demands"
that
players could be allowed to make might be more or less elaborately
structured.
I am now working on a more complicated
model (for three players)
than
that previously studied last year because an expedient used in that
simpler
model to reduce the number of equations and variables to be solved
for
seemed to lead to problems (on which more comments later).
Common Features of Last Year's
Model and the Current
Model
These models are the same in having a
procedure of elections by means
of
which the agency powers are elected. In the previous model there was
a simpler
procedure employed after a first stage of agency elections had
been
effective in reducing the remaining set of active players to merely
two. We
used a sort of natural bargaining scheme through which competing
choices,
by the remaining players, of utility allocations could be
rationalized.
However this device of simplification did not follow
straightforwardly
the basic concept of the agencies as players in the same
sort of
game interaction as for all the players at the beginning of the
playing
of the game.
Both versions began with a first stage of
elections where each of
the
three players could vote (or "choose") any specific one of the other
two
players to be his agent. In either version these voting or choosing
actions
(as repeatedly taken in the repeated game context) are described
by
numbers that represent, effectively, the BEHAVIOR involved, or the
probability
of the action being taken when the opportunity is presented.
Thus we have an array of six numbers (and
three "implied numbers") that
describe
this:
For Player 1: For Player 2:
For Player 3:
a1f2 & a1f3 a2f1 & a2f3 a3f1 & a3f2
And these represent the probabilities of
specific voting choices. Thus,
for
example, a2f1 is the probability that Player 2 will choose (at the
first
stage of the game) to elect that Player 1 should be authorized to
act as
his agent (as if with "power of attorney"). So it is "acceptance
rate of
P2 for P1" in the concept underlying the notation.
The other aifj variables have parallel
meanings. And it is sometimes
convenient
to use another type of symbol like n3 =
1 - a3f1 - a3f2 which
is the
probability that Player 3 chooses NOT to accept either of Players 1
or 2 as
his agent; or thus this is the probability that Player 3 votes
"neither"
or for himself.
And since the three players make their
first votes simultaneously there
can be
various outcomes. We have chosen a rule to simplify the process
by
which there is the proper response to the votes. If the number of
acceptance
votes cast by the players at the first voting opportunity is
more
than one then we apply a rule that only one of these votes, chosen
at
random, is to be accepted, so that the result of the election is either
(1):
that one of the players has elected some other player to be his agent
or (2):
that none of the players has voted to accept any other player to
be an
agent representing his interest.
We have also introduced a convention that
in case of a failure of the
players
to come together at all so that no agency has been elected that,
with a
certain probability, the first stage of elections can be repeated.
This
idea was also used in our prior model. The probability of the players
being
given another chance to elect an agent is called (1-e4) or "one
minus
epsilon sub 4" and the idea is that we want to study results of
calculations
as e4 tends asymptotically to zero. (It was found, in the
case of
the previous model, that as e4 would tend towards zero the
probabilities
like a1f2 would also tend towards zero but that agencies
would
be elected with high probability because of the continued offering
of
"second chances" for the basic action of coalescence (the election of
an
agency).)
The Second Stage of the
Election of Agencies
In our prior model, after one agency had
been elected and only two
active
players remained, we based the payoff on the utility assignments
which
had been chosen by the two remaining players, taking into
consideration
two numbers chosen by the player that had been elected
as an
agent and one number chosen by the remaining solo player. In the
model
currently being studied the approach is, in a sense, more "orthodox"
with
respect to the idea of agencies and all possibility of general
cooperation
is reduced to the idea of the final election of a "general
agent".
So when one player has accepted another as his agent there
then
remain two freely acting players and the level of cooperation
corresponding
to the "grand coalition" is not realized until one of
them
has elected the other as his agent.
But if this final agency election fails to
be realized then we can
allow
the existing agency to exploit the resources of the two player
coalition
formed by the two players involved. (And in a simple case,
like we
consider, this can lead simply to the use of the resources
specified
by the characteristic function for the coalition of that pair
of
players.)
Similarly with the idea for the first stage
of elections we allow the
second
stage to be repeated, with a probability of (1-e5), if neither side
has
elected the other side to agency power, and the idea is that we want
to
study the limiting form of the results as e5 asymptotically goes to
zero.
Once a "general agent" has been
elected then he/she has the privilege
of
being able to allocate payoffs, from the total of the payoff utility
resources
available, to all of the players, including himself. Our model
has
simply a resource total available of +1 which corresponds also to the
Pareto
boundary of the game.
For each player there are four possible
ways by which he could have
become
chosen as final agent. Either of two players may have initially
elected
him, and this leads to two cases, or the other two players may
have
initially combined in either of two ways followed by his being
elected
by the agent of that two player combination. And then as final
agent
he has essentially to choose a point in 2 dimensions to determine
his
choice of Pareto-accessible utility allocations.
This leads to 8 dimensions for each player
and there are three players
so that
this leads to 24 dimensions in all for the choices made by the
players
when they specify allocations of utility after being elected as
"final
agent". These are 24 out of a total of 39 "strategy variables"
that
are
regarded as subject to the individual and individually optimizing
choices
by the players.
And the other 15 dimensions of strategic
choice by the players
correspond
to their choice options in relation to their reactive behavior
(in the
repeated game). The behavior of the players that is affected by
or
controlled by their reactive strategy choices is, in general, their
"acceptance
behavior".
The Third Stage of the
Game;
the Allocation of
Utility
When two stages of agency election are complete
then one of the
original
players has become the agent for all and he "allocates" the
payoffs.
These are presumed to be Pareto-efficient and so we suppose
that
he/she specifies three non-negative numbers with sum = +1. The
information
for this is specified by the amounts allocated to the other
players
and that is two numbers. This leads to 24 strategic choices in
all, of
this type, for all the players.
For example, in cases of type UjBijRk
player number i is in control and
was
first elected by Player j and then by player k and player i chooses to
allocate
the payoff of ujbijrk to player j (and ukbijrk to Player k, but
that is
another case of allocation strategy variables).
Or for example, u1b3r21 is decided upon by
Player 3, who was elected by
Player
2 after Player 2 had been chosen by Player 1 in the first stage.
This is
the amount allocated to Player 1 and u2b3r21 would be the amount
allocated
to Player 2 (who has a different position in the history of the
elections).
Player 3 of course also allocates u3b3r21 to himself, but this
is
eliminated from our system of 42 equations in 42 variables because of
its
simple relation to the other two allocations by Player 3 here.
So there are 24 "utility
allocation" variables (which correspond to
strategic
choices by the players) and they group into the 4 categories
of
UjBijRk, UkBijRk, UjBiRjk, and UkBiRjk.
"Demands" and
Behavior in the
Second Stage of the Game
When the "second stage" is
reached one player has become an agent,
another
player is represented through this agency, and a third player
remains
solo.
Suppose that Player 1 now represents 2 and
that 3 remains solo. We call
a12,
short for a12f3, the probability that 1 now chooses to vote for 3 as
the
final agent. (This is observable behavior on the part of Player 1 in
the
repeated game.) And we call af12, short for a3f12, the complimentary
probability
that Player 3 will vote to accept Player 1 (who already
represents
Player 2) as the final agent. These classifications lead to 12
numbers,
six of each type.
And the 12 numbers ARE NOT
"strategic" choices by the players involved,
rather
we arrange that they are determined by "reactive behavior"
regulated
by "demands" which are the actual strategic choices for the
players.
For example a12 (or a12f3) is specified to
be A12/(1+A12) where
A12
will be a positive number. This makes a12 a positive number less
than
+1. And the quantity A12 controlling a12 is specified to be
A12 =
Exp[ (u1b3r12 - d12)/e3 ] .
Here e3, or "epsilon sub 3", is
intended to be made ultimately very
small,
as we study the equilibria of the model. That smallness will make
A12
react sharply as d12 and u1b3r12 vary in relation to each other.
The
number "d12" is the "demand" chosen (strategically) by
Player 1
in
relation to this situation where he can vote to accept Player 3 as
general
(final) agent or alternatively wait and hope that Player 3 will
accept
him instead (!). And what the formula takes into consideration
is
simply the prospective gain or payoff to Player 1 in the case where
Player
3 becomes general agent and previously Player 1 had been elected
to
represent Player 2, and this is specifically u1b3r12.
There are 6 demand strategy numbers like
d12 (which controls a12). And
there
are also six quite analogous strategy choices like, for example,
df23
controlling af23 (or a1f23). So "df23" is a choice by player 1, since
it controls
a1f23 or the probability of the acceptance by Player 1, as a
solo
player in stage 2 of a game, of the agency of Player 2 when Player 2
is
already representing Player 3.
So we have
af23 = AF23/(1+AF23) or a1f23 =
A1F23/(1+A1F23) with the
relation
of AF23 = Exp[ (u1b23r1 - df23)/e3 ] being specified for the
control
of the acceptance behavior by the (strategic) demand choice. Or
this
could be called, in longer notation, A1F23=Exp[(u1b23r1-d1f23)/e3] .
Demands and Acceptance
Behavior at Stage One
At the first stage of elections, when all
three players are "solo" we
have
made a choice, conventionally, of how to relate election behavior
to
"demands". The choice made is not absolutely free from any possible
arbitrariness
and something more complex might also be considered
appropriate.
Each player, like e.g. player 2, has the
option of voting either for
Player
1 (behavior of probability a2f1) or for Player 3 (a2f3) or for not
voting
for either of them (behavior described by n2 = 1 - a2f1 - a2f3 ).
The
model, in the same manner as the previously studied model, relates
these
behavior-describing numbers (or probabilities) to a single demand
parameter
called d2 which is all of the strategic choice by Player 2
that
relates to stage 1 of the game. We specify in the model that
positive
numbers A2f1 and A2f3 are determined (controlled by d2) and
that a2fj = A2fj/(1 + A2f1 + A2f3) where j is either 1 or 3.
A2fj
is specified to be Exp[ ( q2j - d2 )/e3 ] where q2j is specified
to be
the calculated expectation of payoff, to Player 2, under the
hypothesis
that the game has passed to stage 2 with Player 2 having become
represented
by Player j as his agent. So Player 2 chooses, strategically,
the
demand d2 which is interpretable as describing what Player 2
"demands"
he/she
should expect in gains if either, (q21), Player 1 becomes his
representing
agent or, (q23), Player 3 becomes the agent representing
Player
2 in stage 2 of the game.
Then the three strategy variables, d1, d2,
and d3, control the six
behavior
probabilities a1f2, a1f3, a2f1, a2f3, a3f1, and a3f2 which
completely
describe the actual (observable) behavior of the players in
stage
1.
Variables in the Model
In all we have 39 "strategic"
variables in the model, 15 "demands"
and 24
choices of "utility allocations". But we replace all the
demand
variables, like d23 or d1, by associated controlled behavior
probabilities
like a23 or a1f2 and a1f3. And then we arrive at simpler
equations
with most of the appearance of exponential functions being
eliminated.
It is a matter of practical considerations,
how to find actual
numerical
solutions of the resulting equations. There is experience
with
this problem from work on the previous simpler model but for the
current
model the work is far from complete but has arrived at a reliable
derivation
of the actual equations to be solved (in good form for study
with
applicable computer software) with the aid of the work of an NSF
project
assistant (AK). Furthermore some numerical calculations first for
completely
symmetric game and then for general non-symmetric games give
indications
that the model is working as might be hoped, at least for
small
values of the 2-player coalition strengths. But it is too early
to
really venture any comments about the asymptotic patterns of the
solutions
or to interpret the results of calculations.
When
enough calculations have been made for a variety of non-symmetric
games
then the implied values inferred, via the calculated payoffs,
may
give numbers that can be compared with those of various relevant
concepts,
like the nucleolus, etc.
So the challenge remains to be dealt with
of actually finding enough
of the
instructively revealing results from numerical solutions for
equilibria
of the model.
Comparisons from Results
The model is designed so that the game can
be of a type where the
Shapley
value and the nucleolus give quite different "evaluations" of the
game.
Which of these, for example as a guide usable in an "arbitration
scheme",
is better or worse (if such comparisons could be valid in any
sense)?
Naturally, any other route which leads, however it does so, to an
"evaluation",
can lead to a basis for comparison of these, and of other
approaches
that give evaluations.
Our modeling, as also with the previous
model, has three parameters
that
describe the resources available to each of the two-player
coalitions.
If these are small positive numbers, say of size less than
1/3,
then the standard nucleolus evaluation effectively disregards them
although
the Shapley value gives them a modest weighting in its evaluation
of the
game.
Our previous model turned out to give
numerical results in cases of
this
sort where, depending on the ratio of two "epsilons" (comparable to
"e3"
as described above) the evaluation could be either "hyper-Shapleyan"
(compared
with the nucleolus at 1/3,1/3,1/3) or "sub-Shapleyan".
Ultimately I came to the conclusion that it
was needed to have a MORE
UNIFORM
concept of "fuzziness" to be applied in smoothing out the
sharpness
of the effects of "demand" choices. Otherwise, and this became
evident
in earlier studies of two-player models, if one player would have
"sharper"
demands and the other would make "duller" demands then the
player
with the sharper demands would indeed become like a "sharper"
bargainer
and would tend to gain an advantage in the calculated game
outcomes!
The fuzziness is needed for mathematical reasons, so that
derivatives
of smooth functions can be computed. But under certain
circumstances
it seems that "unbalanced" fuzziness can somehow
"prejudice"
the
game evaluation objective.
Associated Studies
The work on this project which seeks to
exploit the "agencies" concept
to
successfully study games with cooperation via a reduction to non-
cooperative
equilibrium considerations has led me also to study some
other
topics.
One of these was the computability of the
nucleolus, since if a variety
of
sample games are studied, and if these are of the sort where the
classical
nucleolus and the Shapley value are defined, then just for the
comparison
of numerical results it is nice to have a quick method for
finding
the numbers that form the nucleolus vector. It occurred to me that
that
should be possible by a programmable method of Monte Carlo style in
which
random perturbations would be used in finding, to a high level of
approximation,
the numerical values of the nucleolus components.
I tried programming that for use in
MATHEMATICA knowing that often the
game
would be defined by rational numbers and that a close approximation
to the
exact nucleolus would lead to finding the exact answer by finding
the
simple rational numbers for which approximations had been found.
In the method of successive approximations
the actual definition of the
nucleolus
would be the basis for the criterion for the comparison of the
merits
of various random perturbations made to an approximate vector.
Later I learned of the work of Sven Klauke
in Bielefeld where this sort
of a
method has been the basis of an effective program in C++ and where
there
has also been developed an effective program that works by the
method
of reducing the problem to problems of a "linear programming" type.
And after being at the Stony Brook meeting
of last year I thought about
the
characteristic function as calculated by Harsanyi (this was around
1960)
in connection with developing a general solution concept for
cooperative
games. And then I learned that if this approach were used to
transform
the interpretation of a game of three players and if after that
the
nucleolus were calculated for the (constant sum) game thus obtained
that
this would result in THE SAME vector as that of the Shapley value for
the
game (which itself would not be changed by the move to the "Harsanyi
characteristic
function").
On the other hand, if we consider similar
types of four person games
then
the nucleolus computed after transforming the coalition values
information
via the Harsanyi characteristic function will not, in general,
match
the Shapley value vector.
And now this seems to me curiously parallel
to how Von Neumann, in 1928,
concluded
that 3-person cooperative games could be, somehow, evaluated,
but
that 4-person games were of a different order of difficulty, in terms
of
evaluation!