A Project Studying Cooperation
in Games Via Modeling in Terms of
Formally Non-Cooperative Action
in a Repeated Game Context
In 1996, at a summer "science
camp" I gave a talk on
the
topic of the use of the "Prisoner's Dilemma" game, in
a
context of repetition and evolution of strategies, by
theoretical
biologists who were interested in studying
the natural evolution of cooperative adaptations. And
after
giving the talk I thought more about the concept
of
studying a game by studying it as a repeated game and
through
this viewpoint I got an idea of how to eliminate
all
of the "verbal" complications that could become
involved
in the consideration of coalitions and coalition
formation.
In principle, coalitions, and specifically
coalitions
as
considered by Von Neumann and Morgenstern in "Theory
of
Games and Economic Behavior", are things that could
be
implemented by contracts, like contracts in roman
law.
But of course a contract is quite intrinsically
a
"verbal" thing because indeed it could (or should!)
be
written down in words.
My idea was that in a repeated game context
that the
players
could be given the right to vote for "agencies"
or
"agents" among themselves. Thus at a first step a
player,
say player A, would have the option to accept
player
B as his agent. And the effect of this would
be
that the coalition (A,B) would be formed (like
a
committee with B as chairman) without any verbal
processes
occurring between A and B. Furthermore, this
process
adapts to successive steps of coalescence since
if
another step of elections is held then B, as the
"agent"
representing the coalition (A,B), can vote
to
accept the agency of player C and then C will thus
become
the agent representing the coalition (A,B,C).
And in this manner, with generalized
"agencies" being
electable,
the level of a "grand coalition" can always
be
reached (for a game of finitely many players), and
as
a consequence of that the requisites for "Pareto
efficiency"
will be available.
More on Agencies
Instead of there being unlimited means by
which
coalitions
might actually be formed or form, dissolve,
and
reform, we must specify an election procedure that
can
form the basis for a game of non-cooperative form
to
be played by the players of the original game being
translated.
And in the context of studying a repeated
game
we can afford to prescribe that this election
process
is such that the agent is entirely uncommitted
and
his election is irrevocable for each specific playing
of
the game. (Of course election choices are expected
to
vary as the game is repeated.)
A set of rules can be devised so that there
are
election
stages in each of which all of the players
remaining
independent (not represented by another player
as
agent) have, each of them, the option of electing
another
player as an accepted agent. It is natural for
these
rules to require convergence so that no more than
(n-1)
stages of election will ever be needed for a game
of
n players.
Election rules need to do something to resolve the
impasse
of a situation where A votes to accept B as his
agent
but B simultaneously also votes similarly for A.
It
is not exactly clear, a priori, which rule version
handles
these situations in the most preferable fashion,
we
have worked with more than one variant. When we more
recently
found, in the course of the use of specific
model
games for calculations, that it seemed to be
desirable
to allow elections to be repeated when an
election
had failed to result in the election of any
agency
power, this finding had the effect of suggesting
that
election rules which had the effect that at most one
agency
could be elected at any stage of the election
process
would be most convenient. (And furthermore the
actual
results of calculations seem to indicate that
this
convention on elections is "asymptotically non-
prejudicial"
since the probability of simultaneous
votes
seemed to tend towards zero while the probability
of
"successful" elections tended towards +1.)
If one begins with a quite general CF game
and then
if
one introduces a game formally requiring that all
coalition
benefits must be attained through the means of
the
action of agents who have the authority to represent
all
the members of the coalition then the "agencies game"
resulting
from this still has the same validly derivable
characteristic
function as the original game. In essence
the
coalitions simply have the same potentialities as
before,
but in a formal sense, for these potentialities
to
be exploited, the members of a coalition would need
TO
CONSPIRE on a practical procedure for electing agents
successively
and finally taking the effective action of
the
coalition as an action of the agent finally elected
to
represent all of the members of that coalition.
Current Form of Our Work
We have found that it is possible to find
"bargaining
equilibria"
in games of two or three players using models
or
modeling of this sort. Ideally, there could be evolved
and
developed an analogy to the mathematical methods
(studies
of PDE descriptions for the air flow, etc.)
that
are used in weather prediction. On a materialistic-
commercial
level this would naturally apply to the
details
of the deals in big corporation mergers (like
of
Pfizer and Pharmacia recently). Or also we can hope to
get
more insight relating to existing value or evaluation
concepts
such as, notably, the Shapley value or the
nucleolus.
This program of research has led, in
connection with
the
approach to coalitions and cooperation via the
concept
of agencies, to the consideration of varied
models
of how the players or participants can "react"
to
favorable or unfavorable behavior by other players in
the
repeated game and to the study of varied concepts of
how
the players may choose "demands" that regulate their
reactive
behavior. For example, in the analogous area of
the
study by theoretical biologists of the possibility of
the
evolution of cooperation in repeated games of PD
type,
there have been found various types of "reaction
instinct"
that can work to favor cooperation. Besides the
simplest
"tit for tat" instinct there can be more complex
variants
which may require more memory (of recent past
game
experiences in the sequence of repetitions). The
analogous
situation in terms of models with the election
of
agencies is that the "demands" that players could
be
allowed to make might be more or less elaborately
structured.
I have been studying a more complicated
model (for
three
players) than that previously studied (in 2001)
because
an expedient used in that simpler model to reduce
the
number of equations and variables to be solved for
seemed
to lead to problems. The payoff results seemed to
depend
on the comparative smallnesses of two different
parameters,
both of which were supposed to be decreased
asymptotically
to zero.
Common Features of Last Year's
Model and the Current Model
These models are the same in having a
procedure
of
elections by means of which the agency powers are
elected.
In the previous model there was a simpler
procedure
employed after a first stage of agency
elections
had been effective in reducing the remaining
set
of active players to merely two. We used a sort
of
natural bargaining scheme through which competing
choices,
by the remaining players or agents, of utility
allocations
could be rationalized. However this device
of
simplification did not follow straightforwardly the
basic
concept of the agencies as players in the same sort
of
game interaction as that for all the players at the
beginning
of the playing of the game.
Both versions begin with a first stage of
elections
where
each of the three players could vote (or "choose")
any
specific one of the other two players to be his
agent.
In either version these voting or choosing actions
(as
repeatedly taken in the repeated game context) are
described
by numbers that represent, effectively, the
BEHAVIOR
involved, or the probability of the action being
taken
when the opportunity is presented.
Thus we have an array of six numbers (and
three
"implied
numbers") that describe this:
For Player 1: For Player 2: For
Player 3:
a1f2 & a1f3 a2f1 & a2f3
a3f1 & a3f2
And these represent the probabilities of
specific
voting
choices. Thus, for example, a2f1 is the
probability
that Player 2 will choose (at the first stage
of
the game) to elect that Player 1 should be authorized
to
act as his agent (as if with "power of attorney"). So
it
is the "acceptance rate of P2 for P1" in the concept
underlying
the notation.
The other aifj variables have parallel
meanings. And
it
is sometimes convenient to use another type of symbol
like
n3 = 1 - a3f1 - a3f2 which is the probability that
Player
3 chooses NOT to accept either of Players 1 or 2
as
his agent; or thus this is the probability that Player
3
votes "neither" or for himself.
And since the three players make their
first votes
simultaneously
there can be various outcomes. We have
chosen
a rule to simplify the process by which there is
the
proper response to the votes. If the number of
acceptance
votes cast by the players at the first voting
opportunity
is more than one then we apply a rule that
only
one of these votes, chosen at random, is to be
accepted,
so that the result of the election is either
(1):
that one of the players has elected some other
player
to be his agent or (2): that none of the players
has
voted to accept any other player to be an agent
representing
his interest.
We have also introduced a convention that
in case of a
failure
of the players to come together at all so that no
agency
has been elected that, with a certain probability,
the
first stage of elections can be repeated. This idea
was
also used in our prior model. The probability of the
players
being given another chance to elect an agent is
called
(1-e4) or "one minus epsilon sub 4" and the idea
is
that we want to study results of calculations as e4
tends
asymptotically to zero. (It was found, in the case
of
the previous model, that as e4 would tend towards zero
the
probabilities like a1f2 would also tend towards zero
but
that agencies would be elected with high probability
because
of the continued offering of "second chances"
for
the basic action of coalescence (the election of an
agency).)
The Second Stage of the
Election of Agencies
In our prior model, after one agency had
been elected
and
only two active players remained, we based the payoff
on
the utility assignments which had been chosen by the
two
remaining players, taking into consideration two
numbers
chosen by the player that had been elected as
an
agent and one number chosen by the remaining solo
player.
In the model currently being studied the approach
is,
in a sense, more "orthodox" with respect to the idea
of
agencies and all possibility of general cooperation is
reduced
to the idea of the final election of a "general
agent".
So when one player has accepted another as his
agent
there then remain two freely acting players and
the
level of cooperation corresponding to the "grand
coalition"
is not realized until one of them has elected
the
other as his agent.
But if this final agency election fails to
be realized
then
we can allow the existing agency to exploit the
resources
of the two player coalition formed by the
two
players involved. (And in a simple case, like we
consider,
this can lead simply to the use of the
resources
specified by the characteristic function
for
the coalition of that pair of players.
Similarly with the idea for the first stage
of
elections
we allow the second stage to be repeated, with
a
probability of (1-e5), if neither side has elected
the
other side to agency power, and the idea is that we
want
to study the limiting form of the results as e5
asymptotically
goes to zero.
Once a "general agent" has been
elected then he/she
has
the privilege of being able to allocate payoffs,
from
the total of the payoff utility resources available,
to
all of the players, including himself. Our model has
simply
a resource total available of +1 which corresponds
also
to the Pareto boundary of the game.
For each player there are four possible
ways by which
he
could have become chosen as final agent. Either of two
players
may have initially elected him, and this leads to
two
cases, or the other two players may have initially
combined
in either of two ways followed by his being
elected
by the agent of that two player combination. And
then
as final agent he has essentially to choose a point
in
2 dimensions to determine his choice of Pareto-
accessible
utility allocations.
This leads to 8 dimensions for each player
and there
are
three players so that this leads to 24 dimensions in
all
for the choices made by the players when they specify
allocations
of utility after being elected as "final
agent".
These are 24 out of a total of 39 "strategy
variables"
that are regarded as subject to the
individual
and individually optimizing choices by
the
players.
And the other 15 dimensions of strategic
choice
by
the players correspond to their choice options in
relation
to their reactive behavior (in the repeated
game).
The behavior of the players that is affected by
or
controlled by their reactive strategy choices is,
in
general, their "acceptance behavior".
The Third Stage of the Game;
the Allocation of Utility
When two stages of agency election are
complete then
one
of the original players has become the agent for all
and
he "allocates" the payoffs. These are presumed to be
Pareto-efficient
and so we suppose that he/she specifies
three
non-negative numbers with sum = +1. The information
for
this is specified by the amounts allocated to the
other
players and that is two numbers. This leads to 24
strategic
choices in all, of this type, for all the
players.
For example, in cases of type UjBijRk
player number i
is
in control and was first elected by Player j and then
by
player k and player i chooses to allocate the payoff
of
ujbijrk to player j (and ukbijrk to Player k, but that
is
another case of allocation strategy variables).
Or for example, u1b3r21 is decided upon by
Player 3,
who
was elected by Player 2 after Player 2 had been
chosen
by Player 1 in the first stage. This is the amount
allocated
to Player 1 and u2b3r21 would be the amount
allocated
to Player 2 (who has a different position in
the
history of the elections). Player 3 of course also
allocates
u3b3r21 to himself, but this is eliminated from
our
system of 42 equations in 42 variables because of its
simple
relation to the other two allocations by Player 3
here.
So there are 24 "utility
allocation" variables (which
correspond
to strategic choices by the players) and they
group
into the 4 categories of UjBijRk, UkBijRk, UjBiRjk,
and
UkBiRjk.
"Demands" and Behavior in
the
Second Stage of the Game
When the "second stage" is
reached one player has
become
an agent, another player is represented through
this
agency, and a third player remains solo.
Suppose that Player 1 now represents 2 and
that
3
remains solo. We call a12, short for a12f3, the
probability
that 1 now chooses to vote for 3 as the
final
agent. (This is observable behavior on the part
of
Player 1 in the repeated game.) And we call af12,
short
for a3f12, the complimentary probability that
Player
3 will vote to accept Player 1 (who already
represents
Player 2) as the final agent. These
classifications
lead to 12 numbers, six of each type.
And the 12 numbers ARE NOT
"strategic" choices by
the
players involved, rather we arrange that they are
determined
by "reactive behavior" regulated by "demands"
which
are the actual strategic choices for the players.
For example a12 (or a12f3) is specified to
be
A12/(1+A12)
where A12 will be a positive number.
This
makes a12 a positive number less than +1. And
the
quantity A12 controlling a12 is specified to be
A12
= Exp[ (u1b3r12 - d12)/e3 ] .
Here e3, or "epsilon sub 3", is
intended to be made
ultimately
very small, as we study the equilibria of the
model.
That smallness will make A12 react sharply as d12
and
u1b3r12 vary in relation to each other.
The number
"d12"
is the "demand" chosen (strategically) by Player 1
in
relation to this situation where he can vote to accept
Player
3 as general (final) agent or alternatively wait
and
hope that Player 3 will accept him instead (!). And
what
the formula takes into consideration is simply the
prospective
gain or payoff to Player 1 in the case where
Player
3 becomes general agent and previously Player 1
had
been elected to represent Player 2, and this is
specifically
u1b3r12.
There are 6 demand strategy numbers of the
same type
as
d12 (which controls a12). And there are also six quite
analogous
strategy choices like, for example, df23
controlling
af23 (or a1f23).
In the second stage of agency elections an
acceptance
can
be made either by a single player accepting the
coalition
led by another player (and thus that other
player's
leadership as final general agent). Or on the
other
hand the player leading a coalition of two may
accept
the solo player as the final agent for all.
"df23" is a choice by player 1,
since it is structured
to
control a1f23 or the probability of the acceptance by
Player
1, as a solo player in stage 2 of a game, of the
agency
of Player 2 when Player 2 is already representing
Player
3.
So we have
af23 = AF23/(1+AF23) or
a1f23
= A1F23/(1+A1F23) with the relation of
AF23
= Exp[ (u1b23r1 - df23)/e3 ] being
specified
for
the control of the acceptance behavior af23 by
the
(strategic) demand choice df23. Or this could be
called,
in longer notation,
A1F23
= Exp[ (u1b23r1-d1f23)/e3 ] .
Demands and Acceptance
Behavior at Stage One
At the first stage of elections, when all
three
players
are "solo" we have made a choice, conventionally,
of
how to relate election behavior to "demands". The
choice
made is not absolutely free from any possible
arbitrariness
and something more complex might also be
considered
appropriate.
Each player, like e.g. player 2, has the
option of
voting
either for Player 1 (behavior of probability a2f1)
or
for Player 3 (a2f3) or for not voting for either of
them
(behavior described by n2 = 1 - a2f1 - a2f3 ).
The
model, in the same manner as our previous model
(of
2001), relates these behavior-describing numbers (or
probabilities)
to a single demand parameter called d2
which
is all of the strategic choice by Player 2 that
relates
to stage 1 of the game. We specify in the model
that
positive numbers A2f1 and A2f3 are determined
(controlled
by d2) and that a2fj = A2fj/(1 + A2f1 +
A2f3) where j is either 1 or 3.
A2fj is specified to be Exp[ ( q2j - d2 )/e3 ] where
q2j
is specified to be the calculated expectation of
payoff,
to Player 2, under the hypothesis that the game
has
passed to stage 2 with Player 2 having become
represented
by Player j as his agent. So Player 2
chooses,
strategically, the demand d2 which is
interpretable
as describing what Player 2 "demands"
he/she
should expect in gains if either, (q21), Player 1
becomes
his representing agent or, (q23), Player 3
becomes
the agent representing Player 2 in stage 2 of the
game.
Then the three strategy variables d1, d2,
and d3
control
the six behavior probabilities a1f2, a1f3, a2f1,
a2f3,
a3f1, and a3f2 which completely describe the actual
(observable)
behavior of the players in stage 1.
Variables in the Model
In all we have 39 "strategic"
variables in the model,
15
"demands" and 24 choices of "utility allocations". But
we
replace all the demand variables, like d23 or d1, by
associated
controlled behavior probabilities like a23 or
a1f2
and a1f3. And then we arrive at simpler equations
with
most of the appearance of exponential functions
being
eliminated.
It is a matter of practical considerations,
how
to
find actual numerical solutions of the resulting
equations.
There was experience with this problem from
work
on the previous simpler model and much of the
programming
earlier developed for use with MATHEMATICA
for
the finding of numerical solutions could be adapted
to
the new model with its larger number of variables.
With
the help of an NSF project assistant (AK) the 42
equations
to be associated with the 42 unknown variables
deriving
from the model were derived and cross-checked.
At the present time there are some areas of
example
games
for which solutions are known and some where no
explicit
solution has yet been found. For example, for
a
game with two symmetrically favored players who can
already
gain a shared payoff of 3/4 just by themselves,
no
proper solution has yet been found although
illegitimate
solutions with a variable assigned a game-
theoretically
impermissible value have been found.
So it is possible that for this example
that some of
the
behavior-describing variables should assume extreme
values
or become irrelevant. And then a reformulation of
the
set of equations to be solved may properly handle
this
example.
We can illustrate the form of numerical
results found
for
the case on the borderline of where the nucleolus
seems
to treat the values of the two-person coalition as
"irrelevant
alternatives" and where it depends strongly
on
their values. This is the case where Players P1 and P2
are
symmetrically situated and have a coalition value of
1/3
by themselves (while the coalition of all three
players
is worth 1 and no other two-person coalition has
any
value).
We can also illustrate, with a graphics of
a sort
first
prepared last summer, how the Pareto efficiency
of
the game outcome (derived from the payoffs to the
players)
tends to improve as e3 tends towards zero. The
effect
of decreasing e3 is that the "demands" made by the
players
become effectively "more precise" and that leads
to
the improvement of the bargaining efficiency, as it
were.
These illustrative graphics were prepared (by AK
and
SL) on the basis of calculated solutions for a fully
symmetric
example game. We were able to find the first
solutions
for fully symmetric games more easily since
the
effect of that symmetry was to reduce the number
of
equations and variables from 42 to merely 7.
Comparing Results and Various Theories
The motivation for studying cooperative
games via
means
for modeling the actions of the players as actions
possible
in a non-cooperative game is, ultimately, to
shed
some light on simple issues like that of assigning
values
for the players of the game. This is a simple
concept
but it seems to be a difficult and challenging
problem.
Other researchers have also recently gotten
into
studies
of the process of negotiating a mode of
cooperation.
The work, for example, of Armando Gomes in
Philadelphia
was interesting and for three person games
his
model was giving either the nucleolus or the Shapley
value
as the evaluation result. Of course any method
which
leads similarly to an "evaluation" of games becomes
a
natural basis for an "arbitration scheme", using the
words
of Luce and Raiffa.
Our modeling, as it was also with the
previous model,
has
three parameters that describe the values of each of
the
two-player coalitions. When these are small positive
numbers,
say of size less than 1/3, then the nucleolus
evaluation
effectively disregards them although the
Shapley
value gives them a moderate weighting in its
evaluation
of the game.
The results of the most recent
computational studies
using
the model (in which studies I am being assisted by
the
student SL working on the NSF grant for the research
project)
are giving evaluation indications that are more
equitably
distributed among the three players than those
of
the Shapley value. But when the two person coalitions
are
quite weak and the nucleolus ignores them our model
does
indicate some inequality in the payoffs to the
players
but less what the Shapley value would indicate.
If two favored players can gain in their
two-person
coalition
2/3 of the total available to the grand
coalition
(of all three players) then the computational
results
from our model indicate significantly less payoff
to
the two favored players than would be indicated either
by
the nucleolus or by the Shapley value.
It seems plausible that some refinements in
modeling
may
allow the players to more effectively cooperate in
terms
of the virtues of the "alliance possibilities"
involving
subsets of the total set of players. I have
thought
of some ideas here, of varying degrees of
complexity.
In principle it is quite natural, returning
to
the analogy with biology and evolution, that there
should
be some desirable behavior patterns that encourage
cooperative
behavior on the part of other interacting
biological
"players". (We already have "demands" in
the
modeling and it is possible to design things,
effectively,
so that the players can "demand" certain
specific
forms of cooperative behavior from other
players.
For example, they might be enabled to "demand"
that
acceptance probabilities should be more in
reciprocal
relationship. Since any player is competent to
be
an agent and thus a leader it might be natural for a
player
to "demand" that he/she should be elected to lead
with
frequency like that of another player.)
It is notable in the specific model
currently studied
that
when Player 3 seems to gain more than he/she might
be
expected to gain, when in a disfavored position in
relation
to the two-player coalition values, that part
of
HOW Player 3 seems to manage this is by having a
rejecting
rather than accepting behavior in relation to
any
elected agency representing a coalition of Players 1
and
2.
A Technical Issue
Our previous model turned out to give
numerical
results
for cases of weak two-player coalitions where,
depending
on the ratio of two "epsilons" (comparable to
"e3"
as described above), the evaluation result could
be
either "hyper-Shapleyan" (compared with the nucleolus
at
(1/3,1/3,1/3)) or "sub-Shapleyan".
Ultimately I came to the conclusion that it
was needed
to
have a MORE UNIFORM concept of "fuzziness" to be
applied
in smoothing out the sharpness of the effects
of
"demand" choices. Otherwise, and this became evident
in
earlier studies of two-player models, if one player
would
have "sharper" demands and the other would make
"duller"
demands then the player with the sharper
demands
would indeed become like a "sharper" bargainer
and
would tend to gain an advantage in the calculated
game
outcomes! The fuzziness is needed for mathematical
reasons,
so that derivatives of smooth functions can be
computed.
But under certain circumstances it seems that
"unbalanced"
fuzziness can somehow "prejudice" the game
evaluation
objective.
Ultimately I thought of a
"neutral" source of
fuzziness
by means of supposing a general uncertainty
about
the actual boundary of the Pareto-accessible
utility
outcomes.
NSF Project Assistants
Earlier, through the Summer of 2002, AK
assisted in
the
work of deriving and verifying the payoff functions
and
the equilibrium equations (for equilibria in pure
strategies
with parameters describing probabilities of
actions
by the players). Also AK managed to effectively
set
up usable programs for calculating the nucleolus.
These
programs had been developed originally by S. Klauke
in
Bielefeld.
More recently, starting in the Fall, SL has
been the
student
assistant and has found sets of solutions for
varying
values of the adjustable parameters, for example
for
when b3 = 2/3.