A Project Studying Cooperation in Games    

                Through Modeling in Terms of Formally 

                 Non-Cooperative Action in a Repeated              

                  Game Context

 

   In 1996, at a summer "science camp" I gave a talk on the topic of

the use of the "Prisoner's Dilemma" game, in a context of repetition and

evolution of strategies, by theoretical biologists who were interested

in studying the natural evolution of cooperative adaptations. And after

giving the talk I thought more about the concept of studying a game by

studying it as a repeated game and through this viewpoint I got an idea

of how to eliminate all of the "verbal" complications that could become

involved in the consideration of coalitions and coalition formation.

   In principle, coalitions, and specifically coalitions as considered by

Von Neumann and Morgenstern in "Theory of Games and Economic Behavior",

are things that could be implemented by contracts, like contracts in

roman law. But of course a contract is quite intrinsically a "verbal"

thing because indeed it could (or should!) be written down in words.

   My idea was that in a repeated game context that the players could be

given the right to vote for "agencies" or "agents" among themselves. Thus

at a first step a player, say player A, would have the option to accept

player B as his agent. And the effect of this would be that the coalition

(A,B) would be formed (like a committee with B as chairman) without any

verbal processes occurring between A and B. Furthermore, this process

adapts to successive steps of coalescence since if another step of

elections is held then B, as the "agent" representing the coalition (A,B),

can vote to accept the agency of player C and then C will thus become the

agent representing the coalition (A,B,C).

   And in this manner, with generalized "agencies" being electable,

the level of a "grand coalition" can always be reached (for a game of

finitely many players), and as a consequence of that the requisites for

"Pareto efficiency" will be available.

 

 

                         More on Agencies

 

   Instead of there being unlimited means by which coalitions might

actually be formed or form, dissolve, and reform, we must specify an

election procedure that can form the basis for a game of non-cooperative

form to be played by the players of the original game being translated.

And in the context of studying a repeated game we can afford to prescribe

that this election process is such that the agent is entirely uncommitted

and his election is irrevocable for each specific playing of the game.

(Of course election choices are expected to vary as the game is repeated.)

   A set of rules can be devised so that there are election stages in

each of which all of the players remaining independent (not represented

by another player as agent) have, each of them, the option of electing

another player as an accepted agent. It is natural for these rules to

require convergence so that no more than (n-1) stages of election will

ever be needed for a game of n players.

   Election rules need to do something to resolve the impasse of a

situation where A votes to accept B as his agent but B simultaneously

also votes similarly for A. It is not exactly clear, a priori, which rule

version handles these situations in the most preferable fashion, we have

worked with more than one variant. When we more recently found, in the

course of the use of specific model games for calculations, that it seemed

to be desirable to allow elections to be repeated when an election had

failed to result in the election of any agency power, this finding had the

effect of suggesting that election rules which had the effect that at most

one agency could be elected at any stage of the election process would be

most convenient. (And furthermore the actual results of calculations seem

to indicate that this convention on elections is "asymptotically non-

prejudicial" since the probability of simultaneous votes seemed to tend

towards zero while the probability of "successful" elections tended

towards +1.)

   If one begins with a quite general CF game and then if one introduces

a game formally requiring that all coalition benefits must be attained

through the means of the action of agents who have the authority to

represent all the members of the coalition then the "agencies game"

resulting from this still has the same validly derivable characteristic

function as the original game. In essence the coalitions simply have

the same potentialities as before, but in a formal sense, for these

potentialities to be exploited, the members of a coalition would need

TO CONSPIRE on a practical procedure for electing agents successively and

finally taking the effective action of the coalition as an action of the

agent finally elected to represent all of the members of that coalition.

 

 

 

                     Current Form of Our Work

 

   We have found that it is possible to find "bargaining equilibria" in

games of two or three players using models or modeling of this sort. And

some problems or details of complication have appeared which have led to

consideration of refinements in the models.

   Ideally, there could be evolved and developed an analogy to the

mathematical methods (studies of PDE descriptions for the air flow, etc.)

that are used in weather prediction. On a materialistic-commercial level

this would naturally apply to the details of the deals in big corporation

mergers (like of Pfizer and Pharmacia recently). Or also we can hope to

get more insight relating to existing value or evaluation concepts such

as, notably, the Shapley value or the nucleolus.

   This program of research has led, in connection with the approach

to coalitions and cooperation via the model of agencies, to the

consideration of varied models of how the players or participants can

"react" to favorable or unfavorable behavior by other players in the

repeated game and to the study of varied concepts of how the players

may choose "demands" that regulate their reactive behavior. For example,

in the analogous area of the study by theoretical biologists of the

possibility of the evolution of cooperation in repeated games of PD type,

there have been found various types of "reaction instinct" that can work

to favor cooperation. Besides the simplest "tit for tat" instinct there

can be more complex variants which may require more memory (of recent past

game experiences in the sequence of repetitions). The analogous situation

in terms of models with the election of agencies is that the "demands"

that players could be allowed to make might be more or less elaborately

structured.

   I am now working on a more complicated model (for three players)

than that previously studied last year because an expedient used in that

simpler model to reduce the number of equations and variables to be solved

for seemed to lead to problems (on which more comments later).

 

 

 

                   Common Features of Last Year's

                    Model and the Current Model

 

   These models are the same in having a procedure of elections by means

of which the agency powers are elected. In the previous model there was

a simpler procedure employed after a first stage of agency elections had

been effective in reducing the remaining set of active players to merely

two. We used a sort of natural bargaining scheme through which competing

choices, by the remaining players, of utility allocations could be

rationalized. However this device of simplification did not follow

straightforwardly the basic concept of the agencies as players in the same

sort of game interaction as for all the players at the beginning of the

playing of the game. 

   Both versions began with a first stage of elections where each of

the three players could vote (or "choose") any specific one of the other

two players to be his agent. In either version these voting or choosing

actions (as repeatedly taken in the repeated game context) are described

by numbers that represent, effectively, the BEHAVIOR involved, or the

probability of the action being taken when the opportunity is presented.

   Thus we have an array of six numbers (and three "implied numbers") that

describe this:

 

      For Player 1:         For Player 2:          For Player 3:        

 

       a1f2 & a1f3           a2f1 & a2f3            a3f1 & a3f2

 

   And these represent the probabilities of specific voting choices. Thus,

for example, a2f1 is the probability that Player 2 will choose (at the

first stage of the game) to elect that Player 1 should be authorized to

act as his agent (as if with "power of attorney"). So it is "acceptance

rate of P2 for P1" in the concept underlying the notation.

   The other aifj variables have parallel meanings. And it is sometimes

convenient to use another type of symbol like  n3 = 1 - a3f1 - a3f2  which

is the probability that Player 3 chooses NOT to accept either of Players 1

or 2 as his agent; or thus this is the probability that Player 3 votes

"neither" or for himself.

   And since the three players make their first votes simultaneously there

can be various outcomes. We have chosen a rule to simplify the process

by which there is the proper response to the votes. If the number of

acceptance votes cast by the players at the first voting opportunity is

more than one then we apply a rule that only one of these votes, chosen

at random, is to be accepted, so that the result of the election is either

(1): that one of the players has elected some other player to be his agent

or (2): that none of the players has voted to accept any other player to

be an agent representing his interest. 

   We have also introduced a convention that in case of a failure of the

players to come together at all so that no agency has been elected that,

with a certain probability, the first stage of elections can be repeated.

This idea was also used in our prior model. The probability of the players

being given another chance to elect an agent is called (1-e4) or "one

minus epsilon sub 4" and the idea is that we want to study results of

calculations as e4 tends asymptotically to zero. (It was found, in the

case of the previous model, that as e4 would tend towards zero the

probabilities like a1f2 would also tend towards zero but that agencies

would be elected with high probability because of the continued offering

of "second chances" for the basic action of coalescence (the election of

an agency).) 

 

 

                      The Second Stage of the

                        Election of Agencies

 

   In our prior model, after one agency had been elected and only two

active players remained, we based the payoff on the utility assignments

which had been chosen by the two remaining players, taking into

consideration two numbers chosen by the player that had been elected

as an agent and one number chosen by the remaining solo player. In the

model currently being studied the approach is, in a sense, more "orthodox"

with respect to the idea of agencies and all possibility of general

cooperation is reduced to the idea of the final election of a "general

agent". So when one player has accepted another as his agent there

then remain two freely acting players and the level of cooperation

corresponding to the "grand coalition" is not realized until one of

them has elected the other as his agent.

   But if this final agency election fails to be realized then we can

allow the existing agency to exploit the resources of the two player

coalition formed by the two players involved. (And in a simple case,

like we consider, this can lead simply to the use of the resources

specified by the characteristic function for the coalition of that pair

of players.)

   Similarly with the idea for the first stage of elections we allow the

second stage to be repeated, with a probability of (1-e5), if neither side

has elected the other side to agency power, and the idea is that we want

to study the limiting form of the results as e5 asymptotically goes to

zero.

   Once a "general agent" has been elected then he/she has the privilege

of being able to allocate payoffs, from the total of the payoff utility

resources available, to all of the players, including himself. Our model

has simply a resource total available of +1 which corresponds also to the

Pareto boundary of the game.

   For each player there are four possible ways by which he could have

become chosen as final agent. Either of two players may have initially

elected him, and this leads to two cases, or the other two players may

have initially combined in either of two ways followed by his being

elected by the agent of that two player combination. And then as final

agent he has essentially to choose a point in 2 dimensions to determine

his choice of Pareto-accessible utility allocations.

   This leads to 8 dimensions for each player and there are three players

so that this leads to 24 dimensions in all for the choices made by the

players when they specify allocations of utility after being elected as

"final agent". These are 24 out of a total of 39 "strategy variables" that

are regarded as subject to the individual and individually optimizing

choices by the players.

    And the other 15 dimensions of strategic choice by the players

correspond to their choice options in relation to their reactive behavior

(in the repeated game). The behavior of the players that is affected by

or controlled by their reactive strategy choices is, in general, their

"acceptance behavior".

 

 

                      The Third Stage of the Game;

                       the Allocation of Utility

 

   When two stages of agency election are complete then one of the

original players has become the agent for all and he "allocates" the

payoffs. These are presumed to be Pareto-efficient and so we suppose

that he/she specifies three non-negative numbers with sum = +1. The

information for this is specified by the amounts allocated to the other

players and that is two numbers. This leads to 24 strategic choices in

all, of this type, for all the players.

   For example, in cases of type UjBijRk player number i is in control and

was first elected by Player j and then by player k and player i chooses to

allocate the payoff of ujbijrk to player j (and ukbijrk to Player k, but

that is another case of allocation strategy variables).

   Or for example, u1b3r21 is decided upon by Player 3, who was elected by

Player 2 after Player 2 had been chosen by Player 1 in the first stage.

This is the amount allocated to Player 1 and u2b3r21 would be the amount

allocated to Player 2 (who has a different position in the history of the

elections). Player 3 of course also allocates u3b3r21 to himself, but this

is eliminated from our system of 42 equations in 42 variables because of

its simple relation to the other two allocations by Player 3 here.

   So there are 24 "utility allocation" variables (which correspond to

strategic choices by the players) and they group into the 4 categories

of UjBijRk, UkBijRk, UjBiRjk, and UkBiRjk.  

 

 

 

                    "Demands" and Behavior in the

                      Second Stage of the Game

 

   When the "second stage" is reached one player has become an agent,

another player is represented through this agency, and a third player

remains solo.

   Suppose that Player 1 now represents 2 and that 3 remains solo. We call

a12, short for a12f3, the probability that 1 now chooses to vote for 3 as

the final agent. (This is observable behavior on the part of Player 1 in

the repeated game.) And we call af12, short for a3f12, the complimentary

probability that Player 3 will vote to accept Player 1 (who already

represents Player 2) as the final agent. These classifications lead to 12

numbers, six of each type.

   And the 12 numbers ARE NOT "strategic" choices by the players involved,

rather we arrange that they are determined by "reactive behavior"

regulated by "demands" which are the actual strategic choices for the

players.

   For example a12 (or a12f3) is specified to be A12/(1+A12) where

A12 will be a positive number. This makes a12 a positive number less

than +1. And the quantity A12 controlling a12 is specified to be 

A12 = Exp[ (u1b3r12 - d12)/e3 ] .

   Here e3, or "epsilon sub 3", is intended to be made ultimately very

small, as we study the equilibria of the model. That smallness will make

A12 react sharply as d12 and u1b3r12 vary in relation to each other.

The number "d12" is the "demand" chosen (strategically) by Player 1

in relation to this situation where he can vote to accept Player 3 as

general (final) agent or alternatively wait and hope that Player 3 will

accept him instead (!). And what the formula takes into consideration

is simply the prospective gain or payoff to Player 1 in the case where

Player 3 becomes general agent and previously Player 1 had been elected

to represent Player 2, and this is specifically u1b3r12.

   There are 6 demand strategy numbers like d12 (which controls a12). And

there are also six quite analogous strategy choices like, for example,

df23 controlling af23 (or a1f23). So "df23" is a choice by player 1, since

it controls a1f23 or the probability of the acceptance by Player 1, as a

solo player in stage 2 of a game, of the agency of Player 2 when Player 2

is already representing Player 3.

   So we have  af23 = AF23/(1+AF23) or  a1f23 = A1F23/(1+A1F23)  with the

relation of AF23 = Exp[ (u1b23r1 - df23)/e3 ] being specified for the

control of the acceptance behavior by the (strategic) demand choice. Or

this could be called, in longer notation, A1F23=Exp[(u1b23r1-d1f23)/e3] .

 

 

 

                       Demands and Acceptance

                        Behavior at Stage One

 

   At the first stage of elections, when all three players are "solo" we

have made a choice, conventionally, of how to relate election behavior

to "demands". The choice made is not absolutely free from any possible

arbitrariness and something more complex might also be considered

appropriate.

   Each player, like e.g. player 2, has the option of voting either for

Player 1 (behavior of probability a2f1) or for Player 3 (a2f3) or for not

voting for either of them (behavior described by n2 = 1 - a2f1 - a2f3  ).

The model, in the same manner as the previously studied model, relates

these behavior-describing numbers (or probabilities) to a single demand

parameter called d2 which is all of the strategic choice by Player 2

that relates to stage 1 of the game. We specify in the model that

positive numbers A2f1 and A2f3 are determined (controlled by d2) and

that  a2fj = A2fj/(1 + A2f1 + A2f3)  where j is either 1 or 3.

   A2fj is specified to be Exp[ ( q2j - d2 )/e3 ] where q2j is specified

to be the calculated expectation of payoff, to Player 2, under the

hypothesis that the game has passed to stage 2 with Player 2 having become

represented by Player j as his agent. So Player 2 chooses, strategically,

the demand d2 which is interpretable as describing what Player 2 "demands"

he/she should expect in gains if either, (q21), Player 1 becomes his

representing agent or, (q23), Player 3 becomes the agent representing

Player 2 in stage 2 of the game.

   Then the three strategy variables, d1, d2, and d3, control the six

behavior probabilities a1f2, a1f3, a2f1, a2f3, a3f1, and a3f2 which

completely describe the actual (observable) behavior of the players in

stage 1.

 

 

                        Variables in the Model

 

   In all we have 39 "strategic" variables in the model, 15 "demands"

and 24 choices of "utility allocations". But we replace all the

demand variables, like d23 or d1, by associated controlled behavior

probabilities like a23 or a1f2 and a1f3. And then we arrive at simpler

equations with most of the appearance of exponential functions being

eliminated.

   It is a matter of practical considerations, how to find actual

numerical solutions of the resulting equations. There is experience

with this problem from work on the previous simpler model but for the

current model the work is far from complete but has arrived at a reliable

derivation of the actual equations to be solved (in good form for study

with applicable computer software) with the aid of the work of an NSF

project assistant (AK). Furthermore some numerical calculations first for

completely symmetric game and then for general non-symmetric games give

indications that the model is working as might be hoped, at least for

small values of the 2-player coalition strengths. But it is too early

to really venture any comments about the asymptotic patterns of the

solutions or to interpret the results of calculations.

   When enough calculations have been made for a variety of non-symmetric

games then the implied values inferred, via the calculated payoffs,

may give numbers that can be compared with those of various relevant

concepts, like the nucleolus, etc.

   So the challenge remains to be dealt with of actually finding enough

of the instructively revealing results from numerical solutions for

equilibria of the model.

 

 

                     Comparisons from Results

 

   The model is designed so that the game can be of a type where the

Shapley value and the nucleolus give quite different "evaluations" of the

game. Which of these, for example as a guide usable in an "arbitration

scheme", is better or worse (if such comparisons could be valid in any

sense)? Naturally, any other route which leads, however it does so, to an

"evaluation", can lead to a basis for comparison of these, and of other

approaches that give evaluations.

   Our modeling, as also with the previous model, has three parameters

that describe the resources available to each of the two-player

coalitions. If these are small positive numbers, say of size less than

1/3, then the standard nucleolus evaluation effectively disregards them

although the Shapley value gives them a modest weighting in its evaluation

of the game.

   Our previous model turned out to give numerical results in cases of

this sort where, depending on the ratio of two "epsilons" (comparable to

"e3" as described above) the evaluation could be either "hyper-Shapleyan"

(compared with the nucleolus at 1/3,1/3,1/3) or "sub-Shapleyan".

   Ultimately I came to the conclusion that it was needed to have a MORE

UNIFORM concept of "fuzziness" to be applied in smoothing out the

sharpness of the effects of "demand" choices. Otherwise, and this became

evident in earlier studies of two-player models, if one player would have

"sharper" demands and the other would make "duller" demands then the

player with the sharper demands would indeed become like a "sharper"

bargainer and would tend to gain an advantage in the calculated game

outcomes! The fuzziness is needed for mathematical reasons, so that

derivatives of smooth functions can be computed. But under certain

circumstances it seems that "unbalanced" fuzziness can somehow "prejudice"

the game evaluation objective.

 

 

                         Associated Studies

 

   The work on this project which seeks to exploit the "agencies" concept

to successfully study games with cooperation via a reduction to non-

cooperative equilibrium considerations has led me also to study some

other topics.

   One of these was the computability of the nucleolus, since if a variety

of sample games are studied, and if these are of the sort where the

classical nucleolus and the Shapley value are defined, then just for the

comparison of numerical results it is nice to have a quick method for

finding the numbers that form the nucleolus vector. It occurred to me that

that should be possible by a programmable method of Monte Carlo style in

which random perturbations would be used in finding, to a high level of

approximation, the numerical values of the nucleolus components.

   I tried programming that for use in MATHEMATICA knowing that often the

game would be defined by rational numbers and that a close approximation

to the exact nucleolus would lead to finding the exact answer by finding

the simple rational numbers for which approximations had been found.

   In the method of successive approximations the actual definition of the

nucleolus would be the basis for the criterion for the comparison of the

merits of various random perturbations made to an approximate vector.

   Later I learned of the work of Sven Klauke in Bielefeld where this sort

of a method has been the basis of an effective program in C++ and where

there has also been developed an effective program that works by the

method of reducing the problem to problems of a "linear programming" type.

   And after being at the Stony Brook meeting of last year I thought about

the characteristic function as calculated by Harsanyi (this was around

1960) in connection with developing a general solution concept for

cooperative games. And then I learned that if this approach were used to

transform the interpretation of a game of three players and if after that

the nucleolus were calculated for the (constant sum) game thus obtained

that this would result in THE SAME vector as that of the Shapley value for

the game (which itself would not be changed by the move to the "Harsanyi

characteristic function").

   On the other hand, if we consider similar types of four person games

then the nucleolus computed after transforming the coalition values

information via the Harsanyi characteristic function will not, in general,

match the Shapley value vector. 

  And now this seems to me curiously parallel to how Von Neumann, in 1928,

concluded that 3-person cooperative games could be, somehow, evaluated,

but that 4-person games were of a different order of difficulty, in terms

of evaluation!