Image Image Image Image Image Image




Post new topic Reply to topic  [ 15 posts ] 
Author Message
 Post subject: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Fri May 14, 2010 2:37 am 
Offline
Junior member
User avatar

Posts: 22
Favourite Bot: Sonia
Hi,

In hopes of sparking some theoretical discussion on fictitious play, I am sharing java code I wrote which uses fictitious play to solve an interesting toy game for HU limit poker. It was inspired by this 2p2 thread. Here is a description of the game from that thread:

SB (button) and BB each ante 2 chips. They are then each dealt a card: 1, 2, or 3. Each of them has a 1/3 chance of getting any card, and they can both be dealt the same card, in which case they obviously chop at showdown. SB acts first and has the option to fold or bet 1 more chip (call this a 2-bet, even though it's the first bet, to keep the analogy with HU limit HE). BB then has the option to fold, call, or reraise to 2 chips (3-bet). SB can now fold, call, or reraise to 3 chips (4-bet). BB must now just call or fold, and the hands are showdown. A 3 is the best hand.

And here are two slightly different equilibrium solutions of this game, as produced by gambit:

Solution 1
Solution 2

The attached Java code is my attempt to solve the game using fictitious play. The solution generated by my program matches the gambit solutions closely, but not exactly.

Questions / Discussion

There are two observations about this program that I think are interesting and relevant to using fictitious play for solving larger, real-life poker problems.

The first is that the solution remains slightly different from gambit's solutions (a few percentage points) no matter how many iterations we do, even though fictitious play is supposed to converge to the equilibrium solution with arbitrary precision. For example, compare the program's BB strategy with card 1 -- call 25.3%, raise/fold 27% -- with the strategy from Solution 2 above -- call 28.5%, raise/fold 28.5%. This problem may be related to the second point I'd like thoughts on....

The convergence depends quite heavily on the choice of smoothing function used in the algorithm, even though in theory I don't think it should. For example, on line 38 of the code, if you change the multiplier used in the smoothing function from 210 to something like 10, the final answer, especially for BB's strategy with card 1, is significantly different. This raises the question: how does one go about finding the best smoothing function, to maximize the accuracy and quickness of the convergence.

I'd love to hear any thoughts on these points.


Attachments:
SimpleLim4.zip [2.56 KB]
Downloaded 44 times
Top
 Profile  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Fri May 14, 2010 6:38 am 
Offline
Level1 member
User avatar

Posts: 40
Favourite Bot: ZZ'
I'm not into the problem but it sounds like QUAAK game it might be help full (just rolling dices :P). it uses minimax.

http://en.wikipedia.org/wiki/Minimax

for rules http://www.bewersdorff-online.de/quaak/rules.htm

Symmetric game for 2 players (publ.: Ravensburger):
¨ 2 players play several rounds,
¨ each player starts with 15 chips,
¨ in each round a player can bet 0 to 3 chips of his stock,
¨ the player with the higher bet wins the round
¨ A player who has won 3 more rounds than his opponent wins the game.
Recursive game with a normal form of size (until) 3×3 to every possible state.

Starting position: (2×15 chips):
For the first move the optimal behaviour in the sense of a minimax strategy is:
0 Chips: 0.1212
1 Chip: 0.2272
2 Chips: 0.0000 (“shadow price“: 0.0852)
3 Chips: 0.6515

http://www.bewersdorff-online.de/quaak/quaak_e.htm


Top
 Profile E-mail  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Fri May 14, 2010 10:36 am 
Offline
PokerAI fellow
User avatar

Posts: 794
Favourite Bot: my bot
- I tried smoothing a while ago and experienced similar problems. Have you tried without using smoothing? ie in getBestResponse return 1 for the most positive ev action and 0 for the others.
- I'd always assumed that you would update the average strategy at a node before updating its predecessors, but I don't think you do that.
- I'm not saying you are wrong, but I haven't convinced myself that your smoothing is correct. For action evs of {1, 1, 1} does it return {1/3, 1/3, 1/3}? Does it work equally well for negative evs as positive?
- Could you spell out the differences between the Gambit solutions please.
- If FP fails you could try CFRM.
- Nice work, you deserve to find a solution.


Top
 Profile E-mail  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Fri May 14, 2010 5:19 pm 
Offline
Junior member
User avatar

Posts: 22
Favourite Bot: Sonia
Thanks for the ideas, spears.

spears wrote:
- I tried smoothing a while ago and experienced similar problems. Have you tried without using smoothing? ie in getBestResponse return 1 for the most positive ev action and 0 for the others
- I'd always assumed that you would update the average strategy at a node before updating its predecessors, but I don't think you do that. .

These two are possibly related. I must be misunderstanding your suggestion, because with the code as written, if I take away the smoothing, it means that the only possible solutions I could output are pure strategies, and since both Nash Eq are mixed, we couldn't converge to a Nash Eq.

As for the averaging code, I could definitely be doing that wrong. Currently, the algorithm works like this:
1. Take SB's current strategy, and use it to update his historical average strategy.
2. Calculate BB's best response versus that historical average strategy.
3. Take the best response we just calculated in 2., and use it to update BB's historical average strategy.
4. Find SB's best response to that.
5. Repeat.

Can you clarify how I would need to change that procedure in order to update the average strategy at a node before updating it's predecessors?

Quote:
- I'm not saying you are wrong, but I haven't convinced myself that your smoothing is correct. For action evs of {1, 1, 1} does it return {1/3, 1/3, 1/3}? Does it work equally well for negative evs as positive?


Yes, it would have to transform {1,1,1} --> {1/3, 1/3, 1/3}. The code does this:

1. {x, y, z} --> {f(x), f(y), f(z)}
2. Define sum = f(x) + f(y) + f(z)
3. {f(x), f(y), f(z)} --> {f(x)/sum, f(y)/sum, f(z)/sum}

Quote:
- Could you spell out the differences between the Gambit solutions please.

Sure. I think the best way to do that is to simply put them all into the same form:

Recall that:
SB_OPTIONS = {FOLD, RAISE_FOLD, RAISE_CALL, RAISE_RAISE};
BB_OPTIONS = {FOLD, CALL, RAISE_FOLD, RAISE_CALL};

Gambit 1
SB Strategy: [ [0, .857, 0, .143 ], [0, 0, 1, 0], [0, 0, .428, .572] ]
BB Strategy: [ [.714, 0, .286, 0], [0, 1, 0, 0], [0,0,0,1] ]

Gambit 2

SB Strategy: [ [0, .857, 0, .143 ], [0, 0, 1, 0], [0, 0, 0, 1 ]
BB Strategy: [ [.428, .286, .286, 0], [0, 1, 0, 0], [0, 0, 0, 1] ]

My Program

SB Strategy: [[0.006, 0.850, 0.000, 0.144], [0.000, 0.000, 1.000, 0.000], [0.000, 0.000, 0.375, 0.625]]
BB Strategy: [[0.470, 0.253, 0.270, 0.007], [0.000, 1.000, 0.000, 0.000], [0.000, 0.000, 0.000, 1.000]]

So the main differences are BB's strategy with card 1 (close but not exactly similar to Gambit 2), and SB's strategy with card 3 (close but not exactly like Gambit 1)

Quote:
- If FP fails you could try CFRM.

This is my next project, to solve this same toy game that way, just to give me a better feel for CFRM. However, I would really like to know why this method is not completely working, since in theory it should.

Quote:
- Nice work, you deserve to find a solution.

Thanks :)


Top
 Profile  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Fri May 14, 2010 10:02 pm 
Offline
PokerAI fellow
User avatar

Posts: 794
Favourite Bot: my bot
I read somewhere that in Kuhn Poker there is a single solution for one player, but several for the other. Maybe that is what you are seeing here. You could confirm that quite easily by computing the best response ev for both sides and both solutions.

What are the differences in assumptions between the two Gambit solutions?

If the above turns out to be true, then my comments are probably not relevant. My FP solution went something like this:
- build the game tree
- assume a strategy
- walk the game tree, alternating between players as we go, calculating the probability of opponent holding each card from his actions and his strategy on the outward trip. At the leaves calculate the evs. On the return trip, calculate the ev of proponents actions at choice nodes. So we have proponent's best response, a pure strategy. Update proponent's average response with this best response.

This seemed to work OK for single street problems, but didn't converge for bigger ones, so I tried smoothing. Instead of adding a pure strategy to the average strategy, I calculated a Gibbs/Boltzmann distribution for the actions and added that. Didn't work.


Top
 Profile E-mail  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Sat May 15, 2010 2:25 am 
Offline
Junior member
User avatar

Posts: 22
Favourite Bot: Sonia
spears wrote:
I read somewhere that in Kuhn Poker there is a single solution for one player, but several for the other. Maybe that is what you are seeing here. You could confirm that quite easily by computing the best response ev for both sides and both solutions.


I will test this theory tonight or tomorrow and post my results.

Quote:
What are the differences in assumptions between the two Gambit solutions?

No differences in assumptions. Gambit has a number of different solving engines, and you can choose which one to use. Those two solutions were produced by different engines, and are both NE. Solution 1 is created by solving a linear program. Solution 2 by solving a linear complementarity system. I don't know any details beyond that, but gambit has some docs online explaining them I believe.

Quote:
If the above turns out to be true, then my comments are probably not relevant. My FP solution went something like this:
- build the game tree
- assume a strategy
- walk the game tree, alternating between players as we go, calculating the probability of opponent holding each card from his actions and his strategy on the outward trip. At the leaves calculate the evs. On the return trip, calculate the ev of proponents actions at choice nodes. So we have proponent's best response, a pure strategy. Update proponent's average response with this best response.

This seemed to work OK for single street problems, but didn't converge for bigger ones, so I tried smoothing. Instead of adding a pure strategy to the average strategy, I calculated a Gibbs/Boltzmann distribution for the actions and added that. Didn't work.

This is an interesting variant; I can't tell right off if it would have more calcs per iteration than my method, fewer, or the same. Do you know?


Top
 Profile  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Sat May 15, 2010 10:47 am 
Offline
PokerAI fellow
User avatar

Posts: 794
Favourite Bot: my bot
MunroeStahr wrote:
This is an interesting variant; I can't tell right off if it would have more calcs per iteration than my method, fewer, or the same. Do you know?

I don't know for sure, but I guess it's about the same: isn't it the same calculations in a slightly different order?


Top
 Profile E-mail  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Sat May 15, 2010 8:16 pm 
Offline
Junior member
User avatar

Posts: 22
Favourite Bot: Sonia
It looks like your theory may be right:
Code:
Time: 250
SB Strategy: [[0.006, 0.850, 0.000, 0.144], [0.000, 0.000, 1.000, 0.000], [0.000, 0.000, 0.375, 0.625]]
SB Raw EVs: [[-2.000, -1.964, -2.020, -1.993], [-2.000, -0.712, -0.397, -0.728], [-2.000, 0.288, 1.936, 1.938]]
BB Strategy: [[0.470, 0.253, 0.270, 0.007], [0.000, 1.000, 0.000, 0.000], [0.000, 0.000, 0.000, 1.000]]
BB Raw EVs: [[-1.992, -1.996, -2.005, -2.021], [-1.992, -0.002, -0.671, -0.447], [-1.992, 1.998, 1.163, 2.427]]


So the places where I am differing from the Gambit solutions are places where the EVs run really close, so it does appear that there may be many correct solutions, or at least many close to correct ones. My guess is that we could make things closer and closer by increasing the multiplier inside the smoothing function, but when I go much higher than 210 I start getting "NaN" as my answer, since we are creating numbers that are too big or too small for java double to handle, presumably. I wonder if there is any way to avoid this problem -- some other smoothing function which would be very steep without using such large numbers.


Top
 Profile  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Sun May 16, 2010 2:53 am 
Offline
Level1 member
User avatar

Posts: 40
Favourite Bot: ZZ'
In your solution what is the type of game you assumed :

General m x n game (A,B)
Zerosum m x n game (A,-A)
Symmetric m x m game (A,AT)


Top
 Profile E-mail  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Sun May 16, 2010 7:52 am 
Offline
Junior member
User avatar

Posts: 22
Favourite Bot: Sonia
z.z. wrote:
In your solution what is the type of game you assumed :

General m x n game (A,B)
Zerosum m x n game (A,-A)
Symmetric m x m game (A,AT)

I didn't assume anything like that, I wasn't solving a matrix. If you look at the code my assumptions are clear. I am just playing the game over and over again, iteratively.


Top
 Profile  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Sun May 16, 2010 11:02 am 
Offline
PokerAI fellow
User avatar

Posts: 794
Favourite Bot: my bot
MunroeStahr wrote:
I wonder if there is any way to avoid this problem
Could you drop the smoothing, and average the pure best responses as I did? It doesn't look like such a big fix from here, but maybe I'm missing something.

MunroeStahr wrote:
This is an interesting variant
and I thought your solution was the variant :)Do you have a reference for your solution?


Top
 Profile E-mail  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Sun May 16, 2010 1:10 pm 
Offline
Junior member
User avatar

Posts: 22
Favourite Bot: Sonia
spears wrote:
Could you drop the smoothing, and average the pure best responses as I did? It doesn't look like such a big fix from here, but maybe I'm missing something.

Will try this and post results, you may be right.

MunroeStahr wrote:
This is an interesting variant
and I thought your solution was the variant :)Do you have a reference for your solution?[/quote]

lol, no. I just made it up based on my understanding of the concept of fictitious play.


Top
 Profile  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Sun May 16, 2010 10:27 pm 
Offline
Junior member
User avatar

Posts: 22
Favourite Bot: Sonia
spears,

you were right. the smoothing was unnecessary and in fact counterproductive. the results of my code now match the gambit solutions exactly:
Code:
SB Strategy: [[0.000, 0.857, 0.000, 0.142], [0.000, 0.000, 1.000, 0.000], [0.000, 0.000, 0.000, 1.000]]
BB Strategy: [[0.714, 0.000, 0.286, 0.000], [0.000, 1.000, 0.000, 0.000], [0.000, 0.000, 0.000, 1.000]]

for some reason i thought the smoothing was needed to avoid cycling, but the averaging procedure protects us from that, so the smoothing was not needed at all. thanks very much for your help with this. i am now going to try to apply this method to some more complex, real-life holdem problems.


Last edited by MunroeStahr on Sun May 16, 2010 11:47 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Sun May 16, 2010 10:44 pm 
Offline
PokerAI fellow
User avatar

Posts: 794
Favourite Bot: my bot
MunroeStahr wrote:
thanks very much for your help with this.
It's been a pleasure.
MunroeStahr wrote:
i am not going to try to apply this method to some more complex, real-life holdem problems.

not or now?

Do you have lots of memory?


Top
 Profile E-mail  
 
 Post subject: Re: Fictitious Play: A Code Contribution and Theory Discussion
PostPosted: Sun May 16, 2010 11:47 pm 
Offline
Junior member
User avatar

Posts: 22
Favourite Bot: Sonia
lol, that was supposed to be NOW. post edited.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to: