CS212 Unit 5

CS212 Unit 5
6 Jos Antnio Soares Augusto
CS212 Unit 5
Contents
1 CS212 Unit 5......................................................................................................................................................1/14 1.1 1. 01 Welcome Back...........................................................................................................................1/14 1.2 2. 02 Porcine Probability .....................................................................................................................2/14 1.3 3. 03 q The State of Pig .......................................................................................................................2/14 1.4 4. 03 s The State of Pig.......................................................................................................................3/14 1.5 5. 04 l Concept Inventory ....................................................................................................................3/14 1.6 6. 05 p Hold and Roll..........................................................................................................................3/14 1.7 7. 05 s Hold and Roll..........................................................................................................................4/14 1.8 8. 06 l Named Tuples..........................................................................................................................4/14 1.9 9. 07 p Clueless...................................................................................................................................4/14 1.10 10. 07 s Clueless...............................................................................................................................4/14 1.11 11. 08 p Hold At Strategy.................................................................................................................4/14 1.12 12. 08 s Hold At Strategy ..................................................................................................................5/14 1.13 13. 09 p Play Pig...............................................................................................................................5/14 1.14 14. 09 s Play Pig ................................................................................................................................5/14 1.15 15. 10 l Dependency Injection..........................................................................................................6/14 1.16 16. 11 p Loading the Dice.................................................................................................................6/14 1.17 17. 11 s Loading the Dice.................................................................................................................6/14 1.18 18. 12 q Optimizing Strategy............................................................................................................6/14 1.19 19. 12 s Optimizing Strategy............................................................................................................7/14 1.20 20. 13 l Utility...................................................................................................................................7/14 1.21 21. 14 q Game Theory......................................................................................................................7/14 1.22 22. 14 s Game Theory.......................................................................................................................8/14 1.23 23. 15 q Break Even Point................................................................................................................8/14 1.24 24. 15 s Break Even Point.................................................................................................................9/14 1.25 25. 16 q Whats your Crossover.........................................................................................................9/14 1.26 26. 17 l Optimal Pig..........................................................................................................................9/14 1.27 27. 18 l Pwin.....................................................................................................................................9/14 1.28 28. 19 p Maxwins ..............................................................................................................................9/14 1.29 29. 19 s Maxwins............................................................................................................................10/14 1.30 30. 20 l Impressing Pig Scouts ........................................................................................................10/14 1.31 31. 21 p Maximizing Differential...................................................................................................10/14 1.32 32. 21 s Maximizing Differential....................................................................................................10/14 1.33 33. 22 l Being Careful.....................................................................................................................10/14 1.34 34. 23 p Legal Actions....................................................................................................................11/14 1.35 35. 23 s Legal Actions....................................................................................................................11/14 1.36 36. 24 l Using Tools........................................................................................................................11/14 1.37 37. 25 l Telling A Story..................................................................................................................12/14 1.38 38. 26 q Simulation vs Enumeration...............................................................................................12/14 1.39 39. 26 s Simulation vs Enumeration...............................................................................................13/14 1.40 40. 27 l Conditional Probability......................................................................................................13/14 1.41 41. 28 q Tuesday.............................................................................................................................13/14 1.42 42. 28 s Tuesday.............................................................................................................................13/14 1.43 43. 29 l Summary............................................................................................................................14/14
1 CS212 Unit 5
Contents 1. 01 Welcome Back 2. 02 Porcine Probability 3. 03 q The State of Pig 4. 03 s The State of Pig 5. 04 l Concept Inventory 6. 05 p Hold and Roll 7. 05 s Hold and Roll 8. 06 l Named Tuples 9. 07 p Clueless 10. 07 s Clueless 11. 08 p Hold At Strategy 12. 08 s Hold At Strategy 13. 09 p Play Pig 14. 09 s Play Pig 15. 10 l Dependency Injection 16. 11 p Loading the Dice 17. 11 s Loading the Dice 18. 12 q Optimizing Strategy 19. 12 s Optimizing Strategy 20. 13 l Utility 21. 14 q Game Theory 22. 14 s Game Theory 23. 15 q Break Even Point 24. 15 s Break Even Point 25. 16 q Whats your Crossover 26. 17 l Optimal Pig 27. 18 l Pwin 28. 19 p Maxwins 29. 19 s Maxwins 30. 20 l Impressing Pig Scouts 31. 21 p Maximizing Differential 32. 21 s Maximizing Differential 33. 22 l Being Careful 34. 23 p Legal Actions 35. 23 s Legal Actions 36. 24 l Using Tools 37. 25 l Telling A Story 38. 26 q Simulation vs Enumeration 39. 26 s Simulation vs Enumeration 40. 27 l Conditional Probability 41. 28 q Tuesday 42. 28 s Tuesday 43. 29 l Summary
1.1 1. 01 Welcome Back

Hey, welcome back. Now, as we've said, this class is all about managing complexity. Now many types of software manage complexity by trying to artificially rule out any type of uncertainty. That is, say you have a checkbook-balancing program, and it says you've got to enter the exact amount. You've got to say $39.27. You can't say, oh I don't know about $40. It's easier to write programs that deal that way, but it constrains what you can do. So, in this unit we're going to learn about how the laws of probability can allow you to deal with uncertainty in your programs. Now, the truly amazing thing is that you can allow uncertainty and what you know about the world, or what's true right now and uncertainty in your actions, if the program does something, what happens next? Even though both of those are uncertain you can still use the laws of probability to calculate what it means to do the right thing. That is, we can have clarity of action. We can know exactly what the best thing to do is even though we're uncertain about what's going to happen. So follow with this unit, and we'll learn how to do that.
1/14
CS212 Unit 5
06/05/12 11:49:53
1.2 2. 02 Porcine Probability

This unit is about probability, which is a tool for dealing with uncertainty. Once you understand probability, you'll be able to tackle a much broader range of problems than you could with programs that don't understand probability. Often when we have problems with uncertainty, we're dealing with search problems. Recall, in a search problem, we are in a current state. There are other states that we can transition into, and we're trying to achieve some goal, but we can't do it all in one step. We have to paste together a sequence of steps. In doing that, we're building up a search frontier that we're continuing to explore from. Now, uncertainty can come into play in two ways. 1) We can be uncertain about the current state. Rather than knowing exactly where we are, it may be that we start off in one of four possible states and all we know is that we're somewhere in there, but we're not sure exactly where we are. 2) The other place uncertainty can come in is when we apply an action, say this action here--action A--it may be that we don't get to one specific state but, rather, we're uncertain as to what the action will do, and we might end up in this state or this state or this state instead of the one that we were aiming at. And so we'll see techniques for dealing with both of these types of uncertainty. Now, one place where people are used to dealing with uncertainty is in playing games that employ dice. And that's what we're going to deal with. In particular, we're going to play a dice game which is called Pig. I don't know why the game is called Pig. I can guarantee no porcine creatures were harmed in the creation of this unit. Here's how the game works: There are two players, although you could play with more. The players take turns, and on his turn a player has the option to roll the dice--a single die--as often as he wants or to hold--to stop rolling. And the object of the game is to score a certain number of points. We're going to say 50 points; 100 is more common, but 50 will be easier on the Udacity servers in terms of the amount of computation required. And so it's my turn, and we have a score. So here's a scoreboard; we'll have players with the imaginative names of player 0 and player 1. And the score starts off 0 to 0. Now there's another part of the scoreboard that is not part of the player's score. We'll call that the pending score. Let's say it's my turn. I pick up the die, I roll it, and let's say I get a 5. Then 5 goes into the pending score, but I don't score any points yet. Now it's my turn again. Do I roll or do I hold--stop rolling? Let's say I want to roll again. This time I get a 2, so I add 2 to the pending score; I get 7. Let's say I roll again. I'm lucky. I get a 6. I add 6 to the pending; I get 13. And I'm going great, so I roll again, and this time I get a 1. And a 1 is special. A 1 is called a pig out, and when you roll a pig out it means you lose all the pending points, and for your hand you score not this total in pending, but just the 1. So my score would be just the 1. Now the other player, player number 1, goes. Let's say player number 1 says, "I'm going to roll," gets a 3. "I'm going to roll again," gets a 4. "I'm going to roll again," gets a 5. So now we have 12 in the pending, and now player number 1 says, "I think I've had enough; I'm going to hold," and that means we take these points from the pending, the 12 points, put them up on the board for player 1's score. And now player 1's turn ends, and it's player 0's turn. So your turn continues until you either hold or pig out, and your score for the turn is the sum of your rolls, if you didn't pig out, if you decided to hold, and the score is just 1 if you pigged out. And you keep on taking turns until somebody reaches the target--here, 50. So that's how the game of Pig works. Now let's go to try to describe the game in a form that we can program.
1.3 3. 03 q The State of Pig

So as usual, we're going to make an inventory of concepts in the game. This time I'm going to try to break things out a little bit, and I'm going to talk about low-level concepts, high-level concepts, and mid-level concepts. As we saw in the discussion forums there's always a question of where do you want to start. Do you want to describe the low level first and build up from there? Do you want to describe the high level first and build down? I think for this case we'll take more of a middle out approach. So, at the mid level there's the concept of current state of the game. We're sort of inching towards a search problem, and we know that we have to represent states for a search 2/14
CS212 Unit 5
06/05/12 11:49:53
problem. So, we want to know the current state of the game. If we're thinking of search problems then we also have to know about actions we can take. We know that there are two actions: Roll and hold. So, here's some candidates for what's in the current state. First, the things that were on the scoreboard. The scoreboard, remember, had three things. Then the player whose turn it is, we might want that to be part of the state. The previous role of the dice, whether I just rolled a five or something else, that might be part of the state. The previous turn score, how much did the other player just make on their turn? So, all of these are possibilities. You might be able to think of other possibilities. I want you to tell me which one of these are necessary to describe the state of the game. I guess I should say here that we're assuming that the goal of the game, the number of points you need win, we're assuming that's constant and doesn't need to be represented in each individual state. We just represent it once for the whole game. Which of these are necessary for the current state?
1.4 4. 03 s The State of Pig

Well, we certainly have to know the score. We have to know how much is pending, because that's going to affect the score. We have to know what player is playing. Now these things, what happened before, they might be interesting, but they don't really help us to find the current state. So those are unnecessary. So, the state's going to end up being something like a four tuple. I've written it as p, me, you, pending, the player to move, that player's score, the other player's score, and the pending score that hasn't been reaped yet.
1.5 5. 04 l Concept Inventory

At the low level--I count as low-level things like the roll of a die, the implementation of scores, the implementation of the players and of the player to move, the goal--so these are all things that we're going to have to represent. And then at the high level, I'm going to have a function play-pig, that plays a game between two players, and I have the notion of a strategy--a strategy that a player is taking in order to play the game. Now let's think about how to implement these things, and when I'm doing the implementation, I'm going to move top-down. So I started sort of middle-out saying these are the kinds of things I think I'm going to need; now I have a good enough feel for them that I feel confident in moving top-down. I don't see any difficulties in implementing any of these pieces. If I start at the top, then I'll be able to make choices later on without feeling constrained. If I thought there was something down here that was difficult to deal with, I might spend more time now, at the low level, trying to resolve what the right representation is for one of these difficult pieces, and that would inform my high-level decisions. But since I don't see any difficulty, I'm going to jump to the high level. Now, what's play-pig? Well, I think that's going to be a function, and let's just say that its input is two players, A and B, and we haven't decided yet how we're going to represent those. And its output is--let's say it's going to be the winner of the game. Maybe A is the winner. And we'll have to make a choice of how we represent these players. Now what's a strategy? Well, a strategy--people sometimes use the word "policy" for that. We can also represent that as a function. And it takes as input a state, and it returns an action or a move in the game. In this game we said that the actions are roll and hold. We're starting to move down. Let's just say now how are we going to represent these actions? Well, we can call the actions just by strings, so we use the strings "roll" and "hold" and that could be what the strategy returns. But then we'll also need something that implements these actions, so we'll have to have something that's a function that says--let's say-- the function "roll" takes a state and returns a new state; function "hold" takes a state and returns a new state. But that doesn't seem quite right; there's a problem here. What about the die? That seems to take and effect that roll by itself is not a function from state to state. Rather, roll--if we wanted to specify it--would be a function from a state to a set of states, and that represents the fundamental uncertainty. That's why we need probability to deal with it. That's why we have an uncertain or a nondeterministic domain is because an action doesn't have a single result; rather, it can have a set of possible results. And, in some cases it makes sense to go ahead and implement these actions as functions that look like that, that return sets of states. And I considered that as a possibility, but I ended up with an implementation where I talk about the different possibilities for the dice. So the dice can come up as D, one of the numbers 1 to 6, and now roll, from a particular state with the particular die roll, that is going to return a single state rather than a set of states. And I just think it's easier to deal this way, although in other applications you might want to deal that way. Now the rest seems to be pretty easy. The die can be represented as an integer. Scores can be represented as integers. Likewise the goal. The player to move--we can represent that as an index, 0 or 1, into an array of players. And the players themselves? Well, the simplest way to do it is just to represent the player by their strategy. The strategy is a function, and that could represent the player. We could have something more complex, but it seems like we don't need anything more than that. So players will be strategy functions.
1.6 6. 05 p Hold and Roll

Now you're probably itching to write some code by now--so let's get started. What I want you to do first is write these two action functions, hold and role, which take a state and return a state. Here the state that results from holding. Here the state that results from rolling and getting a d. A state is represented by this four tuple of p, the player. It's either zero or one. The subsequent state would remain the same if the player continues and would swap between one and the other otherwise. Me and you, two integers indicating the score, the score of the player to play and the score of the other player, and then pending, which is score accumulated so far but not yet put onto the scoreboard. Go ahead and write those functions. 3/14
CS212 Unit 5
06/05/12 11:49:53
1.7 7. 05 s Hold and Roll

Here's my solution: So, I have my state--I just broke it up into pieces so that I know what I'm talking about. Then if I hold it becomes the other player's turn. The other player's score is the same as it was before. So now remember the second place is the score of the player whose turn it is. So, that was you previously, and then the score that I got--I just add in the pending. I reap all of those, and the pending gets reset to zero. When I roll, again let's figure out what's in the state, if the roll is one that's a pig out, it becomes the other player's turn, and I only got one lousy point. Pending gets reset to zero. If the role is not a one then it's still my turn. I don't change my score so far, but I add d onto the pending. Here's just a way to map from one player to the other. If the other player, if it was one it becomes zero. If it was zero it becomes one. It's always a great idea to write some test cases. Now, one comment on style. Right here I'm taking this state, which is a tuple that has four components, and I'm breaking it up like this into it's four components. When you have four components that's probably okay, but it's getting to worry me a little bit that maybe I won't be able to remember which part of the state is which. If I had more than four, if I had five or six components, I really start to worry about that. So there are other possibilities where we can be more explicit about what the state is rather than just have it be a set of undifferentiated elements of a tuple that we then define like this. We can define it ahead of time.
1.8 8. 06 l Named Tuples

Now here's an alternative. Instead of just defining a state by just creating a tuple and then getting at the fields of a state by doing an assignment, we can use something called a namedtuple that gives a name to the tuple itself as well as to the individual elements. We can define a new data type called state and use capitalized letters for data types. Say state is equal to a namedtuple, and the name of the data type is state, and the fields of the data type are p, me, you, and pending. So I can just go ahead and make that assertion. Namedtuples is in a module. So, from collections import namedtuple gives me access to it. Now I can say s = state (1,2,3,4), and I can ask for the components of s by name. How would I choose between this representation for states and the normal tuple representation? Well the namedtuple had a couple of advantages. It's explicit about the types. It helps you catch errors. So if you ask for the p field of something that's not a state that would give you an error. Whereas if you just broke up something that was four elements into these components that would work even if it didn't happen to be a proper state. There are a few negatives as well. It's a little bit more verbose, although not so much, and it may be unfamiliar to some programmers. It may take them a while to understand what namedtuples mean. I should say we could also do the same type of thing by defining a class. That has all the same positives, and it's certainly familiar to most Python programmers, but it would be even more verbose. Here's what hold and roll look like in this new notation. So, hold--where we're explicitly creating a new state. We look at the state.p, the state.you, the state.me, and the state.pending and so on, similarly for roll. They look fairly similar. You notice the lines are a little bit longer in terms of we're being more explicit. So, it takes a little bit more to say that. I'm sort of up in the air whether this representation is better than the previous representation with tuples. I could go either way.
1.9 9. 07 p Clueless
Now I'm going to talk about strategies for a minute. Remember a strategy is a function, which takes a state as input, and it's output is one of the action names, roll or hold. I want you to write a strategy function, which we're calling clueless. So its a function that takes a state as input, and it's going to return one of the possible moves, roll or hold. It does that by ignoring the state and just choosing one of the possible moves at random. So go ahead and write that.
1.10 10. 07 s Clueless

Here's my solution: I gave you the hint of importing the random module. I just call it the random choice function, which takes a set of possible moves and picks one at random.
1.11 11. 08 p Hold At Strategy

Now I want to describe a family of strategies that I'm calling hold at n, where n is an integer. For example, hold at 20 is a strategy that keeps on rolling until the pending score is greater than or equal to 20, and then it holds. The point of this strategy is you get points by rolling, but you risk points by rolling as well. The higher the pending score is, the more you're risking. So there should be some point at which you're saying that's too much of a risk. I've accumulated so much pending that I don't want to risk any more, and then I'm going to hold. So hold at 10, hold at 20, hold at 30 describes that family of strategies. I should say there's one subtlety here that we'll build in to hold at, which is let's say that the goal is 50, and my score when I start my round is say 40. Then let's say I roll a 6 and a 4. According to hold at 20 I should keep on rolling because my pending score is only 10. I haven't gotten up to 20 yet, but it would be silly for me to keep rolling at that point. I would risk pigging out and only scoring one point and getting to 41. Whereas I know if I hold now I have 40 + 6 + 4 is 50. I've already won the game. So, hold at 20 will hold when pending is greater than or equal to 20, or when you win by holding. So, I want you to go ahead and implement that. Since hold at x is a whole family of strategy of functions, hold at x is not going to be a 4/14
CS212 Unit 5
06/05/12 11:49:53
strategy function. Rather, it's going to return a strategy function. So I've given you this outline of saying we're going to define a strategy function, then we're going to fix up it's name a little bit to describe it better. Then we're going to return it. You have to write the code within the strategy function. I should say, we're going to stick with the representations of states, where state is a four tuple of the player to move, zero one, me and you score, and the pending score.
1.12 12. 08 s Hold At Strategy

Here's my solution: I break up this data to its components. Then I hold if the pending score is greater than or equal to the x, or if I've already won if my current score plus the pending score is already greater than or equal to the goal. Otherwise, I return roll as my move. 1 def hold_at(x): 2 """Return a strategy that holds if and only if 3 pending >= x or player reaches goal.""" 4 def strategy(state): 5 p, me ,you, pending = state 6 7 return 'hold' if ((pending >= x) or (pending+me >= goal )) else 'roll' 8 9 strategy.__name__ = 'hold_at(%d)' % x 10 return strategy
1.13 13. 09 p Play Pig

Now let's talk about the design of the function, Play Pig, which plays a single game of Pig. We decided that this is a two player game, player "A" and "B," and we decided that we're going to represent this as a function. At some point in the future we might want to allow multiplayer games with more than two players, but we're not going to want to worry about that for now. So let's make a list of what the function has to do. It has to keep score--it needs the score for player "A" and for player "B" and for pending. It has to take turns. It has to figure out whose turn it is, and that that turn keeps going until they hold or pig out. Another way to say that is, the score for "A," the score for "B," the pending, and whose turn it is-- all of that is managing the current state. It has to call the strategy functions, so "A" and "B" are going to be strategy functions that we pass in. It has to keep track of the current state, pass that state to the strategy function for the appropriate player whose turn it is, and then that will give back an action, either roll or hold. Then it has to do the action, the roll or hold, and that will generate a new state and we have to keep track of the state we're in. But there's one more trick here--when we were doing a normal search, that was it. We had to figure out what the actions were. Apply the action. When you get to the next state, there's a single successor for each action. But here there's multiple successors for an action. And so we have to do one more thing, which is roll the die, and that disambiguates the action of rolling and makes it not generate a set of possible states, but the action plus the die--that generates a single state. I want you to write the function "Play Pig," which takes two strategies as input, plays the game, keeping track of what's going on-- rolling the dice as necessary, updating the state, and then when one player wins, we turn that player either "A" or "B." One thing I note is--I don't have any tests here. The reason is it's hard to test this. It's hard to write a deterministic test because part of playing the game is rolling the die, and that won't be the same every time. We'll talk in a bit about how to test programs like this.
1.14 14. 09 s Play Pig

Here's my solution. I put the strategies into a list because we're going to be indexing into that. I start out in the start state. Nothing has happened. No points awarded. Player number 0, that is "Player A," is the player to move. Then I repeat this loop. Tell me everything I know about the current state. If the score of the player whose turn it is, is greater than the goal, then that player wins. "Player 0" or "Player 1"--"A" or "B." Same if the other player is greater than the goal, then that player wins and, otherwise, I pick out the strategy function for the player to play. If "P" is "0," then strategy "0" is "A." If "P" is "1," then strategy "1" is "B." Apply that strategy function to the state and if it's whole, I apply the hold action to the state to get a new state. Otherwise, I assume that the strategy function is legal, and if it doesn't return whole, then it does return roll. I'll give it the benefit of the doubt there, and perform the roll action on the state and on a random integer from 1 to 6, inclusive. That will give me the new state, and I continue until we find a winner. 1 def play_pig(A, B): 2 strategies = [A,B]
5/14
CS212 Unit 5 3 state = (0,0,0,0) 4 while True: 5 (p, me, you, pending) = state 6 if me >= goal: 7 return strategies[p] 8 elif you >= goal: 9 return strategies[other[p]] 10 elif stategies[p](state) == 'hold': 11 state = hold(state) 12 else: 13 state = roll(state, random.randint(1,6)))
06/05/12 11:49:53
1.15 15. 10 l Dependency Injection

Now, the question is, how can I test a function like this, that includes this nondeterministic component? One thing we want to be able to do is inject into here some deterministic numbers to say this is the sequence of "die rolls" I want to give you and then, from that, then I can check if it's doing the right thing. This is an example of a concept called Dependency Injection, which has a rather scary and intimidating-sounding name, but it's actually a pretty simple idea. The idea is we've got a function like this, it's a big complicated function and way down somewhere inside, there's something that we want to affect, something we want to monitor or track or change. Dependency Injection says this function depends on this random number generator, so let's be able to inject that. How do we inject something into a function? Well, we just add it as an argument. So let's add in the argument here, and let's call it "die rolls" and say, that's going to be a sequence or an iterable that will generate possible "die rolls. In the normal case, that will just be random numbers exactly like it was before. We don't care what they are, but when we want to test the function we can inject the "die rolls" that we want. We can just pass in a list saying what happens if the "die rolls" are a 6 and a 1 and then a 3 and a 5, and so on. Tell me what happens. So here's my implementation of the Dependency Injected Play Pig. I still have the regular arguments "A" and "B." There's an optional argument. If I leave that out it should behave exactly like it did before. But if I specify it, then I have control over it. So "die rolls" should be an iterable that generates rolls. Here we go down and we ask for the next one out of those rolls and get it back. By default, "die rolls" just says we're going to generate an infinite sequence of random integers from 1 to 6. Oops, I think I misspoke there. I think I said that "die rolls" have to be an iterable. Actually, what it has to be is an iterator such as a generator expression or something else, in order for it to have the next apply to it.
1.16 16. 11 p Loading the Dice

So now, with this play pig, with the dependancy injection, with the goal being 50, here's a test that I can actually run. So "A" and "B" are going to be my two contestants. "A" is hold at 50, which is equivalent to saying never hold until you win. "B" is the clueless function, the one that acts at random, and rolls is going to be an iterator of some list of numbers, maybe 1, 2, 3 or whatever you want, but I want you to write in there the list which is the shortest possible list, or one of the shortest possible lists that allows "A" to win and then you can check Play a Game of Pig between "A" and "B" with these rolls and make sure that "A" wins.
1.17 17. 11 s Loading the Dice

Here's my answer. I've rolled eight 6s. That gives me 48 points, and then a 2--that gives me 50--and that allows "A" to win. There are other sequences that are of the same length, but none that are shorter.
1.18 18. 12 q Optimizing Strategy

So we've seen several different strategies and we've compared them and tried to find one that was better, and we could keep doing that, trying to improve and make a strategy better and better, but what if we could make a leap? Instead of incrementally coming up with a slightly better strategy, would it be possible to leap to the best strategy? To make it sound more mathematical, we can call it the optimal strategy. Can we do that and what would it even mean? On the surface it's not exactly clear. When we did search, we didn't know what our first action was. We started out in some state and we knew there were several different states we could go to, and from there, there were other states we could go to. All we knew is that we were trying to arrive at some goal location. But we knew if we just specified how the domain works, how you get from one state to the next, and if we specified an algorithm that found the best path, the shortest path, or the least cost path, that eventually, by applying that algorithm to that description of the world, we would arrive at the best possible solution. So maybe we can do the same thing here. Even though we're dealing with uncertainty, maybe we can still define what the world looks like and discover the optimal solution. So when we were doing search, it was always sort of one agent doing the searching, so let's call that "me," and what am I looking for? Am I looking for the best path or the 6/14
CS212 Unit 5
06/05/12 11:49:53
worst path? Well, obviously, we're looking for the best path and we can describe that and once we've got that description we've got to search it outward. Now we've gone beyond search in two ways. The most obvious is we're dealing with probability, so we've got dice or whatever other random element there is, and then in addition to that, for the big game, we introduced another complication, which is our opponent. And now this question of what each of these three are trying to do, and I want you to tell me, is our opponent trying to get the best, and that means best score for "me," or is the opponent trying to get the worst score for "me," assuming we're diametrically opposed. So the worst score for "me" would be the best score for the opponent, or is the opponent trying to come up with the outcome that is average? And tell me the same for the dice. Is the dice with "me" in trying to get the best result for "me?" Is the dice plotting against "me" in trying to get the worst result for "me?" Or is the dice going to average out? Go ahead and click the appropriate boxes there.
1.19 19. 12 s Optimizing Strategy

And the answer is, in the game of pig the opponent is trying to defeat "me," so they want the worst for "me." The dice has no intentionality. Everything is equal in terms of outcome for the dice, so that works out to average. So now we have a way of describing the world. When we start out, it's "me" and I have options I can take-- roll or hold--and I go in one direction and I get to a point where it's the dice's turn to roll, and that can have six outcomes. Rather than trying to choose the best, we'll just average over all of them. Let's say there's one here, and now it's my opponents turn to move and my opponent makes a choice, and let's say ends up here. And I look at all these paths through that keep on going until they get to the end of the game. And then if I say, if I always choose the best, and if my opponent always chooses the worst for "me," and if the dice average out, then I can describe all the paths to the end, and I can describe the value of those paths.
1.20 20. 13 l Utility

Now in economics and in game theory, the value of a state is called its utility. It's just a number. So we're at the end of the game, and if there's 1 state where we win, we'll give that a utility of 1. If there's another state where we lose, we'll give that a utility of 0. Now if I have a choice here--it's my turn to move--I have a choice to go either way. I'm going to maximize my choice, and I'm going to move there. That means the utility of this state is going to be 1 because I know I can get 1 by taking the optimal strategy. We keep backing up the tree that way. That's if it was my choice here. If it was my opponent's choice, and they could go in either way, then my opponent is going to minimize my score or maximize their score and go in this direction, forcing me to lose and allowing them to win. So we'll say the utility of this state is 0 for me. And I want to also introduce here another idea called the quality, which is the function on a state and an action and gives us a number--a utility number. So that's saying, what's the quality of this action in this particular state? So if these were the actions, hold and roll, then we'd say for my opponent the quality of rolling from this state would give us this utility, and the quality of holding would give us this utility. Finally, if it was the dice's turn--and let's say there are 6 outcomes, but 3 of them lead to this state and 3 of them lead to this state-- then from the dice we're going to average over all the possibilities, so it's half of one and half of 0, so the utility of this state is 1/2. So now we have a way--if we know the value of the end states, which is defined by the game, defined by when we win and lose and how many points each player gets for that-- now we have a way of essentially searching backwards to say, from the end state, I can go backwards and say, what's the value of one of my moves? Oh! I know that. It's the maximum. What's the value of my opponents? I know that. It's the minimum. What's the value of a random dice roll? I know that. It's the average. Now we could go all the way back, backing up the tree to say, what's the value of this start state? I can collect those values, and I can have a utility for this start state, and we'll see--in the game of pig--the start state for the first player has a little bit better than 50% chance just because they go first. For the game we defined, I think it works out to about 54%-.54 utility for the player who goes first. We can also work out the quality for each of these moves. We can work out, what's the quality of rolling from this state, and what's the quality of holding from this state, and then the optimal strategy is just the one that says choose the move which has the highest quality. If roll has the highest quality from this state, then that's the move that we should do. So just as we did in search, we're finding our way from the start to the goal. We can do that with a random problems, but we have to find a way not to just 1 individual outcome, but to all the outcomes that were covered by all the possibility of the dice. So the complications going to be more complex, but the idea is the same.
1.21 21. 14 q Game Theory

Now when you have a decision under uncertainty and there's an opponent, it's called game theory. If there's no opponent, it's usually called decision under uncertainty. There is other names. So let's look at an example of that first before we get back to the game of pig. Here's the decision I'm going to give you. You're on a game show, and you won, and you get a prize of 1 million dollars or euros or whatever currency you want to use. Now you're given a decision. You can keep the $1 million, or the host will flip a coin and you believe it to be a fair coin, and if you call it correctly you get $3 million, but if you get it wrong, you get nothing. So you analyze the outcome of this and you believe that this is a choice by the coin that has a 50% probability of each outcome, and you want to say, what should I do? Should I keep the million or should I go for the 3 million? What I'm going to do is code up a model for this, and then let the decision theory decide. First, I just define a variable million because it's hard to see the number of 0's and count correctly. Now traditionally, utility is used with the abbreviation U and quality 7/14
CS212 Unit 5
06/05/12 11:49:53
with abbreviation Q. So I'm going to define here a quality function that says, given a state and an action, what's my--and given utility, what state is worth to me that's going to tell me the value of that state action pair? And the actions available to me are holding and gambling. Let's go ahead and make that explicit. So in any state, the actions available are holding and gambling, where we're only going to deal with 1 state, but we make this perfectly general. And the state that I start with is, however many dollars I have in my pocket-- could be anything. And given that state, if I hold, my state is going to be increased by $1 million, and then there's some utility on that--how much do I value having what I have now plus 1 million. And if I gamble, there's a 50% chance that I get 3 million more than I have now. There's some utility for that. And a 50% chance that I get nothing more than I have now, and some utility for that. So that describes the quality of the state, but only describes it if I have a utility function. I have to know how much do I like money? Well, the simplest choice for utility function is the identity function. Say the identity function just takes any input x and returns x. It's the input itself, and so we could say, if I start with nothing, the value of the state of having nothing is 0, and the value of the state of having a million is a million. Now here's--the amazing thing is, I can just write out what the optimal strategy is, what the best action is for this state, and what it's going to be is the maximum over all the possible actions from the state, that was just hold and gamble, maximized by EU, which stands for Expected Utility. Expected meaning average. So what's the average utility of each of the actions, and I've defined the average utility as the quality of that state, given that state action pair under the utility function? And that means that the Q had to deal with the averaging, and it did that. It said, 50% this, 50% that. That's the value of gambling. Now this best_action function solves this particular problem. But the amazing thing is is that we can completely generalize this, so if we just add in parameters, now we're saying what's the best_action in a particular state if you tell me what the available actions are, what the quality of each state action pair is, and what the utility is over states, then I can tell you what the best_action is. That works for any possible domain that you can define. It's an amazing thing that we solved all the problems at once. Similarly to the way in search where we had 1 best_search algorithm that could solve all of the search problems. Now it doesn't mean that we're done, and we never have to program anything again because programming can be difficult. There's some problems that don't fit into this type of formulation, and there are many, many problems which you can describe, but which you can't solve in a feasible amount of time. So we haven't solved everything, but it is amazing how much we can do with just this 1 short function. Let's go ahead and solve it. Let's solve this problem, and let's say I start off with $100, what's my best_action? Then when I run that, it tells me the best_action is gamble. Now that doesn't sound quite right to me. If you are faced with that problem, assuming you had $100 to your name. Would you take the gamble--try to go for the 3 million, or would you hold with 1 million? And there's no right or wrong answer to this despite what the interface has to do. It has to tell you one answer is right or wrong, but you can ignore that. I just want to collect some data on how many people think that they would gamble in that situation and how many people think they would hold.
1.22 22. 14 s Game Theory

So I predict that most people say they would hold, and why is that? Well, under the identity function, sort of the arithmetic function, $3 million is 3 times better than $1 million, and so half of $3 million is 1.5 times better than $1 million. So the gamble is more. But that's only true if $3 million really is 3 times better than $1 million. For most people, that's not true. That going from $100 to a $1 million is a big, big jump. Going from $1 million to $3 million is a smaller jump than that, even though arithmetically it's more, in terms of what you can do, it seems like less, and that doesn't mean that people are irrational in any way. Instead what it means is that for people, the value of money is not a linear function. Rather it's something more like a logarithmic function, meaning if you have a certain amount of money, if you double that money, you don't get twice as much value out of having that money, rather you just get 1 increment more of having that money. So let's try again. I'm going to input the math module, and now I'm going to ask, what's the best_action starting from $100 in my pocket, but valuing money with logarithmic function rather than with the identity or linear function. Now my best_action tells me that what I should be doing is holding. That corresponds to my intuition. That that's the right thing. I can also ask, well, what if I had $10 million already, then would I take the bet, assuming my value of money is still logarithmic, and best_action tells me that yes, I should. If I have $10 million, now I'm starting to look at money as more closely linear again. I'm at this stage where logarithmic function is approximately linear locally. If I've got $10 million, I could say, yeah, I'm risking my $1 million, but that's no big deal. I've already got $10. It's a good bet because if I win, I get 3 or 0--that's 1.5 on average, and that's more than 1, so I'm willing to take that bet, and I don't mind not gaining the additional $1 million.
1.23 23. 15 q Break Even Point

So now I want to ask you a question. So given this quality function Q, and assuming that our utility function is the log function, since we saw that for some values of this state the best action-- the action with the highest quality is 'hold' and other values is 'gamble,' there must be a point at which there is a crossover-- where the two values are approximately equal. So what I want you to tell me is: to the nearest million, what is that crossover point C? That value of state-- that value of the amount of money that I currently have to say that I'm indifferent between the 2-that if I have C dollars, then my quality for gambling is the same or approximately the same as my value for holding. What value of C to the nearest million is that true of? And just to make it easier for those who can't divide by e in your head, we'll use the log10 logarithms rather than the natural logarithms so that the log10 logarithm of a million is 6. Tell me what the crossover point is to the nearest million. 8/14
CS212 Unit 5
06/05/12 11:49:53
1.24 24. 15 s Break Even Point

And the answer is crossover C is 1 million, and if I ask for the quality of C gambling versus hold with the log10 utility function-- and that's a two-value tuple-- and we see, in fact, they're the same.
1.25 25. 16 q Whats your Crossover

Now I want to gather a little bit more data, and here, again, there is no right or wrong answer even though the interface may tell you that your answer was right or wrong, I just want to do this sort of as a sociological experiment to see where people are. So just tell me what your crossover point is. Again, there is no right or wrong answer, but what's the number of dollars-- or euros, if you prefer that--at which you'd be indifferent between accepting this gamble and holding? And put that in as an integer number, not the number of millions. So, if your crossover point is 1 million, write 1 million here, don't write 1.
1.26 26. 17 l Optimal Pig

Now let's get to work on defining an optimal pig strategy. So we need a Q and a U function, and an actions function. Let's get started on that. The Q function we'll call Q_pig. It takes a state and an action and evaluates the quality of that against the utility function. And what should we use for the utility function? Well, I think the best thing to use is the probability of winning because we get 1 point for winning and no points for losing. That's a good way of thinking about the game. And so our expected outcome is going to be somewhere between 0 and 1, and that's like a probability. And so the probability of winning is our score. If we win all of the time, the probability of scoring-- of winning is 1, then that should be worth 1 point. If we win none of the time, that should be worth 0 points. So our utility is just the probability of winning. And what is that? Well, if we hold, then we're turning it over to our opponent, and we still have our hold and roll functions that tell us what state we get to when we hold. And then it's our opponent's, and he's going to do his best, so our utility would be 1 minus the opponent's utility-- what he can do best, his probability of winning because either one player or the other has to win So our probability of winning is 1 minus our opponent's probability of winning. If we roll, it's more complicated. If we roll a 1, then we pig out, and it's 1 minus our opponent's probability of winning because it's his turn. And otherwise, we just take all the possibilities for the other die rolls, and it's our probability of winning from the result of rolling in that state, and that's six probabilities all together. So we have to average them. So we add them all up, and divide by 6. And if the action wasn't hold or roll, I'm going to raise the value. What are the actions in this state? Well, if there's some pending numbers, I can roll or hold, and if they're not, I'm just going to roll. That's the only thing that makes sense to do.
1.27 27. 18 l Pwin

Now what's the probability of winning from a state? It seems complicated. It seems like we've got a lot of work to do, but actually, we've almost solved the whole thing. All we have to do is say, "What's the end point?" So remember, we start out in the start position, and then we have some end positions where the game is over, and we have to assign utilities, which is the same as probability of winning, which is either 0 or 1. So this is a losing state, so it gets a Pwin of 0. This is a winning state. It gets Pwin of 1. We assign all of those, and then all the other states that depend on these-- we've already figured that out in terms of the Q function. Let's see how that works. So the probability of winning is 1 if my current score plus the pending is greater than or equal to goal. Then I win automatically just by reaping those pending. My probability of winning is 0 if your score is greater than the goal and I haven't won. And otherwise, my probability of winning is the probability that I get by taking the best action. So for all the actions-- among all the actions I can do, look for the Q value of that action-- from the current state according to the utility function-- try to maximize that, and that's going to be my probability of winning. So that's saying I can make the best choice that I can. So we said that we had 3 choice points. Here, I'm making the best choice by maximizing. Here, the die gets to roll, and we're averaging-- we're summing them all up and dividing by 6, so that takes care of the averaging-- and what about the worst choice that the opponent makes? Well, that's just folded in because rather than explicitly worrying about me and my opponent, I just said, "Well, I can use That's the probability of the opponent winning.
1.28 28. 19 p Maxwins

So now we're almost there. We've defined the problem. We've defined how the game works, and we're ready to write the optimal strategy-- the best possible strategy for Pig, and I'll let you finish it off. We'll call this function "max_wins"-- the strategy function that maximizes the number of wins-- --at least the number of expected wins-and go ahead and write your code there, and you can write it in terms of the functions we've defined above and in terms of a call to best action.
9/14
CS212 Unit 5
06/05/12 11:49:53
1.29 29. 19 s Maxwins

And this is all it is. We just call best action from the current state using the Pig actions, using the quality function for Pig and trying to maximize the probability of winning.
1.30 30. 20 l Impressing Pig Scouts

Now let's see how we did. So I'm going to define a set of strategies-- the clueless strategy we expect to do the worst; strategies that try to solve the problem in 4 chunks, in 3 chunks in 2 chunks, and to solve it all in one win; and then the max win strategy. Now, we play a tournament with these strategies, and here's the results we get back. So you can see that the clueless strategy does very poorly-- only wins 23 games out of 500. The max win strategy does the best of all-- wins 325, but there's some competitors that are pretty close. So hold at 20 wins 314-- not that much worse off than the optimal strategy. And that holds up if we play a tournament with more games just to get a little bit more accuracy. You wouldn't be able to hit the run button and do this because it would time out, but if you bring it in to your own development environment, you can do that. And here we see max wins gets So only a couple percent better for max wins over hold at 20, but still it's nice to know that no strategy can do better. And it turns out that if we increase the goal and made a longer game than just playing to 40 points-- that the advantage for max wins over any of these other strategies would only increase. In the betting game, we had different utility functions. We tried out the linear utility, and we tried out the logarithmic utility. What about here? Well, we defined our utility as a probability of winning, and the way the game is defined, that's really the only sensible one. If you're trying to win the game, you should maximize the probability of winning. But maybe your only goal isn't just to maximize the probability of winning. Maybe you're in a big Pig tournament, and your seated at the Pig table, rolling the dice, and in the stands are lots of spectators, watching the game with excitement. And you know that somewhere in the stands, there's a scout from the NPA--the National Pig Association. And what you want to do is not just win the game-- because lots of people are going to win the games-- but you really want to get the attention of that NPA scout so that you can move on and have a professional career. So maybe what your utility function would be would not just be to win the game, maybe your utility would be to maximize the differential, to say, "If I just won the game by a couple points, nobody is going to notice, but if I won by a lot-- if I really clobbered my opponent-- then maybe this guy would take notice, and that would be worth more to me." So you'd give up on the goal of just winning, and try to go for the maximizing your differential.
1.31 31. 21 p Maximizing Differential

So here I've written the utility function. I called this utility function "The Winning Differential." And given a state, it tells me what that differential is-- expected differential for that state. And what it says is if we're at the end of the game, if somebody's won, then before, remember that our utility function was 0 or 1, but here utility function is my score-- which is me, and I'm going to reap the pending-- minus your score. Otherwise, we just do the same thing with a Q function that we did before. And note that we're always careful to memoize these functions, because they're recursive, they're recalling themselves over and over again. We don't want to repeat those computations, so we memoize them so we only have to each date computation once. Now, what I want you to do is write the strategy function. This was the utility function over states, now I want you to write the strategy function. You can do it in terms of what we've defined before
1.32 32. 21 s Maximizing Differential

And the answer is you call best action from the state. The actions and the Q function are just like before, but the utility function we're trying to maximize is the differential.
1.33 33. 22 l Being Careful

Now, I want to say right here that I made a mistake, and I haven't talked about this very much over the course of these lectures, but I'm making mistakes all the time. I know you guys are. You type something in, it doesn't work. I have the same problem over and over again. I keep making mistakes, and that's fine as long as I know how to correct them. Here the mistake I made is--the mechanical mistake was I messed up, and I typed the wrong function at one point. I think these function names are too similar, and I made a mistake in naming them max_diffs and win_diff. They sound too much alike, and when I was playing with this I put in the utility function where I meant to put in the strategy function, and that's an easy mistake to make because they sound the same. Probably I should have come up with better names. I should have called this win_diff_utility or something like that to make it clear that this is the utility function and this is the strategy function, but the annoying thing was that mistake went unnoticed for a while, and I'll show you part of the problem. Here's the play_pig function. It takes two strategies: A and B. These have to be strategy functions, and then it comes down here, and it applies the appropriate strategy function to a state, and it says if that strategy function decides to hold, then we do the hold action, otherwise-- well, there's only 2 actions you could do, so I'll do the roll action. That makes sense, right? Now here's the problem. If instead of passing in a strategy function you accidentally pass in a utility function, then that's all good down here. You say strategies of P apply to state, and that's really applying a utility function to 10/14
CS212 Unit 5
06/05/12 11:49:53
state, what's the utility function going to return? Well, it's going to return a number, and play_pig just says, "Well, does that number equal to hold?" No, it's not. No numbers are equal to hold. So then I'm just going to assume that you meant roll. And so the fact that I passed in a completely wrong function that's doing nothing related to strategy-- it's returning a number rather than an action-- went completely unnoticed, and instead what my strategy was-- the utility function that returns a number acted as if it was a strategy function that always said roll. Now, in general, that's one of the complaints that people have about Python is that it's too easy to make that mistake because you don't have to declare for each function what are its inputs and what are its outputs. In other languages, you would do that, and the program where you accidentally used a utility function where you expected a strategy function-- that program wouldn't even compile. You'd get an error message before you ran it. In Python, you don't have that protection, so you've got to build in the protection yourself.
1.34 34. 23 p Legal Actions

So what I'd like you to do is update the play_pig function so that it looks at the result that comes back from the strategy function, and make sure that it's either hold or roll and if it's not one of those, then let's decide that what we do is that that strategy automatically loses a game. So if you make an illegal move, you lose the game right there.
1.35 35. 23 s Legal Actions

Here's how I did it--makes the function just a little more complicated. I added a variable called "action" to hold the result of the strategies. If it's hold, we hold. If it's roll, we roll. Otherwise, you lose, which means that the other strategy wins.
1.36 36. 24 l Using Tools

So now let's go back and analyze this maximize differential strategy versus the maximizing probability of winning strategy. The question is, how do these 2 compare? When are they different, and when are they the same? If you're trying to impress the scouts, you're not going to be making some crazy moves, so probably most of the time, you'd expect the 2 strategies to agree, but some of the time, maybe 1 of them is going to be more aggressive or taking more chances than the other. Let's see if we can analyze that. So I start off by defining a bunch of states, and I'm just going to look from 1 player's point of view. It doesn't really matter to have both since it's symmetric, so for all these values of me, you, and pending, collect all those states. It turns out that there's 35,000 of them. Then I define a variable r to be a default dictionary, which counts up integers, so it starts at 0, and then I go through all the states, and I increment the count for a result for the tuple of the action that's taken by max_wins and the action that's taken by max_diffs. I want to count up. This is going to be hold, hold, roll, roll. etc. I want to see how many of each do we have, and let's convert r back to standard dict, and there we have it. So most of the time, 29,700 out of the 35,000--both strategies agree that roll is the right thing to do. Then another 1200 times, both strategies agree that hold is the right thing to do. But in 2 cases, they differ. So sometimes, max_wins says hold and max_diffs says roll. That happened 381 times, but 10 times more often, it's max_wins that says roll and max_diff that says hold. That actually surprised me. So it's the max_wins strategy that's really more aggressive. It's rolling more often. I thought it was going to be the max_diffs strategy. I thought that was going to be more aggressive, right? So that's the one that's trying to impress the scouts. I thought it was going to be rolling trying to rack up a really big score. But no! So the data tells a different story. It's not trying to rack up a really big score. So what's going on? Well, first it might be nice just to quantify how different they are since I kind of asked that question. So there's 35, 301 states all together and they differ on 3975 + 381, and that's 12% of the states that they differ on. So what's the story? Where do those 12% of the states come from? We still don't know, and we don't even quite know what questions to ask, but it's here that some of our design choices start to pay off. So remember we always start our design with an inventory of concepts, and we have things like the dice and the score, and then we got into things like the utility function and the quality function, so we built all these up, and yes, we're building from the ground up, and yes, at the top, we have a play_pig function, and we can still call that function, but at the bottom, we have all these useful tools. So now when we're not just about playing pig, now we're trying to analyze the situation to understand this story of why are these 2 different? Well, play_pig by itself--the top level function we define--that's not going to help us, but all these little tools that we built down here, they will be helpful. We can start to put them together and explore. So we built this tower, and the tower built up to define the play pig function, and in some languages, it's all about building the tower. When you're done, that's all you have. But in Python, it's common and in many languages, it's a good design and strategy to say let's just build up components along the way so that we--yes, we have the tower, but we can also go out in other directions. If we're interested--not just in playing pig--but we're interested in figuring out this story, then we can quickly assemble pieces from down here and build something that can address that. So I've got all the pieces available. It makes it easy to explore. But I still need an idea, and here's my idea. I expected maximize differential to be aggressive, to try to rack up the big points, and I found out that it was actually maximizing the probability of winning that was more aggressive that rolled more often. Why could that be? I think I might know the answer. I think it might be that the maximized differential is more willing to lose rather than more excited about winning by a lot. What do I mean by that? Well, if you're maximizing the probability of winning, you don't care if you lose by 1 or if you lose by 40, it's all a loss. The maximized differential--if he's losing by a fair amount, he might say, well--say he's 11/14
CS212 Unit 5
06/05/12 11:49:53
behind 39-0 in a game to 40, and say he's accumulated 30 points, If he's trying to maximize the probability of winning, he would keep on rolling. He says, well, I don't have that good of a chance of winning, but all that counts is winning. If I stop now, the opponent's going to win on the next move, so I've got to keep rolling. Probably I'll pig out and only get 1 point, but it's worth it for that small chance of winning. That's what the maximize win probability strategy would do. The maximize differential strategy would say, hey, if I can get 30 points rather than 1, that cuts the differential way down, so that's worth doing. I'll sacrifice winning in order to maximize the differential. Now that's a suggestion of a story, but I don't know yet. Is that the right story? Let's find out.
1.37 37. 25 l Telling A Story

So here's what I did. I wrote this little function to tell a story. The story I want to tell is, over all the states, let's group the states in terms of the number of pending points in that state, and then for each of those number of pending points, say all the states for which there are 10 points pending, how many times did max_wins decide to roll versus how many times did max_diffs decide to roll? And just to consider the ones in which the 2 differ. So throw away all of the states in which they take the same action. Let's see what that looks like. Let me just describe briefly what I do. So I start off--I have a default dictionary and the default is that I have 2 values-- Then I go through--get the 2 strategy functions to apply to the state. If they're different, figure out what pending is and increment the pending count for the person who decided to roll, and then I just go and print them out. Here's the result. It does tell a story. Look what's happening. When I have a small number of points pending, most of the time, the 2 strategies agree, but when they differ, it's the max_diff strategy that's deciding to roll. But as the number of points increase--the number of pending points increase, we see it's max_wins who's willing to roll, and max_diff not at all, so that's a perfect segregation between the 2 in this crossover point between 13 and 16. We're here. Max_wins is rolling all the time with very high pending amounts. But max_diff is never willing to do that. So what must be happening here is 300 times max_wins has 24 pending points, and he's willing to roll. It must be because the opponent is just about to win and he says, "Even though I'm risking 24, I still want to win the game, so I've got to roll." Max_diff says, "Are you crazy? I got 24 points on the board. I'm going to reap them right now. And yes, I may lose, but I'm really going to cut down that differential." So look what the story told us. I thought that this man in the arena who was playing the max_diff strategy was going to impress the NPA scouts by playing aggressively, and it turned out that the story was completely different. So what does that tell us. Well, first it tells us that there is an interesting distinction about how we wrote our function to maximize differential. Note that the way we wrote it is we completely separated the what, which is the rules of the game for how pig is played, from the how of how does it make decisions, and that was the perfectly general best actions. So we didn't go in and say, we're going to write specific rules for pig to say, if you're in such and such a position, do such and such. Rather we just said, here's how pig works. Here's what I'm trying to do. Maximize differential. You do the rest. So the results that came back were surprising to me because I didn't really understand how the what and the how interacted, but I was still able to write a program that maximized the differential, even though I didn't understand what that program is actually doing, and that's the power of breaking up these aspects into what's happening versus maximizing. The second part that's interesting here is that I was able to do exploration. So it wasn't just that I built a monolithic program that could do 1 thing. It's that I build a set of tools that could explore the area. When I wanted to understand something that was different from what I originally did, I was able to do that because I had the tools around.
1.38 38. 26 q Simulation vs Enumeration

So we've talked about some probability problems that we can handle with simulation--that is, choosing using a random number generator, some samples, and hoping that they're representative of the problem. And if you choose enough, you get a close enough representation. An alternative strategy is enumeration, where we actually go over all the possibilities and we can compute an exact probability. Now, some problems are so complex that it would take forever to do that, but computers are much better at it than people are, and so it's a powerful strategy. We'll show you some simple examples, and they can scale up to more complex ones. So let's imagine all the possible families living in houses, and these houses have differeing poperties. This one is colored red and has an Englishman. This one has a zebra, and so on. But for now we're only interested in the children that live in those houses, and, in fact, we're only interested in the houses that have exactly 2 children. So we're going to not consider some of them, and consider the other ones that have exactly 2 children, and then we want to be able to ask questions of them. And we can ask probability questions. We're going to constrain ourselves to ask conditional probability questions. So what is the probability of Event A given Event B? And an event is just a state of affairs. So Event B might be the event of the family having exactly 2 children. And so we've crossed off these houses, and we're only taking these other ones, and then Event A might be the probability of having a boy or a girl. Now, in real life, we could do simulation by going out and polling and asking people. In mathematics, we can do enumeration if we make certain assumptions, and one assumption we can make is that it's exactly 50% probable that you get a boy and 50% probable that you get a girl and that one birth is independent from another. So let's address the question. What's the probability of having 2 boys given that there is at least 1 boy in the family? And the universe of possibilities is only the families that have exactly 2 children. So we could put that here in the condition as well-- at least 1 boy and 2 children total. What do you think this probability is equal to? Let's put your answer here and enter it in the form of a fraction. So if you think it's one-half, put 1 and then 2. If 12/14
CS212 Unit 5 you think it's 11-17ths, put 11 and 17.
06/05/12 11:49:53
1.39 39. 26 s Simulation vs Enumeration

Here's the way to look at it is we count up the number of equally probable events on this side, 1, 2, 3; and then we count up out of those how many appear on this side, just one; so the answer is one-third.
1.40 40. 27 l Conditional Probability

Now if you could do all of these just by writing with a pen, you wouldn't be in a computer class. So let's start modeling this. And so we're going to do our concept inventory. So what do we have? Well, these individual results here come from random variables. So a random variable is like the first child born, which can be a boy or a girl. The whole universe is called the sample space, and then these individual sets of circles are called events--like the event of having (2 boys), or the event of having at least 1 boy. And an event consists, then, of a collection of sample points. So a sample point is BG, GB or BB. In terms of representation, we're going to just represent sample points as strings. So we'll have like the string, 'GG', and we can represent events two ways: as a collection of strings or as a predicate-- a function, which is true of certain strings and not of others. So here's what I did: I imported itertools because we're going to need that. I searched for--and found--a new class, called fractions, within the fractions module, and there's a function or a constructor, called fraction, which produces an exact fraction. And the reason I wanted that is because when the answer is 1/3, I wanted to see that it was exactly 1/3. I didn't want to see that it's .33333. So all a fraction is is a numerator and a denominator, paired together, and they know how to do arithmetic. So here's my random variable. I represent that as a collection of possibilities in here-I just strung the possibilities together. I could have said set of 'BG' or the list of 'B,G' but I just put them together, as a string. Then I said we can combine random variables with a cartesian product--and I used itertools.product-and then I just said: get all results, which itertools.product produces tuples, and I want to look at them as strings. So now two_kids is the product of two children and we're looking at their sex. So if I evaluate that, I get this collection. And one_boy is just the points in two_kids that have at least one boy in the string. So it would be this one, this one, and this one. And then I can define two_boys as a predicate. That's saying that's True when the count of the number of boys is equal to (2). That would be True here. And finally, I can define my conditional probability. That says what's the probability of (p), given (e), where (e) is an event specified as a list of sample points, all equal, probable, and predicate; is a predicate that returns True or False of elements of that event, and so the True elements are just the ones for which it is True, and then I returned a fraction-- how many out of the event satisfy the predicate. Then I can ask, what's the conditional probability of two_boys, given one_boy--and the answer is 1/3. So that's what we expected.
1.41 41. 28 q Tuesday

Now let's move on to a slightly more complicated question: out of all the families with two kids-- with at least 1 boy, born on a Tuesday-- what's the probability of two boys? Now, you might think that the answer should be the same-- it should still be 1/3 because why does Tuesday matter? After all, the kid's gotta be born sometime and if it happens to be Tuesday, why would that be any different than any other day? So is it 1/3? Well, as Gottfried Leibniz said, "Let us calculate." So we have the technology to model that. First, a random variable for day of the week-_ and I had to fool around with the capitalization there, to make sure that we have 7 distinct letters: Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday-- plus ample space of two_kids_bday, one kid with their day of birth; the second kid, with their day of birth. What does that look like? Well, it's this huge thing of (2 X 7 X 2 X 7) entries. The first one: Boy born on Sunday; boy born on Sunday, all the way through to the last one: girl born on Saturday, girl born on Saturday. Then a boy born on Tuesday is all the elements of this, where "BT" appears is in the string. So either "BT" will be the first 2 characters or the last 2 characters. And now we're finally at the point where we can say: given at least 1 boy_tuesday, what's the probability of two_boys? And before I show the results, I'm going to ask you what you think it is. You could follow along, either with pencil and paper or do the computation or just think it out in your head. So Enter as a fraction. If you think it's 1/3, put a 1 here and a 3 here--or whatever.
1.42 42. 28 s Tuesday

If I go ahead and execute this and print the result, it comes out: 13/27. Wow! Where did that come from? So that' surprising--first of all, it's not 1/3, which you might have thought should be the answer if you believe the argument that Tuesday doesn't matter. And secondly, not only is it not 1/3, but it's much closer to 1/2 than it is to 1/3. So just having the birthday there really changed things a lot. How did that happen? Well, I wrote up a little function here to report my findings, and here's its arguments. You can give it a bunch of cases that you care about-- the predicate that you care about and whether you want the results to be verbose or not. And it just prints out some information-- and, by the way, as part of this, I also looked at the question of what's the probability of two_boys, given that there's one boy born in December so I threw that in as well. And here's the output I get: 2 boys, given 1 boy is 1/3; and born on Tuesday is 13/27, and born in December is 23/47. Now, I can turn on the verbose option to report In that case, here's what I see: The probability of 2 boys, given at least 1 boy-- born on 13/14
CS212 Unit 5
06/05/12 11:49:53
Tuesday--is 13/27. And here's the reason--at least 1 boy, born on Tuesday, has 27 elements--and there they are-and of these, 13 are 2 boys--and there they are. And so, you can't really argue with that. You can go through and you can make sure that that's correct, and you can look at the other elements of the sample space and say no, we didn't miss any-- so that's got to be the right answer. It's not quite intuitive yet, and I'd like to define my report function so that it gives me that intuition but right now, I don't have the right visualization. So I've got to do some of the work myself. And here's what I came up with: We still have the four possibilities that we showed before but now we're interested, not just in boys-- we're interested in boys born on Tuesday. So there's going to be some others over here where there's, say, boy born on Wednesday, along with some other partner-- maybe a boy born on Saturday. But we're not even considering them; we're throwing all those out. We're just considering the ones that match here. And like before, we draw 2 circles: one of the right-hand side of the event-- of the conditional probability. And so how many of those are there? Well, there's 7 possibilities here because the boy has to be born on Tuesday-- there's only 1 way to do that--but there's 7 ways for the girls to be born. So there's 7 elements of the sample state there; likewise, 7 elements over here. Now how many elements over here? Well here, either one of the 2 can be a boy born on Tuesday. So really, we should draw this state as either a boy born on Tuesday, followed by another boy or a boy, followed by a boy born on Tuesday. And how many of those are there? Well, there's 7 of these by the same argument we used in the other case, and of these, there's also 7 but now I've double-counted because in one of these 14 cases is a boy born on Tuesday, followed by a boy born on Tuesday. So I'll just count 6 here. And so now it should be clear: 7, 14, 21, 6, 27. There's 27 on the right-hand side, and then what's the probability of 2 boys, given this event of at least 1 boy born on Tuesday? Well, 2 boys--that's here--so it's 13 out of the 27. So that's the result. Seems hard to argue with. Both the drawing it out with a pen and the computing worked out to the same answer. Now why is it that we have a strong intuition that, knowing the boy born on Tuesday shouldn't make any difference? I think the answer is because we're associating that fact with an individual boy. We're like taking that fact and nailing it on to him--and it's true. If we did that, that wouldn't make any difference. But, in this situation, that's not what we're doing. We're not saying anything about any individual boy. If we did that, the computation wouldn't change. Rather, we're making this assertion that at least one was born on Tuesday-- not about boys, but about pairs. And we just don't have very good intuitions about what it means to say something about a pair of people, rather than about an individual person, and that's what we did here-- and that's why the answer comes out to 13/27.
1.43 43. 29 l Summary

So let's summarize what we did in this Unit. We learned that probability is a powerful tool for tackling problems with uncertainty. We learned that we can do Search with uncertainty, like we did Search in the previous Unit, over Exact Certain domains. Here, we can handle uncertainty in our Search. We learned that the notion of Utility gives us a powerful and beautiful general approach to solving the Search problems. It gives us the best-action function with which we can solve any problem that can be specified in the form that best-action expects, and that's a wide variety of problems. Now, some of them are so complex that they can't be computed in a feasible amount of time. And there are more advanced techniques for dealing with approximations to that. But it's incredibly powerful because it separates out the How versus the What. You only have to tell the computer what the situation is. You don't have to tell it how to find the best answer, and it automatically finds the best answer. And we learned you can deal with probability through simulation, making repeated random choices, and just counting up in how many one answer occurs, versus another. And we learned that if the total number of possibilities is small, you can just enumerate them. You can count them all, and you can get an exact answer, as an exact fraction rather than an approximation. And we learned some general strategies that don't have to do with probability. When we were trying to figure out how to add printing to our game, we looked at the notion of a wrapper function. That is, how we inject functionality into an existing function, by sneaking it in on top of one of the arguments. And this is an example of aspect-oriented programming, where we take the aspect of printing out what's happening and keep that separate from the main logic of the program. We learned that you can do exploratory data analysis. When I was looking at the two strategies for playing PIG and where they differed, that was a completely different question than what I'd designed the PIG program for. Because I had put together the right pieces, it was easy to do the exploration and come to an understanding. And we learned--or, at least, I learned because I was the one who made the mistake-- that errors can pop up, particularly in the types of arguments and results that functions expect in return-- and that you have to be careful, Python, to deal with that because Python doesn't give you the seatbelts that other languages have, to protect yourself from those type of errors. So you have to be vigilant, on your own. And finally, that was a lot to cram into one Unit. So if you followed along all of that-- congratulations, for the work you've done. You've learned a lot. Have fun with the homework; we'll see you in the next Unit.
14/14

CS212 Unit 5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS212 Unit 5

Uploaded by

Copyright:

Available Formats

CS212 Unit 5

6 Jos Antnio Soares Augusto

1.1 1. 01 Welcome Back

1.2 2. 02 Porcine Probability

1.3 3. 03 q The State of Pig

1.4 4. 03 s The State of Pig

1.5 5. 04 l Concept Inventory

1.6 6. 05 p Hold and Roll

1.7 7. 05 s Hold and Roll

1.8 8. 06 l Named Tuples

1.10 10. 07 s Clueless

1.11 11. 08 p Hold At Strategy

1.12 12. 08 s Hold At Strategy

1.13 13. 09 p Play Pig

1.14 14. 09 s Play Pig

1.15 15. 10 l Dependency Injection

1.16 16. 11 p Loading the Dice

1.17 17. 11 s Loading the Dice

1.18 18. 12 q Optimizing Strategy

1.19 19. 12 s Optimizing Strategy

1.20 20. 13 l Utility

1.21 21. 14 q Game Theory

1.22 22. 14 s Game Theory

1.23 23. 15 q Break Even Point

1.24 24. 15 s Break Even Point

1.25 25. 16 q Whats your Crossover

1.26 26. 17 l Optimal Pig

1.27 27. 18 l Pwin

1.28 28. 19 p Maxwins

1.29 29. 19 s Maxwins

1.30 30. 20 l Impressing Pig Scouts

1.31 31. 21 p Maximizing Differential

1.32 32. 21 s Maximizing Differential

1.33 33. 22 l Being Careful

1.34 34. 23 p Legal Actions

1.35 35. 23 s Legal Actions

1.36 36. 24 l Using Tools

1.37 37. 25 l Telling A Story

1.38 38. 26 q Simulation vs Enumeration

CS212 Unit 5 you think it's 11-17ths, put 11 and 17.

1.39 39. 26 s Simulation vs Enumeration

1.40 40. 27 l Conditional Probability

1.41 41. 28 q Tuesday

1.42 42. 28 s Tuesday

1.43 43. 29 l Summary

You might also like