You are on page 1of 11

ASSIGNMENT UNIT 7

UOPEOPLE.EDU
S100946
1 Directed Questions
The STRIPS representation for an action consists of what?
Answer: Preconditions - a set of assignments of values to variables that must be true for
the action to occur. Effects - a set of resulting assignments of values to those variables
that change as the result of the action.
What is the STRIPS assumption?
Answer: All of the variables not mentioned in the describtion of an action stay unchanged
when the action is carried out.
What is the frame problem in planning? How does it relate to the STRIPS assumption?
Answer: The frame problem is the problem of representing all things that stay unchanged.
This is important because most actions affect only a small fraction of variables, e.g. filling
a cup with coffee changes the state of the cup and of the pot but not the location of the
robot, the layout of the building, etc. The STRIPS assumption just says that all variables
not mentioned in the description of an action remain unchanged.
What are some key limitations of STRIPS?
Answer: States are represented simply as a conjuction of positive literals, e.g. poor
unknown, goals are conjunctions (no disjunction allowed), no support for equality.
2 STRIPS planning
Consider a scenario where you want to get from home (off campus) to UBC during a bus strike.
You can either drive (if you have a car) or bike (if you have a bike). How would you represent
this in STRIPS?
(a) What are the actions, preconditions and effects? What are the relevant variables?
Answer: The actions could be something like goByBike and goByCar. In a very simple
representation, there are variables loc, haveBike, and haveCar, indicating location,
whether or not you have a bike (t/f), and whether or not you have a car (t/f). The precon-
dition for goByBike is that haveBike = true, and likewise the precondition for goByCar is
that haveCar = true. The effect of each action is that loc = UBC.Figure ?? shows this
representation.
(b) If we select the action goByBike, what is the value of haveBike after the action has been
carried out.
Answer: It will equal true, as it had to be true for the action to take place, and since it is
not mentioned in the action effects its value will be unchanged.


Figure 1: Simple STRIPS commuting problem
(c) If we are at UBC and and select the action goByCar, what will the value of loc be after
the action has been carried out?
Answer: After the action loc = UBC as this is a specified effect. Notice that there is no
loc precondition for action, so if you begin at UBC or at home and select either action,
you will wind up at UBC.
1 Directed Questions
What is meant by the horizon in a planning problem?
Answer: The number of time steps for which the problem is rolled out.
What are initial state constraints in a CSP problem?
Answer: They constrain the state variables at time 0, i.e. before any action has occurred.
What are goal constraints?
Answer: They constrain the state variables at some time k, where k is the horizon.
What are precondition constraints?
Answer: They are constraints between state variables at time t and actions at time t. In
other words, they specify what must hold for an action to take place.
What are effect constraints?
Answer: They are constraints between state variables at time t, actions at time t and
state variables at time t + 1. In other words, the state variable at time t + 1 is affected by
the actions at time t and its own previous value at time t.
2 CSP planning
Theres a big football game tonight, and you cant miss it. Youre trying to decide whether to
watch it in person or on TV. Watching it in person requires having some money for a ticket.
Watching it on TV is only possible if you have a TV and there isnt a local television blackout
on the game. If you need money for a ticket, you can always sell your TV.
Figure 1 shows a CSP representation for this planning problem where the goal is to watch the
game.
What are the actions? Answer: watchAtPark, watchAtHome, sellTV
What are the state variables? Answer: haveMoney, haveTV, blackout, sawGame
What is the horizon shown in Figure 1? Answer: The horizon is 1.
Give the truth tables for the precondition constraint for action watchAtPark (labelled p1 s0
in the figure) and the effect constraint between blackout at step 0 and blackout at step 1
(labelled e3 s1).
Answer:
For p1 s0:

1 Directed Questions
What is meant by a one-off decision? How can this be applied in the delivery robot
example? Answer: The agent knows which actions are available, has preferences expressed
by utilities of outcomes, and makes all the decisions before any action is carried out. In
the delivery robot example, the decisions on wearing pads and taking the long or short
route are made before the robot goes anywhere. Multiple decisions can be considered as a
single macro decision.
Define utility in a decision problem. Answer: The utility is a measure of desirability of
possible worlds to an agent, i.e. indicates the agents preferences. Let U be a real-valued
function such that U(w) represents an agents degree of preference for world w. The value
of a utility is typically between 0 and 100.
How do we calculate the expected utility of a decision? Answer: The expected utility is
derived by summing over the possible worlds that select that decision, for each world w
multiplying U(w) by P(w).
How do we compute an optimal one-off decision? Answer: If we calculate the expected
utility for each decision as per the last question, we choose the decision that maximizes
the expected utility.
What are the three types of nodes in a single-stage decision network? Answer: Decision
nodes, random variables (chance nodes), and utility nodes
What is a policy for a single-stage decision network? What is an optimal policy? Answer:
A policy for a single-stage decision network is an assignment of a value to each decision
variable. The optimal policy is the policy whose expected utility is maximal.
Describe the variable elimination steps for finding an optimal policy for a single-stage
decision network. Answer: Prune all the nodes that are not ancestors of the utility node.
Sum out all the chance nodes. There will be a single factor F remaining that represents
the expected utility for each combination of decision variables. If v is the maximum value
in F, return the assignment d that gives that maximum value v.
2 A One-Off Decision
You are preparing to go for a bike ride and are trying to decide whether to use your thin road
tires or your thicker, knobbier tires. You know from previous experience that your road tires are
more likely to go flat during a ride. Theres a 40% chance your road tires will go flat but only a
10% chance that the thicker tires will go flat.
Because of the risk of a flat, you also have to decide whether or not to bring your tools along on
the ride (a pump, tire levers and a puncture kit). These tools will weigh you down.
The advantage of the thin road tires is that you can ride much faster. The table below gives the
utilities for these variables:
bringTools flatTire bringRoadTires Satisfaction
T T T 50.0
T T F 40.0
T F T 75.0
T F F 65.0
F T T 0.0
F T F 0.0
F F T 100.0
F F F 75.0
Create the decision network representing this problem, using AISpace. Answer: An
example is given in xml file bikeride tires flat tools.xml.

Figure 1: A decision problem.
Use variable elimination to find the optimal policy.
What are the initial factors? Answer: There are two factors to begin with, one
representing p(flatTire|roadTires) and one representing the utilities.
Specify your elimination ordering and give each step of the VE algorithm. Answer:
We sum out our chance node flatTire first. This results in a new factor on the
decisions. We eliminate roadTires by maximizing that decision variable for each
value of bringTools. This leaves one factor on bringTools. We maximize bringTools
in that final factor to get our answer.
What is the optimal policy? What is the expected value of the optimal policy? Answer:
The optimal policy is to take the thicker tires and leave the tools at home. The expected
utility of this policy is 67.5
Try changing the utilities and the probabilities in this problem, and identify which changes
result in a different optimal policy. Answer: There are many possibilities here, e.g.
changing the probability of a flat tire given the tire type, or decreasing the utilities for the
two possible worlds FFT (currently 100) and FFF (currently 75).


Figure 1: CSP representation for viewing the game
haveMoney s0 watchAtPark s0 p1 s0
true true true
true false true
false true false
false false true
For e3 s1:
blackout s0 blackout s0 e3 s1
true true true
true false false
false true false
false false true
What is the minimum horizon needed to achieve the goal, if the start constraints specify that
you have no money and that there is a TV blackout?
Answer: A horizon of 2. At step 1 you sell the TV and at step 2 you watch the game in
person.


1 Directed Questions
How is a sequential decision problem different from a one-off decision problem? Answer:
In a one-off decision problem, even if there are multiple decisions to make they can be
treated as a single macro decision. That macro decision is made before any action is
carried out. With a sequential decision problem, the agent makes observations, decides on
an action, carries out the actions, makes some more observations in the resulting world,
then makes more decisions conditioned on the new observations, etc.
What types of variables are contained in a decision network? Answer: Chance nodes
(random variables), decision nodes, and a utility node.
What can arcs represent in a decision network? Relate this to the types of variables in the
previous question. Answer: Arcs coming into decision nodes represent the information
that will be available when the decision is made. Arcs coming into chance nodes represent
probabilistic dependence. Arcs coming into the utility node represents what the utility
depends on.
What is a no-forgetting decision network? Answer: It is a decision network where the
decision nodes are totally ordered and, if decision node D
i
is before D
j
in the total ordering,
then D
i
is a parent of D
j
, and any parent of D
i
is also a parent of D
j
. This means that all
the information available for the earlier decision is available for the later decision, and the
earlier decision is part of the information available for the later decision.
Define decision function and policy. Answer: A decision function for a decision variable
is a function that specifies a value for the decision variable for each assignment of values
to its parents. A policy consists of a decision function for each decision variable.
A possible world specifies a value for every random variable and decision variable. Given
a policy and a possible world, how do we know if the possible world satisfies the policy?
Answer: The possible world satisfies the policy if the value for each decision variable in
that possible world is the value selected in the decision function for that decision variable
in the policy.
To find an optimal policy, do we need to enumerate all of the policies? Why or why not?
Answer: No, we can use variable elimination instead.
2 Sequential Decisions and Variable Elimination
Miranda is an enthusiastic gamer, spending quite a bit of time playing Wii video games and a
fair amount of money buying them. She notices that her neighbourhood video store rents Wii
games for much less than the cost of buying one. She realizes that renting the games might be a
good way to test them out before she decides whether or not to buy them. Figure ?? represents
her decision problem.


Figure 1: A decision problem.
Based on prior experience, Miranda expects that about 80% of video games will be good quality
and the other 20% she wont care for. Based on her previous experiences renting video games,
she also knows the following information:
P(Outcome = likesGame|goodQuality = True) = 0.85
P(Outcome = likesGame|goodQuality = False) = 0.10
The rental period is so short that its not always possible to get a reliable estimate of whether
the game is of good quality.
Below are the utilities for various outcomes of the decision process. You can think of the utilities
as representing a combination of gaming enjoyment and money saved (Satisfaction).
rentGame buyGame goodQuality Satisfaction
T T Trae 80.0
T T False -100.0
T F Trae 30.0
T F False -30.0
F T Trae 100.0
F T False -80.0
F F Trae 0.0
F F False 0.0
If we carry out the variable elimination algorithm, what are the initial factors? Answer:
There are 3 factors to begin with. f0(rentGame, buyGame, gameQuality) relates to the
utilities. f1(rentGame, outcome, gameQuality) represents the probability of outcome given
gameQuality and rentGame. f2(gameQuality) represents the prior for gameQuality.
Which decision variable is eliminated first, and why?
Answer: The decision variable buyGame is eliminated first because it is the last decision
in the ordering.
How is that decision eliminated? Answer: It is eliminated by choosing that values that
maximize the utility. For example, if rentGame=T and outcome=Like, then buying the
game results in a utility of 52.4 whereas not buying the game results in a utility of only
19.8. So we add a new decision function to our set of decision functions, specifying that
when the parents have those assigned values, the decision is to buy the game. This is done
for each combination of parent values.
After that decision is eliminated, which variable is eliminated next, and why? Answer: The
random variable outcome is eliminated next, because it is no longer parent to any decision
variable (since we removed buyGame).
What is the optimal policy for this decision problem? Answer: The optimal decision for
rental is not to rent.
The optimal decision for buying is to buy the game in every case except where the game
was rented and she disliked it.
What is the expected utility of following the optimal policy? Answer: The expected
utility of following that optimal policy is 64.0.
Use the AISpace decision applet to represent and solve this decision problem, and to check your
answers. The representation we have used is in file wii.xml.


Reference:
Chapter 7: Reasoning Under Uncertainty in: Poole, D. L., & Mackworth, A. K.
(2010). Artificial Intelligence: foundations of computational agents. Cambridge
University Press. Available online at http://artint.info/

You might also like