Robotics

Rational preferences
Idea: preferences of a rational agent must obey constraints.

Rational preferences ⇒
behavior describable as maximization of expected utility
Rational decisions
Constraints:
Orderability
(A B) ∨ (B A) ∨ (A ∼ B)
Transitivity
Chapter 16 (A B) ∧ (B C) ⇒ (A C)
Continuity
A B C ⇒ ∃ p [p, A; 1 − p, C] ∼ B
Substitutability
A ∼ B ⇒ [p, A; 1 − p, C] ∼ [p, B; 1 − p, C]
Monotonicity
A B ⇒ (p ≥ q ⇔ [p, A; 1 − p, B] ∼ [q, A; 1 − q, B])
Chapter 16 1 Chapter 16 4
Outline Rational preferences contd.
♦ Rational preferences Violating the constraints leads to self-evident irrationality
♦ Utilities For example: an agent with intransitive preferences can be induced to give
away all its money
♦ Money
If B C, then an agent who has C A
♦ Multiattribute utilities
would pay (say) 1 cent to get B 1c 1c
♦ Decision networks
If A B, then an agent who has B
♦ Value of information would pay (say) 1 cent to get A B C
If C A, then an agent who has A
1c
would pay (say) 1 cent to get C
Preferences Maximizing expected utility
An agent chooses among prizes (A, B, etc.) and lotteries, i.e., situations Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944):
with uncertain prizes Given preferences satisfying the constraints
A there exists a real-valued function U such that
U (A) ≥ U (B) ⇔ A ∼ B
p
U ([p1, S1; . . . ; pn, Sn]) = Σi piU (Si)
L
Lottery L = [p, A; (1 − p), B] 1−p MEU principle:
B Choose the action that maximizes expected utility
Notation: Note: an agent can be entirely rational (consistent with MEU)
AB A preferred to B without ever representing or manipulating utilities and probabilities
A∼B indifference between A and B E.g., a lookup table for perfect tictactoe
A∼ B B not preferred to A
Utilities Student group utility
Utilities map states to real numbers. Which numbers? For each x, adjust p until half the class votes for lottery (M=10,000)
Standard approach to assessment of human utilities: p
compare a given state A to a standard lottery Lp that has 1.0
“best possible prize” u> with probability p 0.9
“worst possible catastrophe” u⊥ with probability (1 − p) 0.8
adjust lottery probability p until A ∼ Lp 0.7
0.6
0.5
continue as before
0.999999 0.4
0.3
pay $30 ~ L 0.2
0.000001 0.1
instant death
0.0
$x
0 500 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Utility scales Decision networks
Normalized utilities: u> = 1.0, u⊥ = 0.0 Add action nodes and utility nodes to belief networks
to enable rational decision making
Micromorts: one-millionth chance of death
useful for Russian roulette, paying to reduce product risks, etc. Airport Site
QALYs: quality-adjusted life years
useful for medical decisions involving substantial risk
Air Traffic Deaths
Note: behavior is invariant w.r.t. +ve linear transformation
U 0(x) = k1U (x) + k2 where k1 > 0 Litigation Noise U
With deterministic prizes only (no lottery choices), only
Construction Cost
ordinal utility can be determined, i.e., total order on prizes
Algorithm:
For each value of action node
compute expected value of utility node given action, evidence
Return MEU action
Money Multiattribute utility
Money does not behave as a utility function How can we handle utility functions of many variables X1 . . . Xn?
E.g., what is U (Deaths, N oise, Cost)?
Given a lottery L with expected monetary value EM V (L),
usually U (L) < U (EM V (L)), i.e., people are risk-averse How can complex utility functions be assessed from
preference behaviour?
Utility curve: for what probability p am I indifferent between a prize x and
a lottery [p, $M ; (1 − p), $0] for large M ? Idea 1: identify conditions under which decisions can be made without com-
plete identification of U (x1, . . . , xn)
Typical empirical data, extrapolated with risk-prone behavior:
+U
Idea 2: identify various types of independence in preferences
o o o o o o o o and derive consequent canonical forms for U (x1, . . . , xn)
o
o +$
o
o
−150,000 o 800,000
o
o
Strict dominance Label the arcs + or –
Typically define attributes such that U is monotonic in each SocioEcon
Age
GoodStudent
Strict dominance: choice B strictly dominates choice A iff ExtraCar
∀ i Xi(B) ≥ Xi(A) (and hence U (B) ≥ U (A)) Mileage
RiskAversion
VehicleYear
X2 This region X2
dominates A SeniorTrain
B DrivingSkill MakeModel
C B C DrivingHist
A A Antilock
DrivQuality Airbag CarValue HomeBase AntiTheft
D
Ruggedness Accident
X1 X1 Theft
Deterministic attributes Uncertain attributes OwnDamage
Cushioning OwnCost
OtherCost
MedicalCost LiabilityCost PropertyCost
Strict dominance seldom holds in practice
Stochastic dominance Label the arcs + or –
1.2 1 SocioEcon
Age
1 GoodStudent
0.8
ExtraCar
0.8 Mileage
Probability
Probability
0.6 RiskAversion
0.6 S1 VehicleYear
S1 S2
0.4
0.4 S2 SeniorTrain
0.2 +
0.2 DrivingSkill MakeModel
0 0 DrivingHist
-6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 -6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2
Negative cost Negative cost Antilock
DistributionZ p1 stochastically dominates distribution p2 iff Accident
t Z
t Ruggedness
∀ t −∞ p1(x)dx ≤ −∞ p2(t)dt Theft
OwnDamage
If U is monotonic in x, then A1 with outcome distribution p1
Cushioning OwnCost
stochastically dominates A2 Zwith outcome distribution p2: OtherCost
Z
∞ ∞
−∞ p1(x)U (x)dx ≥ −∞ p2 (x)U (x)dx
MedicalCost
Multiattribute case: stochastic dominance on all attributes ⇒ optimal LiabilityCost PropertyCost
Stochastic dominance contd. Label the arcs + or –
Stochastic dominance can often be determined without SocioEcon
Age
exact distributions using qualitative reasoning GoodStudent
ExtraCar
E.g., construction cost increases with distance from city RiskAversion
Mileage
S1 is closer to the city than S2 VehicleYear
+
⇒ S1 stochastically dominates S2 on cost SeniorTrain
+
DrivingSkill MakeModel
E.g., injury increases with collision speed
DrivingHist
Can annotate belief networks with stochastic dominance information: Antilock
+ DrivQuality
X −→ Y (X positively influences Y ) means that Airbag CarValue HomeBase AntiTheft
For every value z of Y ’s other parents Z Accident
Ruggedness
∀ x1, x2 x1 ≥ x2 ⇒ P(Y |x1, z) stochastically dominates P(Y |x2, z) Theft
OwnDamage
Cushioning OwnCost
OtherCost
Label the arcs + or – Preference structure: Deterministic
SocioEcon X1 and X2 preferentially independent of X3 iff
Age
GoodStudent preference between hx1, x2, x3i and hx10 , x20 , x3i
ExtraCar does not depend on x3
Mileage
RiskAversion
VehicleYear E.g., hN oise, Cost, Saf etyi:
+
SeniorTrain h20,000 suffer, $4.6 billion, 0.06 deaths/mpmi vs.
+ h70,000 suffer, $4.2 billion, 0.06 deaths/mpmi
DrivingHist
Theorem (Leontief, 1947): if every pair of attributes is P.I. of its com-
Antilock
DrivQuality AntiTheft
plement, then every subset of attributes is P.I of its complement: mutual
Airbag CarValue HomeBase
− P.I..
Ruggedness Accident
Theft Theorem (Debreu, 1960): mutual P.I. ⇒ ∃ additive value function:
OwnDamage
Cushioning
V (S) = ΣiVi(Xi(S))
OtherCost OwnCost
Hence assess n single-attribute functions; often a good approximation
Label the arcs + or – Preference structure: Stochastic
SocioEcon Need to consider preferences over lotteries:
Age
GoodStudent X is utility-independent of Y iff
ExtraCar preferences over lotteries in X do not depend on y
Mileage
RiskAversion
VehicleYear Mutual U.I.: each subset is U.I of its complement
+ −
SeniorTrain ⇒ ∃ multiplicative utility function:
+ U = k 1 U 1 + k2 U 2 + k3 U 3
DrivingHist + k 1 k2 U 1 U 2 + k 2 k3 U 2 U 3 + k 3 k1 U 3 U 1
Antilock + k 1 k2 k3 U 1 U 2 U 3
−
Routine procedures and software packages for generating preference tests to
Ruggedness Accident identify various canonical families of utility functions
Theft
OwnDamage
Cushioning OwnCost
OtherCost
Label the arcs + or – Value of information
SocioEcon Idea: compute value of acquiring each possible piece of evidence
Age
GoodStudent Can be done directly from decision network
ExtraCar
RiskAversion
Mileage Example: buying oil drilling rights
VehicleYear Two blocks A and B, exactly one has oil, worth k
+ −
SeniorTrain Prior probabilities 0.5 each, mutually exclusive
+ Current price of each block is k/2
DrivingHist “Consultant” offers accurate survey of A. Fair price?
Antilock
DrivQuality AntiTheft
Solution: compute expected value of information
Airbag CarValue HomeBase
−
= expected value of best action given the information
Ruggedness Accident minus expected value of best action without information
Theft Survey may say “oil in A” or “no oil in A”, prob. 0.5 each (given!)
OwnDamage
= [0.5 × value of “buy A” given “oil in A”
Cushioning
OtherCost OwnCost + 0.5 × value of “buy B” given “no oil in A”]
–0
MedicalCost LiabilityCost PropertyCost = (0.5 × k/2) + (0.5 × k/2) − 0 = k/2
General formula
Current evidence E, current best action α
Possible action outcomes Si, potential new evidence Ej
EU (α|E) = max
a
Σi U (Si) P (Si|E, a)
Suppose we knew Ej = ejk , then we would choose αejk s.t.
EU (αejk |E, Ej = ejk ) = max
a
Σi U (Si) P (Si|E, a, Ej = ejk )
Ej is a random variable whose value is currently unknown
⇒ must compute expected gain over all possible values:
!
V P IE (Ej ) = Σk P (Ej = ejk |E)EU (αejk |E, Ej = ejk ) − EU (α|E)
(VPI = value of perfect information)
Chapter 16 25
Properties of VPI
Nonnegative—in expectation, not post hoc
∀ j, E V P IE (Ej ) ≥ 0
Nonadditive—consider, e.g., obtaining Ej twice
V P IE (Ej , Ek ) 6= V P IE (Ej ) + V P IE (Ek )
Order-independent
V P IE (Ej , Ek ) = V P IE (Ej ) + V P IE,Ej (Ek ) = V P IE (Ek ) + V P IE,Ek (Ej )
Note: when more than one piece of evidence can be gathered,
maximizing VPI for each to select one is not always optimal
⇒ evidence-gathering becomes a sequential decision problem
Chapter 16 26
Qualitative behaviors
a) Choice is obvious, information worth little
b) Choice is nonobvious, information worth a lot
c) Choice is nonobvious, information worth little
P( U | Ej ) P( U | Ej ) P( U | Ej )
U U U
U2 U1 U2 U1 U2 U1
(a) (b) (c)
Chapter 16 27

Robotics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robotics

Uploaded by

Copyright:

Available Formats

Rational preferences

Idea: preferences of a rational agent must obey constraints.

You might also like