Professional Documents
Culture Documents
to cause constant, beaming grins. Another example, suppose we give A.I. the goal to
solve a difficult mathematical problem. When the A.I. becomes superintelligent, it
realizes that the most effective way to get the solution to this problem is by
transforming the planet into a giant computer, so as to increase its thinking capacity.
And notice that this gives the A.I.s an instrumental reason to do things to us that we
might not approve of. Human beings in this model are threats, we could prevent the
mathematical problem from being solved.
Of course, perceivably things won't go wrong in these particular ways; these are
cartoon examples. But the general point here is important: if you create a really
powerful optimization process to maximize for objective x, you better make sure that
your definition of x incorporates everything you care about. This is a lesson that's
also taught in many a myth. King Midas wishes that everything he touches be turned
into gold. He touches his daughter, she turns into gold. He touches his food, it turns
into gold. This could become practically relevant, not just as a metaphor for greed,
but as an illustration of what happens if you create a powerful optimization process
and give it misconceived or poorly specified goals.
Now you might say, if a computer starts sticking electrodes into people's faces, we'd
just shut it off. A, this is not necessarily so easy to do if we've grown dependent on
the system -- like, where is the off switch to the Internet? B, why haven't the
chimpanzees flicked the off switch to humanity, or the Neanderthals? They certainly
had reasons. We have an off switch, for example, right here. (Choking) The reason is
that we are an intelligent adversary; we can anticipate threats and plan around
them. But so could a superintelligent agent, and it would be much better at that than
we are. The point is, we should not be confident that we have this under control
here.
And we could try to make our job a little bit easier by, say, putting the A.I. in a box,
like a secure software environment, a virtual reality simulation from which it cannot
escape. But how confident can we be that the A.I. couldn't find a bug. Given that
merely human hackers find bugs all the time, I'd say, probably not very confident. So
we disconnect the ethernet cable to create an air gap, but again, like merely human
hackers routinely transgress air gaps using social engineering. Right now, as I speak,
I'm sure there is some employee out there somewhere who has been talked into
handing out her account details by somebody claiming to be from the I.T.
department.
More creative scenarios are also possible, like if you're the A.I., you can imagine
wiggling electrodes around in your internal circuitry to create radio waves that you
can use to communicate. Or maybe you could pretend to malfunction, and then
when the programmers open you up to see what went wrong with you, they look at
the source code -- Bam! -- the manipulation can take place. Or it could output the
blueprint to a really nifty technology, and when we implement it, it has some
surreptitious side effect that the A.I. had planned. The point here is that we should
not be confident in our ability to keep a superintelligent genie locked up in its bottle
forever. Sooner or later, it will out.
I believe that the answer here is to figure out how to create superintelligent A.I. such
that even if -- when -- it escapes, it is still safe because it is fundamentally on our
side because it shares our values. I see no way around this difficult problem.
Now, I'm actually fairly optimistic that this problem can be solved. We wouldn't have
to write down a long list of everything we care about, or worse yet, spell it out in
some computer language like C++ or Python, that would be a task beyond hopeless.
Instead, we would create an A.I. that uses its intelligence to learn what we value,
and its motivation system is constructed in such a way that it is motivated to pursue
our values or to perform actions that it predicts we would approve of. We would thus
leverage its intelligence as much as possible to solve the problem of value-loading.
This can happen, and the outcome could be very good for humanity. But it doesn't
happen automatically. The initial conditions for the intelligence explosion might need
to be set up in just the right way if we are to have a controlled detonation. The
values that the A.I. has need to match ours, not just in the familiar context, like
where we can easily check how the A.I. behaves, but also in all novel contexts that
the A.I. might encounter in the indefinite future.
And there are also some esoteric issues that would need to be solved, sorted out:
the exact details of its decision theory, how to deal with logical uncertainty and so
forth. So the technical problems that need to be solved to make this work look quite
difficult -- not as difficult as making a superintelligent A.I., but fairly difficult. Here is
the worry: Making superintelligent A.I. is a really hard challenge. Making
superintelligent A.I. that is safe involves some additional challenge on top of that.
The risk is that if somebody figures out how to crack the first challenge without also
having cracked the additional challenge of ensuring perfect safety.
So I think that we should work out a solution to the control problem in advance, so
that we have it available by the time it is needed. Now it might be that we cannot
solve the entire control problem in advance because maybe some elements can only
be put in place once you know the details of the architecture where it will be
implemented. But the more of the control problem that we solve in advance, the
better the odds that the transition to the machine intelligence era will go well.
This to me looks like a thing that is well worth doing and I can imagine that if things
turn out okay, that people a million years from now look back at this century and it
might well be that they say that the one thing we did that really mattered was to get
this thing right.
Thank you.
(Applause)
Read more interesting Talks at talksintext.com