You are on page 1of 32

BrandZ: A LINUX

ENVIRONMENT IN A
SOLARIS ZONE
Nils Nieuwejaar
Russ Blaine
Solaris Kernel Technology
Sun Microsystems
Agenda

• BrandZ Overview
• The 'lx' Brand
• OpenSolaris
• Demo
• Q&A
Server Virtualization Categories
Hard Partitions Virtual Machines OS Virtualization Resource Mgmt.
App Identity
Server Database Server
File
Server
Web
Server
Mail
Server
Calendar Database Web
Server Server
SunRay Database App
Server Server
App

OS

Server

Multiple OSes  Single OS
Trend to flexibility Trend to isolation
Dynamic System Logical Domains Zones Solaris Resource
Domains Xen Manager (SRM)
Solaris Zones
• Basic concept: isolated execution environment
within a Solaris instance
• Virtualizes OS layer: file system, devices,
network, processes
• Provides:
> Privacy: can't see outside zone
> Security: can't affect activity outside zone
> Failure isolation: application failure in one zone
doesn't affect others
• Lightweight, granular, efficient
Zones Block Diagram
global zone (serviceprovider.com)
twilight zone (twilight.com) drop zone (drop.net) fracture zone (fracture.org)
zone root: /zone/twilight zone root: /aux0/drop zone root: /export/fracture

web services login services web services


(WS 6.1, J2SE 5.0) (OpenSSH 3.4) (Apache 2.0.52, J2SE 1.4)

Environment
Application
enterprise service network services network services
(Oracle 10g, AS 8.1EE)
22%
(BIND 8.3, Sendmail 8.13.1) (BIND 9.2.4, Postfix 2.1)

core services core services core services


(NIS, inetd, automountd) (NIS+, inetd, rpcbind) (DNS, inetd, automountd)

Platform
zcons

zcons

zcons

ge0:2
ce0:1
ge0:1

ce0:2
65%zoneadmd
/usr

/usr

Virtual
/opt

zoneadmd zoneadmd

zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)


core services remote administration platform administration
(LDAP, inetd, rpcbind, (SNMP, SunMC, WBEM, ...) (syseventd, devfsadm, ...)
automountd, snmpd, dtlogin,
Sendmail 8.13.1, Sun SSH)

storage complex
network device network device
(ce0) (ge0)
One Step Back, Two Steps Forward
• A zone is:
> A collection of processes with limited privileges
> A limited device tree
> An alternate root directory
• Why should a zone have to look like the system
that hosts it?
BrandZ: Branded Zones
• Simple extension of the zones model
• Supports zones that don't resemble the global
zone
> Only supports user-space environments
> If you need a different kernel, see Xen
• Each distinct zone type is called a Brand
• The Brand defines the content, structure, and
behavior of the zone
BrandZ Uses
• Available today:
> Linux zones on Solaris
> x86 only
> Possible OpenSolaris project: add support for SPARC
Linux
• Other possibilities
> Alternate Solaris zones
> Nexenta/ShilliX/BeleniX
> Replace Solaris tools in /usr/bin with GNU equivalents
> BrandZ + QEMU = SPARC zones on x86
> A MacOS X zone (you've gotta have dreams).
(Branded) Zones Block Diagram
global zone (My desktop)
linux zone (brandz.east) twiki zone (muskoka.east) Mac OSX zone
zone root: /zone/brandz zone root: /zone/twiki zone root: /zone/mac

End-user apps web services Quicktime, iTunes


(OpenSSH 3.6, acroread, (Apache 2.0.52,TWiki)

Environment
Application
MATLAB, yum, pandora)
network services Finder, Spotlight
(BIND 9.2.41)

Linux core services core services core services


(NIS, xinetd, autofs) (NIS, inetd, automountd) (NIS, quartz-wm)

bge0::2

Platform
bge0:1

zcons
zcons

zcons

/usr
65%
/usr

/usr

Virtual
/opt
zoneadmd zoneadmd zoneadmd

zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)


core services remote administration platform administration
(LDAP, inetd, rpcbind, (SNMP, SunMC, WBEM, ...) (syseventd, devfsadm, ...)
automountd, snmpd, dtlogin,
Sendmail 8.13.1, Sun SSH)

Local hard drives


network device
(bge0)
BrandZ Interface Changes
global# zonecfg -z newzone
newzone: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:newzone> create -t SUNWlx
zonecfg:newzone> set zonepath=/zones/newzone
zonecfg:newzone> commit
zonecfg:newzone> info
For a Linux zone
zonename: newzone
zonepath: /zones/newzone
brand: lx New Zone Attribute
autoboot: false
bootargs:
pool:
limitpriv:
New Column
zonecfg:newzone> exit
global# zoneadm list -icv
ID NAME STATUS PATH BRAND
0 global running / native
- newzone configured /zones/newzone lx
Brand Components
• A Brand is composed of the following items:
> Required:
> An XML brand configuration file and an XML brand
platform definition file.
> Scripts used to install a branded zone.
> Optional:
> Kernel brand support module.
> Scripts that can be invoked at zone boot and shutdown.
> Linker libraries that can be used to help debug branded
applications.
> Userland brand support library.
Userspace Infrastructure
• Zone utilities call per-brand scripts
> At zone installation, zone boot, zone shutdown
> Allows brand to customize software installed,
execution environment, etc.
• Debugging support
> Brand-specific library plugins aid with:
> Application and library segment mappings
> Address and symbol mappings
> Enables tools like mdb, ptools, and the DTrace pid
provider, to work without modification
Kernel Infrastructure
• BrandZ adds interposition points to the Solaris
kernel:
> syscall path, process loading, fork, exit, etc.
• Control is transferred to brand's kernel support
module.
• Allows a brand to replace or modify basic Solaris
behaviors.
• Only applied to branded process.
• Fundamentally different brands may require new
interposition points
brand_ops Vector
• Shows all interposition points
struct brand_ops {
int (*b_brandsys)(int, int64_t *, uintptr_t, uintptr_t,
uintptr_t, uintptr_t, uintptr_t, uintptr_t);
void (*b_setbrand)(struct proc *);
int (*b_getattr)(zone_t *, int, void *, size_t *);
int (*b_setattr)(zone_t *, int, void *, size_t);
void (*b_copy_procdata)(struct proc *, struct proc *);
void (*b_proc_exit)(struct proc *, klwp_t *);
void (*b_exec)();
void (*b_lwp_setrval)(klwp_t *, int, int);
int (*b_initlwp)(klwp_t *);
void (*b_forklwp)(klwp_t *, klwp_t *);
void (*b_freelwp)(klwp_t *);
void (*b_lwpexit)(klwp_t *);
int (*b_elfexec)(struct vnode *vp, struct execa *uap,
struct uarg *args, struct intpdata *idata, int level,
long *execsz, int setid, caddr_t exec_file,
struct cred *cred, int brand_action);
}
The lx Brand
• Official marketing name: Solaris Containers for
Linux Applications (SCLA)
• Allows Linux binaries to run on Solaris 10
• Creates a zone for Linux application execution
> Zone is populated only with Linux software
> At boot, it runs the Linux init(1M), configuration
scripts, and applications
> It all runs on a Solaris kernel.
• There is no Linux software delivered with BrandZ
> We install and run standard Linux distributions
> Initially RHEL/CentOS version 3
What it's not
• Not a full system emulator or virtualization layer
> No non-Solaris kernel code is ever executed.
> You can't run any random Linux distribution.
• Doesn't support all Linux kernel functionality.
> No support for Linux file systems, kernel modules,
or device drivers.
> Not all system calls are fully supported.
• Not simply binary emulation (like lxrun, wine,
etc.)
> You can't just run the Linux version of acroread
from your Solaris shell prompt.
Linux Zone Uses
• Transition tool: reducing Linux “barrier to exit”
> Customer would like to move to Solaris, but has
legacy Linux applications
• Best of both worlds
> Users familiar with Linux environment
> Administrators want Solaris' enterprise-class
features: SRM, FMA, DTrace
• Developer/ISV workload
> Solaris has strong development tools, let Linux
developers leverage them.
Emulating Linux
• Loading processes
• System calls
• Signals
• Devices
• /proc
Loading Linux Processes
ld.so.1
1.Kernel jumps into Solaris linker
lx_brand.so.1
2.Loads our libc (and a few others)
libc.so.1
3.Resolves symbols in our libraries
4.Runs _init() in lx_brand.so.1 ld-linux.so.2

• Pass lx_handler() address to kernel glibc-2.3.2.so

5.Builds aux vector for Linux app


6.Jumps to Linux linker Heap

7.Linux linker loads Linux glibc, etc. Linux Binary

8.Resolves Linux symbols Stack


9.Jumps to Linux main()
Address Space Map
(artist's rendering, not drawn to scale, etc.)
System Call Emulation
Global Zone Linux Zone
LX emulation library
Solaris Process Linux Process lx_open(args) {
{ { fd = open((lx_to_solaris(args))
... ... if (fd < 0)
return (solaris_to_lx(errno))
open() open() else
... ... }
return (fd);

} }
Userland

Kernel
Solaris Kernel LX brand module
struct lx_brand_ops {
Syscall handler lx_syscall()
lx_syscall {
lx_proc_exit()
if (p->p_brand) return to userland
lx_pid_assign()
p->p_brand->br_syscall(); }
lx_pid_release()
else lx_setregs()
rval = do_syscall(); ...
return to userspace open() {
... }
return(fd);
}
Signals
• Solaris never handles or processes untranslated
signals
Linux Process Linux Process

kill(pid, signal) Signal handler

Solaris Process Solaris Process


Linux->Solaris Solaris->Linux
Signal Translation kill(pid, signal) Signal handler Signal Translation

Kernel

Signal delivery subsystem


Signal Delivery
• Brand library provides a signal handler that
processes all signals sent to the process
• lx signal handler
> Translates Solaris signal number and siginfo_t to
Linux
> Builds a Linux stack: Linux siginfo_t
Linux fpu state
Linux ucontext_t
Pointer to Linux ucontext_t
Pointer to Linux siginfo_t
Stack pointer Linux signal number

> Jumps to the Linux application's signal handler


/proc
• Linux /proc is a Superfund site
> Starts getting cleaned up in 2.6 kernel
• We support a subset of its functionality
> /proc/<PID>/*
> /proc/meminfo
> /proc/mount
> /proc/stat
> /proc/uptime
• Implemented as a new file system: lx_procfs
• Most process info maps directly to our /proc
> Mount Solaris /proc in the zone under /native/proc
Devices
• Zones approach makes it easy to control which
devices accessible to Linux apps
• Initially only supporting minimum needed:
> /dev/null, /dev/zero, /dev/ptmx, /dev/pts/*, /dev/tty,
/dev/console, /dev/random, /dev/urandom
> OSS audio devices (good for Quake)
• No network or disk devices
> Network plumbing done by the global zone
> Status comes from /proc/net and ioctl()s on
sockets
> Filesystems mounted by the global zone
• No support for framebuffers (bad for Quake)
Observability
• BrandZ supports both Solaris and Linux tools
• In the Linux zone:
> strace – syscall tracer
> gdb – GNU debugger
• From the Global zone:
> DTrace
> New Linux syscall provider
> PID provider can manipulate Linux processes
> mdb: can handle both live Linux processes and
core files
• Goal: to be a better Linux development platform
than Linux
Performance
• Original target: average no more than 10% slower
than Linux or Solaris on same hardware
> Will vary widely from application to application

BrandZ vs. Native Linux


Benchmark
(higher is bettter)
MySQL (create) +26%
MySQL (alter-table) +23%
MySQL (“wisconsin”) -55%
MySQL (total) -3%
Tibco Rendezvous (1KB msg size) +10%
Netperf TCP 0%
SpecJBB2005 -4%
SCLA Limitations
• All the current Zones limitations still apply.
> Linux zones can't modify network configuration
> Linux zones can't act as NFS servers
> Linux zones can't directly access physical devices
• Linux applications will not run in the global zone
• No 64-bit Linux application support.
• On a crash, applications will leave Solaris core
files
• Cannot support Linux kernel modules
• Cannot access Linux filesystems
BrandZ vs Xen
• BrandZ • Xen
> Legacy Linuxes > Modern / future
> Userspace support Linuxes
only > Higher Linux fidelity
> Solaris kernel stability > Supports Linux kernel
> Lighter weight modules
> Subject to Resource > Heavier weight
Management > Harder to manage
> Better visibility > Best suited for long-
> DTrace term, production
environments
> Best suited for
transitional, legacy
environments
Status
• Integrated into Nevada build 49
• Will be in Solaris Express in a month or two
• On track for Solaris 10 update 4
Demo
Q&A
• Project and Contact Information:
> brandz-discuss@opensolaris.org
> http://www.opensolaris.org/os/community/brandz
BrandZ:
A LINUX ENVIRONMENT IN
A SOLARIS ZONE
Nils Nieuwejaar (nils.nieuwejaar@sun.com)
Russ Blaine (russell.blaine@sun.com)

You might also like