You are on page 1of 35

Sun Solaris OS

Glenn Barney
gb2174@columbia.edu
COMS E6998.002 : Advanced Computer Design

Metrics

Sun focused on 5 major design areas


- Performance
- Security
- Prevent
- Detect
- Respond

- Availability
- Utilization
- Platform Choice
- Hardware Compatibility list
- 716 x86/x64 systems, 75 SPARC systems

Major Metric successes are Security, Availability. Performance


and Utilization are a bit more questionable but still very good
as well see.

History of Solaris

Its a Unix OS that is an amalgam of earlier Unix based OSs, but mainly SUNs first
OS, SunOS based on BSD and AT&Ts Unix, System .

The General timeline :


1970 to 1979 : Unix is first written and Assembly
and then C by Ritchie and Thompson.
1982 Bill Joy leaves Berkeley, co-founds sun and
develops SunOS based on BSD
1984 to 1987 AT&T develops releases System V,
which competes with BSD until the mid 90s
1988 AT&T Purchases large stake in Sun
1993 Sun announces first version of Solaris,
which will no longer be based on BSD but mainly
on System V release 4, an mix of other Unix
distributions. The competing unix standards group,
OSF, begins a GUI war with Sun supporting its own
MOTIF/X against Suns OPEN LOOK.
1994 Sun creates the Common Desktop
Environment to support both MOTIF and OPEN
LOOK - by Solaris 5 its officially supported

The Solaris Gestalt


Pulled in from BSD

Virtual memory system


Fast file system with symbolic links
TCP/IP networking system with
Kerberos, Telnet, FTP, sendmail.
Alternate shells to Bourne shell (C
shell)
Vendor products like NFS (from SUN)
and symmetric multiprocessing
support, thread management and
shared libraries

Pulled in from System V


Interprocess Communication
Bourne shell enhancements
STREAMS and TLI networking
libraries
Remote File Sharing
Improved memory paging
Application Binary Interface

Created by Sun for the SunOS

SunOS 4.x
NFS
OpenWindows 2.0 GUI
OpenBoot monitor
DeskSet Utilities
Multiprocessing Support

SunOS 5.x (ie Solaris)


SMP for more then 100 processors in
single server
CDE (Motif, PostScript, Open Look)
Gnome 2.0 to support Linux integration
Network Information Service (NIS)
Clustering
Java
Ever growing list of new features

Some General Solaris Tidbits


Solaris 10 does not support old Sun hardware : Chipsets it does
support UltraSPARC II, III, IV and newer, 32 bit Intel x86 and
64-bit AMD Opteron.
Of course old 32 bit SPARC programs are still supported
Sun does support batch jobs like JCL : Sun MBM - which
preserves Batch step constructs on Sun systems
Load balancing seems to require a third party application
Sun Network Cache and Accelerator (SNCA) since Solaris 8
helps cache and serve web pages, but doesnt do load balancing
per se

Solaris Overview

Processor/Platform Specific code less then


5% of kernel, developed to adapt to different
hardware platforms
Device Drivers - dynamically loaded and
use a common published interface
File System and Volume Management
treat large number of disks as single volume,
Virtual File System supports unlimted file
system extensions : UFS, NFS, Sun
StoreEdge file systems, PC file systems, etc.
New Zetabyte File System.
Unified TCP/IP Stack

Linux System Call Handler is in-kernel, it catches Linux ssytem calls and dispatches the
equivalent Solaris kernel functions
Dtrace debugging system new for Solaris 10, clean and modular pre-deployed global
debugging solution at minimum runtime cost.

Solaris Modular Kernel


Seven types of loadable
modules
Secluding classes
File systems
Loadable system calls
Loaders for executable
file formats
Stream modules
Bus or device drivers
Miscellaneous

Solaris Kernel

Kernel Thread - core unit of


execution that is scheduled and
executed on a processor.
have an execution state and
context that includes a global
priority and scheduling class
units that get scheduled,
executed and context switched
on and off processors
User Thread user level thread
state maintained within a user
process
Process executable form of
program
Lightweight Process LWP kernel
visible execution context for a user
thread
Solaris 2 to 8 had a two-level
threads model where many threads
were able to be assigned to to a
smaller group of LWPs.

However the two-level model was replaced with a 1


to 1 model. Why? Basically it was too complicated.
Improved performance, scalability, and reliability
Reliable signal behavior
Improved adaptive mutex lock implementation
User-level sleep queues for synchronization objects

Kernel Thread Scheduling

Dispatcher uses priority model to


select which kernel thread to execute
next.
Supports preemption, and the kernel
itself is preemptable.
170 global priorities partitioned by
scheduling class.
Three main classes are TS, SYS, and
RT.

Timeshare (TS) default for all process


and kernel threads in the process.
Interactive (IA) enhanced TS used by
the windowing system to boost threads
under the window focus
Fair Share Scheduling (FSS) share
based, not priority based.

Fixed Priority (FX) fixed-priority


System (SYS) used for kernel threads,
they are bound and run till block or
complete
Real Time (RT) fixed priority, fixedtime quantum scheduling.

Interprocess Communication and Signals

Traditional Unix IPC


Pipes: directly channels data between related processes through an file like object
Named Pipes FIFO paipes actually implemented as files in the file system namespace
Socket can be over a network or local (domain)

System V IPC
Shared Memory process create segment of shared memory shared among each other
Message Queue each message contains a 32 bit type value and a data payload
Semaphores process can sleep on them, used for synchronization but any process can
increment

Solaris doors Door server contains a thread that sleeps waiting for client, client
calls server through a door and scheduling control is passed to the door to the
requesting thread through the door server. Very low latency turnaround.

Signals can interrupt a process after an event occurs. Signals can be ignored,
caught and handled, or treated with a default action.

Memory
64-bit kernel and process address space

optimizes memory use by sharing program binaries and application data among
processes
VM system manages most objects related to I/O and memory, kernel and user
applications, shared libraries and file systems

Manages virtual-to-physical mapping of memory.


Manages swapping memory between primary and secondary storage to optimize performance.
Handles requests of shared images between multiple users and processes.
It acts as an integrated file cache.

Newer features in the VM implementation include :


During I/O uses 64 bit address space to create a permanent mapping of all physical pages into
SEGKPM, eliminating need to map/unmap for each I/O.
Variable page sizes, largest available now is 356 Bytes
Generic framework: Multiple Page Size Selection (MPSS) for various page sizes
Support for nonuniform (NUMA) memory architectures
Dynamic reconfiguration new pages can add to the free list on the fly while the kernel is in a
safe kernel cage
Modern memory allocators support slabs

Virtual Memory

Pages can very in size, common size is 8


Kbytes.
Solaris kernel uses a combined demandpaged and swapping model.
Abstract memory objects called segments,
vnodes, and pages

Physical memory, in chunks called pages


Virtual file object called vnode
File system is a hierarchy of vnodes
Process and Kernel address space as
segements of mapped vnodes
Mapped hardware devices (ie frame buffers)
are segments of hardware-mapped pages

Physical Memory management done by


Hardware Address Translator (HAT)
Machine independent implementation

Virtual Memory Continued

Processs virtual address space skeleton created by kernel when the fork()
system call creates the process
Memory is allocated on the heap, malloc() doesnt create physical memoy
Heap can be allocated in 32 or 64 bit mode, much larger with 64 bit mode.

Picture on the right show how memory mapping can share data among processes
Several options govern how a file is shared when it is mapped between process
MAP_SHARED can be set to PROT_, READ|PROT_, WRITE
MAP_PRIVATE can be set to PROT_, READ|PROT_, WRITE

Each segment has protection mode Read, Write, or Executable.

Page Faults and Anonymous Memory

Major Page fault occurs when


physical page does not exist
Minor page fault when page
is in physical memory but no
MMU translation is exists
(attaches)
Protection fault when access
violates memory permissions

There can also be anonymous memory, pages that are not associated with a vnode.
They are used for new heap space, and are allocated by a zero-fill-on-demand
operation, or a ZFOD.

Intimate Shared Memory

System V shared memory (ipc)


option
Shared Memory optimization:
Additionally share lowlevel kernel data
Reduce redundant mapping
info (V-to-P)
Shared Memory is locked,
never paged
No swap space is allocated
Use SHM_SHARE_MMU
flag in shmat()

Physical Memory

Memory managed by page scanner


deamon (except kernel memory)
When the system is booted memory
is placed on the freelist in page size
chunks.
Anonymous memory is used for
most of a processess memory
allcoation (heap and stack).
Pages are read into memory from
the free list and then reside in a
segmap cache, processs address
space, or the cachelist.
page_create_va() allocates pages,
taking into account the virual
address to calculate page coloring.
Page scanner uses global page
replacement.
Two bits are kept per page to
indicate if the page has been
modified since bits were last
cleared.

Page swapping two-handed clock algorithm

In addition to this page-out process, the dispatcher can swap out entire processes to
conserve memory, it does this rarely but in extreme circumstances.

Slab and HAT

Solaris has a general purpose memory allocator known as the slab allocator.
Used for memory requests that are :
Smaller then a page size
Not even a multiple of a page size
Frequently going to allocated and freed memory that causes fragmentation

Solves fragmentation issues by grouping different-sized memory objects into


separate caches, where each object cache has its own size and characteristics

The HAT layer programss the TLB with entries identifiying the relationship
of the virutal and physical addresses.
If the TLB lookup fails, as backup the UltraSPARC uses a translation storage
buffer (TSB), while most other architectures use a hardware page table.
Big difference cause the TSB is a software lookup, but Solaris provides both.
Take a look at the slide titled Virtual Memory to see a picture of the HAT
layer, it is on the right

Virtual File System VFS

Created to abstract away file systems so


NFS and UFS could co-exist
Made of vnode, the virtual node interface
that implements file-related functions, and
vfs the virtual file system that directs
functions to specific file systems
Structures consist of file descriptors in a
file list, which point to a per-process file
table. A vnode is looked up in this table,
which eventually points to a physical node
depending on file system implementation.
New in Solaris 10 : Zettabyte File System
Endian Neutral move files between
SPARC and x86 based systems
ZFS protects all data with 64-bit checksums
128-bit file system!
built on top of virtual storage pools
All operations are transactional and copyon-write

Unix File System (UFS)

UFS we know and love : The default file


system for Solaris, in development for over 20
years.
Based around disk geometry : the number of
sectors in a track, the location of the head, and
the number of tracks.
Supports hard and soft links.
Inode (index node) is the internal descriptor for
a file
Access scheme : users, group, world.

I/O

Two distinct methods perform


file system I/O:
read(), write(), and related
system calls
Memory-mapping of a
file into the process's
address space

Both are in the picture here


to the right.

Performance: NUMA systems

NonUniform Memory Access (NUMA) machine


- machines in which some memory is closer to
some CPUs than others
Addressed by the Memory Placement
Optimzation framework (MPO)
Locality awareness
Balancing
Dynamic topology support

Latency groups (lgroup) sets of CPU and


equidistant memory defined in the kernel.
A home lgroup is chosen for each thread upon
creation, and it prefers this lgroup.
For memory allocation, perfer lgroup but if you
know you have multithreaded, spread out code,
random placement may be better

CMT support and Parallel System architectures

Chip Multithreading (CMT) CPUs share various processor


components and caches
The three different parallel
architectures
SMP. Symmetric multiprocessor
with a shared memory model;
single kernel image
MPP. Message-based model;
multiple kernel images
NUMA/ccNUMA. Shared
memory model; single kernel
image

So the Solaris kernel has several semaphores and mutex locks to help address concurrent
thread memory access. SMP (like Intel and AMD chips) and CMT (the UltraSPARC T1)
is lot more complicated then just NUMA system, and much research goes on in this field.
Suns attitude is to try to make things as simple as possible while still providing
necessary synchronization.

Networking : The TCP/IP Stack

Was two STREAMS layers with


packet queueing and locks between
layers and 1 processor thread per
connection
Now merges TCP and IP layers and
allocates a single thread per CPU.
Streamlined to process packet
through both layers
Binds connections to a CPU for
entire life

Uses a vertical perimeter per-CPU


mehcnaism to protect the
connection. It is implemented with
an IP classifier, serialization queue,
and worker thread so only one
CPU processes a specific packet.
Integrated support for TCP offload
engines let hardware do the work

Security
For user permissions
UFS and file system permissions
Role Based Access Control since Solaris 8
New in Solaris 10: least privilege model
Access Control Lists let you make arbitrary security permissions
Kernel level permissions, the privileged kernel thread and modules
run the whole system and control Solaris containers.
Automated Patch Tool
Solaris Cryptographic framework
Full network traffic control, for example TCP packet monitoring,
disable redirecting of packets and answering system pings.

Solaris Containers/Zones
Containers provide the complete
virtualized environment, zones are
the component that provides the
isolation between zones.
Up to 8192 virtualized
environments per Solaris OS
instance.
Provides a secure sandbox that has
unique root, user and file systems.
Also network interfaces, devices,
hardware, I/O all virtualized.
The kernel makes sure that the
zones are isolated.
If a zone fails, it can reboot in a
few seconds.

Process rights management

Solaris 10 OS least privilege model includes nearly 50 fine-grained privileges as well as


the basic privilege set.
Evolved from Trusted Solaris.
Basic Privilege set includes al privileges given to unprivileged processes in the
tradition security model

Each process has four sets in its kernel credentials

The Inheritable set (I): The privileges inherited on exec.


The Permitted set (P): The maximum set of privileges for the process.
The Effective set (E): The privileges currently in effect, a subset of P.
The Limit set (L): The upper bound of the privileges a process and its children may obtain

Once launched, a process uses privalege manipulation functions to add or remove


privaleges from the privilege sets

Cryptography
Two Basic Types
User level Framework
Exists Outside the Kernel
Uses the PKCS 11 interface
Applications use it
Kernel Level Framework
Operating System modules
use it
Can interface with hardware
and software plug-ins
Niether provide actual encryption
algorithms, plug-ins do all the
work!
Both are verified by the Module
Verification Deamon

Cryptography Continued

Each plug in must be verified (signed) by the Module Verification Daemon


First sets up thread pull that lives in the kCF to service requests
Second answers request for verification of user and kernel level provider
signatures

User level crypto algorithms supported

Kernel level crypto algorithms supported

Cryptoadm() tool provided for administration of uCF and kCF.


/dev/crypto drivers allow communication between user and kernel level plug ins
/dev/cryptoadm runs the Module Verificaton Daemon
For user level, provides digest() and mac() for calculating digest and MAC of files.
Provides encrypt() and dectrypt() for encrypting and decrypting files
Solaris IPsec/IKE and Kerberos, user-level and kernel-level, have been ported to use
the Solaris Cryptographic Framework in the Solaris 10 OS.

DTrace Debugging System

Dynamically record data at points


of points of interest (probes) in the
user and kernel areas.
Record stack trace, timestamp,
arguments.
Kernel modules called providers
know how to activate probes
Has its own D language a
compiler looks for probes and
providers, using the provider
information to find which probes
should be logged when fired.
DTrace won the top prize in the
Wall Street Journal's 2006
Technology Innovation Awards
competition

30,000 published probes


within the Solaris kernel

Recovery Predictive self healing

Self diagnosing system is constantly


gathering data. Error reports are encoded
as a set of name-value pairs and form an
error event. Diagnosis engines run in the
background consuming error events.
Diagnosis engines output a fault event,
broadcast to all agents who can respond.
Enter the Solaris Fault Manager

Manages the diagnosis engines and agents


Provides a programming model for clients
Compiles logs
Manages multiplexing of events between
producers and consumers

Sun message identifier corresponds an


error message with an online
knowledgebase article or link
Diagnosis have a universal link identifier
so that solutions can be cross referenced

Why Solaris beats Linux

Solaris is more secure - it hasACLs, RBAC, PRM, and containers vs. ACLs and Xen
in Linux
Solaris is more Sable Linux has rapid change and multiple centers of
control. While sun has a predictable lifecycle, and Solaris Application
Guarantee.

Solaris has a better price/performance :


SPECjAppServer2002 results

Solaris has a lower cost of support for high level support

Why Linux Beats Solaris

Novell points out Solariss higher cost for multiple


CPU machines

Novell points out Solariss poor performance

But Sun has put out a lot of technology to fight criticisms, like ZFS to address big
endian/little endian compatibility between SPARC and x86, and the linux binary API
to increase software options on Solaris.

Where Solaris is Headed

Since once the most popular UNIX based OS in the world, SUN has
lost a lot of market share.
Microsoft Windows took the low-end market away from most
Unix systems
Linux came in to pull away remainder
Solaris left with the high-end space - based sales on its
stability, performance, and support
Now with Solaris 10 and OpenSolaris, sun is trying to regain the
low end market
Trying to work with AMD/Linux, not against it:
Linux Application Environment
Specific designs for AMD multiprocessor systems
Free OS with competitive support options

Trusted Solaris features in Solaris 10 a huge selling point

References

Solaris 10: In a Class By Itself


http://www.sun.com/software/whitepapers/solaris10/classbyitself.pdf
Solaris and Linux : SealRock research comparison whitepaper
http://www.sun.com/software/whitepapers/solaris10/sealrock.pdf
Solaris 10 The Complete Reference http://books.mcgrawhill.com/downloads/products/0072229985/0072229985_ch01.pdf
Solaris 8 Administrator Certification Training Guide Appendix C
http://unixed.com/Resources/history_of_solaris.pdf
Solaris Internals Core Kernel Components
http://www.phptr.com/content/images/0130224960/samplechapter/0130224960.pdf
Solaris Internals : Solaris 10 and OpenSolaris Kernel Architecture
http://www.sun.com/books/catalog/solaris_internals.xml
The Solaris Cryptographic Framework
http://www.sun.com/bigadmin/features/articles/crypt_framework.pdf
The least privilege model in the Solaris OS
http://www.sun.com/bigadmin/features/articles/least_privilege.html
Solaris and Linux Seal Rock Research Paper
http://www.novell.com/collateral/4621445/4621445.pdf
SUSE Linux Enterprise Server 9 and Solaris 10 on x86
http://www.novell.com/collateral/4621445/4621445.pdf

You might also like