Operating Systems Virtualization and The Machine

Operating Systems, Virtualization, And The Machine http://www.nextplatform.com/2016/02/01/operati...
ABOUT CONTRIBUTORS CONTACT THE REGISTER THE CHANNEL
HOME COMPUTE STORE CONNECT CONTROL CODE ANALYZE HPC ENTERPRISE HYPERSCALE CLOUD ISC16
Operating Systems, Virtualization, And The Machine

February 1, 2016 Mark Funk
I recall teaching a college class in operating systems, when well

into the class, thinking I was getting the points across, a few of
the students stopped the class and asked Yes, but what really is
an operating system? I was momentarily taken aback, but the
question happened to be absolutely fair and not really all that The Next Platform Weekly
easy to answer.
Tap the stack to painlessly subscribe
for a weekly email edition of The
OK folks, its your turn. Given what you have seen so far of The
Next Platform, featuring highlights,
Machine from the first four articles in this six part series (see the
analysis, and stories from the week
bottom of this story for links), what is an operating system? And
directly from us to your inbox with
in answering, how does such an operating system guarantee
nothing in between.
The Machines security and integrity? Ill pause here and hum the Jeopardy! theme song while you
mull this over and I decide how to proceed.
Recapping a few things we know about The Machine, all of which need managing:
DRAM memory on a node is only accessible from the processors on that node.
DRAM memory is more quickly accessed and is volatile.
Processor cache coherence is only maintained within a node, the cache lines having been
tagged with local-scope real addresses.
Processor cache can hold data blocks from both local DRAM and Fabric Memory.
Processor cache is volatile.
All non-volatile fabric memory can be accessed by any processor in The Machine. Fabric
memory can, therefore, be used for either persistence, internode sharing, or both.
Processors on a node can only access those portions of fabric memory over which have been
mapped a portion of those processors real address space.
We know of and can picture variations on what we are going to describe next, but:
OS design typically assumes full cache coherence amongst all of the processors being managed.
All memory is accessible from all processors, even those residing on different ccNUMA
In virtualized systems, those supporting multiple OS instances, there typically exists a trusted
hypervisor which also assumes the previous two points.
Each OS instance, within my background, is called a partition, a term we will be using.
Hypervisor does not trust OSes.
Hypervisors work to isolate the partitions under their control from each other.
So what of The Machine?
At some level, all that an OS for The Machine needs to be is a package of code licensed by Hewlett
Packard Enterprise that manages and abstracts all of the lower level aspects of the hardware, but
also providing all that you have come to expect from an OS. HPE could provide a single product
and it would do its best to make the hardware of The Machine in its entirety appear as the single-
system image that you would want it to be. Such as that does seem to be in the future plans for The
Machine. Indeed, if I understood the presentation correctly, this presentation at HPE Discovery in
London by Martin Fink seems to suggest that such a single system OS called Carbon may best
be managed using something he called Container OS, with containers used for efficiently using
and managing resources within a single OS. (If unfamiliar with containers, a good place to start is
here , and to be fair here.) But The Machine is nonetheless different, for all of its wonderful
attributes; prudence suggests that the architecture development of such a wonderful tool should
1 of 7 07/22/2016 01:34 PM
take its time in determining what all is expected at the general availability of such an Similar Vein
all-encompassing OS. (More on containers later.)
It happens, though, that there is a nearer term OS approach which would allow for The Machines
hardware to be made available sooner.
Recall that one attribute assumed by OSes is full cache coherency as well as physical, if not actual,
accessibility to all of the memory. Using an existing OS as we understand it say a Linux derivative
would require that we limit the scope of such an OS instance a partition to a single node of
The Machine. Each of The Machines nodes would have at least one such Linux-like partition, but
those partitions would not span the processors or DRAM of multiple nodes. First Steps In The
Program Model For
Processor virtualization requires that the OSes residing on the same system be assured isolation Persistent Memory
from each other by default, just as though every OS was on its own hardware. The data owned by
one partition and residing in DRAM memory may not be accessed by another, one with physical
access to the same DRAM. Serendipitously, The Machine does not allow a partition in one node any
physical means of accessing the DRAM owned by a partition in another. In this sense, The Machine
is like a distributed-memory cluster, which, because it is distributed memory it also insures
isolation. Of course, it remains a hypervisors responsibility to assure this isolation for partitions
sharing the same node.
Virtualization, as managed by a hypervisor, normally allows a systems partitions to share the

processors of the system, this done via hypervisor-controlled placement and time slicing. But,
again, the hardware-enforced local access by each nodes processors to only local memory and the Weaving Together The
lack of cache coherency across nodal boundaries tends to make cross-node processor sharing Machines Fabric
impractical. The Machine could move partitions between nodes, but such movement is more than Memory
just temporarily having a partition use another nodes processors; it also requires that the
partitions entire state in memory (and cache) be moved to that new node.
As a quick outline, a few of we are sure many more additional hypervisor responsibilities in
The Machine would include:
Providing to each partition a set of real address ranges which are backed by fabric memory.
Allowing partitions access to only those portions of fabric memory to which the partition has
access rights.
Managing the physical location of objects objects which can also be files within fabric The Bits And Bytes Of
memory, given some token representing the object. The Machines Storage
Not only are the nodes interconnected via the memory fabric, but inter-node communications
are possible via a more traditional communications network, the same hardware being used to
communicate outside of The Machine. Secure management of both these inter-node and
external networks.
As a side observation, you should be asking Where does the hypervisor really reside? In more
traditional read that cache-coherent shared-memory virtualized systems, the hypervisor
carves out some of the systems memory for its own and it steals processor cycles from any of the
available processors on an as needed basis. We assume but do not know that The Machines
hypervisor would follow suit. Notice again, though, that The Machine is not a fully cache-coherent The Intertwining Of
shared-memory system; communications between each nodes portion of The Machines Memory And
hypervisor is possible with some effort via the fabric memory, atomic storage, and of course, Performance Of HPEs
more traditional communications mechanisms. I can imagine that the design of such partially Machine
distributed management is not trivial.
Yes, the hypervisor could be completely distributed, sharing parts of processors and memory
persistent and otherwise on every node. But I getting some hint from a number of sources that at
least the global file manager is not. The location of files data can potentially reside anywhere in
Fabric Memory, but the metadata, the description of the files and there locations appears to be in a
separate system, one at top of rack, quickly accessible to and from every other node of the system.
Memory Addressing And Security In HPEs Future Machine

So there we have it, as detailed in the first four parts of this series, The Machine from Hewlett Drilling Down Into The
Packard Enterprise, with scads of compute power and petabytes of persistent memory, all Machine From HPE
physically accessible by any of the processors all that data ripe for rapid picking. The folks at
HPE have it right when they speak of the need for integrated, tight security throughout the system.
Much of the enablement for such security arises from the various forms of addressing found in
such a system.
We have already seen how the lowest level of addressing real addressing is protected on The
Machine in ways not found on most systems. Each nodes processors can have their real address
2 of 7 07/22/2016 01:34 PM
space extend well outside of its own nodes memory, well into fabric memory of any number of
other nodes. But system integrity and data security require that each nodes real address space be
allowed to extend to only that fabric memory containing data which that node has the right to
access. Given an OS instance a partition per node, this also means that each partition has access
to only allowed portions of memory. It is like a new level of hypervisor-managed security; only the
hypervisor is trusted enough to provide the needed real-to-physical location mapping.
Still, most code doesnt ever work directly with real addresses. Far and away most code works with
virtual addresses which the processor hardware securely maps onto these real addresses. That
notion has been around forever, but it is because of that mapping that the security understood
from both processor virtualization and process isolation happens to work. We will talk about that
in more detail shortly, but with many forms of processor virtualization each operating system and
each partition is allowed by the hypervisor and the hardware to access only each partitions own
part of the physical memory. The hypervisor owns the real address space(s) and only allows a
partitions virtual address space to be mapped onto portions of the real address space reserved for
that partition. System security and integrity is enabled by such address mapping.
Processes within each operating system are similarly isolated and therefore secure because of
higher level addressing. Normally, even for the many processes within the same operating system,
the data owned by one process is not accessible by another. Such addresses are effectively
process-local, it is a range of addresses associated only with the program code executing on behalf
of a process. When a user-level program is using an address, it is these types of addresses which
are being used (and certainly not real addresses).
We intend to talk to this notion of addressing as it relates to The Machine in a moment. But, just to
be sure that we are first all on the same page, and partly because we have not found a sufficient
explanation elsewhere, allow us to quickly outline the relationship between this process-local
addressing, real addressing, and physical memory. Largely because we find the terminology
cleaner, we are going to use terms found in the Power processor architecture from IBM, but the
concepts are applicable most everywhere.
As mentioned, user-level programs do not use real addresses. Instead, to ensure isolation between
processes, each process is given its own process-local address space. The Power architecture calls
this entire address space an effective address space. Again, each process if you like, each program
is given its own effective address (or EA) space. For instance, you have undoubtedly heard of
32-bit or 64-bit systems. These typically also mean that the size of the effective address Space is 32
bits (232 = 4 Gibibytes) or 64 bits (264 = 16 Exbibytes, a lot bigger than physical memory). This is
what the instructions of a program perceive; it is not an address into physical memory nor a real
address. Being process-local, if a Process B attempts to use an EAa value produced by a Process A,
that EAa value will typically mean something completely different (if it means anything at all). If
effect, one process cannot normally address into the memory used by another. This is process
isolation, whether those processes those programs reside in the same operating system
partition or in different partitions.
These effective addresses should ultimately represent some data residing in memory; somehow
when the processors instructions use an EA it means that memory is to be accessed. Even though
EAs do isolate processes from each other, there is often good reason for multiple processes to
access the same data at the same memory locations. This is certainly true for processes sharing the
same partition, but it can also be true for processes in different partitions sharing the same
physical memory. And this last this shared physical memory happens to be fabric memory in
The Machine. (Even though different nodes can map portions of their real address space onto the
same physical location in fabric memory, no user-level program uses real addresses; we need yet
to describe a way for such programs to use their EAs to share fabric memory.
To allow such inter-process sharing, the Power architecture includes a notion called a virtual
address (VA). For two processes each with their own process-local EAs to share the same
memory location, the operating system typically securely arranges for the hardware to translate
each process EAs (which are different values) to the same VA value. In the same way that an
effective address space is a contiguous address space for processes, the virtual address space tends
to be a single contiguous address space for the whole of an OS, one scoped to that OS only.
If, though, inter-process sharing or not, a process attempts to use an EA value which was not
mapped to a VA, this is often consider an addressing exception, perhaps even a violation of
security. Upon such exceptions, the OS either securely determines what the legal mapping should
have been, or blows this process away for attempting to violate the systems security.
It is these virtual addresses that the processor hardware maps to real addresses (RAs), the OS
having decided which virtual address pages are mapped to which real address pages. Although
there are variations on this theme, for now think of a page as being 4096 (212) contiguous bytes
each starting on a 4096-byte boundary. This mapping allows pages in an OS contiguous virtual
3 of 7 07/22/2016 01:34 PM
address space an address space which is typically far bigger than a systems physical memory
to be mapped onto arbitrary pages in a real address space. Its useful to think although not
sufficiently true of each byte of the real space as representing a unique byte in physical memory.
This also tends to mean that there is a one-to-one relationship between each RA and a byte in
physical memory. You have seen that this is not always the case for The Machine.
On The Machine Real Address Spaces are scoped to a node:
Part of a nodes RA space refers to the nodes local DRAM.

Part of a nodes RA space refers to mapped locations in Fabric Memory, where that Fabric
Memory can potentially reside on any node.
Where an unmapped EA results in an exception when used, an unmapped VA can identify an

access violation, but it more likely on todays systems represents a virtual page which is not at
this moment in physical memory. Its a page fault; the OS goes out and finds the needed page, often
in backing persistent storage, and brings that page into DRAM. Recall on most of todays systems,
the smaller DRAM is also a cache of data out on disk. Whats cool about The Machine in this
context is that that very same data is already in memory, fabric memory. Certainly the contents of
fabric memory could be paged into (i.e., directly copied into) a nodes DRAM and accessed from
there, but often its just as straightforward simply to map a VA page onto an RA page of something
already in fabric memory. That being that case, the delay associated with what todays systems
consider completely normal page faults would seem on The Machine to become non-existent. Too
cool. And these mappings only occur if all parties essentially agree that this process has the right to
access this data.
Each memory-accessing instruction of a program and this includes the programs instruction
stream as well completes a memory access by having the processor hardware translate each EA
to a VA and then to an RA and ultimately to a physical memory location. All this has been set up
and managed to enable the access and sharing of only that which the process and its OS are
allowed to access. And, for reasons that we wont get into here, the hardware handles this entire
address translation process, once set up, very rapidly. And it is all an integrated part of system
security. Level-upon-level of security.
Lets pause here for a moment and consider two forms of virtualization and how these notions get
used. In both, there is a requirement for isolation; how that isolation is being provided based on
the above is what we will be covering.
First: One form of virtualization, one now generally well understood, allows multiple OS instances
partitions to all share an SMPs processors and memory (at least). Each partition perceives
functionally that it is alone in executing on this system. The partition is guaranteed by a common
hypervisor that it has some amount of physical memory that belongs exclusively to that partition;
no other partition perceives and so cannot change that memory. This is typically accomplished
by having the hypervisor take control of the systems real address space. Each partition, with its
own VA space, requests the hypervisor to map the partitions VA pages onto RA pages owned only
by that partition.
Because of this, the partition does what the partition does, whether it is sharing the physical
memory with other partitions or whether it is executing alone on a system, say as part of a
distributed cluster. Either way its data is isolated. It is secure either because of a trusted hypervisor
managing the one SMPs physical memory or because as in distributed memory clusters the
memory really is separate and isolated.
Notice that The Machine hardware has aspects of both of these. Given one OS instance a partition
per node, the DRAM really is physically isolated. It is the fabric memory which is physically
shareable. But because The Machine can carve up the fabric memory on a nodal basis by
mapping each nodes real address space uniquely over segments of fabric memory even fabric
memory can be perceived as isolated at this low level. But, multiple partitions could share the same
node, in which case the isolation would be provided by such virtual-to-real address mapping.
Second: A more recent form of virtualization is that provided by containers. Ill assume here the
form of containers which share a single OS. Again, aside from the programming development
advantages, much of the intent of containers is to provide an environment which the program(s)
can perceive as isolated from any other container, even if they do share the same OS. They do this
with the starting knowledge the ordinarily each process has its own private process-local address
space (an EA space). As long as the OS does its part in assuring that each process maps only to
private portions of an OS VA space (and that mapping to unique RA pages), there can be no sharing
between the processes. Sets of processes in many cases programs reside in containers. If the
processes residing in one container is never allowed to share data with the processes of another
container, the containers too remain isolated from each other.
4 of 7 07/22/2016 01:34 PM
But it is not quite that simple, and this is good. Because containers share the same OS and so
potentially parts of the operating systems total name space there is the potential for indeed, the
desire for sharing. Read-only sharing of the common operating system programs and other files
may often be good because it is common and sharable. Unlike the previous form of virtualization
where each partition is absolutely isolated and each partition has a complete OS image, sharing
where desirable and well managed is possible. The common data and programs reside only once
each in physical memory and so potentially at a single real address. Notice also that containers may
have portions of their name space that is common between them, but it is also true that other
portions of each containers name space will be different. If a container and so all of its processes
have no name for something, those processes will also not be provided an address (of any form)
to that something.
So, for The Machine, lets assume a single OS image residing there within fabric memory. This OS
is somehow capable of distributed management of all of the processors and node-private volatile
memory throughout the system. For performance reasons, lets say that read-only copies or even
node-private versions of portions of this OS are copied into each nodes DRAM. Additionally,
though, this OS decides where, amongst The Machines many nodes, where each container is to
reside, providing the containers processes resources from there. As before, a single container per
node is guaranteed isolation of any data in its own local DRAM, because there is no physical means
by which another container (on a different node) could access another nodes DRAM. Potential
undesirable sharing, though, is possible for containers sharing the same node or in fabric memory.
Because of The Machines real-to-physical mapping of fabric memory, containers on different
nodes could only have shared data in fabric memory if the trusted global hypervisor (within the
OS) had allowed such shared access; if something in fabric memory was private to a container, no
other node would have been provided a real address mapping over that private data.
We could go on for quite a while, but we are sure you are seeing the essence of this. Addressing,
appropriately controlled at the right level, can provide both the desired isolation and secure
sharing.
Lets look at another case in point, memory-mapped files, a concept used to speed access of the
files contents. (We will also be looking in the next section at non-volatile heaps, a rather new
concept.)
In typical file I/O, a file first resides in persistent storage, say out on disk. A program could, and
does often, open a file, access on disk those portions of that file needed by the program, and in
doing so map those accessed bytes into the process EA space, the OS VA space, and after reading
the file from disk into the DRAMs RA space. This is done repeatedly, with such repeated
remapping taking time.
An often used performance enhancement on that notion, memory-mapped files, maps all or
portions of the file into a process address space, whether accessed or not, whether in physical
DRAM or not. Each byte of the file is known by a unique EA. As each EA is used, the accessed
portion of the file is read off of disk and into the DRAM, or, if already address-mapped into DRAM,
accessed directly from the DRAM itself. As long as the file remains open in the process, that unique
mapping can continue. The files bytes can be repeatedly accessed using the same EAs whether
those bytes are at that time in memory or not. If not, the disk space where the needed file bytes
reside is accessed brought into memory, under the corresponding VAs, remaining still under the
relatively persistent EAs.
That is the situation today. What of The Machine in the future? You likely know where we are
leading with this. In todays systems, the memory that we are talking about is the DRAM; the file
was read into DRAM because the processor can only access the DRAM (and EAs are translated to
DRAM-based RAs). In The Machine, that same file can be directly accessed from a processor
without additional copying. All that is required is that the process doing the accessing have the
right to do so. With that right, a memory-mapped file is enabled for access merely by setting up the
addressing to it. Done and done.
But why stop there? In IBM i, Big Blues proprietary operating system for Power-based machines
that has a single-level storage architecture that dates back to the late 1970s, every byte of the
persistently held data on disk is known by a persistent virtual address. It does not matter that the
OS is active (powered on) or not; each byte on disk has its own virtual address; this is the same
virtual address which is used to access that byte when it happens to also reside in DRAM and gets
accessed by a processor. Although every byte on disk is known by its unique virtual address, you
know that the processor cant access that byte until it has been read into the DRAM. In this concept,
the memory the DRAM is essentially a volatile cache of the persistently addressed data which
resides on disk. The virtually addressed bytes on disk within IBM i, are instead, in The Machine,
residing in fabric memory. DRAM need not act as a cache although it could since the fabric
memory can already be accessed by the processor, given the appropriate address mapping. How
such a persistent and globally shared virtual address space is securely managed in the IBM i OS is a
5 of 7 07/22/2016 01:34 PM
subject for another article; we will note, though, that it is done well via a form of capability
addressing.
Related Items
Drilling Down Into The Machine From HPE
The Intertwining Of Memory And Performance Of HPEs Machine
Weaving Together The Machines Fabric Memory
The Bits And Bytes Of The Machines Storage
Future Systems: How HP Will Adapt The Machine To HPC
Share this:
Reddit Facebook 4 LinkedIn 235 Twitter Google Email
Similar Vein
First Steps In The Weaving Together The The Bits And Bytes Of
Program Model For Machines Fabric The Machines Storage
Persistent Memory Memory
The Intertwining Of Drilling Down Into The Programming For

Memory And Machine From HPE Persistent Memory Takes
Performance Of HPEs Persistence
Machine
Categories: Compute, Enterprise, Store
Tags: HPE, The Machine
Inside Microsofts Azure Stack Containerizing Engineering

Private Cloud Simulation for Faster CFD at
Trek
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment
6 of 7 07/22/2016 01:34 PM
Name *
Email *
Website
Post Comment
Pages Recent Posts

About Are ARM Virtualization Woes
Contact Overstated?
Contributors Datacenters, Poised To Spend, Take
Newsletter A Breather From Intel
Stacking Up Oracle S7 Against Intel
Xeon
Systems Are The Table Stakes For
IBMs Evolution
HPC Flows Into Hyperscale With
Dell Triton
Copyright 2015 The Next Platform
7 of 7 07/22/2016 01:34 PM

Operating Systems Virtualization and The Machine

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Operating Systems Virtualization and The Machine

Uploaded by

Copyright:

Available Formats

Operating Systems, Virtualization, And The Machine http://www.nextplatform.com/2016/02/01/operati...

ABOUT CONTRIBUTORS CONTACT THE REGISTER THE CHANNEL

Operating Systems, Virtualization, And The Machine

I recall teaching a college class in operating systems, when well

So what of The Machine?

Virtualization, as managed by a hypervisor, normally allows a systems partitions to share the

Memory Addressing And Security In HPEs Future Machine

On The Machine Real Address Spaces are scoped to a node:

Part of a nodes RA space refers to the nodes local DRAM.

Where an unmapped EA results in an exception when used, an unmapped VA can identify an

The Intertwining Of Memory And Performance Of HPEs Machine

Weaving Together The Machines Fabric Memory

The Bits And Bytes Of The Machines Storage

Future Systems: How HP Will Adapt The Machine To HPC

Reddit Facebook 4 LinkedIn 235 Twitter Google Email

The Intertwining Of Drilling Down Into The Programming For

Categories: Compute, Enterprise, Store

Tags: HPE, The Machine

Inside Microsofts Azure Stack Containerizing Engineering

Pages Recent Posts

Copyright 2015 The Next Platform

You might also like