Professional Documents
Culture Documents
US 8,522,253 B1
Aug. 27, 2013
2
438163991 A
4,920,477 A 5,294,897 A
sglf?urilcfgag a1
3/1989 Watanabe et a1"
4/1990 ColWell et 31. 3/1994 Notani et a1.
2
5,574,878 A
5,754,818 A * 5/1998
glannjn 33L
an ye .
11/1996 Onodem et a1
M h d ................... .. 711/207
5,872,985 A * 6,075,938 A
6,167,490 A
6,260,131 B1
6,907,600 B2
7,111,145 B1
6,604,187 B1
8/2003
McGrath et a1.
Flledi
7,278,030 B1
2004/0064668 A1
2004/0117593 A1
OTHER PUBLICATIONS
Barham et al., Xen and theAIt 0fVirtualiZati0n,ACM Symposium
(51)
Int CL
G06F 9/46 (2006.01)
db
.
G06F 9/26
(52) US. Cl.
(2006.01)
one
Y exammer
i
prmary Examm
Camquy Tmong
(57)
ABSTRACT
Amethod for tagging cache entries to support context sWitch ing for virtual machines and for operating systems. The method includes, storing a plurality of entries Within a cache
of a CPU of a computer system, Wherein each of the entries includes a context ID, handling a ?rst portion of the entries as local entries When the respective context IDs indicate a local status, and handling a second portion of the entries as global
2
43524415 A
4,527,237 A
1521mm: i; alt 31
6/1985 Mills, Jr. et a1.
7/1985 Frieder et a1.
4,577,273 A
m
EXECUTE A PLURALITY OF APPLICATIONS WITHIN EACH VIRTUAL MACHINE
5.0.1
PERFORM VIRTUAL ADDRESS TD PHYSICAL ADDRESS TRANSLAHON FOR EACH OF THE VIRTUAL MACHINES USING HOST MACHINE HARDWARE SUPPORT
STORE A PLURALITY OF TLB ENTRIES FOR EACH OF THE VIRTUAL MACHINES USING HOST MACHINE HARDWARE SUPPORT, WHEREIN ONE OR MORE OF THE ENTRIES ARE GLOBAL ACROSS THE APPLICA11ONS OF A VIRTUAL MACHINE
I
STORE A PLURALITY OF TLB ENTRIES FOR EACH OF THE VIRTUAL MACHINES USING HOST MACHINE HARDWARE SUPPORT, WHEREIN ONE OR MORE OF THE ENTRIES ARE GLOBAL ACROSS MULI1PLE VIRTUAL MACHINES
59A
US. Patent
Sheet10f8
US 8,522,253 B1
_"I
_________1
VM1
VM 2
l l l I
' APP
I l l l
APP I
I l I l
l APP
| l I l
APP I
:
l I I l
1_2
os1
I i
I | | I
132
osz
I
l l l I I
12_1
l | l I
_________ ___..|
I___________
HYPERVISOR
1_42
HOST MACHINE
FIGURE 1
US. Patent
Sheet 2 0f 8
US 8,522,253 B1
K.
ENTRIES
200
PAYLOAD
GLOBAL
CM GLOBAL
CONTEXT ID
VALID
PAYLOAD
PAYLOAD
GLOBAL
GLOBAL
CM GLOBAL CM GLOBAL
CONTEXT ID CONTEXT ID
VALID
VALID
PAYLOAD
PAYLOAD
GLOBAL
GLOBAL
CM GLOBAL
CM GLOBAL
CONTEXT ID
CONTEXT ID
VALID
VALID
O
O
O
0
FIGURE 2
US. Patent
Sheet 3 of8
US 8,522,253 B1
M130? m
US. Patent
Sheet 4 of8
US 8,522,253 B1
400
CPU
m.
SYSTEM MEMORY
415 _'
NORTH BRIDGE
i_2_
GRAPHICS
DISPLAY
SOUTH BRIDGE
DISK
USER lo
.432
FIGURE 4
US. Patent
500
Sheet 5 of8
US 8,522,253 B1
m
V
E
II
E
II
PERFORM VIRTUAL ADDRESS TO PHYSICAL ADDRESS TRANSLATION FOR EACH OF THE VIRTUAL MACHINES USING HOST MACHINE HARDWARE SUPPORT
595
V
STORE A PLURALITY OF TLB ENTRIES FOR EACH OF THE VIRTUAL MACHINES USING HOST MACHINE HARDWARE SUPPORT, WHEREIN ONE OR MORE OF THE ENTRIES ARE GLOBAL ACROSS THE APPLICATIONS OF A VIRTUAL MACHINE
505
II
STORE A PLURALITY OF TLB ENTRIES FOR EACH OF THE VIRTUAL MACHINES USING HOST MACHINE HARDWARE SUPPORT, WHEREIN ONE OR MORE OF THE ENTRIES ARE GLOBAL ACROSS MULTIPLE VIRTUAL MACHINES
&
FIGURE 5
US. Patent
00
Sheet 6 of8
US 8,522,253 B1
Q2
II
PROVIDE VIRTUAL ADDRESS TO PHYSICAL ADDRESS TRANSLATION FOR EACH OF THE APPLICATIONS AND FOR THE OPERATING SYSTEM USING HOST MACHINE HARDWARE SUPPORT 603
I
STORE A PLURALITY OF CACHE ENTRIES WITHIN A VIRTUALLY TAGGED CACHE FOR EACH OF THE APPLICATIONS AND FOR THE OPERATING SYSTEM USING HOST MACHINE HARDWARE SUPPORT 604
FIGURE 6
US. Patent
Sheet 7 of8
US 8,522,253 B1
tEm?zo a
mh EDQE
3b
55 n:
US. Patent
Sheet 8 of8
US 8,522,253 B1
CnAP-.:20 a
Mw 130?
won
N5
.R23E15 05 a
m055 QOQ %
wolm
US 8,522,253 B1
1
HARDWARE SUPPORT FOR VIRTUAL MACHINE AND OPERATING SYSTEM CONTEXT SWITCHING IN TRANSLATION LOOKASIDE BUFFERS AND VIRTUALLY TAGGED CACHES
2
FIG. 3 shows a diagram depicting TLB or cache matching hardware in accordance with one embodiment of the present invention. FIG. 4 shows a diagram of a computer system platform in accordance with one embodiment of the present invention. FIG. 5 shows a ?owchart of the steps of an exemplary
TLB-based process in accordance with one embodiment of
This application is a Continuation in Part of US. applica tion Ser. No. 11/096,922, now US. Pat. No. 7,734,892, ?led on Mar. 31, 2005, to RoZas et al., entitled MEMORY PRO TECTION AND ADDRESS TRANSLATION HARDWARE SUPPORT FOR VIRTUAL MACHINES which is incorpo
The present invention relates generally to digital computer systems. More speci?cally, the present invention pertains to e?iciently implementing hardware support for virtual machine and operating system context switching in TLBs
virtually tagged cache-based process in accordance with one embodiment of the present invention. FIG. 7 shows a diagram depicting TLB or cache matching hardware in accordance with one embodiment of the present invention. FIG. 8 shows a diagram depicting TLB or cache matching hardware in accordance with one embodiment of the present invention.
20
embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the inven tion will be described in conjunction with the preferred embodiments, it will be understood that they are not intended
to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modi?cations
30
and equivalents, which may be included within the spirit and scope of the invention as de?ned by the appended claims.
context) such that multiple processes can share a single CPU resource. World switching is the process of switching
between two or more worlds of a virtual machine architec
present invention. However, it will be recogniZed by one of ordinary skill in the art that the present invention may be practiced without these speci?c details. In other instances,
With respect to TLBs and virtually tagged caches, with both context switching and world switching, the computer system
needs to ?ush or tag the TLB/cache. In each case, what is
required is a solution for ef?ciently supporting global pages, both operating system global pages and virtual machine glo
bal pages.
DISCLOSURE OF THE INVENTION
45
obscure aspects of the embodiments of the present invention. Embodiments of the present invention implement a method and system for providing hardware support for virtual machine and operating system context switching in transla tion lookaside buffers and virtually tagged caches. For example, in one embodiment, the present invention is imple
mented as a process for providing hardware support for
50
and system for implementing hardware support for virtual machine and operating system context switching in transla tion lookaside buffers and virtually tagged caches.
BRIEF DESCRIPTION OF THE DRAWINGS
55
memory protection and virtual memory address translation for a virtual machine is implemented by executing a host machine application on a host computer system, executing a ?rst operating system within a ?rst virtual machine, and executing a second operating system within a second virtual machine. The ?rst and second operating systems support a
computer system.
The TLB or cache entries each include respective context
60
This unique identi?cation prevents virtual address aliasing problems between the applications of the different operating
systems of the different virtual machines, and prevents unnec
65
present invention.
FIG. 2 shows a diagram of a plurality of entries of a TLB
(translation lookaside buffer) in accordance with one embodi ment of the present invention.
essary ?ushes of the TLB or cache entries. Embodiments of the present invention and their bene?ts are further described below.
US 8,522,253 B1
3
Notation and Nomenclature
4
150). Thus, only the hypervisor 140 comprises the true ker
nel having ?nal control of the host machine 150. Alternatively, in one embodiment, the hypervisor 140 can
be implemented as an application that itself runs on top of a
Some portions of the detailed descriptions Which folloW are presented in terms of procedures, steps, logic blocks, processes, and other symbolic representations of operations
on data bits Within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their Work to others skilled in the art. A procedure, com
host operating system and uses the host operating system (e.g., and its kernel) to interface With the host machine 150. The hypervisor presents programs executing Within the
virtual machines 101-102 With the illusion that they are in fact
executing upon a real physical computer system (e.g., bare metal). The programs executing Within the virtual machines
101-102 can be themselves operating systems Which further support other application programs, such as the operating systems 121 and 131 and the application programs 125-126 and 135-136. In general, all of these programs need to be provided, or otherWise negotiate, the consent of the hypervi
sor 140 in order to access the functionality of the host
machine 150 to perform, for example, user IO, and the like. The computer system embodiment 100 shoWs a special
case Where the virtual machines 101-102 each execute their
20
It should be borne in mind, hoWever, that all of these and similar terms are to be associated With the appropriate physi cal quantities and are merely convenient labels applied to these quantities. Unless speci?cally stated otherWise as
apparent from the folloWing discussions, it is appreciated that throughout the present invention, discussions utiliZing terms
such as storing or accessing or recognizing or retriev
25
executing upon a real physical computer system. Accord ingly, the operating systems 121 and 131 Will function most e?iciently When provided support for their oWn respective
virtual memory management systems. These systems assume
the presence of a dedicated host machine TLB and cache.
ing or translating or the like, refer to the action and pro cesses ofa computer system (e.g., system 400 of FIG. 4), or
similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quanti ties Within the computer systems registers and memories into other data similarly represented as physical quantities Within
the computer system memories or registers or other such information storage, transmission or display devices.
30
Embodiments of the present invention function in part by reducing the expense of ?ashing the TLB/cache on World sWitches (e.g., betWeen virtual machines) and context sWitches (e. g., betWeen applications on an OS). FIG. 2 shoWs a diagram of a plurality of entries 200 of a
TLB or cache in accordance With one embodiment of the
40
the typical entries that are cached Within the CPU of the host machine 150. Each of the entries comprise a virtual address tag, a payload, an optional global indicator, a context identi ?er, and a validity indicator, as shoWn. An optional cross machine global indicator can also be included as shoWn. In accordance With embodiments of the present invention, the entries 200 are tagged such that the TLB/cache of the host machine 150 can include entries for different applica tions executing Within different contexts. In the FIG. 2 embodiment, the entries 200 are tagged With respect to their
context ID so that upon a sWitch from one context to another,
host computer system 150, often referred to as the bare metal machine. It should be noted embodiments of the present invention are also applicable non-virtual machine related architectures. For example, embodiments of the present invention can also
45
the TLB/cache does not have to be completely invalidated (e.g., ?ushed). In this manner, the context ID alloWs entries
coexist. Thus, for a valid entry match, each TLB/cache lookup must match not only the virtual address and valid bits, but also match the context-ID, in order to be successful. In addition to using context identi?ers, the entries 200
hardWare abstraction layer that controls physical resources and ?r'mWare resources (e.g., BIOS) of the real computer
system hardWare (e. g., motherboard, IO devices, peripherals, etc.). Only the hypervisor 140 is exposed to the bare machine interface of the computer system 100 (e.g., host machine
US 8,522,253 B1
S
Ware 300 shows the logical operation of match detection
6
As described above, in one embodiment, TLB or cache entries can be marked as global across multiple virtual machines. In such an embodiment, an additional indicator/bit
As described above, the hypervisor 140 presents programs executing Within the virtual machines 101-102 With the illu
sion that they are in fact executing upon a real physical com
interaction With the virtualiZation layer/hypervisor (e.g., hypervisor 140). The logic depicted in FIG. 3 can be modi?ed to incorporate the CM global ID 320 as shoWn by the dotted lines. Fast hardWare based context sWitching support Would
thus be provided to all virtual machines needing access to the
virtual machines. With reference noW to FIG. 4, a computer system 400 in accordance With one embodiment of the present invention is
the present invention that provides the execution platform for implementing certain softWare-based functionality of the present invention. As described above, certain processes and
steps of the present invention are realiZed, in one embodi ment, as a series of instructions (e. g., softWare program) that reside Within computer readable memory units of a computer
entries (e.g., if only visible micro-architecturally). In the present embodiment, the context-ID register is extended to have tWo ?elds, namely the per-process context-ID ?eld (e.g., App Context ID 301), and the machine-instantiation/virtual machine context-ID ?eld (e. g., VM ID 302). The virtual address tag 303 functions conventionally.
As the logic depicted in FIG. 3 shoWs, a match 310 occurs only When the virtual address tags match and the TLB or cache entry is valid. If the TLB or cache entry has the global
30
35
bit clear (e. g., marked not global), the entrys context-ID ?eld
must match the per-process application context-ID ?eld 301 of the context-ID register. If the TLB or cache entry has the
system (e. g., system 400) and are executed by the CPU 401 of system 400. When executed, the instructions cause the system 400 to implement the functionality of the present invention as described above. In general, system 400 comprises at least one CPU 401 coupled to a North bridge 402 and a South bridge 403. The North bridge 402 provides access to system memory 415 and a graphics unit 410 that drives a display 411. The South bridge 403 provides access to a coupled disk drive 431 and various user I/O devices 433 (e.g., keyboard, mouse, etc.) as shoWn.
Referring noW to FIG. 5 and FIG. 6, FIG. 5 shoWs a How
40
tagged cache-based process 600 supporting a plurality of applications executing upon a common operating system in
accordance With one embodiment of the present invention.
the host machine. In step 502, multiple virtual machines (e. g.,
virtual machines 101 and 102) are instantiated on top of the
virtualiZation layer. In step 503, a plurality of applications are executed Within each virtual machine (e.g., applications 125 and 126). In step 504, virtual address to physical address translation is performed for each of the virtual machines using host machine hardWare support. In step 505, a plurality
of TLB entries for each of the virtual machines are stored
cache entry is ?lled When the entry is inserted into the TLB or cache. The process of ?lling the context ID ?eld is substan
global across the applications of one of the virtual machines. As described above, embodiments of the present invention can support global access to a TLB entry by all applications of
one of the virtual machines. In step 506, a plurality of TLB entries for each of the virtual machines are stored using host
65
are global across multiple virtual machines. As described above, embodiments of the present invention can support
US 8,522,253 B1
7
global access to TLB entries across all virtual machines, and
8
range are global IDs and Which numbers are local IDs. In such an embodiment, global IDs and local IDs can be allocated
thus all virtual machine operating systems/applications. As depicted in FIG. 6, process 600 begins in step 601, Where an operating system (e. g., operating system 131)
executes directly on a host machine (e.g., directly on the bare
metal host machine 150). In step 602, a plurality of applica tions (e.g., applications 135-136) are executed on top of the operating system. In step 603, virtual address to physical address translation is provided for each of the applications and for the operating system using host machine hardWare
support. In step 604, a plurality of cache entries are stored for
local IDs and global IDs canbe changed dynamically depend ing upon the changing needs of a given application (e. g.,
on-the-?y). This process Will Work so long as local IDs and/or
are tagged such that they are global across the applications
and across the operating system and one or more of these
entries are tagged such that they are local, speci?c to one of
present invention. As depicted FIG. 8, the hardWare 800 is substantially similar to the hardWare 700 described above in the discussion of FIG. 7. HoWever, the hardWare 800 embodi
20
present invention. As depicted in FIG. 7, the hardWare 700 is substantially similar to the hardWare 300 described above in the discussion of FIG. 3. HoWever, the hardWare 700 embodi
ment utiliZes an OR gate 701 to logically combine the com
25
physical address translations for code versus data. In such machines, virtual machine IDs and context IDs can be tracked With respect to both code and data. In the present embodi ment, multiplexers 811-812 are used to select betWeen code and data references. In general, embodiments of the present invention can
shown, and also deletes the use of the global bit (e.g., shoWn in FIG. 3). In the hardWare 700 embodiment, virtual address
tags can be invalidated on either a local scale or a global scale,
implement X number of choices for each ofVM ID andApp context ID, Were each choice represents, for example, differ ent translation domains. In the FIG. 8 embodiment, there are
tWo choices, one for code and one for data. The selection as to Whether a references is for code or data is made earlier in the
even though the global bit is removed. This results in a sig ni?cant simpli?cation of the constituent logic While still pro
viding the same functionality as the hardWare 300 embodi ment of FIG. 3. The hardWare 700 embodiment functions through the use of a management process that allocates space for the virtual
machine IDs 302 and the context IDs 301 in such a manner
35
text ID 301 for a code access, or the virtual machine ID 802 and the context ID 801 for a data access. In other respects, the
that they do not overlap. For example, in a case Where a total of N bits comprise both the virtual machine ID 302 and the context ID 301, the N bits de?ne a numerical range of 2N (e. g.,
hardWare 800 embodiment functions substantially the same as the hardWare 700 embodiment described above.
integers 0 through 2N), and the management process can allocate up to 2N entries, either global or local, such that they do not overlap.
Thus in an exemplary case Where a total of 8 bits comprise both the virtual machine ID 302 and the context ID 301, the
present invention have been presented for purposes of illus tration and description. They are not intended to be exhaus
tive or to limit the invention to the precise forms disclosed, and obviously many modi?cations and variations are possible in light of the above teaching. The embodiments Were chosen
50
and described in order to best explain the principles of the invention and its practical application, to thereby enable oth
ers skilled in the art to best utiliZe the invention and various embodiments With various modi?cations as are suited to the
(e. g., for global processes) are tracked. Continuing the above
example, in one embodiment, the range of numbers can be tracked such that some portion of the range (e.g., the numbers 0 through 200) are for local IDs and a remaining portion of the
55
particular use contemplated. It is intended that the scope of the invention be de?ned by the claims appended hereto and
their equivalents.
What is claimed is:
1. A method comprising:
60
range (e.g., the numbers 201 through 255) are for global IDs,
or the like.
storing a selecting value, a ?rst context ID, and a second context ID in at least one cache entry of cache entries in
It should be noted that the local IDs and the global IDs do not need to be allocated in the same proportion. For example, if a larger proportion of local IDs are needed in comparison to global IDs, the 2N range can be allocated such a larger pro portion of local IDs are allocated in comparison to global IDs. In one embodiment, a separate data structure (e.g., table, etc.) can be maintained to keep track of Which numbers of the
ing comprises:
65
nal;
US 8,522,253 B1
10
comparing a second context ID of the cache entry With a second provided context ID and generating a second
comprises:
comprises:
using a signal to select the ?rst provided context ID and the selecting a match signal from the ?rst match signal and second provided context ID from a plurality of context the second match signal based on the selecting value ID registers. of the cache entry; 7. The method of claim 6, Wherein at least one of the comparing the virtual address tag of the cache entry With plurality of context ID registers stores a data context. a provided virtual address tag; and 8. The method of claim 6, Wherein at least one of the generating a third match signal. plurality of context ID registers stores a code context. 2. The method of claim 1, Wherein said searching further 10 9. The method of claim 5, Wherein said searching further
match signal;
comprises:
performing a logical operation With the match signal, the
third match signal, and a valid bit of the entry; and
cache;
storing a virtual address tag in the cache entries; and
ing comprises:
comparing a ?rst context ID of a cache entry With a ?rst
ID of the cache entry With a second provided context ID and operable to output a second match signal;
a third comparator operable to compare a virtual address
25
nal;
comparing a second context ID of the cache entry With a second provided context ID and generating a second
tag of the cache entry With a provided virtual address tag and operable to output a third match signal; a multiplexer operable to receive the ?rst and second match
match signal;
setting a match signal to a match state if at least one of
comparing the virtual address tag of the cache entry With a provided virtual address tag; and generating a third match signal. 4. The method of claim 3, Wherein said searching further
anAND gate operable to receive the match signal, the third match signal, and a valid bit of the cache entry and operable to generate an output match signal. 11. A circuit comprising:
a ?rst comparator con?gured to compare a ?rst context ID of a cache entry With a ?rst provided context ID and
comprises:
performing a logical operation With the match signal, the
third match signal, and a valid bit of the cache entry; and
35
text ID of the cache entry With a second provided context ID and con?gured to output a second match signal; an OR gate con?gured to receive the ?rst and second match
(TLB);
storing a virtual address tag in the entries; and
provided context ID in response to a signal; and a second multiplexer con?gured to receive outputs from a
nal;
comparing a second context ID of the entry With a sec
match signal;
setting a match signal to a match state if at least one of
50
55
tag of the cache entry With a provided virtual address tag and con?gured to output a third match signal. 13. The circuit of claim 12 further comprising: an AND gate con?gured to receive the match signal, the third match signal, and a valid bit of the cache entry and con?gured to generate an output match signal.
* * * * *