You are on page 1of 5

xVP: Re-architecture of VMware internal VASA Provider

with pluggable backends


Sujay Godbole, Vladimir Senkov, Andrew Davenport
{godboles, vsenkov, adavenport}@vmware.com

Abstract 1. Introduction
In the last couple of years, VMware has engaged with storage
vendors to come up with solutions like VAAI, VASA and VVols. VMware developed SampleVP as internal VASA provider for
These solutions require close engagement with core partners at functional testing. It is used for testing vSphere features like
both design and implementation phases, thus increasing VASA and VVols. With introduction of VVols, key VM life cycle
VMware’s reliance on partner code drops to release a feature. The operations such as snapshot, clone, etc are offloaded to VASA
quality of VMware code and the ability of QE to test the entire provider. From a scalability point of view, it is necessary to
feature heavily depend on the quality of partner code drops and perform these operations fast even for larger disk sizes. The
the speed with which partners can fix issues found in testing. existing architecture of SampleVP performed poorly in both the
Often release timeline differences between partners and vSphere data path and the control path. This impacts VMware’s ability to
make things harder. Thus, based on the VAAI experience, deliver newer features faster. xVP is major re-architecture of the
VMware developed an internal VASA provider (AKA SampleVP) existing SampleVP, aimed at providing performance, scalability
for VASA and VVols, which enabled developers to test features and ease of development. xVP primarily uses ZFS in its backend
before providing periodic drops to partners during development to provide instantaneous snapshot and fast clone operation. Apart
phases. Initially, SampleVP was used as a model VP, which from supporting ZFS, it allows backend plugins so that third party
helped partners in writing their VASA 1.0 providers and also storage test targets like SanBlaze can be integrated with xVP.
helped VMware QE for VASA1.0 testing. But for VVol it was
extended with SCSI and NFS backends using Linux Volume Fig 1, shows the basic architecture of xVP. It includes 4 main
manager. SampleVP was primarily designed for correctness for components: Java frontend, generic Python back end, backend
functional verification rather than high-performance for large- plugins and MySQL Database.
scale testing. SampleVP being a basic implementation, VMware
still has to rely on partner drops for large scale testing, which still
poses a serious hurdle.
VC

This paper discusses xVP, which is a major re-


architecture of the existing internal VP, providing more scalable SMS ESX

architecture, which can satisfy performance requirements needed VASA Client

for large scale testing. Primarily, changes involve using ZFS as a VASA Server (VP)

storage backend and MySQL database to store VP state rather VxVP VM

than XML files. This provides high performance on data services Java Front end – tomcat service Local
Client

like snapshot, clone, diff APIs and easy synchronization between


VP
database
Generic Python Back-end
various components of the VP. It also allows us to easily simulate
various VP HA scenarios. Another key feature is to provide clean ZFS
backend
SanBlaze
backend

interface for backends such that it is possible to support multiple


implementation of storage back ends in future. This provides
ability to integrate third party storage test targets like SanBlaze
with the XVP, which can supports next version of VASA APIs
under testing. This gives the additional ability to use features of
these test targets for internal testing. A few other features of xVP Fig1: xVP basic architecture
include uniform logging, more configuration control to emulate
The Java frontend is responsible for receiving incoming VASA
parameters returned by real VPs, generate events, fault injection,
requests. The generic python backend is the main engine, which
etc. Currently xVP is primary testing target for testing VVol
implements all VASA APIs using the attached backend plugins.
replication feature for both FVT and System test teams.
The built-in ZFS-based backend plugin implements data path
Categories and Subject Descriptors APIs exported by the generic python backend. The individual
[Storage Data Management]: performance and scalability, plugin is responsible for data path provisioning and management
partner enablement, cloud, resource management, object storage. of protocol-specific VVol functionality. The Java frontend passes
data path and provisioning operations to the generic backend. The
General Terms MySQL database keeps track of the xVP state, including VP
Management, Measurement, Performance, Reliability, configuration (storage arrays, storage containers, replication
settings, etc.) and operational data (VVols, VASA session info,
Keywords Protocol endpoints, binding information, SPBM policies, etc.).
Virtual Volumes, Object Storage, LUNs, NFS, SCSI, Storage, We currently access the database only from the backend.
VASA, SPBM
We also have a Python-based VASA Client for unit testing 2.1 Motivation
individual APIs and various test scenarios. One of the main drawbacks of the existing implementation of
SampleVP is scalability issues. Being implemented on top of
2. Background LVM, it does not scale well with load tests, especially for
operations like snapshot and clone. Some of the recent
In the last couple of years, VMware has engaged with storage improvements made in the SampleVP helped QE to integrate
vendors to come up with solutions like VAAI, VASA and VVols. VVol tests with the CAT infrastructure, but depending on the load
These solutions require close engagement with core partners at on the testing infrastructure, some time-critical tests fail
both design and implementation phases, thus increasing intermittently. This means that VMware still has to rely on partner
VMware’s reliance on partner code drops to release a feature. The VPs for running system tests.
quality of VMware code and the ability of QE to test the entire
feature heavily depends on the quality of partner code drops and Some of the main limitations of SampleVP include:
the speed with which partners can fix issues found in the testing 1. LVM backend means snapshot and clone are slower.
cycles. Sometimes release timeline differences between partners Especially hot snapshot (with 5 Sec timeout) does not scale
and vSphere make things harder. Thus, based on the VAAI beyond a size of 2GB.
experience, VMware developed an internal VASA provider (AKA
SampleVP) for VASA1.0 and VVols which enabled developers to 2. All VVols are thick provisioned. There is no support for thin
test features before providing periodic drops to partners during provisioning in LVM.
development phases. 3. Bitmap APIs implementation needs to read and compare
VASA 1.0 introduced out of band communication blocks, which makes migration very slow and causes
channel between vSphere and storage Array using VASA timeouts.
providers. This allows vSphere to get more details about some of
the array side settings and information. Initially, SampleVP was 4. The entire VVol state is in single XML database. Backend
used as a model VP, which helped partners in writing their VASA operations are almost serialized due to poor locking
1.0 providers and also helped VMware QE for VASA testing. implementation.
SampleVP is based on a SLES VM with an Apache Tomcat
5. Information “split brain,” as VASA 1.0 and SPBM state is
server. Tomcat hosts a SOAP web service implementing VASA
with Java frontend and VVol information is with Perl
1.0 APIs using a simple database. Some of the values in the
backend.
database can be populated using various XML files or a local
VASA client. 6. Communication between front end and backend is slow and
VASA 1.0 did not include any data path operations. expensive.
With VVols (supported in VASA 2.0), vSphere introduced
offloading of data path operations such as create, snapshot, clone, 7. No uniform logging.
fastClone, etc to the storage arrays using VASA. For VVol 8. Poor infrastructure for event and alarm generation
support, SampleVP was extended with SCSI and NFS backends
using Linux Volume Manager (LVM) to implement data path 9. Support for upcoming features like replication, VAIO and
operations. Each Logical Volume (LV) represents a VVol and VMODL VP is difficult with the LVM backend.
they are exported over SCST SCSI and Linux NFS servers. VVol The nature of VASA and VVol features are such that there is a
management is controlled using a simple XML database. The cyclic dependency between VMware and core partners. Partners
following diagram shows the architecture of SampleVP. need the latest vSphere bits to test their side of changes and
VMware relies on partners for scale and performance testing. This
means that VMware will always need an internal VP
implementation like SampleVP for doing basic feature testing
before providing code drops to partners. Considering this, it
necessary for VMware to invest in SampleVP development and
make sure that partner reliance is reduced to some extent. xVP
provides a first attempt at a next-generation VP, which can satisfy
existing QE requirements and ease development for upcoming
features.

3. Related Work

SampleVP is the first implementation of a VASA provider by


VMware. In the VASA 1.0 time frame, VMware has shared
SampleVP code with vendors as an example provider. Later, the
vCenter team developed iVP (internal VP) to integrate with
automated precheck-in tests for vCenter (SMS). iVP mainly
focuses on VASA 1.0 and SPBM related features. It does not have
any support for data path. Thus iVP was not suitable for VVol
testing.
Fig2: SampleVP architecture overview
4. Design 5.2 VMODL API Frontend:
In upcoming vSphere release, VMware is planning to support
As described earlier, xVP is major re-architecture of SampleVP some of the IOFilter VPs using VMODL interface. Since xVP
primarily aimed at performance and scalability aspects. This architecture is flexible, it was easy to add VMODL frontend
section covers the detailed design, along with major design which can receive VASA APIs using VMODL and translate into
decisions we made. appropriate backend interface API.

5.3 Backend Interface:


4.1 Goals Backend Interface passes VASA/VMODL API to backend for
To keep the impact to existing systems minimal and facilitate a actual processing. Backend interface is defined in Thrift and there
smooth transition, we set out with a few design goals. is one to one mapping between VASA API and Backend interface
• Major design goal was to not rewrite everything. Make sure API. Data returned by backend is a JSON blob matching VASA
that we use part of existing code base as much as possible. data structures. Thus both VMODL and VASA frontend needs to
map JSON response to appropriate data structure and return it.
• Provide a backend, which can scale well with various data Since JSON doesn’t handle Class inheritance well, for some
service operations needed for VVol. This is very important responses, related to replication APIs frontend needs to use
for VVol integration with CAT using VMs with OS images. custom JSON mapper routines.
It should also help to get System test on board early on any
VASA feature and help developers in finding performance .
issues earlier in the release process. Fig 3: xVP detailed architecture
• Improve communication between front end and backend by
avoiding polling.
• Improve performance by reducing lock contentions.
• Provide a framework for integration of different backends,
which can help in testing different types of load and scale.

4.2 Architecture
Fig 3 shows xVP detailed architecture. It describes details of
various xVP frontend and backend components and interaction
between them. Subsequent sections will describe more details
about xVP frontend and backend components.

5. xVP Frontend

Similar to SampleVP, the xVP Java front end is a web service


running inside of Tomcat, which is responsible for receiving
VASA requests
The xVP Frontend has 3 major components.

5.1 VASA API Frontend:


This is the entry point of each VASA API (SOAP) call into the
xVP. It does basic API validation and certificate management.
Parts of SampleVP java frontend were retained in VASA API
frontend.
Request Filtering:
Flexibility and rapid development is key goal of xVP. With VVol
1.0 being released, partner VPs are maturing and hence for some
VASA 2.0 APIs it will be easier to pass on the control to partner
VPs and emulate newer APIs inside xVP. For each API, Request
filter will look at the FilterDB setting and forward the VASA
request to configured partner VP if necessary. Otherwise it will
pass it to the xVP common front end to implement/emulate it.
6. ConfigDB: MySQL database service
Currently xVP doesn’t implement request filtering.

The current SampleVP java frontend maintains information about


VASA 1.0 arrays, SPBM capabilities, storage Alarms and events.
This information is stored in a simple database. A Perl backend is
maintaining the information about VVol array, storage containers the LVM-based backend is integration with the VVol snapshot
and VVols provisioned on these containers. This resulted in a model. A VVol snapshot is taken in two phases. In the first phase,
“split-brain” situation and extra calls from front end to backend. ESX asks the array to prepare for the snapshot so that in the
Also, the backend stores this information in a simple XML file, second phase, the actual snapshot operation finishes very quickly.
which result in contention. xVP uses a MySQL database to ensure This is important because the VM is stunned during this second
that configuration and state information is stored in a single place phase to ensure data consistency across multiple disks. LVM
and is accessible to various backend threads. Currently, this doesn’t support two-phase snapshots. Also, its snapshots are not
information is accessed by the backend only. The backend uses instantaneous and hence LVM snapshot does not work well for
SQL transactions for atomic updates across tables. Fig 4 shows larger VVol sizes.
the xVP config DB schemas for VVol 1.0 and replication support. VVol also has bitmap APIs which require the VASA provider to
diff two VVols or provide a block allocation map of a VVol. With
LVM, all VVols were thick and diff-ing two VVols involved
reading and comparing the data from two LVs. Similarly, finding
whether a given block is allocated or not involved reading the
blocks and comparing it with an empty block. LVM also doesn’t
allow efficient cloning at the LV level. SampleVP created clones
by copying the data, so it performed very poorly during snapshot,
clone and migration workflows.
ZFS, being combination of file system and logical volume
manager, fits the requirements much better. It provides
instantaneous snapshot and clone operations for large files. It also
provides advanced capabilities such as encryption and de-
duplication, which can be useful to simulate SPBM capabilities.
The xVP backend consists of 2 components:
7.1 Generic Python backend
This is the main engine, where every VASA call lands. It
implements the Thrift interface defined for communication
between xVP Frontend and backend. And uses ConfigDB and
backend plugin to perform various VASA operations. Upon
receiving API call from frontend, it validates the VASA session
and other parameters. It uses MySQL transactions to implement
complex database updates.
7.2 Backend plugin interface
We have defined a backend plugin interface for some of the data
path operations like create, delete, bind/unbind, clone, snapshot,
and the informational operations like getting the allocation bitmap
or differences between VVols. The backend plugin can
implement these APIs, which allows xVP to support multiple
types of storage technologies in the future without a major
rewrite. Currently, xVP comes with a default ZFS-based backend
plugin. Being an internal target, there will always be a possibility
that xVP will not be the preferred target for some newer VASA
features. For example third party storage test targets like SanBlaze
can help with testing various flavors of data path functionalities
by emulating behavior of various storage arrays. Allowing
pluggable backends, which can talk with other data path
technologies, helps QE to test a wider spectrum of test cases using
xVP.

Fig 4: ConfigDB Schemas for VVol 1.0 and VVol replication 7.3 ZFS Backend:
ZFS is a copy-on-write transactional file system, which also
supports logical volumes and has snapshot and clone capabilities.
7. xVP backend We utilize ZFS logical volumes (zvols) to store VVol objects.
ZFS supports taking snapshots of multiple volumes at the same
Primary feature of xVP backend is support of data path operations time. This is important for many VASA workflows, especially
using ZFS backend. when creating points in time for VASA 3.0 replication support.
xVP being a test target, it utilizes a single ZFS pool for the entire
Why ZFS: xVP instance including all containers and sites (in replication
The xVP backend is where API processing happens. SampleVP configurations) which allows the use of native clones across
used LVM for its storage backend. One of the major drawbacks of containers and for replication purposes. We had to add a few low
level operations to ZFS to support VASA bitmap operations on VASA3 testing: For 2016, release xVP 1.1 implements all of the
VVols. In particular, we've added support for getting allocated VASA 3.0 (VVol replication APIs) functionality and provides a
bitmaps for a zvol, diffing arbitrary zvols and copying differences solid platform for testing VVol based replication solution. VVol
between arbitrary zvols. These features were added by accessing replication Layer 0 and Layer 1 testing is currently performed on
zfs internal block metadata rather than having to read the data xVP.
itself, which improves performance significantly, especially for System Test: In recent runs where the xVP VM was deployed on
sparse disks, which are very common in vSphere environments in SSD backed VMFS5 datastore on ESX host 128GB memory,
general and in test environments in particular. Certain VASA System Test team was able to power on 400 VMs using 4 ESX
workflows such as reverting to arbitrary snapshot are not natively nodes. VMs are combination of Linked clone, full clone and non-
supported by ZFS, but we were able to overcome these limitations persistent VMs. This is tremendous improvement over SampleVP,
by utilizing native ZFS clones and keeping track of the which used to fail to power on a couple of VMs simultaneously.
relationship within the VVol object hierarchy in the xVP database. To give more perspective for these numbers, for VVol 1.0 release
Overall goal that we had and were able to achieve completely in design partners DELL and HP support 1024VVols/200VMs.
our first implementation was to implement all VASA workflows Apart from this System Test is able to Test 1024 VASA endpoint
including replication in a metadata driven manner, without the testing using xVP handful of xVP VM.
need to read, compare and copy volume data which had to be The team is currently fixing issues found in other
done in sampleVP for many workflows. System Test workflows. Based on above results, team is well on
7.4 VVolFS: course of achieving the promised goal of passing System Test on
We added a kernel module, VVolFS, which takes exports a ZFS xVP for VVol 2.0 replication feature with 500 VMs.
device as a separate file system with a single file representing that
VVol. This is used in mounting and exporting NFS non-config
VVols. 10. Future Work
Future tasks once VASA 3 is released
The rest of the data path stack remains the same. Files on ZFS will
be exported over using SCST/iSCSI and NFS for IO. Information - Support multiple VASA versions in single xVP VM
about binding of VVols to Protocol endpoints is stored in - Support VASA 1.0 APIs and other related functionality so
ConfigDB. that SampleVP can be fully deprecated.
- Support the VAIO based VASA APIs
- Implement Request Filter and SanBlaze plugin.
8. xVP Packaging
Acknowledgements
Apart from other performance limitations, one of the limitations We would like to thank Patrick Dirks, Derek Uluki, Naga Ullas
of SampleVP was slow OVF Deployment. This is because the Vankayala Harinathagupta, Deepak Babarjung and all of the other
SampleVP base image was oversized and SampleVP code was folks in VASA3 development that have contributed to the xVP
getting compiled during the installation process. In xVP, we use a project with design input, reviews, and implementation.
VMware Studio-generated VA to create a thin Ubuntu 14.4 OVF.
Since xVP will not be shared with any partner, there is no need to References
compile a stripped version of the code on xVP VM, so all xVP [1] Virtual Volume Specification (1 & 2) by the VVol team and
bits are packaged as Debian packages. Also it is possible to pass SPBM team.
OVF parameters to the VA, which means it’s easier to control
[2] vStorage API specification, v10.4,
package installation process without regenerating the OVF image.
http://url.eng.vmware.com/xpe
With these optimizations, standard xVP VM can be deployed in
approximately 70sec. This is 5-10 times faster than SampleVP. [3] T10 SCSI Architecture Model, http://www.t10.org/cgi-
bin/ac.pl?t=f&f=sam4r14.pdf
Apart from this, xVP comes with a CLI tool (xvp-mgmt) to
manage xVP setup. It also comes with pre-canned configurations [4] T10 Object-Based Storage Devices,
as required by various QE teams. http://www.t10.org/drafts.htm#OSD_Family
[5] VMware View, http://www.vmware.com/products/view/
[6] VMware RDM whitepaper,
9. Current Status www.vmware.com/pdf/esx25_rawdevicemapping.pdf
[7] VMware SRM whitepaper,
Currently, the xVP 1.0 implements all VASA 2.0 (VVol 1.0) APIs http://www.vmware.com/files/pdf/techpaper/srm5-perf.pdf
and is actively used by all QE teams in CAT runs. Due to
limitations of SampleVP snapshot implementation, CAT runs for [8] VASA 1.0 specification
snapshot, clone and migration workflows used to frequently fail [9] NetApp architecture whitepapers (FAS, Virtual Storage Tier)
due to timeout errors caused by slow storage. With xVP 1.0 we http://www.netapp.com/us/library/white-papers.html
don’t see any of these intermittent timeout errors. Also overall full
[10] EMC VPLEX and FAST, www.emc.com/VPLEX
test cycle time have also improved w.r.t SampleVP.
[11] Understanding VMware Snapshots,
http://kb.vmware.com/kb/1015180

You might also like