You are on page 1of 30

Huawei FusionSphere 5.

1
Technical White Paper on
Reliability (Cloud Data Center)

Issue 1.0

Date 2015-04-15

HUAWEI TECHNOLOGIES CO., LTD.


Copyright © Huawei Technologies Co., Ltd. 2015. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior
written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.

Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.


Address: Huawei Industrial Base
Bantian, Longgang
Shenzhen 518129
People's Republic of China

Website: http://e.huawei.com

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential i


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) About This Document

About This Document

Purpose
This document describes the system reliability of FusionSphere.

Intended Audience
This document is intended for:
 Marketing engineers
 Sales engineers
 Distributors

Symbol Conventions
The symbols that may be found in this document are defined as follows:

Symbol Description

Indicates an imminently hazardous situation which, if not


avoided, will result in death or serious injury.

Indicates a potentially hazardous situation which, if not


avoided, could result in death or serious injury.

Indicates a potentially hazardous situation which, if not


avoided, could result in equipment damage, data loss,
performance deterioration, or unanticipated results.
Indicates a tip that may help you resolve a problem or save
time.
Provides additional information to emphasize or supplement
the main text.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential ii


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) About This Document

Change History
Changes between document issues are cumulative. The latest document issue contains all the
changes in earlier issues.

Issue 01 (2015-04-15)
This issue is the first official release.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential iii


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) Contents

Contents

About This Document....................................................................................................................ii


1 Overview of the FusionSphere Solution..................................................................................1
1.1 Logical Architecture of the FusionSphere System.........................................................................................................1

2 System Reliability.........................................................................................................................2
2.1 System Reliability Requirements...................................................................................................................................2
2.2 OpenStack HA................................................................................................................................................................2
2.3 Service Deployment Redundancy..................................................................................................................................3
2.4 Traffic Control................................................................................................................................................................4
2.5 Data Consistency Audit..................................................................................................................................................5
2.6 Black Box.......................................................................................................................................................................5
2.7 Protection from Zombies................................................................................................................................................5
2.8 Redundant Network Paths..............................................................................................................................................5
2.9 Plane-based Network Communication...........................................................................................................................6
2.10 Management Data Backup...........................................................................................................................................7
2.11 Global Time Synchronization.......................................................................................................................................8

3 FusionCompute Reliability.........................................................................................................9
3.1 VM Live Migration........................................................................................................................................................9
3.2 Storage Cold and Live Migration.................................................................................................................................10
3.3 VM HA.........................................................................................................................................................................11
3.4 VM Fault Isolation.......................................................................................................................................................12
3.5 Virtualized Deployment of Management Nodes..........................................................................................................13
3.6 Host Fault Recovery.....................................................................................................................................................13

4 FusionStorage Reliability..........................................................................................................14
4.1 Data Store Redundancy Design....................................................................................................................................14
4.2 Multi-Failure Domain Design......................................................................................................................................15
4.3 Data Security Design....................................................................................................................................................15
4.4 Strong Data Consistency..............................................................................................................................................16
4.5 NVDIMM Power Failure Protection............................................................................................................................17
4.6 I/O Traffic Control........................................................................................................................................................17
4.7 Disk Reliability.............................................................................................................................................................18
4.8 Metadata Reliability.....................................................................................................................................................19

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential iv


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) Contents

5 FusionManager Reliability........................................................................................................20
5.1 Active and Standby Management Nodes Architecture.................................................................................................20
5.2 System Monitoring.......................................................................................................................................................21
5.3 Data Consistency Between the Active and Standby Nodes..........................................................................................21

6 Network Reliability....................................................................................................................22
6.1 Multipathing Storage Access........................................................................................................................................22
6.2 NIC Load Balancing.....................................................................................................................................................23
6.3 Switch Stacking............................................................................................................................................................23
6.4 VRRP............................................................................................................................................................................23

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential v


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data Overview of the FusionSphere SolutionOverview of the
Center) FusionSphere Solution

1 Overview of the FusionSphere Solution

1.1 Logical Architecture of the FusionSphere System


Figure 1.1 outlines the logical architecture of the FusionSphere system.

Figure 1.1 FusionSphere logical architecture

Huawei FusionSphere solution consolidates multiple applications in the service system,


thereby improving the server utilization and system reliability, reducing purchase costs, and
increasing the maintenance efficiency. Using elastic hosts, this solution provides high-quality
pay-per-use elastic services. It virtualizes hardware computing, networks, and storage
resources in carrier and enterprise data centers and provides unified resource management and
scheduling.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 1


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) System ReliabilitySystem Reliability

2 System Reliability

2.1 System Reliability Requirements


To provide services with high reliability, a cloud computing environment must provide:
 High service availability to ensure service continuity
 High data durability to prevent data loss
FusionSphere uses diverse mechanisms to improve service availability and data durability.
Details are as follows:
 FusionSphere uses an architecture that provides redundancy for the entire system,
eliminating single points of failure (SPOF). The FusionSphere system supports
scheduled upgrades and capacity expansion and does not interrupt services when doing
so. By quickly detecting failures, FusionSphere can automatically isolate the failures and
recover the system, minimizing service downtime. FusionSphere also provides black
box, logging, and alarm reporting functions to help O&M personnel quickly locate and
rectify faults.
 FusionSphere stores multiple identical copies of metadata and service data and supports
data scan and automatic data restoration. The FusionSphere system provides multiple
backup mechanisms for securing management and service data.

2.2 OpenStack HA
OpenStack reliability is determined by the reliability of the following services
 Representational State Transfer (REST) API service reliability: provides continuous API
services for users.
 Database service reliability: ensures user configuration data security and integrity and
services continuity.
 Communication service reliability: ensures that different components properly interact
with one another.
All OpenStack services are deployed in active/active or active/standby mode for redundancy.
Figure 1.2 illustrates an example of OpenStack HA deployment.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 2


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) System ReliabilitySystem Reliability

Figure 1.2 OpenStack HA deployment

Details about OpenStack HA deployment include:


 HAProxy servers forward REST API service requests and are deployed in active/active
or active/standby mode.
 API Servers and Schedulers provide stateless services in load balancing mode and are
deployed in active/active mode.
 The GaussDB databases are deployed in active/standby mode.
 RabbitMQ servers are deployed in active/standby mode.
 Network nodes are deployed in active/active mode.
 Compute nodes are deployed in active/active mode.

2.3 Service Deployment Redundancy


BPS is used for system installation and deployment and consists of two components: Cloud
Boot Service (CBS) and Cloud Provisioning Service (CPS).
Both CBS and CPS use the client/server architecture. CBS manages host OS deployment, and
CPS manages host service deployment and upgrades (including host OS upgrades), and
monitors host status.
Figure 1.3 shows the BPS deployment in a cloud data center.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 3


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) System ReliabilitySystem Reliability

Figure 1.3 BPS deployment

A CPS client is deployed on each server to monitor services deployed on the server.
If the CPS client detects that a service is faulty, it automatically restarts the service.
If the CPS server does not receive a heartbeat message from a CPS client within a specified
time period, it regards that the server where the client is running is faulty. The loss of
heartbeat then automatically triggers the VM HA mechanism to restart all affected VMs on
other servers.
CPS servers are deployed in cluster mode, usually on three to seven servers. The Zookeeper
mechanism is used to elect the active CPS server. In addition to monitoring server running
status, the CPS server also provides the active/standby arbitration service.

2.4 Traffic Control


The traffic control mechanism helps the management node provide concurrent services of
high availability without system collapse duo to excessive traffic. Traffic control is enabled
for the Virtualization Resource Management (VRM) access points to prevent excessive loads
on the front end and enhance system stability. To prevent service failures due to excessive
traffic, this function is also enabled for each key internal process in the system, such as traffic
control on image download, authentication, VM services (including VM migration, VM high
availability, VM creation, hibernation, waking up, and stopping), and operation and
maintenance (O&M).

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 4


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) System ReliabilitySystem Reliability

2.5 Data Consistency Audit


FusionSphere automatically audits and restores key resource data, and supports timed and
manual audits for data of such key resources as VMs and volumes to help remove residual
data, thereby ensuring the data consistency. When detecting an exception, the system
automatically generates a report and an alarm, and provides maintenance instructions.

2.6 Black Box


FusionSphere virtualization software and virtualization management software support the
black box. The black box applies to management nodes and compute nodes with UVP
installed. It collects the last information of the system right before a breakdown, kernel panic,
or an abnormal reset, and backs it up to the local directory for future fault locating and data
analysis. The last information includes kernel logs and diagnosis information provided by
diagnosis tools.

2.7 Protection from Zombies


FusionSphere provides a protection mechanism against a zombie process. A zombie process is
a process that is running properly but does not provide services. The mechanism allows
FusionSphere to detect the zombie status of processes and restart the zombie processes
automatically, enabling them to provide services properly.

2.8 Redundant Network Paths


The FusionSphere network is divided into the core, aggregation, access, and virtual network
layers.
 Located at the core layer, core switches enable communication between different data
centers and connect FusionSphere to external networks. The core switch cluster is used
to provide redundant connections to aggregation switches in internal and external data
centers. Core switches interconnect with upper-layer devices using the Open Shortest
Path First (OSPF) protocol or static routing.
 Located at the aggregation layer, aggregation switches are deployed in the equipment
room of data centers to converge traffic from access switches in the same data center.
Aggregation switches implement layer 3 communication with core switches and layer 2
communication with access switches. Aggregation switches are stacked to provide
redundant connections to access switches in the data center and external core switches.
 Located at the access layer, access switches are installed in a cabinet and connected to
servers within the cabinet. Access switches can also be stacked to provide redundant
connections to aggregation switches and to the virtual network layer.
 The virtual network layer resides within a server and enables communication among
VMs running on the server and between VMs and external networks. At the virtual
network layer, two or more network interface cards (NICs) for a server are bound as a
logical NIC to prevent service interruptions caused by the failure of a single NIC.
Figure 1.4 outlines the configuration of redundant network paths.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 5


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) System ReliabilitySystem Reliability

Figure 1.4 Network path redundancy configuration

2.9 Plane-based Network Communication


The communication plane of FusionSphere can be divided into the management, storage,
service, and Intelligent Platform Management Interface (IPMI) planes.
 The management plane transmits API and RPC messages between different service
nodes.
 The service plane transmits VM service data.
 The storage plane transmits data from storage devices.
 The IPMI plane transmits server management messages.
FusionSphere divides the network into different planes to ensure data reliability and security.
Different planes are isolated using virtual local area networks (VLANs) so that the failure of a
single plane exerts no impact on other planes. For example, if the management plane becomes
faulty, the service plane can still be used to access VMs.
In addition, the FusionSphere system supports VLAN-based priority settings. By setting the
highest priority for internal management and control packets, the administrator and other
users can manage and control the system at any time.
Figure 1.5 outlines the plane-based network communication among servers, access switches,
and aggregation switches.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 6


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) System ReliabilitySystem Reliability

Figure 1.5 Plane-based network communication

Physical NICs of servers can be bound and categorized to allow the management, service, and
storage planes to use different logical NICs and to connect to different access switches,
thereby implementing physical network isolation.

2.10 Management Data Backup


Management data is stored in the FusionSphere system as multiple identical copies to ensure
high data reliability. Based on the redundancy mode and application scenarios, management
data can be backed up by:
 Data synchronization
Management nodes are deployed in active/standby mode. Changes to the database and
configuration files on the active node are synchronized to the standby node in real time.
If the active node is faulty, the standby node becomes the active node and takes over
services. Data synchronization prevents data loss caused by active/standby database
switchover.
 Data backup
Before performing an important operation, such as a system upgrade or critical data
modification, you can and should manually back up management data in the
FusionSphere system. The data backups can be used to restore the FusionSphere system
if an exception occurs or the operation has not achieved the expected results.
Data backups can be stored on both the control nodes and third-party storage devices,
including File Transfer Protocol (FTP) and File Transfer Protocol over SSL (FTPS)
servers and Universal Distributed Storage (UDS) devices.
Scheduled and manual backups are both supported. You can configure the maximum
number of data backups that can be stored, the time when a scheduled backup is
performed, and the directory on a third-party server where data backups are to be stored
based on service requirements.
You can use a data backup to restore FusionSphere control nodes to the state they were when
the data backup was created.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 7


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) System ReliabilitySystem Reliability

Figure 1.6 outlines the mechanism of management data backup.

Figure 1.6 Management data backup

2.11 Global Time Synchronization


FusionSphere provides an internal time synchronization mechanism to ensure time
consistency among internal components, such as management nodes, compute nodes,
FusionManager, and FusionStorage.
The FusionSphere system also supports an external Network Time Protocol (NTP) clock
source to ensure that the time of the entire system is precise and consistent. Global time
synchronization facilitates system maintenance and communication between different
components.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 8


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionCompute ReliabilityFusionCompute Reliability

3 FusionCompute Reliability

3.1 VM Live Migration


The system supports VM live migration without interrupting services. The cloud management
system creates an image for the VM on the destination server and synchronizes the data of the
VM with the source server. Data to be synchronized includes the status of the memory,
register, stack, vCPU, and storage device and dynamic information about virtual hardware.
The hypervisor enables rapid duplication of memory data to ensure memory synchronization
and prevents service interruption during VM migration. In addition, shared storage ensures
persistent data consistency before and after VM migration.
Figure 1.7 shows the mechanism of VM live migration.

Figure 1.7 VM live migration

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 9


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionCompute ReliabilityFusionCompute Reliability

Live migration allows VMs scattered on different servers to be migrated to only several
servers or one server when traffic is light. Then idle servers can be turned off. This reduces
costs for customers and saves energy.
VM live migration can ensure high reliability of the customer system. If a fault occurs on a
running physical machine, its services can be migrated to other properly running machines
before the situation turns worse.
Hardware can be upgraded online without interrupting services. Before upgrading a physical
machine, users can migrate all its VMs to other machines. After the upgrade is complete,
users can migrate the VMs back. In this way, services are not interrupted.
Typical application scenarios for VM live migration are as follows:
 Manual VM migration to any idle physical server as required
 Batch VM migration to any idle physical server based on the resource utilization

3.2 Storage Cold and Live Migration


FusionSphere offers cold migration and live migration for VM disks. Cold migration is to
move VM disks from one data store to another when the VM is stopped. Live migration is to
move VM disks from one data store to another without service interruption.

Figure 1.8 Mechanism of storage cold migration

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 10


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionCompute ReliabilityFusionCompute Reliability

Figure 1.9 Mechanism of storage live migration

3.3 VM HA
If the physical CNA server breaks down or restarts abnormally, the system can migrate the
VMs with high availability (HA) to other computing servers, ensuring rapid restoration of
VMs.
A cluster can house thousands of VMs. Therefore, if a computing server breaks down, the
system migrates VMs to different servers based on the network traffic and destination server
load to prevent network congestion and overload on the destination server.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 11


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionCompute ReliabilityFusionCompute Reliability

Figure 1.10 VM HA

After detecting that a computing node or VM becomes faulty, the VRM restarts the VM on a
properly running computing node based on the recorded VM information.
VM HA is triggered if the heartbeat connection between the VRM and CNA is disconnected
for 30 seconds, or a VM works abnormally suddenly.
The lockout mechanism at the storage layer prevents one VM instance from being started
concurrently on multiple CNAs.
CNA nodes can be recovered from power-off failures. Then, service processes are
automatically resumed after the restart, and the VMs that were running on the CNAs are
migrated to other computing nodes.

3.4 VM Fault Isolation


With the virtualization technology, one physical server can be virtualized into multiple VMs.
VMs are separated from each other. If a virtual machine fails, other virtual machines can still
work properly. User experience on virtual machines is the same as that on physical machines.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 12


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionCompute ReliabilityFusionCompute Reliability

Figure 1.11 Protocol stack in a hypervisor

All in all, any operation performed on a VM exerts no impact on other VMs of the same
physical server or the virtualization platform.

3.5 Virtualized Deployment of Management Nodes


The management software of the FusionSphere solution can be deployed to VMs, that is,
management nodes support virtualized deployment. Management node VMs support the
following functions besides redundancy, live migration, and HA:
 IP SAN storage+local storage, improving system reliability.
 Automatic startup upon host power-on. After a host is powered on, its VRM management
node VMs automatically start. If both active and standby management node VMs fail to
start, FusionManager provides VRM heartbeat detection and alarm sending, and
FusionCompute provides a tool for restoring the management nodes.

3.6 Host Fault Recovery


If an entire server or a computing node OS becomes faulty and cannot be recovered by
restarting the system or handling related alarms, users can replace the entire server, hard
disks, mainboards, NICs, or RAID cards of the node and restore services and configurations
with commands or one-click method.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 13


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionStorage ReliabilityFusionStorage Reliability

4 FusionStorage Reliability

By deeply integrating computing and storage, Huawei-developed distributed storage software,


FusionStorage, delivers optimal performance, sound reliability, and high cost-effectiveness. It
can be deployed on a server to consolidate local disks on all servers into a virtual storage
resource pool. Therefore, FusionStorage can completely take the place of external storage area
network (SAN) device in some scenarios.

4.1 Data Store Redundancy Design


User data can be backed up in two or three copies, ensuring data reliability. As shown in
Figure 1.12, three nodes form a resource pool, in which storage data is backed up as two
copies. The active data copy is saved on one node and the standby data copies are evenly
distributed on the other two nodes. In this case, data will not lose in the event of single points
of failure.

Figure 1.12 Data saved as two copies in the FusionStorage system

 In the dual-copy scenario, if a disk becomes faulty in a FusionStorage resource pool, data
of the entire system will not lose, and services still run properly.
 In the three-copy scenario, if two disks become faulty concurrently in a FusionStorage
resource pool, data of the entire system will not lose, and services still run properly.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 14


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionStorage ReliabilityFusionStorage Reliability

 The system data persistence reaches 99.99% in the dual-copy scenario and 99.99999% in
the three-copy scenario.

4.2 Multi-Failure Domain Design


A FusionStorage resource pool is a failure domain by default. The active and standby data
copies are located in the same failure domain.
In the multiple failure domain scenario, a disk failure in a failure domain has no adverse
impact on data copies in other failure domains. As shown in Figure 1.13, if two resource pools
are created in the FusionStorage system, two independent failure domains are available. If one
hard disk in each of the two resource pools (failure domains) becomes faulty, two or three
points of failure will not occur simultaneously in the resource pools. To be more specific, data
in the entire system will not be lost.

Figure 1.13 Multi-failure domains in the FusionStorage system

4.3 Data Security Design


In a FusionStorage resource pool, data store DR can be configured at the server or rack level,
which effectively reduces the failures that may concurrently occur on dual-copy or three-copy
disks.
 Server-based security: The standby server data copy is distributed only on another server.
Any disk fault on the same server does not cause data loss in the system, ensuring
service continuity. The server-based security level is the default value. Figure 1.14 shows
the copy distribution by server.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 15


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionStorage ReliabilityFusionStorage Reliability

Figure 1.14 Server-based security of FusionStorage

 Rack-based security: The standby rack data copy is distributed only on the nodes of other
racks. Any blade or disk fault on the same rack does not cause data loss in the system,
ensuring service continuity. Figure 1.15 shows the copy distribution by rack.

Figure 1.15 Rack-based security of FusionStorage

4.4 Strong Data Consistency


The strong consistency and replication protocol is used to ensure the consistency of multiple
data copies. Only after all copies are successfully written to the disk, the system prompts for

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 16


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionStorage ReliabilityFusionStorage Reliability

the data write success. In most cases, FusionStorage ensures that data read from any copy is
the same. If a disk is faulty temporarily, FusionStorage does not write data into the copies on
the disk until the disk fault is rectified. Then FusionStorage restores data in the copies and
continues to write data into them. If a disk can be no longer used, FusionStorage removes it
from the cluster and finds another available disk for copies. Using the rebalance mechanism,
FusionStorage can balance data distribution among all disks.
Figure 1.16 outlines how strong data consistency is implemented.

Figure 1.16 Strong data consistency

4.5 NVDIMM Power Failure Protection


To improve the performance of any storage system, data should be first stored in memory and
then written onto disks.
To ensure the reliability of data stored in memory and to prevent data loss caused by power
failures, FusionStorage introduces support for the non-volatile dual in-line memory module
(NVDIMM). NVDIMM provides fast data access and ensures data integrity in the event of
power failures.

4.6 I/O Traffic Control


FusionStorage provides various I/O services, including:
 Read and write operations on upper-layer applications
 Snapshot and remote data replication services
 Data reconstruction service in the event of disk failures
 Scheduled disk scan service
Disk I/O congestion occurs and I/O latency increases if the I/O service traffic exceeds system
capacity or a large amount of data needs to be migrated during system capacity expansion, or
during data reconstruction in the event of a disk failure.
FusionStorage supports I/O traffic control. If I/O traffic is overloaded, FusionStorage
preferentially guarantees the processing of high-priority services by temporarily stopping

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 17


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionStorage ReliabilityFusionStorage Reliability

services with low priorities based on the traffic control algorithm and policies. Based on the
volume of overloaded traffic, the FusionStorage traffic control module adjusts the amount of
I/O traffic processed each time to ensure that the system workloads are within the normal
range.

4.7 Disk Reliability


FusionStorage supports hard disk Smart detection, slowly-rotating/fast-rotating disk detection,
disk SCSI fault handling, hard disk hot swap and identification handling, and disk scan.
FusionStorage allows upper-layer services to conduct read repair, remove or reconstruct a
disk, mark a bad block, scan valid data in disks, handle errors caused by the exceeded Smart
threshold, and handle slowly-rotating disks (removing disks after pre-reconstruction) based on
relevant I/O errors indicated by the SmartData and the disk status information.
 Read Repair
When failing to read data, FusionStorage automatically determines the failure type. If
data fails to be read from a disk sector, FusionStorage retrieves the data from other
copies and writes the data back into the original disk sector. Read repair can be used to
repair most sector read failures. However, if the failure persists, select another disk for
storing the data copy and remove the failed disk from the cluster.
 Block Status Table (BST)
If a bad track exists when the system scans disks or reads data, an error in operation
(EIO) is reported. Then the system attempts to read data from another copy and to write
data back into the original disk track. However, if the other copy is unavailable, the
system needs to mark the bad block in the BST and restore the lost data in the bad block
using upper-layer applications.
 Disk removal and reconstruction
Smart Data can detect disk errors, such as WP, ABRT, and DF errors, and report a special
EIO to FusionStorage for disk removal. If only one copy is available, FusionStorage
refuses to remove the disk but performs the procedures for handling the dual-disk failure.
If two copies are available, FusionStorage removes the disk and reconstructs data.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 18


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionStorage ReliabilityFusionStorage Reliability

 Disk scan for valid data


FusionStorage scans disks to prevent silent data corruption. If the scan fails due to a bad
track (an extended EIO is reported), FusionStorage performs a fine-grained scan to
locate the bad sector and performs read repair. If the read repair fails, FusionStorage then
marks the bad block in the BST.
 Handling exceeded Smart threshold and slowly-rotating disks (construction and disk
removal)
When detecting that the Smart value of a disk exceeds the threshold or that a disk rotates
too slowly, FusionStorage first migrates the primary partition away and constructs
another copy. (If the system already has two data copies, it then has three copies.) After
the copy is constructed, FusionStorage removes the disk whose Smart value exceeds the
threshold or removes the disk that rotates slowly.

4.8 Metadata Reliability


The metadata of volumes and snapshots is stored in two metadata volumes, each of which has
two copies. The system therefore contains altogether four copies, thereby ensuring high
reliability of the metadata.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 19


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionManager ReliabilityFusionManager Reliability

5 FusionManager Reliability

5.1 Active and Standby Management Nodes Architecture


The FusionSphere management nodes work in active/standby mode. The active node provides
services through the floating IP address.
The active and standby management nodes use the heartbeat detection mechanism. The
standby node monitors the health status of the active node in real time. If the standby node
detects that the active node process is faulty or the OS on the active node or on the host breaks
down, the standby node takes over service processing.
During the switchover, the floating IP address is configured and the MAC address is updated
on the gateway. All processes monitored by the original active node start on the standby node
and provide services.
Figure 1.17 outlines the active and standby management nodes architecture.

Figure 1.17 Active and standby management nodes

Management nodes work in active and standby mode to manage services in the system. If
both active and standby management nodes become faulty, new services will be adversely
affected. For example, VM creation and deletion operations cannot be performed. However,
VMs running on the system can still be used, and the VM users will not be aware of the
failure.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 20


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) FusionManager ReliabilityFusionManager Reliability

5.2 System Monitoring


The FusionSphere system provides fault detection and alarm reporting functions. It also
displays detected faults on portals.
When a cluster is running, FusionSphere provides a portal that enables visibility for cluster
management. The portal can be used to check whether system loads are unbalanced, whether a
process is out of control, or whether hardware performance deteriorates.
To improve overall system performance, users then can adjust and allocate system resources
based on the detected problems. Users can also view historical records to obtain information
about daily, weekly, and even annual hardware resource consumption of clusters.
By running a probe program on each monitored node and customized VMs, the FusionSphere
system collects key indicators, such as CPU usage, network traffic, and memory data, of the
monitored node or VM.
FusionSphere can also detect system exceptions, such as process breakdown, management
and storage link faults, node breakdown, and system resource overload.
In addition, FusionSphere provides a set of tools for technical support and maintenance
engineers to monitor the general health of the system. The tools can be used to perform health
checks for system components and generate health check reports. The tools apply to site
deployment, preventive maintenance, and system upgrades.

5.3 Data Consistency Between the Active and Standby


Nodes
Control nodes use GaussDB databases working in active/standby mode. The active database
performs data read and write operations when the FusionSphere system is running properly.
If data in the active database changes, the changes will be synchronized to the standby
database. To ensure consistent database performance, asynchronous synchronization is
performed. The synchronization mechanism prevents data loss caused by active/standby
database switchovers.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 21


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) Network ReliabilityNetwork Reliability

6 Network Reliability

6.1 Multipathing Storage Access


Computing nodes support the redundancy deployment of storage Initiators modules, and VMs
on these nodes can access the multipathing storage using standard protocols, such as iSCSI.
The load balancing, switch stacking, and clustering technologies of multiple NICs provide
physical redundant storage paths.
Figure 1.18 illustrates the multipathing storage access.

Figure 1.18 Multipathing storage access

Figure 1.18 describes the multipathing access process when computing nodes communicate
with storage nodes. Any VM can provide at least two paths for its attached virtual volumes.
The multipathing software is used to control multipathing access and implements service
switchover upon failures. This prevents service interruption that may be caused by single
points of failure and ensures system reliability.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 22


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) Network ReliabilityNetwork Reliability

6.2 NIC Load Balancing


NICs on physical servers work in bonding mode for higher reliability and load balancing. Two
or more NICs are bound as one logical NIC to balance traffic to servers across the NICs.
NIC bonding prevents a NIC from being overloaded, improves network performance in the
event of traffic burst, and ensures stable and smooth access to servers.
If one NIC becomes faulty, other NICs in the bonding team immediately take over its service
processing. NIC bonding avoids service interruptions caused by the failures of a single NIC
and link failures.
NIC bonding provides the following benefits to the table:
 Bandwidth aggregation for load sharing and high inbound and outbound bandwidths
 Traffic failover to prevent connectivity loss in the event of a network component failure

6.3 Switch Stacking


Two or more stackable switches in the same location can be stacked using stacking cables or
high-speed uplink ports to form a reliable logical switch. S5300 access switches are stacked
through stacking ports.
Switch stacking improves switch reliability, enables centralized switch management and
maintenance, and reduces maintenance costs.
The stacking technology empowers two stacked physical switches to act as one switch, with
no trunk interface configured between them. The two physical switches in a stacking group
work in active/standby mode. If one switch is faulty, the other switch takes over service
processing.
Before being stacked, each switch is a standalone entity with its own IP address and needs to
be separately managed.
The stacking technology enables two or more switches to work as a logical switch with only
one IP address. The IP address can be used to manage and maintain all switches in the
stacking group.
The stacking protocol is used to elect the active, standby, and slave switches to implement
data backup and switchover between the active and standby switches.
Switches can be connected in a ring or link topology. The stacking management protocol is
used to elect the active switch for stacking system management, including allocating IDs to
stack member, collecting information about the stacking topology, and sends the topology
information to stack members.
The active switch also designates its standby switch, which can serve as the active switch to
manage the stacking group if the original active switch becomes faulty.

6.4 VRRP
The Virtual Router Redundancy Protocol (VRRP) is a fault tolerance protocol. With this
protocol used, several routers can be grouped as a virtual router. If the next-hop switch of a

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 23


Copyright © Huawei Technologies Co.,
Ltd.
Huawei FusionSphere 5.1
Technical White Paper on Reliability (Cloud Data
Center) Network ReliabilityNetwork Reliability

host becomes faulty, other switches rapidly take over services, ensuring service continuity and
reliability.
VRRP combines a group of routers in a LAN as a VRRP backup group, which is equivalent to
a virtual router.
A virtual IP address is next configured for the virtual router. Hosts in the LAN need to know
only the virtual IP address instead of the IP addresses of each specific router. After the default
gateway of the host is set to the virtual IP address, the host can use the virtual gateway to
communicate with external networks.
VRRP dynamically associates the virtual router with a physical device that transmits service
packets. If the physical device is faulty, VRRP selects a new device to take over service
transmission.
This entire process is invisible to users, and the VRRP enables continuous communication
between internal and external networks.

Issue 1.0 (2015-04-15) Huawei Proprietary and Confidential 24


Copyright © Huawei Technologies Co.,
Ltd.

You might also like