You are on page 1of 547
© 2011 NetApp. All rights reserved. 1
© 2011 NetApp. All rights reserved. 2
© 2011 NetApp. All rights reserved. 3
© 2011 NetApp. All rights reserved. 4
© 2011 NetApp. All rights reserved. 5
© 2011 NetApp. All rights reserved. 6
© 2011 NetApp. All rights reserved. 7
© 2011 NetApp. All rights reserved. 8
© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.
In 1992, NetApp introduced Data ONTAP and ushered in the network-attached storage industry. Since then, NetApp

In 1992, NetApp introduced Data ONTAP and ushered in the network-attached storage industry. Since then, NetApp has continued to add features and solutions to its product portfolio to meet the needs of its customers. In 2004, NetApp acquired Spinnaker Networks® in order to fold its scalable Clustered file system technology into Data ONTAP. That plan came to fruition in 2006 as NetApp released Data ONTAP GX, the first Clustered product from NetApp. NetApp also continued to enhance and sell Data ONTAP 7G.

Having two products provided a way to meet the needs of the NetApp customers who were happy with the classic Data ONTAP, while allowing customers with certain application requirements to use Data ONTAP GX to achieve even higher levels of performance, and with the flexibility and transparency afforded by its scale-out architecture.

Although the goal was always to merge the two products into one, the migration path for Data ONTAP 7G customers to get to Clustered storage would eventually require a bi g lea p. Enter Data ONTAP 8.0. The goal for Data ONTAP 8.0 was to create one code line that allows Data ONTAP 7G customers to operate a Data ONTAP 8.0 7-Mode system in the manner in which they’re accustomed, while also providing a first step in the eventual move to a Clustered environment. Data ONTAP 8.0 Cluster-Mode allows Data ONTAP GX customers to upgrade and continue to operate their Clusters as they’re accustomed.

.

© 2011 NetApp. All rights reserved.
© 2011 NetApp. All rights reserved. 5
© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

The direct link to “What is a cluster” VOD is available at : http://netappenged.vportal.net/?auid=1000 © 2011

The direct link to “What is a cluster” VOD is available at :

http://netappenged.vportal.net/?auid=1000

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.
© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.
© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved. 16
Vserver - A vserver is an object that provides network a ccess through unique network addresses,

Vserver - A vserver is an object that provides network access through unique network addresses, that may serve data out of a distinct namespace, and that is separately administerable from the rest of the cluster. There are three types of vservers: cluster, admin, node.

Cluster Vserver - A cluster vserver is the standard data serving vserver in cluster-mode. It is the successor to the vserver of GX. It has both data and (optional) admin LIFs, and also owns a namespace with a single root. It has separate administrative domains, such as kerberos realms, NIS domains, etc. and can live on separate virtual networks from other vservers.

Admin Vserver - Previously called the "C-server", the admin vserver is a special vserver that does not provide data access to clients or hosts. However, it has overall administrative access to all objects in the cluster, including all objects owned by other vservers.

Node Vserver - A node vserver is restricted to operation in a single node of the cluster at any one time, and provides administrative and data access to 7-mode objects owned by that node. The objects owned by a node vserver will failover to a partner node when takeover occurs. The node vserver is equivalent to the pfiler, also known as vfiler0 on a particular node. In 7G systems, it is commonly called the "filer".

This example shows many of the key resources in a cluster. There are three types of

This example shows many of the key resources in a cluster. There are three types of virtual servers, plus nodes, aggregates, volumes, and namespaces.

Notice the types of vservers. Each node in the Cluster automatically has a node vserver created

Notice the types of vservers. Each node in the Cluster automatically has a node vserver created to represent it. The administration vserver is automatically created when the Cluster is created. The Cluster vservers are created by the administrator to build global namespaces.

© 2011 NetApp. All rights reserved.
© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved. : 22

© 2011 NetApp. All rights reserved.

:

22

© 2011 NetApp. All rights reserved. 23
© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.
© 2011 NetApp. All rights reserved. 26
© 2011 NetApp. All rights reserved.
Physical things can be touched and seen , like nodes, disks, and ports on those nodes.

Physical things can be touched and seen, like nodes, disks, and ports on those nodes.

Logical things cannot be touched, but they do exist and take up space. Aggregates are logical groupings of disks. Volumes, Snapshot copies, and mirrors are areas of storage carved out of aggregates. Clusters are groupings of physical nodes. A virtual server is a virtual representation of a resource or group of resources. A logical interface is an IP address that is associated with a single network port.

A cluster, which is a physical entity, is made up of other physical and logical pieces. For example, a cluster is made up of nodes, and each node is made up of a controller, disks, disk shelves, NVRAM, etc. On the disks are RAID groups and aggregates. Also, each node has a certain number of physical network ports, each with its own MAC address.

© 2011 NetApp. All rights reserved. 29
© 2011 NetApp. All rights reserved.
© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

Please refer to your Exercise Guide for more instructions. © 2011 NetApp. All rights reserved.

Please refer to your Exercise Guide for more instructions.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved. 1
© 2011 NetApp. All rights reserved. 2
Cluster-Mode supports V-Series systems. As such, the setup will be a little different when using V-Series.

Cluster-Mode supports V-Series systems. As such, the setup will be a little different when using V-Series.

© 2011 NetApp. All rights reserved. 4
Each controller should have a console connection, which is needed to get to the firmware and

Each controller should have a console connection, which is needed to get to the firmware and to get to the boot menu (for the setup, install, and init options, for example). A Remote LAN Module (RLM) connection, although not required, is very helpful in the event that you cannot get to the UI or console. It allows for remote rebooting and forcing core dumps, among other things.

Each node must have at least one connection (ideally, two connections) to the dedicated cluster network. Each node should have at least one data connection, although these data connections are only necessary for client access. Because the nodes will be clustered together, it’s possible to have a node that participates in the cluster with its storage and other resources, but doesn’t actually field client requests. Typically, however, each node will have data connections.

The cluster connections must be on a network dedicated to cluster traffic. The data and management connections must be on a network that is distinct from the cluster network.

There is a large amount of cabling to be done with a Data ONTAP 8.0 cluster.

There is a large amount of cabling to be done with a Data ONTAP 8.0 cluster. Each node has NVRAM interconnections to its HA partner, and each node has Fibre Channel connections to its disk shelves and to those of its HA partner.

This is standard cabling, and is the same as Data ONTAP GX and 7-Mode.

For cabling the network connections, the follow things must be taken into account: •Each node is

For cabling the network connections, the follow things must be taken into account:

•Each node is connected to at least two distinct networks; one for management (UI) and data access (clients), and one for intra-cluster communication. Ideally, there would be at least two cluster connections to each node in order to create redundancy and improve cluster traffic flow.

•The cluster can be created without data network connections but not without a cluster network connection. •Having more than one data network connection to each node creates redundancy and improves client traffic flow.

© 2011 NetApp. All rights reserved. 8
© 2011 NetApp. All rights reserved. 9
To copy flash0a to flash0b, run flash flash0a flash0b . To “flash” (put) a new image

To copy flash0a to flash0b, run flash flash0a flash0b. To “flash” (put) a new image onto the primary flash, you must first configure the management interface. The -auto option of ifconfig can be used if the management network has a DHCP/BOOTP server. If it doesn’t, you’ll need to run ifconfig <interface> -addr=<ip> -mask=<netmask> -gw=<gateway>. After the network is configured, make sure you can ping the IP address of the TFTP server that contains the new flash image. To then flash the new image, run flash tftp://<tftp_server>/<path_to_image> flash0a.

The environment variables for Cluster-Mode can be set as follows:

•set-defaults •setenv ONTAP_NG true •setenv bootarg.init.usebootp false •setenv bootarg.init.boot_clustered true

ONTAP 8.0 uses an environment variable to determine whic h mode of operation to boot with.

ONTAP 8.0 uses an environment variable to determine which mode of operation to boot with. For Cluster-Mode the correct setting is:

LOADER> setenv bootarg.init.boot_clustered true If the environment variable is unset, the controller will boot up in 7-Mode

© 2011 NetApp. All rights reserved. 12
© 2011 NetApp. All rights reserved. 13
© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

The time it takes to initialize the disks is based on the size of one of

The time it takes to initialize the disks is based on the size of one of the disks, not on the sum capacity of the disks, because all disks are initialized in parallel with each other. Once the disks are initialized, the node’s first aggregate and its vol0 volume will be automatically created.

After the reboot, if the node stops at the firmware prompt by itself (which will happen

After the reboot, if the node stops at the firmware prompt by itself (which will happen if the firmware environment variable AUTOBOOT is set to false), type boot_primary to allow it to continue to the boot menu. If AUTOBOOT is set to true, the node will go straight to the boot menu.

When using TFTP, beware of older TFTP servers that have limited capabilities and may cause installation failures.

The setup option on the boot menu configures the local info rmation about this node, such

The setup option on the boot menu configures the local information about this node, such as the host name, management IP address, netmask, default gateway, DNS domain and servers, and so on.

© 2011 NetApp. All rights reserved. 18
© 2011 NetApp. All rights reserved. 19
© 2011 NetApp. All rights reserved. 20
Autoconfig is still somewhat inflexible because: it doesn’t allow you to choose the host names of

Autoconfig is still somewhat inflexible because: it doesn’t allow you to choose the host names of the nodes, only two cluster ports can be configured, the cluster ports are fixed (always the same), and the cluster IPs are out of sequence. As such, NetApp recommends that cluster joins be done manually.

© 2011 NetApp. All rights reserved.

The first node in the cluster will perform the "cluster creat e" operation. All other nodes

The first node in the cluster will perform the "cluster create" operation. All other nodes will perform a "cluster join" operation. Creating the cluster also defines the cluster-management LIF. The cluster-management LIF is an administrative interface used for UI access and general administration of the cluster. This interface can failover to data- role ports across all the nodes in the cluster, using pre-defined failover rules (clusterwide).

The cluster network is an isolated, non-routed subnet or VLAN, separate from the data or management networks, so using non-routable IP address ranges is common and recommended.

Using 9000 MTU on the cluster network is highly recommended, for performance and reliability reasons. The cluster switch or VLAN should be modified to accept 9000 byte payload frames prior to attempting the cluster join/create.

After a cluster has been created with one node, the administrator must invoke the cluster join

After a cluster has been created with one node, the administrator must invoke the cluster join command on each node that is going to join the cluster. To join a cluster, you need to know a cluster IP address of one of the nodes in the cluster, and you need some information that is specific to this joining node.

The cluster join operation ensures that the root aggregates are uniquely named. During this process, the first root aggregate will remain named "aggr0" and subsequent node root aggregates will have the hostname appended to the aggregate name as "aggr0_node01". For consistency, the original aggregate should also be renamed to match the naming convention, or renamed as per customer requirements.

When the storage controller(s) that were unjoined from the cluster are pow ered back, they will

When the storage controller(s) that were unjoined from the cluster are powered back, they will display information about the cluster that it previously belonged to.

© 2011 NetApp. All rights reserved. 25
The base Cluster-Mode license is fixed and cannot be inst alled as a temporary/expiring license. The

The base Cluster-Mode license is fixed and cannot be installed as a temporary/expiring license. The base license determines the cluster serial number and is generated for a specific node count (as are the protocol licenses). The base license can also be installed on top of an existing base license as additional node count are purchased. If a customer purchases a 2-node upgrade for their current 2 node cluster, they will need a 4-node base licenses for the given cluster serial number. The licenses are indexed on the NOW by the *cluster* serial number, not the node serial number.

By default, there are no feature licenses installed on an 8.0 Cluster-Mode system as shipped from the factory. The cluster create process installs the base license, and all additional purchased licenses can be found on the NOW site

© 2011 NetApp. All rights reserved. 27
The controllers will default to GMT timezone. Modify the date, time and timezone using the system

The controllers will default to GMT timezone. Modify the date, time and timezone using the system date command.

While configuring the NTP is not a hard requirement for NFS only environments, it is for a cluster with the CIFS protocol enabled and a good idea in most environments. If there are time servers available in the customer environment, the cluster should be configured to sync up to them.

Time synchronization can take some time, depending on the skew between the node time and the reference clock time.

© 2011 NetApp. All rights reserved. 29
Please refer to your Exercise Guide for more instructions. © 2011 NetApp. All rights reserved. 30

Please refer to your Exercise Guide for more instructions.

© 2011 NetApp. All rights reserved. 31
© 2011 NetApp. All rights reserved. 1
© 2011 NetApp. All rights reserved. 2
Although the CLI and GUI interfaces are different, they bot h provide access to the same

Although the CLI and GUI interfaces are different, they both provide access to the same information, and both have the ability to manage the same resources within the cluster. All commands are available in both interfaces. This will always be the case because both interfaces are generated from the same source code that defines the command hierarchy.

The hierarchical command structure is made up of command directories and commands. A command directory may contain commands and/or more command directories. Similar to a typical file system directory and file structure, the command directories provide the groupings of similar commands. For example, all commands for storage-related things fall somewhere within the storage command directory. Within that directory, there are directories for disk commands and aggregate commands. The command directories provide the context that allows similar commands to be used for different objects. For example, all objects/resources are created using a create command, and removed using a delete command, but the commands are unique because of the context (command directory) in which they’re used. So, storage aggregate create is different from network interface create.

There is a cluster login, by way of the cluster management LIF. There is also a login capability for each node, by way of the node management LIF for each node.

© 2011 NetApp. All rights reserved. 4
© 2011 NetApp. All rights reserved. 5
The preferred way to manage the cluster is to log in to the clustershell by way

The preferred way to manage the cluster is to log in to the clustershell by way of the cluster management LIF IP address, using ssh. If a node is experiencing difficulties and cannot communicate with the rest of the cluster, the node

management LIF of a node can be used. And if the node management LIF cannot be used, then the Remote LAN Module (RLM) interface can be used.

© 2011 NetApp. All rights reserved. 7
This diagram shows the software stack making up Data ONTAP 8.0 Cluster-Mode. The most obvious difference

This diagram shows the software stack making up Data ONTAP 8.0 Cluster-Mode. The most obvious difference between this stack and the 7-Mode stack is the addition of a networking component called the N-blade, and more logical interfaces (LIFs). Also, notice that Cluster-Mode does not yet support the SAN protocols (FC and iSCSI).

The N-blade is the network blade. It translates between the NAS protocols (NFS and CIFS) and the SpinNP protocol that the D-blade uses. SpinNP is the protocol used within a cluster to communicate between N-blades and D-blades. In Cluster-Mode, the D-blade does not service NAS or SAN protocol requests.

© 2011 NetApp. All rights reserved.

8

Data ONTAP GX had one management virtual interface on eac h node. Cluster-Mode still has that

Data ONTAP GX had one management virtual interface on each node. Cluster-Mode still has that concept, but it’s called a “node management” LIF. Like the management interfaces of Data ONTAP GX, the node management LIFs do not fail over to other nodes.

Cluster-Mode introduces a new management LIF, called the “cluster management” LIF, that has failover and migration capabilities. The reason for this is so that regardless of the state of each individual node (rebooting after an upgrade, halted for hardware maintenance), there is a LIF addres s that can always be used to manage the cluster, and the current node location of that LIF is transparent.

The two “mgmt1” LIFs that are shown here are the no de management LIFs, and are

The two “mgmt1” LIFs that are shown here are the node management LIFs, and are each associated with their respective node virtual servers (vservers).

The one cluster management LIF, named “clusmgmt” in this example, is not associated with any one node vserver, but rather is associated with the admin vserver, called “hydra,” which represents the entire physical cluster.

Nodeshell is accessible only via run -node from within the clustershell Has visibility to only those

Nodeshell is accessible only via run -node from within the clustershell Has visibility to only those objects that are attached to the given controller

Like Hardware, disks, aggregates, volumes, and things inside volumes like snapshots and qtrees.Both 7-Mode and Cluster-Mode volumes on that controller are visible

© 2011 NetApp. All rights reserved. 12
In these examples, the hostname command was invoked from the UI of one no de, but

In these examples, the hostname command was invoked from the UI of one node, but actually executed on the other node. In the first example, the command was invoked from the clustershell. In the second example, the administrator entered the nodeshell of the other node, and then ran the command interactively.

The FreeBSD shell is only to be used internally for ONTAP development, and in the field

The FreeBSD shell is only to be used internally for ONTAP development, and in the field for emergency purposes (e.g., system diagnostics by trained NetApp personnel). All system administration and maintenance commands must be made available to customers via the cluster shell.

Access to the systemshell is not needed as much as it wa s in Data ONTAP

Access to the systemshell is not needed as much as it was in Data ONTAP GX because many of the utilities that only ran in the BSD shell have now been incorporated into the clustershell.

But there are still some reasons why the systemshell may need to be accessed. No longer can you log in to a node or the cluster as “root” and be placed directly to the systemshell. Access to the systemshell is limited to a user named “diag,” and the systemshell can only be entered from within the clustershell.

The FreeBSD shell is accessible via the diag user account. FreeBSD access will not have a

The FreeBSD shell is accessible via the diag user account.

FreeBSD access will not have a default password and the diag account is disabled by default. The account can only be enabled by the customer by explicitly setting a password from a privileged ONTAP account.

Diag passwords have two states:

blocked: which means there is no password and no one can log into diag enabled: which means there is a password and one can log into diag

By default, diag is blocked. This default applies both on standalone nodes and clusters.

Element Manager is the web based user interface for administration of the cluster. All the operations

Element Manager is the web based user interface for administration of the cluster. All the operations which can be done using the CLI, ZAPI etc can be done using this interface.

To use the Element Manager point to a web browser to the URL- http://<cluster_management_ip >/

SMF and RDB provide the basis for single system image administration of a cluster in the

SMF and RDB provide the basis for single system image administration of a cluster in the M-host. SMF provides the basic command framework and the ability to route commands to different nodes within the cluster. RDB provides the mechanism for maintaining cluster-wide data.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved. 19
Please refer to your Exercise Guide for more instructions. © 2011 NetApp. All rights reserved. 20

Please refer to your Exercise Guide for more instructions.

The clustershell has features similar to the tcsh shell that is popular on UNIX® machines, such

The clustershell has features similar to the tcsh shell that is popular on UNIX® machines, such as the ability to pull previous commands out of a command history buffer, then optionally edit those commands and reissue them. The command editing is very similar to tcsh and Emacs editing, with key combinations like Ctrl-a and Ctrl-e to move the cursor to the beginning and end of a command, respectively. The up and down arrows allow for cycling through the command history.

Simple online help also is available. The question mark (?) can be used almost anywhere to get help within whatever context you may find yourself. Also, the Tab key can be used in many of the same contexts to complete a command or parameter in order to reduce the amount of typing you have to do.

The clustershell uses named parameters for every command. Every command directory, command, and parameter can be

The clustershell uses named parameters for every command.

Every command directory, command, and parameter can be abbreviated to the extent that it remains unambiguous within that context. For example, from the top level, the storage aggregate show command can be abbreviated to be as short as sto a s. On the other hand, the network interface show command can be abbreviated as n i s.

Commands can be run out of context. If we’re at the top level of the command hierarchy and type disk show, the shell will run the storage disk show command, because it was able to resolve the disk command as being unique within the whole command hierarchy. Likewise, if you simply type disk and hit ENTER, you’ll be put into the storage disk command directory. This will work even if you’re in an unrelated command directory, say in the network interface directory.

The clustershell supports queries and UNIX-style patterns and wildcards to enable you to match multiple values of particular parameters. A simple example would be if you have a naming convention for volumes, such that every volume owned by the Accounting department is named with a prefix of “acct_”, you could show only those volumes using volume show –vserver * –volume acct_*. This will show you all volumes beginning with “acct_” and on

all vservers. If you want to further limit your query to volumes that have more than 500 GB of data, you could do something like: volume show –vserver * -volume acct_* -used >500gb.

These are the command directories and commands avail able at the top level of the command

These are the command directories and commands available at the top level of the command hierarchy.

This demonstrates how the question mark is used to s how the available commands and command

This demonstrates how the question mark is used to show the available commands and command directories at any level.

This demonstrates how the question mark is used to show the required and optional parameters. It

This demonstrates how the question mark is used to show the required and optional parameters. It can also be used to show the valid keyword values that are allowed for parameters that accept keywords.

The Tab key can be used to show other directories, commands, and parameters that are available, and can complete a command (or a portion of a command) for you.

This is the initial page that comes up when logging in to the Element Manager. It’

This is the initial page that comes up when logging into the Element Manager. It’s a dashboard view of the performance statistics of the entire cluster. The left pane of the page contains the command directories and commands. When there is a “+” beside a word, it can be expanded to show more choices. Not until you click an object at the lowest level will the main pane switch to show the desired details.

Notice the expansion of the STORAGE directory in the left pane. © 2011 NetApp. All rights

Notice the expansion of the STORAGE directory in the left pane.

This shows the further expansion of the aggregate directory within the STORAGE directory. The main pane

This shows the further expansion of the aggregate directory within the STORAGE directory. The main pane continues to show the Performance Dashboard.

After selecting “manage” on the left pane, all the aggregates are listed. Notice the double arrow

After selecting “manage” on the left pane, all the aggregates are listed. Notice the double arrow to the left of each aggregate. Clicking that will reveal a list of actions (commands) that can be performed on that aggregate.

This shows what you see when you click the arrow for an aggregate to reveal the

This shows what you see when you click the arrow for an aggregate to reveal the storage aggregate commands. The “modify” command for this particular aggregate is being selected.

The “modify” action for an aggregate brings up this page. You can change the state, the

The “modify” action for an aggregate brings up this page. You can change the state, the RAID type, the maximum RAID size, or the high-availability policy. Also, from the “Aggregate” drop-down menu, you can select a different aggregate to work on without going back to the previous list of all the aggregates.

This shows the set adv command (short for set -privilege advanced ) in the clustershell. Notice

This shows the set adv command (short for set -privilege advanced) in the clustershell. Notice the options available for the storage directory before (using the admin privilege) and after (using the advanced privilege), where firmware is available.

Note that the presence of an asterisk in the command prompt indicates that you are not currently using the admin privilege.

This page, selected by clicking PREFERENCES on the left pane, is how you would change the

This page, selected by clicking PREFERENCES on the left pane, is how you would change the privilege level from within the GUI.

The privilege level is changed only for the user and interface in which this change is made, that is, if another admin user is using the clustershell, that admin user’s privilege level is independent of the level in use here, even if both interfaces are accessing the same node.

© 2011 NetApp. All rights reserved. 34
Please refer to your Exercise Guide for more instructions. © 2011 NetApp. All rights reserved. 35

Please refer to your Exercise Guide for more instructions.

© 2011 NetApp. All rights reserved. 1
© 2011 NetApp. All rights reserved. 2
© 2011 NetApp. All rights reserved. 3
Here is an example of a FAS3040 or FAS3070 controller. Use this as a reference, but

Here is an example of a FAS3040 or FAS3070 controller. Use this as a reference, but keep in mind that as new cards are supported, some of this could change.

Here is an example of a FAS6070 or FAS6080 controller. Use this as a reference, but

Here is an example of a FAS6070 or FAS6080 controller. Use this as a reference, but keep in mind that as new cards are supported, some of this could change.

What are the characteristics of a cluster? -- A collection of nodes consisting of one or

What are the characteristics of a cluster? -- A collection of nodes consisting of one or more HA pairs -- Each node connected to other nodes via redundant 10GbE cluster network -- The cluster as a whole offering NetApp unified storage in a single namespace -- Administered as a single unit, with delegation of virtual servers

This is the back of a typical disk shelf. Here, we’re highli ghting the in and

This is the back of a typical disk shelf. Here, we’re highlighting the in and out ports of loop A (top) and loop B (bottom).

The following example shows what the storage show disk -port command output looks like for an

The following example shows what the storage show disk -port command output looks like for an SFO configuration that does not use redundant paths:

node::> storage disk show -port

Primary

Port Secondary

Port Type

Shelf

 

Bay

 

--------------- ---- --------------- ---- ------ ----- ---

node2a:0a.16

A

-

-

FCAL

1

0

node2a:0a.17

A

-

-

FCAL

1

1

node2a:0a.18

A

-

-

FCAL

1

2

node2a:0a.19

A

-

-

FCAL

1

3

node2a:0a.20

A

-

-

FCAL

1

4

node2a:0a.21

A

-

-

FCAL

1

5

.

.

.

node2a:0b.21

B

-

-

FCAL

1

5

node2a:0b.22

B

-

-

FCAL

 

1

6

node2a:0b.23 B

 

-

-

FCAL

1

7

© 2011 NetApp. All rights reserved.

Multipath HA Storage enhances data availability and performance for active/active system configurations. It is highly recommended

Multipath HA Storage enhances data availability and performance for active/active system configurations. It is highly recommended for customers who want to avoid unnecessary failovers resulting from storage-related faults. By providing redundant paths, Multipath HA Storage avoids controller failover due to storage faults from shelf I/O modules, cables, and disk HBA failures.

MultiPathing is supported on ESH2, ESH4 and AT-FCX disk shelves. If the shelf modules are not of these types then upgrade them before proceeding. If there are no free HBAs on the node, then add additional HBAs.

Use following procedure to dual-path each loop. This can be done while the node is online. Insert optical connectors into the out connection on both the A and B modules on the last shelf in the loop. Determine if the node head is plugged into the A or B module of the first shelf.

Connect a cable from a different host adapter on the node to the opposite module on the last shelf. For example, if the node is attached, via adapter 1, to the in port of module A on the first shelf, then it should be attached, via adapter 2, to the out port of the module B on the last shelf and vice versa.

Repeat step 2 for all the loops on the node. Repeat steps 2 & 3 for the other node in the SFO pair. Use the storage disk show –port command to verify that all disks have two paths.

© 2011 NetApp. All rights reserved.

As a best practice cable shelf loops sy mmetrically for ease of administration - Use the

As a best practice cable shelf loops symmetrically for ease of administration - Use the same node FC port for owner and partner to ease administration.

Consult the appropriate ISI (Installation and Setup Instructions) for graphical cabling instructions.

The types of traffic that flow over the InfiniBand links are: •Failover: The directives related to

The types of traffic that flow over the InfiniBand links are:

•Failover: The directives related to performing storage failover (SFO) between the two nodes, regardless of whether the failover is:

negotiated (planned and as a response to administrator request) non-negotiated (unplanned in response to a dirty system shutdown or reboot)

•Disk firmware: Nodes in an HA pair coordinate the update of disk firmware. While one node is updating the firmware, the other node must not do any I/O to that disk

•Heartbeats: Regular messages to demonstrate availability

•Version information: The two nodes in an HA pair must be kept at the same major/minor revision levels for all software components

Each node of an HA pair designates two disks in the first RAID group in the

Each node of an HA pair designates two disks in the first RAID group in the root aggregate as the mailbox disks. The first mailbox disk is always the first data disk in RAID group RG0. The second mailbox disk is always the first parity disk in RG0. The mroot disks are generally the mailbox disks.

Each disk, and hence each aggregate and volume built upon them , can be owned by exactly one of the two nodes in the HA pair at any given time. This form of software ownership is made persistent by writing the information onto the disk itself. The ability to write disk ownership information is protected by the use of persistent reservations. Persistent reservations can be removed from disks by power-cycling the shelves, or by selecting Maintenance Mode while in Boot Mode and issuing manual commands there. If the node that owns the disks is running in normal mode, it reasserts its persistent reservations every 30 seconds. Changes in disk ownership are handled automatically by normal SFO operations, although there are commands to manipulate them manually if necessary.

Both nodes in an HA pair can perform reads from any disk to which it is connected, even if it isn't that disk's owner. However, only the node marked as that disk's current owner is allowed to write to it.

Persistent reservations can be remov ed from disks by power-cycling the shelve s, or by selecting

Persistent reservations can be removed from disks by power-cycling the shelves, or by selecting Maintenance Mode while in Boot Mode and issuing manual commands there. If the node that owns the disks is running in normal mode, it reasserts its persistent reservations every 30 seconds.

A disk's data contents are not destroyed when it is marked as unowned, only its ownership information is erased. Unowned disks residing on an FC-AL loop, where owned disks exist, will have ownership information automatically applied to guarantee all disks on the same loop have the same owner.

To enable SFO within an HA pair, the nodes must have the Data ONTAP 7G “cf”

To enable SFO within an HA pair, the nodes must have the Data ONTAP 7G “cf” license installed on them, and they must both be rebooted after the license is installed. Only then can SFO be enabled on them.

Enabling SFO is done within pairs regardless of how many nodes are in the cluster. For SFO, the HA pairs must be of the same model, for example, two FAS3050s, two FAS6070s, and so on. The cluster itself can contain a mixture of models but each HA pair must be homogenous. The version of Data ONTAP must be the same on both nodes of the HA pair, except for the short period of time during which the pair is being upgraded. During that time, one of the nodes will be rebooted with a newer release than its partner, with the partner to follow shortly thereafter. The NVRAM cards must be installed in the nodes, and two interconnect cables are needed to connect the NVRAM cards to each other.

Remember, this cluster is not simply the pairing of machines for failover; it’s the Data ONTAP cluster.

In SFO, interface failover is separate out from storage failover. So give back returns first aggregate

In SFO, interface failover is separate out from storage failover. So give back returns first aggregate which has mroot volume of partner node and then rest of the aggregates one-by-one

Multiple controllers are connected together to provi de a high-level of hardware redundancy and resilience against

Multiple controllers are connected together to provide a high-level of hardware redundancy and resilience against single points of failure.

All controllers in an HA array can access the same shared storage backend.

All controllers in an HA array can distribute their NVRAM contents, including the NVLog to facilitate takeover without data loss in the event of failure.

In the future, HA array will likely expand to include more than two controllers.

© 2011 NetApp. All rights reserved. 17
If the node-local licenses are not installed on each node, enabli ng storage failover will result

If the node-local licenses are not installed on each node, enabling storage failover will result in an error. Verify and/or install the appropriate node licenses, reboot each node.

Enable SFO on one node per HA-pair (reboot required later) for cluster > 2 nodes.

CFO used to stand for “cluster failover,” but the term “cluster” is no longer being used

CFO used to stand for “cluster failover,” but the term “cluster” is no longer being used in relation to Data ONTAP 7G or Data ONTAP 8.0 7-Mode.

© 2011 NetApp. All rights reserved. 20
This example shows a 2-node cluster, which is also an HA pair. Notice that SFO is

This example shows a 2-node cluster, which is also an HA pair. Notice that SFO is enabled on both nodes.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved. 23
When the aggregates of one node failover to the SFO partner node, the aggregate that contai

When the aggregates of one node failover to the SFO partner node, the aggregate that contains the mroot of that node goes too. Each node needs its mroot to boot, so when the rebooted node begins to boot, the first thing that happens is that it signals the partner to do a sendhome of that one aggregate and then it waits for that to happen. If SFO is working properly, sendhome will happen quickly, the node will have its mroot and be able to boot, and then when it gets far enough in its boot process, the rest of the aggregates will be sent home (serially). If there are problems, you’ll probably see the rebooted node go into a “waiting for sendhome” state. If this happens, its possible that its aggregates are stuck in a transition state between the two nodes and may not be owned by either node. If this happens, contact NetApp Technical Support.

The EMS log will show why the sendhome was vetoed.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved. 25
© 2011 NetApp. All rights reserved. 26
Note: Changing epsilon can be run from any node in the cluster. The steps to move

Note: Changing epsilon can be run from any node in the cluster. The steps to move epsilon are as follows:

  • 1. Mark all nodes in the cluster as – epsilon false

  • 2. Mark one node – epsilon true

© 2011 NetApp. All rights reserved.

In a 2-node cluster, the choices for RDB are: Both sites required for online service (a

In a 2-node cluster, the choices for RDB are:

Both sites required for online service (a 2/2 quorum) Master/slave configuration, where one designated site is required for online operation (a (1+e)/2 quorum)

Without the epsilon node, only 1 out of 2 nodes are available, and the quorum requirement is the bare majority (1+e)/2. That represents a single point of failure.

Both these options suffer from some combination of availability issues, potential for data loss, and lack of full automation. The goal must be availability, complete data integrity, and no need for human intervention – just as for clusters of all other sizes.

Every node acts as an RDB replication site, and nodes are always sold in SFO pairs. So, the 2-node configurations are going to be quite common, and the technical issue represents a practical concern; if the wrong node crashes, all the RDB-applications on the other will stay offline until it recovers.

The problem is to provide a highly-available version of the RDB data replication service for the 2-node cluster – staying online when either one of the two nodes crash.

For clusters of only two nodes, the replicated database (RDB ) units rely on the disks

For clusters of only two nodes, the replicated database (RDB) units rely on the disks to help maintain quorum within the cluster in the case of a node being rebooted or going down. This is enabled by configuring this 2-node HA mechanism. Because of this reliance on the disks, SFO enablement and auto-giveback is also required by 2-node HA and will be configured automatically when 2-node HA is enabled. For clusters larger than two nodes, quorum can be maintained without using the disks. Do not enable 2-node HA for clusters that are larger than two nodes.

© 2011 NetApp. All rights reserved. 30
© 2011 NetApp. All rights reserved. 31
Note : 2-node HA mode should be disabled on an existing 2- node cluster prior to

Note : 2-node HA mode should be disabled on an existing 2-node cluster prior to joining the third and subsequent nodes

Please refer to your Exercise Guide for more instructions. © 2011 NetApp. All rights reserved. 33

Please refer to your Exercise Guide for more instructions.

© 2011 NetApp. All rights reserved. 34
© 2011 NetApp. All rights reserved. 35
The HA policy determines the takeover and giveback behavior and is set to either CFO or

The HA policy determines the takeover and giveback behavior and is set to either CFO or SFO.

CFO HA Policy: CFO policy aggregates (or CFO aggregates for short) can contain 7-mode volumes. When these aggregates are taken over they are available in partner mode. During giveback, all CFO aggregates are given back in one step. This is same as what happens during takeover and giveback on 7g. CFO aggregates can also contain cluster mode volumes but this is not recommended because such cluster mode volumes could experience longer outages during giveback while waiting for the applications like VLDB to stabilize and restore the access to these volumes. Cluster-mode volumes are supported in CFO aggregates because Tricky allowed data volumes in a root aggregate.

SFO HA Policy: SFO policy aggregates (or SFO aggregates for short) can contain only cluster mode volumes. They cannot contain 7-mode volumes. When these aggregates are taken over they are available in local mode. This is same as what happens during takeover on GX. During giveback, the CFO aggregates are given back first, the partner boots and then the SFO aggregates are given back one aggregate at a time. This SFO aggregate giveback behavior is same as the non-root aggregate giveback behavior on GX.

The root aggregate has policy of CFO in cluster mode. In BR.0 cluster-mode, only the root can have CFO policy. All other aggregates will have SFO policy.

© 2011 NetApp. All rights reserved. 37
Here we see that each of our nodes contains three aggregates. © 2011 NetApp. All rights

Here we see that each of our nodes contains three aggregates.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

39

© 2011 NetApp. All rights reserved. 40
© 2011 NetApp. All rights reserved. 41
Cluster-Mode volumes can be flexible volumes. The flexible vo lumes are functionally equivalent to flexible volumes

Cluster-Mode volumes can be flexible volumes. The flexible volumes are functionally equivalent to flexible volumes in 7-Mode and Data ONTAP 7G. The difference is in how they’re used. Because of the flexibility inherent in Data ONTAP clusters (specifically, the volume move capability), volumes are deployed as freely as UNIX® directories and Windows® folders to separate logical groups of data. Volumes are created and deleted, mounted and unmounted, and moved around as needed. To take advantage of this flexibility, cluster deployments typically use many more volumes than traditional 7G deployments.

Volumes can be moved around, copied, mirrored, and backed up.

This example shows some volumes. The name for the vserver r oot volume was chosen by

This example shows some volumes. The name for the vserver root volume was chosen by the administrator to indicate clearly that the volume is a root volume.

You can see that the Type values are all “RW,” which shows that these are read/write volumes, as opposed to load- sharing (LS) mirrors or data protection (DP) mirrors. We’ll learn more about mirrors later.

Also, the difference between the Size and Available values is the amount of the volume that is used, but also reflects some administrative space used by the WAFL® (Write Anywhere File Layout) file system, as well as Snapshot reserve space.

For example, an explicit NFS licence is required (was not previously with GX). Mirroring requires a

For example, an explicit NFS licence is required (was not previously with GX). Mirroring requires a new license.

© 2011 NetApp. All rights reserved.

44

ONTAP 8.0 Cluster-Mode supports a limited subset of the 7-Mode qtree functionality. In cluster-mode, they are

ONTAP 8.0 Cluster-Mode supports a limited subset of the 7-Mode qtree functionality. In cluster-mode, they are basically quota containers, not as a storage unit of management.

Qtrees can be created within flexvols and can be configured with a security style and default or specific tree quotas. User quotas are not supported in the 8.0.0 release, and backup functionality remains targeted at the volume level.

© 2011 NetApp. All rights reserved.

Cluster virtual servers are integral part of the cluster arch itecture and the means for achi

Cluster virtual servers are integral part of the cluster architecture and the means for achieving secure multi-tenancy and delegated administration. They serve data out of its namespace, have its own network identities and administrative domains.

A cluster virtual server (vserver) ties together volumes, logical interfaces, and other things for a namespace. No volumes can be created until there is a cluster vserver with which to associate them.

Think of the cluster as a bunch of hardware (nodes, disk s helves, and so on).

Think of the cluster as a bunch of hardware (nodes, disk shelves, and so on). A vserver is a logical piece of that cluster, but it is not a subset or partitioning of the nodes. It’s more flexible and dynamic than that. Every vserver can use all the hardware in the cluster, and all at the same time.

Here is a simple example: A storage provider has one cluster, and two customers, ABC Company and XYZ Company. A vserver can be created for each company. The attributes that are related to specific vservers (volumes, LIFs, mirrors, and so on) can be managed separately, while the same hardware resources can be used for both. One company can have its own NFS server, while the other can have its own NFS and CIFS servers, for example.

There is a one-to-many relationship between a vserver and it s volumes. The same is true

There is a one-to-many relationship between a vserver and its volumes. The same is true for a vserver and its data LIFs. Cluster vservers can have many volumes and many data LIFs, but those volumes and LIFs are associated only with this one cluster vserver.

© 2011 NetApp. All rights reserved. 49
Please note that this slide is a representation of logical c oncepts and is not meant

Please note that this slide is a representation of logical concepts and is not meant to show any physical relationships. For example, all of the objects shown as part of a vserver are not necessarily on the same physical node of the cluster. In fact, that would be very unlikely.

This slide shows four distinct vservers (and namespaces). Although the hardware is not shown, these four vservers could be living within a single cluster. These are not actually separate entities of the vservers, but are shown merely to indicate that each vserver has a namespace. The volumes, however, are separate entities. Each volume is associated with exactly one vserver. Each vserver has one root volume, and some have additional volumes. Although a vserver may only have one volume (its root volume), in real life it is more likely that a vserver would be made up of a number of volumes, possibly thousands. Typically, a new volume is created for every distinct area of storage. For example, every department and/or employee may have its own volume in a vserver.

A namespace is simply a file system. It is the external (c lient-facing) representation of a

A namespace is simply a file system. It is the external (c lient-facing) representation of a vserver. It is made up of volumes that are joined together through junctions. Each vserver has exactly one namespace, and the volumes in one vserver cannot be seen by clients that are accessing the namespace of another vserver. Namespace provides the logical arrangement of the NAS data available in the Vserver.

These nine volumes are mounted together via junctions. A ll volumes must have a junction path

These nine volumes are mounted together via junctions. All volumes must have a junction path (mount point) to be accessible within the vserver’s namespace.

Volume R is the root volume of a vserver. Volumes A, B, C, and F are mounted to R through junctions. Volumes D and E are mounted to C through junctions. Likewise, volumes G and H are mounted to F.

Every vserver has its own root volume, and all non-root volumes are created within a vserver. All non-root volumes are mounted into the namespace, relative to the vserver root.

© 2011 NetApp. All rights reserved. 53
This is a detailed volume show command. Typing this will show a summary view of all

This is a detailed volume show command. Typing this will show a summary view of all volumes. If you do a show of a specific virtual server and volume, you’ll see the instance (detailed) view of the volume rather than the summary list of volumes.

Junctions are conceptually similar to UNIX mountpoints. In UNIX, a hard disk can be carved up

Junctions are conceptually similar to UNIX mountpoints. In UNIX, a hard disk can be carved up into partitions and then those partitions can be mounted at various places relative to the root of the local file system, including in a hierarchical manner. Likewise, the flexible volumes in a Data ONTAP cluster can be mounted at junction points within other volumes, forming a single namespace that is actually distributed throughout the cluster. Although junctions appear as directories, they have the basic functionality of symbolic links.

A volume is not visible in its vserver’s namespace until it is mounted within the namespace.

Typically, when volumes are created by way of the volume create command, a junction path is

Typically, when volumes are created by way of the volume create command, a junction path is specified at that time. That is optional; a volume can be created and not mounted into the namespace. When it’s time to put that volume into use, the volume mount command is the way to assign the junction path to the volume. The volume also can be unmounted, which takes it out of the namespace. As such, it is not accessible by NFS or CIFS clients, but it is still online, and can be mirrored, backed up, moved and so on. It then can be mounted again to the same or different place in the namespace and in relation to other volumes (for example, it can be unmounted from one parent volume and mounted to another parent volume).

This is a representation of the volume hierarchy of a namespace. These five volumes are connected

This is a representation of the volume hierarchy of a namespace. These five volumes are connected by way of junctions, with the root volume of the namespace at the “top” of the hierarchy. From an NFS or CIFS client, this namespace will look like a single file system.

It’s very important to know the differences between what the volume hierarchy looks like to the

It’s very important to know the differences between what the volume hierarchy looks like to the administrator (internally) as compared to what the namespace looks like from an NFS or CIFS client (externally).

The name of the root volume of a vserver (and hence, the root of this namespace) can be chosen by the administrator, but the junction path of the root volume is always /. Notice that the junction path for (the mountpoint of) a volume is not tied to the name of the volume. In this example, we’ve prefixed the name of the volume smith_mp3 to associate it with volume smith, but that’s just a convention to make the relationship between the smith volume and its mp3 volume more obvious to the cluster administrator.

© 2011 NetApp. All rights reserved. 59
Here again is the representation of the volumes of this namespace. The volume names are shown

Here again is the representation of the volumes of this namespace. The volume names are shown inside the circles and the junction paths are listed outside of them. Notice that there is no volume called “user.” The “user” entity is simply a directory within the root volume, and the junction for the smith volume is located in that directory. The acct volume is mounted directly at the /acct junction path in the root volume.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

Please refer to your Exercise Guide for more instructions. © 2011 NetApp. All rights reserved. 62

Please refer to your Exercise Guide for more instructions.

© 2011 NetApp. All rights reserved. 63
© 2011 NetApp. All rights reserved. 64
© 2011 NetApp. All rights reserved. 1
© 2011 NetApp. All rights reserved. 2
© 2011 NetApp. All rights reserved. 3
Kernel modules are loaded into the FreeBSD kernel. This give s them special privileges that are

Kernel modules are loaded into the FreeBSD kernel. This gives them special privileges that are not available to user space processes. There are great advantages to being in the kernel; there are downsides too. For one, it’s more difficult to write kernel code, and the penalty for a coding error is great. User space processes can be swapped out by the operating system, but on the plus side, user space processes can fail without taking the whole system down, and can be easily restarted on the fly.

© 2011 NetApp. All rights reserved. 5
This diagram shows the software stack making up Data ONTAP 8.0 Cluster-Mode. The most obvious difference

This diagram shows the software stack making up Data ONTAP 8.0 Cluster-Mode. The most obvious difference between this stack and the 7-Mode stack is the addition of a networking component called the N-blade, and more logical interfaces (LIFs). Also, notice that Cluster-Mode does not yet support the SAN protocols (FC and iSCSI).

The N-blade is the network blade. It translates between the NAS protocols (NFS and CIFS) and the SpinNP protocol that the D-blade uses. SpinNP is the protocol used within a cluster to communicate between N-blades and D-blades. In Cluster-Mode, the D-blade does not service NAS or SAN protocol requests.

© 2011 NetApp. All rights reserved.

6

All nodes in a cluster have these kernel modules: •common_kmod.ko: The Common module is the first

All nodes in a cluster have these kernel modules:

•common_kmod.ko: The Common module is the first kernel module to load. It contains common services, which are shared by modules that load after it.

•nvram5.ko: A low-level hardware driver for NVRAM5. •nvram_mgr.ko: This segments NVRAM for various users, provides some common access functions that NVRAM5 doesn't provide, and provides centralized power management. •nvr.ko: A character device driver for interfacing with individual regions of NVRAM (for example, /var).

•maytag.ko: This is the D-blade. It is a stripped-down and modified Data ONTAP 7G, which includes the WAFL® file system, RAID, and storage components, and the SpinNP translation layers.

•nbladekmod.ko: The N-blade contains the network stack, protocols, and SpinNP translation layers. •spinvfs.ko: SpinVFS enables the user space components to access volumes in the cluster.

© 2011 NetApp. All rights reserved. 8
All nodes in a cluster have these user space processes: •mgwd: This process runs the M-host

All nodes in a cluster have these user space processes:

•mgwd: This process runs the M-host (the Management component) on each node. The mgwd process of each node talks to the mgwd processes on the other nodes.

•vldb: This process manages the Volume Location Database (VLDB). It provides the mappings from volume and container identifiers presented in file handles to those containers (or replicas of them) on a particular D-blade.

•vifmgr: This process manages virtual interfaces, including their ability to migrate, fail over, and revert. •ngsh: This process manages the clustershell on each node. •ndmpd: This process services all NDMP calls from third-party data management applications. •secd: This process manages protocol authentication from NFS and CIFS clients.

•spmd: This process controls the starting, stopping, and restarting of the other processes. If a process gets hung, it will be killed and restarted by the spmd.

•N-blade (network, protocols) •CSM (and SpinNP) •D-blade (WAFL, NVRAM, RAID, storage) •Management (sometimes called M-host) ©

•N-blade (network, protocols) •CSM (and SpinNP) •D-blade (WAFL, NVRAM, RAID, storage) •Management (sometimes called M-host)

© 2011 NetApp. All rights reserved.

10

The term “blade” refers to separate software state ma chines, accessed only by well-defined application program

The term “blade” refers to separate software state machines, accessed only by well-defined application program interfaces, or APIs. Every node contains an N-blade, a D-blade, and Management. Any N-blade in the cluster can talk to any D-blade in the cluster. Each node has an N-blade, a D-blade, and Management.

The N-blade translates client requests into Spin Network Protocol (SpinNP) requests (and vice versa). The D-blade, which contains the WAFL® (Write Anywhere File Layout) file system, handles SpinNP requests. CSM is the SpinNP layer between the N-blade and D-blade.

The members of each RDB unit, on every node in the cluster, are in constant communication with each other to remain in sync. The RDB communication is like the heartbeat of each node. If the heartbeat cannot be detected by the other members of the unit, the unit will correct itself in a manner to be discussed later. The three RDB units on each node are: VLDB, VifMgr, and Management. There will be more information about these RDB units later.

This graphic is very simplistic, but each node contains the following: N-blade, CSM, D-blade, M-host, RDB

This graphic is very simplistic, but each node contains the following: N-blade, CSM, D-blade, M-host, RDB units (3), and the node’s vol0 volume.

An NFS or CIFS client sends a write reque st to a data logical interface, or

An NFS or CIFS client sends a write request to a data logical interface, or LIF. The N-blade that is currently associated with that LIF translates the NFS/CIFS request to a SpinNP request. The SpinNP request goes through CSM to the local D-blade. The D-blade sends the data to nonvolatile RAM (NVRAM) and to the disks. The response works its way back to the client.

© 2011 NetApp. All rights reserved.

13

This path is mostly the same as the local write request, except that when the SpinNP

This path is mostly the same as the local write request, except that when the SpinNP request goes through CSM, it goes to a remote D-blade elsewhere in the cluster, and vice versa.

© 2011 NetApp. All rights reserved.

14

The N-blade architecture comprises a variety of functional areas, interfaces and compon ents. The N -

The N-blade architecture comprises a variety of functional areas, interfaces and components. The N - blade itself resides as a loadable module within the FreeBSD kernel. It relies heavily on services provided by SK (within the D- blade).

The N-blade supports a variety of Protocols. Interaction with these protocols is mediated by the PCP (protocol and connection processing) module. It handles all connection and packet management between the stream protocols and the network protocol stack/device drivers.

© 2011 NetApp. All rights reserved.

15

© 2011 NetApp. All rights reserved. 16
•Transports requests from any N-blade to any D-blade and vice versa (even on the same node)

•Transports requests from any N-blade to any D-blade and vice versa (even on the same node)

•The protocol is called SpinNP (Spinnaker network protocol) and is the language that the N-blade speaks to the D- blade

•Uses UDP/IP

© 2011 NetApp. All rights reserved.

17

SpinNP is the protocol family used within a cluster or between clusters to carry high frequency/high

SpinNP is the protocol family used within a cluster or between clusters to carry high frequency/high bandwidth messages between blades or between an m-host and a blade.

Cluster Session Manager (CSM) is the communication layer that manages connections using the SpinNP protocol between

Cluster Session Manager (CSM) is the communication layer that manages connections using the SpinNP protocol between two blades. The blades can be either both local or one local and one remote. Clients of CSM use it because it provides for blade to blade communication without the client's knowledge of where the remote blade is located.

© 2011 NetApp. All rights reserved. 20
© 2011 NetApp. All rights reserved. 21
Basically a wrapper around Data ONTAP 7G that translate s SpinNP for WAFL. The Spinnaker D-blade

Basically a wrapper around Data ONTAP 7G that translates SpinNP for WAFL. The Spinnaker D-blade (SpinFS file system, storage pools, VFS, Fibre Channel Driver, N+1 storage failover) was replaced by Data ONTAP (encapsulated into a FreeBSD kernel module)

•Certain parts of the “old” Data ONTAP aren’t used (UI, network, protocols) •It “speaks” SpinNP on the front end •The current D-blade is mostly made up of WAFL

© 2011 NetApp. All rights reserved.

22

D-blade is the disk facing software kernel module and is derived from ONTAP. It cont ains

D-blade is the disk facing software kernel module and is derived from ONTAP. It contains WAFL, RAID, Storage.

SpinHI is part of the D-blade, and sits directly above WAFL. It processes all incoming SpinNP fileop messages. Most of these are translated into WAFL messages

•Also known as “Management” •Based on code called “Simple Management Framework” (SMF) •Cluster, nodes, and virtual

•Also known as “Management” •Based on code called “Simple Management Framework” (SMF) •Cluster, nodes, and virtual servers can be managed by any node in the cluster

© 2011 NetApp. All rights reserved.

24

The M-Host is a User space environment on a node along with the entire collection of

The M-Host is a User space environment on a node along with the entire collection of software services :

Command shells and API servers. Service processes for upcalls from the kernel. User space implementation of network services, such as DNS, and file access services such as HTTP and FTP. Underlying cluster services such as RDB, cluster membership services, quorum. Logging services such as EMS. Environmental monitors. Higher level cluster services, such as VLDB, job manager, and LIF manager. Processes that interact with external servers, such as Kerberos, LDAP. Processes that perform operational functions such as NDMP control, and auditing. Services that operate on data, such as Anti-virus and indexing.

SMF currently supports two types of per sistent data storage via table level a ttributes: persistent

SMF currently supports two types of persistent data storage via table level attributes: persistent and replicated. The replicated tables are identical copies of the same set of tables stored on every node in the cluster. Persistent tables are node specific and stored locally on each node in the cluster.

Colloquially these table attributes are referred to as RDB (replicated) and CDB (persistent).

© 2011 NetApp. All rights reserved. 27
© 2011 NetApp. All rights reserved. 28
© 2011 NetApp. All rights reserved. 29
The volume location database (VLDB) is a replicated database Stores tables used by N-blades and the

The volume location database (VLDB) is a replicated database Stores tables used by N-blades and the system management processes to find the D-blade to which to send requests for a particular volume. Note that all of these mappings are composed and cached in the N-blade’s memory, so that the results of All lookups are typically available after a single hash table lookup

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

SecD Relies on External Servers - DNS Servers Used for name to IP address lookups o

SecD Relies on External Servers

  • - DNS Servers Used for name to IP address lookups o Used by Server Discovery to get the Domain Controllers for a Windows domain. Microsoft stores this info in DNS. -Windows Domain Controllers

Used by SecD to create a CIFS server's Windows machine account Used to perform CIFS authentication Retrieved from DNS using the CIFS domain name included in the 'cifs create' command Preferred DCs can be specified in the configuration

  • - NIS Servers If configured, can be used to obtain credentials for UNIX users NIS must be included in the vserver's ns-switch option - LDAP Servers

If configured, can be used to obtain credentials for UNIX users, and as a source for UNIX accounts during name mapping

LDAP must be included in the vserver's ns-switch and/or nm-switch options.

In some cases these servers can be automatically detected, and in others the servers must be defined in the configuration.

Manages all cluster-mode network connections, data, cluster, and mgmt networks. Uses RDB to store network configuration

Manages all cluster-mode network connections, data, cluster, and mgmt networks. Uses RDB to store network configuration information User RDB to know when to migrate a LIF to another node.

© 2011 NetApp. All rights reserved. 34
© 2011 NetApp. All rights reserved. 35
© 2011 NetApp. All rights reserved. 36
The vol0 volume of a node is analogous to the root volume of a Data ONTAP®

The vol0 volume of a node is analogous to the root volume of a Data ONTAP® 7G system. It contains the data needed for the node to function.

The vol0 volume does not contain any user data, nor is it part of the namespace of a vserver. It lives (permanently) on the initial aggregate that is created when each node is initialized.

The vol0 volume is not protected by mirrors or tape backups, but that’s OK. Although it is a very important volume (a node cannot boot without its vol0 volume), the data contained on vol0 is (largely) re-creatable. If it were lost, the log files would indeed be gone. But because the RDB data is replicated on every node in the cluster, that data can be automatically re-created onto this node.

Each vserver has one namespace and, therefore, one root volu me. This is separate from the

Each vserver has one namespace and, therefore, one root volume. This is separate from the vol0 volume of each node.

The RDB units do not contain user data, but rather they contain data that helps manage

The RDB units do not contain user data, but rather they contain data that helps manage the cluster. These databases are replicated, that is, each node has its own “copy” of the database, and that database is always in sync with the databases on the other nodes in the cluster. RDB database reads are performed locally on each node, but an RDB write is performed to one “master” RDB database, and then those changes are replicated to the other databases throughout the cluster. When reads are done of an RDB database, they can be fulfilled locally, without the need to send any requests over the cluster networks.

The RDB is transactional in that it guarantees that when something is being written to a database, either it all gets written successfully or it all gets rolled back. No partial/inconsistent database writes are committed.

There are three RDB units (VLDB, Management, VifMgr) in every cluster, which means that there are three RDB unit databases on every node in the cluster.

Replicated Database Currently three RDB units: VLDB, VifMgr, Management Maintains the data that manages the cluster

Replicated Database Currently three RDB units: VLDB, VifMgr, Management Maintains the data that manages the cluster Each unit has its own replication unit Unit is made up of one master (read/write) and other secondaries (read-only) One node contains the master of an RDB app, others contain the secondaries Writes go to the master, then get propagated to others in the unit (via the cluster network) Enables the consistency of the units through voting and quorum The user space processes for each RDB unit vote to determine which node (process) will be the master Each unit has a master, which could be a different node for each unit The master can change as quorum is lost and regained An RDB unit is considered to be healthy only when it is “in quorum” (i.e., a master is able to be elected) A simple majority of online nodes are required to have a quorum One node is designated as “epsilon” (can break a tie) for all RDB units A RDB replication ring stays “online” as long as a bare majority of the application instances are healthy and in communication (a quorum). When an instance is online (part of the quorum), it enjoys full read/write capability on up- to-date replicated data. When offline, it is limited to read-only access to the potentially out-of-date data offered by the local replica. The individual applications all require online RDB state to provide full service.

© 2011 NetApp. All rights reserved.

40

Each RDB unit has it own ring. If n is the number of nodes in the

Each RDB unit has it own ring. If n is the number of nodes in the cluster, then each unit/ring is made up of n databases and n processes. At any given time, one of those databases is designated as the master and the others are designated as secondary databases. Each RDB unit’s ring is independent of the other RDB units. If nodeX has the master database for the VLDB unit, nodeY may have the master for the VifMgr unit and nodeZ may have the master for the Management unit.

The master of a given unit can change. For example, when the node that is the master for the Management unit gets rebooted, a new Management master needs to be elected by the remaining members of the Management unit. It’s important to note that a secondary can become a master and vice versa. There isn’t anything special about the database itself, but rather the role of the process that manages it (master versus secondary).

When data has to be written to a unit, the data is written to the database on the master and then the master takes care of immediately replicating the changes to the secondary databases on the other nodes. If a change cannot be replicated to a certain secondary, then the entire change is rolled back everywhere. This is what we mean by no partial writes. Either all databases of an RDB unit get the change, or none get the change.

© 2011 NetApp. All rights reserved. 42
Quorum requirements are based on a straight majority calc ulation. To promote easier quorum formation given

Quorum requirements are based on a straight majority calculation. To promote easier quorum formation given an even number of replication sites, one of the sites is assigned an extra partial weight (epsilon). So, for a cluster of 2n sites, quorum can be formed by the n-site partition that includes the epsilon site.

Let’s define some RDB terminology. A master can be elec ted only when there is a

Let’s define some RDB terminology. A master can be elected only when there is a quorum of members available (and healthy) for a particular RDB unit. Each member votes for the node that it thinks should be the master for this RDB unit. One node in the cluster has a special tie-breaking ability called “epsilon.” Unlike the master, which may be different for each RDB unit, epsilon is a single node that applies to all RDB units.

Quorum means that a simple majority of nodes are healthy enough to elect a master for the unit. The epsilon power is only used in the case of a voting tie. If a simple majority does not exist, the epsilon node (process) chooses the master for a given RDB unit.

A unit goes out of quorum when cluster communication is interrupted, for example, due to a reboot, or perhaps a cluster network hiccup that lasts for a few seconds. It comes back into quorum automatically when the cluster communication is restored.

In normal operation, cluster-wide quorum is required to elect the master. For quorum, a simple majority

In normal operation, cluster-wide quorum is required to elect the master.

For quorum, a simple majority of connected, healthy, active nodes is required; or For N = 2n or 2n+1, Quorum >= (n+1) required Alternatively, an artificial majority: half the nodes (including configuration epsilon) For N = 2n w/epsilon, Quorum >= (n+e) For N = 2n+1 w/epsilon, Quorum >= (n+1)

A master can be elected only when there is a majority of local RDB units connected

A master can be elected only when there is a majority of local RDB units connected (and healthy) for a particular RDB unit. A master is elected when each local unit agrees on the first reachable healthy node in the RDB site list. A “healthy” node would be one that is connected, able to communicate with the other nodes, has CPU cycles, and has reasonable I/O.

The master of a given unit can change. For example, when the node that is the master for the Management unit gets rebooted, a new Management master needs to be elected by the remaining members of the Management unit.

A local unit goes out of quorum when cluster communication is interrupted for a few seconds, for example, due to a reboot, or perhaps a cluster network hiccup that lasts for a few seconds. It comes back in quorum automatically as the RDB units are always working to monitor and maintain a good state. When a local unit goes out of quorum and then comes back into quorum, the RDB unit is re-synchronized. It’s important to note that the VLDB process on a node could go out of quorum for some reason, while the VifMgr process on that same node has no problem at all.

When a unit goes out of quorum, reads from that unit can be done, but writes to that unit cannot. That restriction is enforced so that no changes to that unit happen during the time that a master is not agreed upon. Besides the VLDB example above, if the VifMgr goes out of quorum, access to LIFs is not affected, but no LIF failover can occur.

Marking a node as ineligible (by way of the cluster modi fy command) means that it

Marking a node as ineligible (by way of the cluster modify command) means that it no longer affects RDB quorum or voting. If the epsilon node is marked as ineligible, epsilon will be automatically given to another node.

© 2011 NetApp. All rights reserved. 48
© 2011 NetApp. All rights reserved. 49
The cluster ring show command is available only at the advanced privilege level or higher. The

The cluster ring show command is available only at the advanced privilege level or higher.

The DB Epochvalues of the members of a given RDB unit should be the same. For example, as shown, the DB epoch for the mgmt unit is “8,” and it’s “8” on both node5 and node8. But that is different than the DB epoch for the vldb unit, which is “6.” This is fine. The DB epoch needs to be consistent across nodes for an individual unit. Not all units have to have the same DB epoch.

© 2011 NetApp. All rights reserved.

Whenever RDB ring forms a new quorum and elects the RDB master, the master starts a

Whenever RDB ring forms a new quorum and elects the RDB master, the master starts a new epoch. Combination of epoch number and transaction number <epoch,tnum> is used to construct RDB versioning.

The transaction number is incremented with each RW transaction. All RDB copies that have the same <epoch,tnum> combination contain exactly the same information.

© 2011 NetApp. All rights reserved. 52
When a majority of the instances in t he RDB ring are available, they elect one

When a majority of the instances in the RDB ring are available, they elect one of these instances the master, with the others becoming secondary's. The RDB master is responsible for controlling updates to the data within the replication ring

When one of the nodes wishes to make an update, it must first obtain a write transaction from the master. Under this transaction, the node is free to make whatever changes it wants; however, none of these changes are seen externally until the node commits the transaction. On commit, the master attempts to propagate the new data to the other nodes in the ring.

If a quorum’s worth of nodes is updated, the changes are made permanent; if not, the

If a quorum’s worth of nodes is updated, the changes are made permanent; if not, the changes are rolled back.

© 2011 NetApp. All rights reserved. 55
One node in the cluster has a special voting weight call ed epsilon. Unlike the masters

One node in the cluster has a special voting weight called epsilon. Unlike the masters of each RDB unit, which may be different for each unit, the epsilon node is the same for all RDB units. This epsilon vote is only used in the case of an even partitioning of a cluster, where, for example, four nodes of an eight-node cluster cannot talk to the other four nodes. This is very rare, but should it happen, a simple majority would not exist and the epsilon node would sway the vote for the masters of the RDB units.

From Ron Kownacki, author of the RDB: “Basically, quorum majority doesn't work well when down to

From Ron Kownacki, author of the RDB:

“Basically, quorum majority doesn't work well when down to two nodes and there's a failure, so RDB is essentially locking the fact that quorum is no longer being used, and enabling a single replica to be artificially writable during that outage.

“The reason we require a quorum (a majority) is so that all committed data is durable - if you successfully write to a majority, you know that any future majority will contain at least one instance that has seen the change, so the update is durable. If we didn't always require a majority, we could silently lose committed data. So in two nodes, the node with epsilon is a majority, and the other is a minority - so you would only have one directional failover (need the majority). So epsilon gives you a way to get majorities where you normally wouldn't have them, but it only gives unidirectional failover because it's static.

“In two-node [high availability mode], we try to get bidirectional failover. To do this, we remove the configuration epsilon, and make both nodes equal - and form majorities artificially in the failover cases. So quorum is 2/2 (no epsilon involved), but if there's a failover, you artificially designate the survivor as the majority (and lock that fact). However, that means you can't failover the other way until both nodes are available, they sync up, and drop the lock - otherwise you would be discarding data.”

This diagram shows that each node contains the following: N-blade, CSM, D-blade, M-host, RDB units (3),

This diagram shows that each node contains the following: N-blade, CSM, D-blade, M-host, RDB units (3), and vol0.

© 2011 NetApp. All rights reserved. 59
Please refer to your Exercise Guide for more instructions. © 2011 NetApp. All rights reserved. 60

Please refer to your Exercise Guide for more instructions.

© 2011 NetApp. All rights reserved. 61
© 2011 NetApp. All rights reserved. 1
© 2011 NetApp. All rights reserved. 2
The bundled cluster and management swit ch infrastructure consists of: Cluster: Cisco NX5010/NX5020 (20/40 port, 10GbE)

The bundled cluster and management switch infrastructure consists of:

Cluster: Cisco NX5010/NX5020 (20/40 port, 10GbE) Management: Cisco 2960 (24 port 10/100) Switch Cabling:

Cable ISL ports (8x Copper ISL ports) NX5010: Ports 13-20 NX5020: Ports 33-40 Cable mgmt switch ISL and customer uplink Cable NX50x0 to mgmt switch Cable controller cluster ports cluster port1 -> sw A, cluster port2 -> sw B Cable Management Ports Odd node #: node-mgmt sw A, RLM sw B Even node #: node-mgmt sw B , RLM sw A

© 2011 NetApp. All rights reserved.

3

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

4

© 2011 NetApp. All rights reserved. 5
© 2011 NetApp. All rights reserved. 6
© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved.

7

© 2011 NetApp. All rights reserved. 8
The key change in Boilermaker from 7G is that we now have a dual-stack architecture. However,

The key change in Boilermaker from 7G is that we now have a dual-stack architecture. However, when we say "dual- stack" it tends to imply that both stacks have equal prominence. But in our case, the stack inherited from 7G, referred to as the SK stack, owns the network interfaces in normal operation and runs the show for 7-mode and C-mode apps for the most part. The FreeBSD stack inherited from GX runs as a surrogate to the SK stack and provides the programmatic interface(BSD sockets) to the mhost apps to communicate to the network. The FreeBSD stack itself does not directly talk to the network in normal operational mode. This is because it does not own any of the physical network interfaces. FreeBSD stack maintains the protocol(TCP+UDP) state for all mhost connections and sets up the TCP/IP frames over mhost data. It sends the created TCP/IP frames to the SK stack for delivery to the network. On the ingress side, SK stack delivers all packets destined to the mhost to the FreeBSD stack.

Data ONTAP 8.0 makes a distinction between physical network ports and logical interfaces, or LIFs. Each

Data ONTAP 8.0 makes a distinction between physical network ports and logical interfaces, or LIFs. Each port has a role associated with it by default, although that can be changed through the UI. The role of each network port should line up with the network to which it is connected.

Management ports are for administrators to connect to the node/cluster, for example, through SSH or a Web browser.

Cluster ports are strictly for intra-cluster traffic.

Data ports are for NFS and CIFS client access, as well as the cluster management LIF.

© 2011 NetApp. All rights reserved. 11
Using a FAS30x0 as an example, the e0a and e0b ports are defined as having a

Using a FAS30x0 as an example, the e0a and e0b ports are defined as having a role of cluster, while the e0c and e0d ports are defined for data. The e1a port would be on a network interface card in one of the four horizontal slots at the top of the controller. The e1a port is, by default, defined with a role of mgmt.

The network port show command shows the summary view of the ports of this 4-node cluster.

The network port show command shows the summary view of the ports of this 4-node cluster. All the ports are grouped by node, and you can see the roles assigned to them, as well as their status and Maximum Transmission Unit (MTU) size. Notice the e1b data ports that are on the nodes, but not connected to anything.

A LIF in Cluster-Mode terminology refers to an IP and netmask associated with a data port.

A LIF in Cluster-Mode terminology refers to an IP and netmask associated with a data port.

Each node can have multiple data LIFs, and multiple data LIFs can reside on a single data port, or optional interface group.

The default LIF creation command will also create default failover rules. If manual/custom failover rule creation is desired, or if multiple data subnets will be used, add the "use-failover-groups disabled" or specific "-failover-group" options to the "network interface create" command.

Data ONTAP connects with networks through physical interf aces (or links). The most common interface is

Data ONTAP connects with networks through physical interfaces (or links). The most common interface is an Ethernet port, such as e0a, e0b, e0c, and e0d.

Data ONTAP has supported IEEE 802.3ad link aggregation for some time now . This standard allows multiple network interfaces to be combined into one interface group. After being created, this group is indistinguishable from a physical network interface.

Multiple ports in a single controller can be combined into a trunked port via the interface group port feature. An interface group supports 3 distinct modes: multimode, multimode-lacp and singlemode, and the load distribution selectable between mac, ip and sequential. Using interface groups will require matching configuration on the connected client ethernet switch, depending on the configuration selected.

Ports are either physical ports (NICs), or virtualized ports such as ifgrps or vlans. Ifgrps treat

Ports are either physical ports (NICs), or virtualized ports such as ifgrps or vlans. Ifgrps treat several physical ports as a single port, while vlans subdivide a physical port into multiple separate ports. A LIF communicates over the network through the port it is currently bound to.

Using 9000 MTU on the cluster network is highly recomme nded, for performance and reliability reasons.

Using 9000 MTU on the cluster network is highly recommended, for performance and reliability reasons. The cluster switch or VLAN should be modified to accept 9000 byte payload frames prior to attempting the cluster join/create. Standard 1500 MTU cluster ports should only be used in non-production lab or evaluation situations, where performance is not a consideration

© 2011 NetApp. All rights reserved.

The LIF names need to be unique within their scope. For data LIFs, the scope is

The LIF names need to be unique within their scope. For data LIFs, the scope is a cluster virtual server, or vserver. For the cluster and management LIFs the scopes are limited to their nodes. Thus, the same name, like mgmt1, can be used for all the nodes, if desired.

© 2011 NetApp. All rights reserved. 19
A routing group is automatically created when the first inte rface on a unique subnet is

A routing group is automatically created when the first interface on a unique subnet is created. The routing group is role-specific, and allows the use of the same set of static and default routes across many logical interfaces. The default naming convention for a routing group is representative of the interface-role and the subnet they are created for

The first interface created on a subnet will trigger the autom atic creation of the appropriate

The first interface created on a subnet will trigger the automatic creation of the appropriate routing-group. Subsequent LIFs created on the same subnet will inherit the existing routing group.

Routing groups cannot be renamed. If a naming convention other than the default is required, the routing group can be pre-created with the desired name, then applied to an interface during LIF creation or as a modify operation to the LIF

© 2011 NetApp. All rights reserved. 22
© 2011 NetApp. All rights reserved. 23
Routing groups are created automatically as new LIFs are created, unless an existing routing group already

Routing groups are created automatically as new LIFs are created, unless an existing routing group already covers that port role/network combination. Besides the node management LIF routing groups, other routing groups have no routes defined by default.

The node management LIFs on each node have static routes automatically set up for them, using the same default gateway.

There is a “metric” value for each static route, which is how the administrator can configure which route would be preferred over another (the lower the metric, the more preferred the route) in the case where there is more than one static route defined for a particular LIF. The metric values for the node management LIFs are 10. When routes are created for data LIFs, if no metric is defined, the default will be 20.

© 2011 NetApp. All rights reserved. 25
As with the network interface show output, the node managem ent LIFs have a Server that

As with the network interface show output, the node management LIFs have a Server that is the node itself. The data LIFs are associated with a cluster vserver, so they’re grouped under that.

Why migrate a LIF? It may be needed for troubleshooting a faulty port, or perhaps to

Why migrate a LIF? It may be needed for troubleshooting a faulty port, or perhaps to offload a node whose data network ports are being saturated with other traffic. It will failover if its current node is rebooted.

Unlike storage failover (SFO), LIF failover or migration does not cause a reboot of the node from which the LIF is migrating. Also unlike SFO, LIFs can migrate to any node in the cluster, not just within the high-availability pair. Once a LIF is migrated, it can remain on the new node for as long as the administrator wants it to.

We’ll cover failover policies and rules in more detail later.

•Data LIFs can migrate or failover from one node and/or port to any other node and/or

•Data LIFs can migrate or failover from one node and/or port to any other node and/or port within the cluster •LIF migration is generally for load balancing; LIF failover is for node failure •Data LIF migration/failover is NOT limited to an HA pair

•Nodes in a cluster are paired as “high-availability” (HA) pairs (these are called “pairs,” not “clusters”) •Each member of an HA pair is responsible for the storage failover (SFO) of its partner •Each node of the pair is a fully functioning node in the greater cluster

•Clusters can be heterogeneous (in terms of hardware and Cluster-Mode versions), but an HA pair must be the same controller model

•First, we show a simple LIF migration •Next, we show what happens when a node goes down:

•Both data LIFs that reside on that node fail over to other ports in the cluster •The storage owned by that node fails over to its HA partner •The failed node is “gone” (i.e., its partner does not assume its identity like in 7G and 7-Mode) •The data LIF IP addresses remain the same, but are associated with different NICs

© 2011 NetApp. All rights reserved.

28

Remember that data LIFs aren’t permanently tied to thei r nodes. However, the port to which

Remember that data LIFs aren’t permanently tied to their nodes. However, the port to which a LIF is migrating is tied to a node. This is another example of the line between physical and logical. Also, ports have a node vserver scope, whereas data LIFs have a cluster vserver scope.

All data and cluster-mgmt LIFs can be configured to automatically fail over to other ports/nodes in the event of failure. Can also be used for load-balancing if an N-Blade is overloaded .The TCP state is not carried over during failover to another node.

Best practices is to fail LIFs from “even” nodes over to other “even” nodes and LIFs from “odd” nodes to other “odd” nodes

The default policy that gets set when a LIF is created is nextavail, but priority can

The default policy that gets set when a LIF is created is nextavail, but priority can be chosen if desired.

In a 2 node cluster, the nextavail failover-group policy creates rules to fail over between interfaces on the 2 nodes. In clusters with 4 or more nodes, the system-defined group will create rules between alternating nodes, to prevent the storage failover partner from receiving the data LIFs as well in the event of a node failure. For example, in a 4 node cluster, the default failover rules are created so that node1 -> node3, node2 -> node4, node3-> node1 and node4->

node2

Priority rules can be set by the administrator. The default rule (priority 0, which is the highest priority) for each LIF is its home port and node. Additional rules that are added will further control the failover, but only if the failover policy for that LIF is set to priority. Otherwise, rules can be created but won’t be used if the failover policy is nextavail. Rules are attempted in priority order (lowest to highest) until the port/node combination for a rule is able to be used for the LIF. Once a rule is applied, the failover is complete.

Manual failover rules can also be created, in instances where explicit control is desired by using ‘disabled’ option. .

© 2011 NetApp. All rights reserved. 31
© 2011 NetApp. All rights reserved. 32
© 2011 NetApp. All rights reserved. 33
As the cluster receives different amounts of traffic, the traffic on all of the LIFs of

As the cluster receives different amounts of traffic, the traffic on all of the LIFs of a virtual server can become unbalanced. DNS load balancing aims at dynamically choose a LIF based on load instead of using the round robin way of providing IP addresses.

© 2011 NetApp. All rights reserved.

With DNS load balancing enabled, a storage administrator can choose to allow the new built-in load

With DNS load balancing enabled, a storage administrator can choose to allow the new built-in load balancer to balance client logical interface (LIF) network access based on the load of the cluster. This DNS server resolves names to LIFs based on the weight of a LIF. A vserver can be associated with a DNS load-balancing zone and LIFs can be either created or modified in order to be associated with a particular DNS zone. A fully-qualified domain name can be added to a LIF in order to create a DNS load-balancing zone by specifying a “dns-zone” parameter on the network interface create command.

There are two methods that can be used to specify the weight of a LIF: the storage administrator can specify a LIF weight, or the LIF weight can be generated based on the load of the cluster. Ultimately, this feature helps to balance the overall utilization of the cluster. It does not increase the performance of any one individual node, rather it makes sure that each node is more evenly used. The result is better performance utilization from the entire cluster.

DNS load balancing also improves the simplicity of maintaining the cluster. Instead of manually deciding which LIFs are used when mounting a particular global namespace, the administrator can let the system dynamically decide which LIF is the most appropriate. And once a LIF is chosen, that LIF may be automatically migrated to a different node to ensure that the network load is remains balanced throughout the cluster.

The -allow-lb-migrate true option will allow the LIF to be migrated based on failover rules to

The -allow-lb-migrate true option will allow the LIF to be migrated based on failover rules to an underutilized port on another head. Pay close attention to the failover rules because an incorrect port may cause a problem. A good practice would be to leave the value false unless you're very certain about your load distribution.

The -lb-weight load option takes the system load into account. CPU, throughput and number of open connections are measured when determining load. These currently cannot be changed.

The -lb-weight 1

..

100

value for the LIF is like a priority. If you assign a value of 1 to LIF1, and a value of 10 to LIF2,

LIF1 will be returned 10 times more often than LIF2. An equal numeric value will round robin each LIF to the client.

This would be equivalent to DNS Load Balancing on a traditional DNS Server.

The weights of the LIFs are calculated on the basis of CPU utilization and throughput (Average

The weights of the LIFs are calculated on the basis of CPU utilization and throughput (Average of both is taken)

  • 1. LIF_weight_CPU = ((Max CPU on node - used CPU on node)/(number of LIFs on node) * 100

  • 2. LIF_weight_throughput = (Max throughput on port - used throughput on port)/(number of LIFs on port) * 100

The more the weight , lesser is the probability of returning a LIF associated.

© 2011 NetApp. All rights reserved. 38
© 2011 NetApp. All rights reserved. 39
Please refer to your Exercise Guide for more instruction. © 2011 NetApp. All rights reserved. 40

Please refer to your Exercise Guide for more instruction.

© 2011 NetApp. All rights reserved. 41
© 2011 NetApp. All rights reserved. 1
© 2011 NetApp. All rights reserved. 2
NFS is the standard network file system protocol for UNIX clients, while CIFS is the standard

NFS is the standard network file system protocol for UNIX clients, while CIFS is the standard network file system for Windows clients. Macintosh® clients can use either NFS or CIFS.

The terminology is slightly different between the two protocols. NFS servers are said to “export” their data, and the NFS clients “mount” the exports. CIFS servers are said to “share” their data, and the CIFS clients are said to “use” or “map” the shares.

•NFS is the de facto standard for UNIX and Linux, CIFS is the standard for Windows

•NFS is the de facto standard for UNIX and Linux, CIFS is the standard for Windows •N-blade does the protocol “translation” between {NFS and CIFS} and SpinNP •NFS and CIFS have a virtual server scope (so, there can be multiples of each “running” in a cluster)

NFS is a licensed protocol, and is enabled per vserver by creating an NFS server associated

NFS is a licensed protocol, and is enabled per vserver by creating an NFS server associated with the vserver.

Similarly CIFS is a licensed protocol, and is enabled per vserver by creating a CIFS server associated with the vserver

© 2011 NetApp. All rights reserved. 6
© 2011 NetApp. All rights reserved. 7
© 2011 NetApp. All rights reserved. 8
The name-service switch is assigned at a virtual server level and, thus, Network Information Service (NIS)

The name-service switch is assigned at a virtual server level and, thus, Network Information Service (NIS) and Lightweight Directory Access Protocol (LDAP) domain configurations are likewise associated at the virtual server level.

A note about virtual servers--although a number of virtual servers can be created within a cluster, with each one containing its own set of volumes, vifs, NFS, and CIFS configurations (among other things), most customers only use one virtual server. This provides for the most flexibility, as virtual servers cannot, for example, share volumes.

© 2011 NetApp. All rights reserved.

© 2011 NetApp. All rights reserved. 10
The Kerberos realm is not created within a Data ONTAP cl uster. It must already exist,

The Kerberos realm is not created within a Data ONTAP cluster. It must already exist, and then configurations can be created to associate the realm for use within the cluster.

Multiple configurations can be created. Each of those configurations must use a unique Kerberos realm.

The NIS domain is not created within a Data ONTAP clus ter. It must already exist,

The NIS domain is not created within a Data ONTAP cluster. It must already exist, and then configurations can be created to associate the domain with cluster vservers within Data ONTAP 8.0.

Multiple configurations can be created within a vserver and for multiple vservers. Any or all of those configurations can use the same NIS domain or different ones. Only one NIS domain configuration can be active for a vserver at one time.

Multiple NIS servers can be specified for an NIS domain configuration when it is created, or additional servers can be added to it later.

The LDAP domain is not created within a Data ONTAP cluste r. It must already exist,

The LDAP domain is not created within a Data ONTAP cluster. It must already exist, and then configurations can be created to associate the domain with cluster vservers within Data ONTAP 8.0.

LDAP can be used for netgroup and UID/GID lookups in environments where it is implemented .

Multiple configurations can be created within a vserver and for multiple vservers. Any or all of those configurations can use the same LDAP domain or different ones. Only one LDAP domain configuration can be active for a vserver at one time.

© 2011 NetApp. All rights reserved. 14
© 2011 NetApp. All rights reserved. 15
© 2011 NetApp. All rights reserved. 16
Each volume will have an export policy associated with it. Each policy can have rules that

Each volume will have an export policy associated with it. Each policy can have rules that govern the access to the volume based on criteria such as a client’s IP address or network, the protocol used (NFS, NFSv2, NFSv3, CIFS, any), and many other things. By default, there is an export policy called default that contains no rules.

Each export policy is associated with one cluster vserver. An export policy name need only be unique within a vserver. When a vserver is created, the default export policy is created for it.

Changing the export rules within an export policy changes the access for every volume using that export policy. Be careful.

© 2011 NetApp. All rights reserved. 18
Export Policies control the clients that can access the NAS data in a Vserver. It is

Export Policies control the clients that can access the NAS data in a Vserver. It is applicable to both CIFS and NFS access. Each export policy consists of a set of export rules that define mapping of client, its permission and the access protocol [CIFS, NFS]. Export policies are associated with volumes which by virtue of being associated to the namespace by a junction controls the access to the data in the volume.

© 2011 NetApp. All rights reserved. 20
Export policies serve as access controls for the volumes. During configuration and testing, a permissive export

Export policies serve as access controls for the volumes. During configuration and testing, a permissive export policy should be implemented, and tightened up prior to production by adding additional export policies and rules to limit access as desired.

If you’re familiar with NFS in Data ONTAP 7G (or on UNIX NFS servers), then you’ll

If you’re familiar with NFS in Data ONTAP 7G (or on UNIX NFS servers), then you’ll wonder about how things are tagged to be exported. In Data ONTAP clusters, all volumes are exported as long as they’re mounted (through junctions) into the namespace of their cluster vservers. The volume and export information is kept in the Management RDB unit so there is no /etc/exports file. This data in the RDB is persistent across reboots and, as such, there are no temporary exports.

The vserver root volume is exported and, because all the other volumes for that vserver are mounted within the namespace of the vserver, there is no need to export anything else. After the NFS client does a mount of the namespace, the client has NFS access to every volume in this namespace. NFS mounts can also be done for specific volumes other than the root volume, but then the client is limited to only being able to see this volume and its “descendant” volumes in the namespace hierarchy.

Exporting a non-volume directory within a volume is permitted but not recommended. NetApp® recommends that a separate volume be set up for that directory, followed by an NFS mount of that volume.

If a volume is created without being mounted into the namespace, or if it gets unmounted, it is not visible within the namespace.

Please refer to your Exercise Guide for more instructions. © 2011 NetApp. All rights reserved. 23

Please refer to your Exercise Guide for more instructions.

© 2011 NetApp. All rights reserved. 24
© 2011 NetApp. All rights reserved. 25
© 2011 NetApp. All rights reserved. 26
© 2011 NetApp. All rights reserved. 27