You are on page 1of 3

KB Article

3035

Updated
8/11/2015

Category
Usage

Affected versions
NSS 9.0:
NSS 9.5:

How to:

Troubleshoot EMC Celerra/VNX Integration


Summary
The purpose of this document is to serve as an information bank for EMC-related problems. It will cover the most common problems and the
recommended steps and tools needed in order to solve them.
The document is divided into three parts:
Quotas not locking
Quotas not updating
Error 1450

Quotas Not Locking


If the EMC quotas aren't locking, the first thing to look for is if the NSS server receives heartbeats fromthe CEPA-server. A failure to
receive heartbeats will result in quotas not locking. There are three main ways to check for heartbeat errors:
1. In the Notice field in Quota Server.
2. In the System\History-tab in Quota Server.
3. In Application Event log in Windows.
This is the most common message if no heartbeats are received:
"Failed to receive heartbeats from EMC CEPA. Locking on EMC quotas will not be fully operational. Please check if it is installed and
configured."
The three most common reasons behind the failure to receive heartbeats:
1. Failed communication fromthe CEPA-server, i.e. stopped/crashed EMC CAVA-service.
2. Missing or incorrectly configured endpoint at HKEY_LOCAL_MACHINE\SOFTWARE\EMC\Celerra Event
Enabler\CEPP\CQM\Configuration.
3. Endpoint still claimed after a NSS Quota Server service stop.
It can take several seconds for the QS Service instance to de-register as the end-point for CEPA, this is carried out during service
shutdown. If the service is restarted before de-registration is completed, then its attempt to connect to CEPA is refused (as the end-point
is still claimed) so no heart-beats to QS. To reset the connection it is necessary to restart the QS Service ensuring that the original
shutdown has actually completed.
Note: Although it can appear in the service manager that services are stopped, it is more reliable to monitor the process in task manager
as applications can send completed messages before actions are actually completed (QS is sometimes guilty of this).
Endpoint check:
Make sure that the endpoint is correctly configured on the CEPA-server(s). If NSS and CEPA runs on the same machine, the endpoint can be
set to only Northern. If the CEPA-server is external it needs to point to the direction (IP address) where all the information should be sent
to - in this case the NSS server(s).
A single NSS server: Northern@<IP>
Multiple NSS servers: Northern@<IP1>;Northern@<IP2> etc.
Disregard the brackets when setting the endpoint. It should look like this: Northern@xx.xx.xx.xx
If you are receiving heartbeats the problemis either:
1. Account & permissions-related.
2. A communication problembetween CEPP and CEPA.
1 a) EMC CAVA Service running with the wrong account
No errors are displayed in this case, which makes it difficult to troubleshoot. The only obvious symptomis that the files cannot be
blocked. Make sure that the EMC CAVA service runs with an account that has administrative rights on the CIFS servers managed by
Quota Server.
1 b) EMC CAVA Service running with the SYSTEM account
In case the CQM application is co-resident (i.e. NSS services and CEPA server are the same host), the EMC CAVA can run with the
Local Systemaccount. However, this configuration is strongly not recommended. The Local Systemaccount can be easily affected by
security policies forced on the server, preventing connection fromthe network, for example.
1 c) NSS Services running with the wrong account
The NSS Quota Server service account should belong at least to both Backup Operators and Power Users groups in the VNX /
Celerra CIFS server. If not, quotas may not be locked, without any errors logged in the NSS trace files or in the Data Mover log files.
2. Communication problembetween CEPP and CEPA

Enter the EMC Control Station and make a CEPA pool check. This will provide a status report of the CEPA pool. If there are any problems
with the communication between CEPP and CEPA, it will be displayed in the pool information. This is the command for a pool check:
$ server_cepp server name -pool -info
The command will produce a result similar to this:
server name :
pool_name = Northern
server_required = No
access_checks_ignored = 0
req_timeout = 5000ms
retry_timeout = 1000ms
pre_events = OpenFileWrite, CreateFile, RenameFile, DeleteFile, CloseModified, CreateDir, RenameDir, DeleteDir, SetAclFile
post_events =
post_err_events =
CEPP Servers:
IP = xx.xx.xx.xx, state = ONLINE, rpc = MS-RPC over SMB, cava version = 6.0.4.0, nt status = SUCCESS, server name = server.domain.com
If there are any problems on this end they will be featured in the bottomrow. Check the 'state' and the 'status'.
Common state errors:
ERROR_CEPP_NOT_FOUND - Insufficient account permissions.
OFFLINE - NSS Quota Server not running or not registered as a CQM application.
Common status errors:
OBJECT_NAME_NOT_FOUND - CEPP is unable to communicate to EMC CAVA-service on the CEPA-server.
CONNECTION_DISCONNECTED - Connection rejected. Possibly by closed ports, a firewall or insufficient account permissions. This
error could occur if the cepp.conf-file is pointing to the wrong server (e.g. to a server that does not have the EMC CEE Framework
installed).
INVALID_PARAMETER - Account problems of a more complex nature. The MS RPC account is incorrectly mapped and configured in the
domain.
If the problemshould persist on this end (CEPP & CEPA), you need to contact EMC support in order to receive further assistance.

Quotas not Updating


Disabled CIFS Notifications
The most common reason behind quotas not updating synchronously on EMC is the absence of CIFS notifications. NSS 8.x, 9.0 and 9.5 relies
on CIFS notifications in order to update quotas. No CIFS notifications means no usage level update.
A quick way to verify that the server receives CIFS notifications is to enter the trace file named ncl_trace_qsserver_statistics.txt and
search for the term"CIFS notifications". How big is this number? If it's zero it means that no CIFS notifications are received. If it's larger
than zero, how big is it? Does the number change over time or does it remain unchanged? Does the number of CIFS notifications really
match the size and activity of the environment?
One way to see if the number of CIFS notifications is correct is to compare it with the number of CheckEvents in the previously mentioned
statistics log. These two numbers should be fairly close to each other. If the difference is large it's usually a sign of that CIFS notifications
are turned off for a majority of the CIFS servers.
CIFS notifications need to be enabled for ALL CIFS servers used. The setting responsible for this is called 'notifyonwrite' and it's disabled
by default.
This command enables CIFS notifications on the CIFS server:
$ server_mount server_2 -option notifyonwrite ufs1 /ufs1 (where ufs1 is a fileserver name)

Consult with your EMC technical account manager if you are unsure of the implications of enabling CIFS notifications in your environment.
Empty CIFS Notifications
Another common reason behind quotas not updating is empty CIFS notifications. An empty CIFS Notification is a notification that one or
several changes have occurred within the file system, but the CIFS server is unable to deliver a complete message of these changes due to
an overflowed command buffer. An empty notification can be likened to an error message "changes occurred in a share, but no details can
be provided". NSS responds to this error by re-scanning the quota path, or the entire share where multiple quotas are configured on the
share, in order to calculate current usage levels.
An abnormal rate of Empty Notifications could potentially lead to a state of constant rescanning. In this scenario, the file change
notifications will get stuck in the scan queue and a significant delay in processing can be witnessed. In a worst case scenario, this could
continuously and negatively affect major Quota Server features such as quota locking.
Read more about Empty CIFS notifications here.

Error 1450
For versions 9.5 or earlier, this is a problemthat shows up as Error 1450 in Windows Application Event Log. Error 1450 means that
"Insufficient system resources exist to complete the requested service". The error message refers to a resource exhaustion on the EMC
CIFS server. All available CIFS/SMB-threads on the CIFS server are consumed.
Due to the insufficient resources on the CIFS server, Quota Server will not be able to performoperations on the target storage and
process quota usage level updates. This could potentially cause serious harmto the Quota Sever functionality (i.e. quotas not updating,
miscalculating quotas and failed locking).

Illustration:

Description: Failed to queue for notification on drive root: \\device\fs1$ Error:1450.


The entries of Error 1450 in the Windows Application Event log can be matched to a specific message in the EMC Command Station:
2013-09-26 09:26:36: VC: 3:[vdm_002v] Too many access from CAVA server xx.xx.xx.xx:
2013-09-26 09:26:36: VC: 3:[vdm_002v] without the EMC VirusChecking privilege:
The IP address mentioned in this message is the IP address of the NSS server (and the CEPA/CAVA-server if everything runs on the same
machine). Through cooperation with EMC engineers, it has been discovered that the combination of these two error messages is a safe
indicator that all available CIFS/SMB threads are consumed at the time the error is reported. The error messages are printed out as soon
as NSS tries to spawn a thread to performa required action, but is denied by the EMC CIFS server.
EMC's default maximumnumber of threads, in both EMC Celerra and EMC VNX OE for File environments, is 256 for systems with more than
1GB of memory. In a highly active environment this can become a bottleneck. It is possible to increase the number of threads by making
alterations to a specific EMC parameter. Northern's experience shows that the resource exhaustion can be greatly mitigated (or in some
cases even resolved) by increasing the number of maximumthreads available.
IMPORTANT:
EMC customers should always consult with EMC technical personnel to get expert advice on the effect that a change of this
setting may have on the EMC Datamover and the specific environment in question. This is an EMC setting within EMC technology,
Northern is providing this information to assist customers in investigating, together with EMC personnel, what is the most appropriate
action to resolve resource exhaustion. Northern makes no claimas to the applicability of these settings in a specific customer's
environment, and shall not be held responsible for any ill effect in the use of these settings.
How to increase the number of threads:
$ server_setup server_X -P cifs -o start=XXX (Where XXX decides the number of available CIFS threads. Default is 256)
The following is a more detailed explanation fromEMC's document Configuring and Managing CIFS on VNX (P/N 300-013-429 Rev 02, page
65):

Please note once again that EMC personnel must be consulted prior to changing this parameter!
Other considerations:
Error 1450 is directly linked to the amount of activity that NSS must monitor; a combination of systemactivity and the scope of quota
policies configured. As such, and if the number of available threads cannot be successfully extended, it may be possible to look at these two
parameters: reducing the rate of activity on the device, reducing the scope of the quota policies.
NSS subscribes to receive notification of file systemchanges. When a change notification is received NSS scans the individual folder where
the change occurred in order to establish the new quota usage level. These operations (notification and scan) require systemresources. As
such it is always wise to review quota policies and ensure no unnecessary quotas are configured. Additionally, it may be possible to reduce
the number of quotas configured, to prioritize specific file shares - avoiding high-level 'general monitoring' quotas (this monitoring can be
achieved with NSS' reporting capabilities). Note that hard and soft quotas require the same level of access to CIFS threads in order to
performmonitoring operations.
Northern has seen excessive load being generated by the constant writing of temporary internet files to remote storage devices in Virtual
Desktop environments. Non business-related streaming media has been seen to generate huge amounts of traffic to remote Internet
Explorer temporary file caches, tying up resources and destroying systemperformance. This is a possible opportunity to avoid resource
exhaustion.
For advanced troubleshooting, please contact the Technical Support teamat Northern (support@northern.net).

ADDITIONAL RESOURCES
KB2884 How to: Configure EMC & NSS
KB1785 About: Handling of Empty CIFS Notifications in NSS

You might also like