What Does Log Message - Entering GATHER State - Mean in Red Hat High Availability Add-On - Red Hat Customer Portal

O Portal do Cliente estar indisponvel em 06 de setembro para manuteno programada. Mais detalhes (https://access.redhat.
com/site/announcements/1171973)X
.
What does log message "entering GATHER state" mean in Red Hat
High Availability Add-on?
( Updated 20/05/2014
0 Issue
In the event of a cluster membership change, the cluster enters into a GATHER state. The logs will report messages similar to the following:
Dec 7 06:30:08 hostX openais[5555]: [TOTEM] entering GATHER state from 9.

Dec 7 06:30:10 hostX openais[5555]: [TOTEM] entering GATHER state from 0.
What does this messages mean in Red Hat High Availability Cluster?
Environment
Red Hat Enterprise Linux Server(RHEL) 5 with High Availability or Resilient Storage Add-on
Red Hat Enterprise Linux Server(RHEL) 6 with High Availability or Resilient Storage Add-on
, Resolution
When nodes in a cluster enter the GATHER state, they send join messages out to rest of the cluster in order to form a consensus about the cluster
membership. These messages can be interpreted as follows:
0: Consensus timeout expired

The consensus timer expired. This timer is set on entry to GATHER state and is reset when COMMMIT state is entered.
It means the nodes took too long to agree on the membership list.
2: Token timeout in OPERATIONAL (normal) state
3: Token timeout in GATHER state
4: Token timeout in COMMIT state
5: Token timeout in RECOVERY state
NOTE: These states are all related. The Token timer is set when the token is transmitted and if it expires
before another message is received it will trigger one of these messages, depending on the state of
the protocol at the time.
6: Token failed to receive (ARU count > fail_to_recv_const)

We failed to receive a copy of our own token.
This will always be accompanied by a "FAILED TO RECEIVE" message.
7: mcast (data) message received from unknown node while in OPERATIONAL state
8: mcast (data) message received from unknown node while in GATHER state
Self-explanatory I think. This can be caused by a brief network split where
a node is forced to leave the cluster but doesn't get fenced before the network
heals again.
9: Merge detection message received while OPERATIONAL

When nodes are missing from the membership and there are no naturally-occurring multicast messages
being sent, the messaging layer will send a periodic merge-detection message to see if any other
partitions are operating without being part of this configuration. This usually just means there
are nodes missing, but doesn't otherwise signify a problem.
10: Merge detected in GATHER

As above but while the cluster was already in transition from another node joining or leaving.
11: JOIN received while OPERATIONAL
12: JOIN received while in GATHER
13: JOIN received while in COMMIT
14: JOIN received while in RECOVERY

A JOIN message is sent by a node if GATHER times out, to bring
a new node into the cluster. These logs indicate
receipt of one of these messages in GATHER or COMMIT state.
15: Interface changed state

Often seen at startup, but can happen if an interface is taken down unexpectedly
Root Cause
The GATHER state message is normally caused by a network/communication issue within the cluster. But GATHER states can be entered for a number of
reasons. The number at the end of the message (from X) indicates why it entered the GATHER state. This is called by
"message_handler_memb_merge_detect" when the cluster is attempting to see if there are other nodes are out on the network.
GATHER state happens every time a node receives its own token back (meaning its the only node in the ring). During this time, it starts a timer to form
and agree on a membership list of nodes in the cluster. If this timer expires, we enter the GATHER state to see if there is another node out there, and
attempt to merge with it. After a certain number of times after the node receives its our own token back, it will stop sending it. In which case, these
state changes will also stop. Therefore, they are a side effect of the earlier communication problem and subsequent fencing that left this node alone
in the cluster.
Product(s) Red Hat Enterprise Linux Component cluster cman openais Category Learn more
Tags cluster cluster ha high availability cluster_suite high availalility add-on syslog
Comments
Copyright 2014 Red Hat, Inc.

What Does Log Message - Entering GATHER State - Mean in Red Hat High Availability Add-On - Red Hat Customer Portal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Does Log Message - Entering GATHER State - Mean in Red Hat High Availability Add-On - Red Hat Customer Portal

Uploaded by

Copyright:

Available Formats

O Portal do Cliente estar indisponvel em 06 de setembro para manuteno programada. Mais detalhes (https://access.redhat.

Dec 7 06:30:08 hostX openais[5555]: [TOTEM] entering GATHER state from 9.

0: Consensus timeout expired

2: Token timeout in OPERATIONAL (normal) state

3: Token timeout in GATHER state

4: Token timeout in COMMIT state

5: Token timeout in RECOVERY state

6: Token failed to receive (ARU count > fail_to_recv_const)

9: Merge detection message received while OPERATIONAL

10: Merge detected in GATHER

12: JOIN received while in GATHER

13: JOIN received while in COMMIT

14: JOIN received while in RECOVERY

15: Interface changed state

You might also like