You are on page 1of 4

O Portal do Cliente estar indisponvel em 06 de setembro para manuteno programada. Mais detalhes (https://access.redhat.

com/site/announcements/1171973)X
.

What does log message "entering GATHER state" mean in Red Hat
High Availability Add-on?
( Updated 20/05/2014

0 Issue
In the event of a cluster membership change, the cluster enters into a GATHER state. The logs will report messages similar to the following:

Dec 7 06:30:08 hostX openais[5555]: [TOTEM] entering GATHER state from 9.


Dec 7 06:30:10 hostX openais[5555]: [TOTEM] entering GATHER state from 0.

What does this messages mean in Red Hat High Availability Cluster?

Environment
Red Hat Enterprise Linux Server(RHEL) 5 with High Availability or Resilient Storage Add-on
Red Hat Enterprise Linux Server(RHEL) 6 with High Availability or Resilient Storage Add-on

, Resolution
When nodes in a cluster enter the GATHER state, they send join messages out to rest of the cluster in order to form a consensus about the cluster
membership. These messages can be interpreted as follows:

0: Consensus timeout expired


The consensus timer expired. This timer is set on entry to GATHER state and is reset when COMMMIT state is entered.
It means the nodes took too long to agree on the membership list.

2: Token timeout in OPERATIONAL (normal) state

3: Token timeout in GATHER state

4: Token timeout in COMMIT state

5: Token timeout in RECOVERY state

NOTE: These states are all related. The Token timer is set when the token is transmitted and if it expires
before another message is received it will trigger one of these messages, depending on the state of
the protocol at the time.

6: Token failed to receive (ARU count > fail_to_recv_const)


We failed to receive a copy of our own token.
This will always be accompanied by a "FAILED TO RECEIVE" message.

7: mcast (data) message received from unknown node while in OPERATIONAL state

8: mcast (data) message received from unknown node while in GATHER state
Self-explanatory I think. This can be caused by a brief network split where
a node is forced to leave the cluster but doesn't get fenced before the network
heals again.

9: Merge detection message received while OPERATIONAL


When nodes are missing from the membership and there are no naturally-occurring multicast messages
being sent, the messaging layer will send a periodic merge-detection message to see if any other
partitions are operating without being part of this configuration. This usually just means there
are nodes missing, but doesn't otherwise signify a problem.

10: Merge detected in GATHER


As above but while the cluster was already in transition from another node joining or leaving.
11: JOIN received while OPERATIONAL

12: JOIN received while in GATHER

13: JOIN received while in COMMIT

14: JOIN received while in RECOVERY


A JOIN message is sent by a node if GATHER times out, to bring
a new node into the cluster. These logs indicate
receipt of one of these messages in GATHER or COMMIT state.

15: Interface changed state


Often seen at startup, but can happen if an interface is taken down unexpectedly

Root Cause
The GATHER state message is normally caused by a network/communication issue within the cluster. But GATHER states can be entered for a number of
reasons. The number at the end of the message (from X) indicates why it entered the GATHER state. This is called by
"message_handler_memb_merge_detect" when the cluster is attempting to see if there are other nodes are out on the network.

GATHER state happens every time a node receives its own token back (meaning its the only node in the ring). During this time, it starts a timer to form
and agree on a membership list of nodes in the cluster. If this timer expires, we enter the GATHER state to see if there is another node out there, and
attempt to merge with it. After a certain number of times after the node receives its our own token back, it will stop sending it. In which case, these
state changes will also stop. Therefore, they are a side effect of the earlier communication problem and subsequent fencing that left this node alone
in the cluster.

Product(s) Red Hat Enterprise Linux Component cluster cman openais Category Learn more

Tags cluster cluster ha high availability cluster_suite high availalility add-on syslog

Comments
Copyright 2014 Red Hat, Inc.

You might also like