Find the answer to your Linux question:
Results 1 to 8 of 8
I am using the Redhat Cluster Suite (luci and ricci) on my centos 5.4. i have 2 nodes in a cluster. I had clustered an apache server. The service is ...
  1. #1
    Just Joined!
    Join Date
    Nov 2009
    Posts
    2

    Redhat Cluster Suite When node power off the service is not migrated

    I am using the Redhat Cluster Suite (luci and ricci) on my centos 5.4. i have 2 nodes in a cluster.
    I had clustered an apache server.
    The service is up end running and i can stop,start and switch on all two node.
    The problem is when i try to simulate a fault for one node.
    For example:

    The apache resource stay on the first cluster node.

    If i power off the first cluster node (not halt or init 0 but take off the eletric power off), the second cluster node not take the resource.
    With the clustat command, the service still running on the first node. But the service is down. The first node is dead.
    Only one the first node is join again the cluster the resource goes up on the second node.

    Is it normaly ??

    For me this it is a big problem

    Best Regards

  2. #2
    Linux Guru
    Join Date
    Nov 2007
    Posts
    1,695

  3. #3
    Just Joined!
    Join Date
    Nov 2009
    Posts
    2

    RedHat Cluster Problem

    I just read all documents, but i did not find a response to my problem.

    Please Help me

    Best Regards

  4. #4
    Just Joined!
    Join Date
    Aug 2005
    Posts
    2
    Quote Originally Posted by mabombo View Post
    I am using the Redhat Cluster Suite (luci and ricci) on my centos 5.4. i have 2 nodes in a cluster.
    I had clustered an apache server.
    The service is up end running and i can stop,start and switch on all two node.
    The problem is when i try to simulate a fault for one node.
    For example:

    The apache resource stay on the first cluster node.

    If i power off the first cluster node (not halt or init 0 but take off the eletric power off), the second cluster node not take the resource.
    With the clustat command, the service still running on the first node. But the service is down. The first node is dead.
    Only one the first node is join again the cluster the resource goes up on the second node.

    Is it normaly ??

    For me this it is a big problem

    Best Regards
    I would like to suggest split-brain for a 2 node situation, but if the 2nd node didn't pick up the down signal, it will be good to look at the logs of the 2nd node, see what it shows.

    Then power up the 1st node, don't connect the network, and see what happens, and the logs too.

    That will be a good first step

  5. #5
    Just Joined!
    Join Date
    Jan 2010
    Posts
    1

    Lightbulb

    I had the same problem...

    What is your fencing solution?
    Does it work when system is down ?

    If it is not, then fencing doesn't work then the other node can't get the service.

  6. #6
    Just Joined!
    Join Date
    Jan 2010
    Location
    Montreal
    Posts
    10
    you can write a script on the second node to ping the first node at x interval. say every 200ms
    if node1 is alive donothing;
    if node1 is dead, resume the services in question on node2.

    Mohammed Al-Mehdar
    Systems & Telecommunications Enginner

  7. #7
    Just Joined!
    Join Date
    Aug 2010
    Posts
    1
    Quote Originally Posted by mabombo View Post
    I am using the Redhat Cluster Suite (luci and ricci) on my centos 5.4. i have 2 nodes in a cluster.
    I had clustered an apache server.
    The service is up end running and i can stop,start and switch on all two node.
    The problem is when i try to simulate a fault for one node.
    For example:

    The apache resource stay on the first cluster node.

    If i power off the first cluster node (not halt or init 0 but take off the eletric power off), the second cluster node not take the resource.
    With the clustat command, the service still running on the first node. But the service is down. The first node is dead.
    Only one the first node is join again the cluster the resource goes up on the second node.

    Is it normaly ??

    For me this it is a big problem

    Best Regards
    Hi,

    Did you find the cause of this behaviour? I have the same problem!

    Thanks in advance
    M

  8. #8
    Just Joined!
    Join Date
    Aug 2010
    Posts
    1
    denicfr has it correct in his post.

    If your cluster is configured such that the fencing device is an integrated power fence (like iLo, Drac, etc) and you completely remove power from the cluster node (which in turn will also remove power from the fence device) then there is no way to actively fence the node and the cluster will wait for fencing confirmation. So this is behaving as expected, to prevent data corruption or split brain from occurring.

    If you want to literally pull the power from a node and have services failover, you have to do it in a controlled manner. Either:
    1. shut down the cluster services gracefully before pulling the power. This will fail over the services prior to the node losing power
    2. run fence_ack_manual after pulling the power. fence_ack_manual is the way of telling the cluster "yes, I have confirmed that this node is dead. override the fencing operation please"

    If you're worried about a node losing power in a non-graceful situation then there are ways to mitigate this:
    1. use redundant power on nodes to prevent the integrated fencing device from losing power
    2. use an external power fence device that has redundant power
    3. use a blade enclosure that has a centralized mgmt interface for controlling the power status of the blades
    4. use a fencing method that is not dependent on the node's power status like SAN fencing or fence_scsi

    Hope this helps.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...