I've recently joined a shop whose prior admin set up a 6 node production GFS cluster. There is no quorum disk, and it appears there is only manual fencing.

I've got to power down everything in all the racks to get it moved into a cage, and only discovered this issue in the past 24 hours.

I am seriously concerned that I will be running into the infinite loop problems of trying to achieve a quorate state with all 6 machines playing happily when they all come back online tomorrow morning (a quorum of 4 is configured).

Moreover, the only reason he used the GFS is to not have to make multiple copies of the shared content - logical at first, but everything I read is that manual fencing is super bad. (At least the write activity is almost nil.)

Any help here is appreciated - not having an answer to this could mean a postponement to our work.

Thanks in advance!

-Joe

Cluster.conf is below:


<?xml version="1.0"?>
<cluster config_version="9" name="digitalimages">
<fence_daemon post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="prodweb01-internal" votes="2">
<fence>
<method name="1">
<device name="manfence" nodename="prodweb01-internal"/>
</method>
</fence>
</clusternode>
<clusternode name="prodweb02-internal" votes="2">
<fence>
<method name="1">
<device name="manfence" nodename="prodweb02-internal"/>
</method>
</fence>
</clusternode>
<clusternode name="prodweb03-internal" votes="1">
<fence>
<method name="1">
<device name="manfence" nodename="prodweb03-internal"/>
</method>
</fence>
</clusternode>
<clusternode name="prodweb04-internal" votes="1">
<fence>
<method name="1">
<device name="manfence" nodename="prodweb04-internal"/>
</method>
</fence>
</clusternode>
<clusternode name="prodweb05-internal" votes="1">
<fence>
<method name="1">
<device name="manfence" nodename="prodweb05-internal"/>
</method>
</fence>
</clusternode>
<clusternode name="prodweb06-internal" votes="1">
<fence>
<method name="1">
<device name="manfence" nodename="prodweb06-internal"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="6"/>
<fencedevices>
<fencedevice agent="fence_manual" name="manfence"/>
</fencedevices>
<rm>
<failoverdomains/>
<resources/>
</rm>
</cluster>