Results 1 to 1 of 1
Sup everybody. I am building a system which has 3 Fedora boxes along with some other networked hardware. It will be deployed to a remote location where there will be ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 12-19-2007 #1
Power failure management
I am building a system which has 3 Fedora boxes along with some other networked hardware. It will be deployed to a remote location where there will be frequent power failures. I presently have two APC UPSs with network cards and two networked power strips. Here's the issue:
If there's a power failure, it's easy to automatically shutdown the servers using apcupsd (or any other similar program). After the shutdown and power is physically restored, the systems boot. BUT what if there is a power failure, the systems shut themselves down, and power returns before the outlets the servers are plugged into are turned off? The servers would remain halted even though power was restored.
I'm interested in whether anybody has any thoughts on how to handle such a situation.
I have two ideas:
1) Use a very simple computer (PC104 or something like that) which can tolerate non-graceful shutdowns to control the networked power strips. That computer would be "on" if there's no power failure and "off" if there is a failure. It would never shut itself down and only manage the bigger servers with writable file systems.
2) Rather than shut the servers down when the UPSs switch to battery, take them to some state where file systems are cleanly unmounted but the server is able to stay alive until either power is restored or power is physically removed (when the UPSs drain). Since their would be no disk access at this point, a sudden halt would not damage the file systems and the computer would boot when power is restored with no memory of the event. And if power were to be restored before the batteries drained, the systems could restore themselves.
Does anybody have any recommendations?
PS: I wasn't sure what category to put this in, but since all detection and communication in would be over the network, I figured it was a networking problem.