Multi-wan load balancing/failover woes (not out to the internet)
I'm on a big time crunch for work to get this new network set up (as I'm sure a lot of people posting here are), it consists of one fiber network connecting 5 lans (each lan can talk to the other accross the fiber) together, one wireless point to point connection connecting 2 of those lans (more of these in the future).
I've set up a lab to test routing out before I implement it accross all the gateways which is essentially this:
Gateway A - eth0 (10.1.1.1/16) Internal network
- eth1 (192.168.1.1/29) Fiber network
- eth2 (172.16.1.1/30) Wireless p2p link
Gateway B - eth0 (10.2.1.1/16) Internal network
- eth1 (192.168.1.2/29) Fiber network
- eth2 (172.16.1.2/30) wireless p2p link
Gateway C - eth0 (10.5.1.1/16) Internal network
- eth1 (192.168.1.5/29) Fiber network
I have all three gateways plugged into one switch emulating the fiber network, one for wireless, and then the local lans on their own switch.
I want to end up with load balancing with failover running accross the board. If the fiber connection at either Gateway A or Gateway B goes down it needs to route over the wireless p2p connection and vise versa, for example:
Gateway A Fiber goes down (eth1), and it is trying to connect to Gateway C (10.5.1.1) which is only on the fiber network, it needs to route over the wireless connection (eth2) to Gateway B and then from there route over the fiber connection (192.168.1.2) to it's destination Gateway C
I've spent much time googling and researching, I attempted to setup quagga using ospf, which worked to an extent, but once my link went down, and then came back up, ospf didn't always re-create the route, so I put this on hold.
I then found the nano-howto.txt file which seemed very promising, only I got several errors applying the patches (despite many attempts to fix, some files it wants to patch I just don't have), and need to recompile my kernel to disable the IP_MULTIPATH_ROUTE_CACHE, which I've no clue how to do (time crunch remember). It did work to an extent with a multipath route, it's just that half the time it's trying to use the dead connection, and the other time it's working correctly, I couldn't find any way of setting up failover with multipath routing without the patches suggested in the nano-howto.txt
I just recently tried creataing two routing tables on a gateway, each one with one route, one via wireless, and one via fiber, and set the priority on one slightly above the other, in the hopes that when one connection failed, it would pickup on the other route once the first one timed out, but this didn't seem to work either.
I'm by no means an expert in linux or networking, I wouldn't even say an experienced user, so if I've missed details that you need, please just let me know and I'll post ASAP, I'm just out of ideas of what to try next.
Also, an alternative to full load balancing is sending high priority traffic over one link, and everything else over the other, but then when one fails, it all needs to be shifted over to the working link fairly quickly and automatically.
I'm running CentOS release 5.2 with a 2.6.18-92.1.10.el5 kernel
Any suggestions are greatly appreciated.
Also.. about the only thing I haven't tried that I can think of is bonding the two interfaces/connections (Fiber connection going to the fiber network, and the wireless p2p link). I'm not sure if this would even work, but if it did, then I could setup traffic shaping on the bonded interface and see how it runs. If I could comments on this that would be great as well.