Results 1 to 6 of 6
Enjoy an ad free experience by logging in. Not a member yet? Register.
- Join Date
- May 2012
some hosts are unreachable until i add / remove a host route for them
I've got a really strange Problem that i hardly come to grips with, but let me describe my overall situation first:
I've got a couple of hosts. Some are physical and some are virtual machines running e.g. on Linux-KVM or VirtualBox based servers, etc. Those hosts are spreaded over two networks, let's call them n1 and n2, wich are connected via OpenVPN (with pretty often breaks down, but that's another story). To be honest I doubt (not to say I'm 98% sure) that the OpenVPN and virtualization (and related bridging) stuff could really be the cause...
- n1 is part of my universities subnet and thus it has public IP-addresses (141.13.*.*). The (dedicated) OpenVPN "gateway server" (call it 'U') is running inside a virtual machine. There's also a separate virtual machine for the firewall/routing/dns/dhcp services in that network.
- n2 is a private network and uses private IP-addresses (192.168.*.*). The routing and OpenVPN services are also accomplished by a virtual machine in this network (call it 'H').
- Routes are set on each network (i.e. on the routers / VPN-servers) so that every host from n1 can reach every host from n2 and vice versa. Everything works fine ... nearly fine...
- A, B and C are hosts from n1,
- X, Y and Z are hosts within n2.)
Sometimes, not to say quite regularly, it occurs that Z is able to reach (e.g. ping, ssh, etc.) A and B but for C no connection can be established, i.e. it times out. It's interesting to see that for example X or Y can indeed connect/ping C. So this is probably not related to any of the routers or VPN-servers involved. Even more interesting is the fact that running
route add -host C gw H; route del -host C
I doesn't really matter whether the connection source or target is a physical or a virtual host. Maybe that's just in my mind but I've got a feeling that this mainly happens after the OpenVPN connection was down and I had to restart it (manually), so that n1 was unreachable for a while. I looks quite like if Z (i.e. a service running on it) was trying to connect to C within that period of time and has been -- of course -- unsuccessful.
Is there any feature or bug in the kernel which makes it remember such unsuccessful connection requests? It quite looks that way, at least...
(The system on Z is Debian Wheezy running on a 3.2.0-2-amd64 kernel.)
This problem totally drives me crazy. Is there anyone who could shed some light into it? What can I do to get rid of it?
Thanks in advance!
Are you certain that Z knows that the route to n1 goes through the VPN? I.e. does OpenVPN add the correct kernel routes? Did you check if the ARP table became corrupted?
It occurred to me that sometimes a PPPoE module did not set up the correct default route so that suddenly the internet connection was halfway down - but I'm talking here about an "old" 2.4 kernel. Things should have improved since back then.
- Join Date
- May 2012
yes, I'm 100% sure because:
- the VPN server and the router are the very same machine ('H'). the default route of every host in the n2 network (including 'Z') is set to H.
- Z can connect/ping/whatever A and B which are only reachable via 'H' (that is via VPN)
- why would adding and immediately REMOVING a host route for 'C' (via 'H') to 'Z' help then?
Checking the ARP table was the first thing that I've tried - to no avail.Code:
Maybe I should mention that I use a bridge in Z (because sometimes I need to run some virtual-machine experiments on it)
# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.10.3 0.0.0.0 UG 0 0 0 br0 192.168.10.0 0.0.0.0 255.255.255.0 U 0 0 0 br0Code:
# brctl show bridge name bridge id STP enabled interfaces br0 8000.e0cb4ec04125 no eth0Code:
# ifconfig br0 Link encap:Ethernet HWaddr e0:cb:4e:c0:41:25 inet addr:192.168.10.142 Bcast:192.168.10.255 Mask:255.255.255.0 inet6 addr: fe80::e2cb:4eff:fec0:4125/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1492 Metric:1 RX packets:955521 errors:0 dropped:0 overruns:0 frame:0 TX packets:1824734 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:878250684 (837.5 MiB) TX bytes:1956786222 (1.8 GiB) eth0 Link encap:Ethernet HWaddr e0:cb:4e:c0:41:25 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1212629 errors:0 dropped:446 overruns:0 frame:0 TX packets:1824518 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:908980236 (866.8 MiB) TX bytes:1956767726 (1.8 GiB) Interrupt:45 Base address:0x2000 [...]Code:
# arp -an ? (192.168.10.40) at 52:54:00:c2:2c:93 [ether] on br0 ? (192.168.10.3) at 52:54:00:30:e9:4d [ether] on br0 ? (192.168.10.5) at 00:13:46:54:9a:83 [ether] on br0 ? (192.168.10.4) at 52:54:00:82:ba:a9 [ether] on br0 ? (192.168.10.6) at 52:54:00:66:d8:07 [ether] on br0
So when the is corrupted and you do:
# route -n # arp -an # route add -host C gw H; route del -host C # route -n # arp -an
- Join Date
- May 2012
yes, they're exactly the same
Have you checked the firewall, selinux, ..?