Find the answer to your Linux question:
Results 1 to 6 of 6
Hi, I've got a really strange Problem that i hardly come to grips with, but let me describe my overall situation first: I've got a couple of hosts. Some are ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    May 2012
    Posts
    3

    Unhappy some hosts are unreachable until i add / remove a host route for them


    Hi,
    I've got a really strange Problem that i hardly come to grips with, but let me describe my overall situation first:

    I've got a couple of hosts. Some are physical and some are virtual machines running e.g. on Linux-KVM or VirtualBox based servers, etc. Those hosts are spreaded over two networks, let's call them n1 and n2, wich are connected via OpenVPN (with pretty often breaks down, but that's another story). To be honest I doubt (not to say I'm 98% sure) that the OpenVPN and virtualization (and related bridging) stuff could really be the cause...
    • n1 is part of my universities subnet and thus it has public IP-addresses (141.13.*.*). The (dedicated) OpenVPN "gateway server" (call it 'U') is running inside a virtual machine. There's also a separate virtual machine for the firewall/routing/dns/dhcp services in that network.
    • n2 is a private network and uses private IP-addresses (192.168.*.*). The routing and OpenVPN services are also accomplished by a virtual machine in this network (call it 'H').
    • Routes are set on each network (i.e. on the routers / VPN-servers) so that every host from n1 can reach every host from n2 and vice versa. Everything works fine ... nearly fine...


    Problem:
    (Assumption:
    • A, B and C are hosts from n1,
    • X, Y and Z are hosts within n2.)


    Sometimes, not to say quite regularly, it occurs that Z is able to reach (e.g. ping, ssh, etc.) A and B but for C no connection can be established, i.e. it times out. It's interesting to see that for example X or Y can indeed connect/ping C. So this is probably not related to any of the routers or VPN-servers involved. Even more interesting is the fact that running
    Code:
    route add -host C gw H; route del -host C
    on Z (but leaving all other hosts untouched) instantly enables communication from Z to C i.e. it gets ping/ssh/etc working. (I found that phenomenon by pure accident)

    I doesn't really matter whether the connection source or target is a physical or a virtual host. Maybe that's just in my mind but I've got a feeling that this mainly happens after the OpenVPN connection was down and I had to restart it (manually), so that n1 was unreachable for a while. I looks quite like if Z (i.e. a service running on it) was trying to connect to C within that period of time and has been -- of course -- unsuccessful.
    Is there any feature or bug in the kernel which makes it remember such unsuccessful connection requests? It quite looks that way, at least...

    (The system on Z is Debian Wheezy running on a 3.2.0-2-amd64 kernel.)

    This problem totally drives me crazy. Is there anyone who could shed some light into it? What can I do to get rid of it?

    Thanks in advance!
    tom

  2. #2
    Linux Engineer Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    773
    Hi

    Are you certain that Z knows that the route to n1 goes through the VPN? I.e. does OpenVPN add the correct kernel routes? Did you check if the ARP table became corrupted?

    It occurred to me that sometimes a PPPoE module did not set up the correct default route so that suddenly the internet connection was halfway down - but I'm talking here about an "old" 2.4 kernel. Things should have improved since back then.

    Cheers

  3. #3
    Just Joined!
    Join Date
    May 2012
    Posts
    3
    Quote Originally Posted by Kloschüssel View Post
    Are you certain that Z knows that the route to n1 goes through the VPN? I.e. does OpenVPN add the correct kernel routes? Did you check if the ARP table became corrupted?
    first of all thanks a lot for your answer!

    yes, I'm 100% sure because:
    • the VPN server and the router are the very same machine ('H'). the default route of every host in the n2 network (including 'Z') is set to H.
    • Z can connect/ping/whatever A and B which are only reachable via 'H' (that is via VPN)
    • why would adding and immediately REMOVING a host route for 'C' (via 'H') to 'Z' help then?


    Checking the ARP table was the first thing that I've tried - to no avail.
    Code:
    arp -an
    only prints MAC/IP mappings for the n2 network just as expected.

    Maybe I should mention that I use a bridge in Z (because sometimes I need to run some virtual-machine experiments on it)
    Code:
    # route -n
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    0.0.0.0         192.168.10.3    0.0.0.0         UG    0      0        0 br0
    192.168.10.0    0.0.0.0         255.255.255.0   U     0      0        0 br0
    Code:
    # brctl show
    bridge name     bridge id               STP enabled     interfaces
    br0             8000.e0cb4ec04125       no              eth0
    Code:
    # ifconfig
    br0       Link encap:Ethernet  HWaddr e0:cb:4e:c0:41:25  
              inet addr:192.168.10.142  Bcast:192.168.10.255  Mask:255.255.255.0
              inet6 addr: fe80::e2cb:4eff:fec0:4125/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1492  Metric:1
              RX packets:955521 errors:0 dropped:0 overruns:0 frame:0
              TX packets:1824734 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:878250684 (837.5 MiB)  TX bytes:1956786222 (1.8 GiB)
    
    eth0      Link encap:Ethernet  HWaddr e0:cb:4e:c0:41:25  
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                
              RX packets:1212629 errors:0 dropped:446 overruns:0 frame:0                                                                                        
              TX packets:1824518 errors:0 dropped:0 overruns:0 carrier:0                                                                                        
              collisions:0 txqueuelen:1000 
              RX bytes:908980236 (866.8 MiB)  TX bytes:1956767726 (1.8 GiB)
              Interrupt:45 Base address:0x2000
    
    [...]
    Code:
    # arp -an
    ? (192.168.10.40) at 52:54:00:c2:2c:93 [ether] on br0
    ? (192.168.10.3) at 52:54:00:30:e9:4d [ether] on br0
    ? (192.168.10.5) at 00:13:46:54:9a:83 [ether] on br0
    ? (192.168.10.4) at 52:54:00:82:ba:a9 [ether] on br0
    ? (192.168.10.6) at 52:54:00:66:d8:07 [ether] on br0
    (192.168.10.3 is the IP of 'H' of course)

  4. #4
    Linux Engineer Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    773
    So when the is corrupted and you do:

    Code:
    # route -n
    # arp -an
    # route add -host C gw H; route del -host C
    # route -n
    # arp -an
    the output of route and arp are the same before/after route add/del?

  5. #5
    Just Joined!
    Join Date
    May 2012
    Posts
    3
    yes, they're exactly the same

  6. #6
    Linux Engineer Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    773
    Have you checked the firewall, selinux, ..?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •