Find the answer to your Linux question:
Results 1 to 10 of 10
Dear All, I've come here as a last resort (while I've still got some hair left) - I've been looking high and low for a solution to this problem, but ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Dec 2008
    Posts
    6

    Network Hangs - Leopard/Debian


    Dear All,

    I've come here as a last resort (while I've still got some hair left) - I've been looking high and low for a solution to this problem, but there are simply too many possible sources for me to comb through - I'm just above newbie level.

    What happens is that I periodically lose my connection to my debian (amd64 Etch) servers that I am using for NFS/(mac)FUSE mounts. The problem persists across both type of mounts. My setup: an OS X "head" server controlling (through 1000T ethernet) four Debian Etch sub-servers, each with their own SATA storage space.

    After much testing and troubleshooting I've managed to eliminate our switches from the problem - and in doing this I found that, in fact, data transfer has nothing to do with the problem. The sub-servers will disconnect whether there is data being transferred or the server is completely idle. The disconnect (hang) can happen anytime - while logging in (connect timing out) and at varying periods (10min ~ 2hrs).

    I'm not asking anyone to solve this problem for me, but could one of you be so kind as to point me in the right direction about which logs to consult to find the source of the problem, or which test commands to run?

    Thank you, Best,

    Sefu.

  2. #2
    Just Joined!
    Join Date
    Dec 2008
    Location
    Canberra, Australia
    Posts
    8
    I really have no idea, but first I'd have a look at the data going through the network with "tcpdump -tni <interface> 'host <ip>'" for more detail try "tcpdump -tni <interface> 'host <ip>' -s0 -X -vvv"

    You might see something that would lead you in the right direction if you can capture the packets at the time of disconnection...

  3. #3
    Just Joined!
    Join Date
    Dec 2008
    Posts
    6
    tcpdump looks like a very powerful tool. Thanks! I'll be back for a follow-up.

  4. #4
    Just Joined! mhanan's Avatar
    Join Date
    Dec 2008
    Location
    San Diego CA
    Posts
    60

    another option

    would be wireshark... same idea but with a pretty gui front end.

    Nice thing about wireshark is that you can set it and forget it... the color coding of packet type makes it easy to scroll back through later and check things out. You can also save the capture files... a few of these over a period of time and you may begin to see a pattern.

  5. #5
    Just Joined!
    Join Date
    Dec 2008
    Posts
    6
    I found a few problems - my network clock was out of sync, for example - but still nothing about the hangs. Still looking...

    One more question though: shouldn't this info be in a log somewhere? Also, I get a feeling that this problem may have something to do with sessions - would activating PAM help (and do nfs/macFuse even use this (with sshd_config))?

    Thanks, more later.

  6. #6
    Just Joined!
    Join Date
    Dec 2008
    Posts
    6
    Okay, I've got tcpdump working through two terminals (one for each host - localhost (head Leopard server) and the Debian fileserver sub-server) - everything seems to be fine, as I can see the ports talking to each other through tcp & icmp. But from time to time this comes up:

    IP (tos 0x0, ttl 64, id 55690, offset 0, flags [DF], proto TCP (6), length 52) 192.168.1.1.1020 > 192.168.1.6.6996: ., cksum 0x837e (incorrect (-> 0x0d29), 107900:107900(0) ack 125128 win 65535 <nop,nop,timestamp 939229875 106320798>
    0x0000: 4500 0034 d98a 4000 4006 dde1 c0a8 0101 E..4..@.@.......
    0x0010: c0a8 0106 03fc 1b54 73ff 8faf cfbb dede .......Ts.......
    ...etc...

    ...so from this I can see that something's out of sync in the packets sent from the head server to the sub-server.

    I modified the above server's sshd_config file to not send clientAlive messages - bad idea? I seem to detect a faster query rate with that one than the other four servers (that do have a clientAlive set). To tell you the truth, I'm still deciding whether tcp connections depend on the sshd_config file... it would seem so.

    (added) I see that any packet sent to the sub-servers from the head server gets an "incorrect" reply similar to the above. Is this indeed an error? Anyhow, the connection is still not hanging. More later...

  7. #7
    Just Joined!
    Join Date
    Dec 2008
    Posts
    6
    Okay, I have a better understanding of outgoing packets sent with a [df] flag not being seen by tcpdump as 'correct'. Better understood after trying the above command (thanks!) from opposing directions from both computers.

    I'm still waiting for one of them to hang, but nothing doing! Could the tcpdump be somehow keeping the connection alive? I dislike just as much not knowing why something IS working as not knowing why it isn't.

    One last addition: I notice that ssh on the Linux Debian (sub-) server is listening at tcp6, whereas ssh on the Leopard server is listening only on tcp4. Possible conflict?

    Thank you for any help or advice at all.

  8. #8
    Just Joined!
    Join Date
    Dec 2008
    Posts
    6
    I am having a hard time pinning the source of the hangs down. Switch, host server or client server? I've combed all the logs and tried all the network diagnostics tools, but have yet to find any sign of a slow/dropped connection. I can only see (through constant monitoring - Nagios) that at times my (host) sub-servers are not responding to any service save 'ping'.

    The connection drop occurs during file transfers, while I was consulting 'man' pages through the terminal, and even when host/client connection is completely idle.

    I'm wondering if there is some sort of timed authentication process that is failing, thus dropping the connection? Yet I see no sign of anything of the like in the logs.

    If anyone can tell me better where to look, I would be much obliged.

  9. #9
    Linux Enthusiast
    Join Date
    Jul 2005
    Location
    Maryland
    Posts
    522
    This may not be what's causing it, but I would still look into it.
    I have seen "green" drives that go into sleep mode, becoming unavailable after while.
    Also, maybe it's SATA controller (if it's external card) or driver that is dropping connection to the drives. So, I would check the drives first.

  10. #10
    jep
    jep is offline
    Just Joined!
    Join Date
    Oct 2009
    Posts
    1

    random network hangs (OS X)

    I am facing exactly the same problems as Sefu (no hairs left on my side).
    I am using a mixture of OS X clients (Tiger, Leopard, Snow Leopard) with NFS and MacFuse (SSH) over an Ubuntu 9.04 Server (Jaunty) 64bit.

    As Sefu I have tried everything (as low level as checking cables/swithes/tcpdump) and came to no conclusion. I noticed that things gone worse with Snow Leopard.
    In the same network I run a bunch of linux Ubuntu clients with no problems at all.

    Did someone solved this?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •