Find the answer to your Linux question:
Page 1 of 3 1 2 3 LastLast
Results 1 to 10 of 27
I got a serious problem with RedHat 7.3 and 8.0: - Server crashes after 1 to 21 days - it's not possible to forsee thes events - mainly occured during ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Mar 2003
    Posts
    12

    RH: system hangs, network connects work, service don't


    I got a serious problem with RedHat 7.3 and 8.0:
    - Server crashes after 1 to 21 days
    - it's not possible to forsee thes events
    - mainly occured during peak-hours but today it wasn't
    - server still accepts tcpip-connections on ports where services were running before - but (most) services don't respond. e.g. you can telnet to port 110 (pop3), get a connect but can't use pop3 because service doesn't respond.
    - DNS still works - guess because it's running completely in memory and not forking any subprocesses / reading from harddisk afaik
    - ssh etc. don't work either
    - console doesn't work (not even Ctrl-Alt-Del etc.)

    exchanged hardware completely, moved from RH 7.3 to RH 8.0 in this step. always used latest kernels from redhat network.

    I'm lost!

    Anybody have similar problems or can imagine a solution?

    Even changed harddisks: only common thing to my two servers was they were both using Maxtor-harddisks. Exchanged them in the new server against Seagate-ones and it seemed to work. But today again it crashed - so it didn't help :-(


    Any suggestions? PLEASE

  2. #2
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    That seems really strange. It really sounds like a hardware problem. I had a friend whose computer was crashing because the electrical connection gave voltage peaks. Could you try another outlet?
    Is there any way you can login to this machine (with terminal, ssh, telnet...?)
    In what way doesn't the console work? Is it just blank?
    Do you get a SYN ACK when you telnet to eg. pop3? (Ie. does the TCP connection get ESTABLISHED if you look at it with netstat)

  3. #3
    Just Joined!
    Join Date
    Mar 2003
    Posts
    12

    It's not a hardware problem - definitiv!

    It's not a hardware-Problem because I have the same on two completely different machines with completely different hardware. Until two weeks ago only only thing common to both servers were Maxtor-harddisks (although the new PC had brand new harddisks!) but I exchanged them against Seagate one's. That wasn't the problem either :-((

    On the old server I used Redhat 7.3 with all updates from RHN (Redhat Network), on the new I use 8.0 with all updates. So even software is mostly different.

    I don't know what else to do about it, what else to try. I can't even say when (because of what reason) the error occurs. It reoccurs after 1 day to 4 weeks unexpectedly - and not because of high load or anything like that.

    Anybody got an idea how to track these problems down?

    There are no suspicious entries in the log-files either :-(

  4. $spacer_open
    $spacer_close
  5. #4
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    Like I said, could you try connecting it to another outlet, preferrably (if possible) in another house, and/or installing a ground protector (or whatever it's called in English) in your outlet? Are you using grounded outlets?
    Could you answer the other questions I asked, too? it will be necessary to track down the problems.

  6. #5
    Linux Engineer
    Join Date
    Jan 2003
    Location
    Lebanon, pa
    Posts
    994
    Had this exact same problem with a server and it ended up being a failing ide controller. Though you said you swapped all hardware which makes this really strange. So what you did was install 7.3 then it crashed so you swapped all hardware out and installed 8.0 which started doing the samething?

  7. #6
    Just Joined!
    Join Date
    Mar 2003
    Posts
    12
    Not exactly this way: I had the problem on my "old" server (which was running for about 1/2 year at that time) with RH 7.3. After the problems occured several times we decided to move to a new serverhousing-location and therefor bought complete new hardware:
    - Old system was a Athlon-based-system with RH 7.3.
    - New system was a Intel-based-system with RH 8.0.
    Therefor I would believe that both systems have almost completely different chipsets etc.

    Only thing common to both were Maxtor-harddrive ... although the ones in the new system were also brand-new - but they were of the same type as the old ones. These were lately exchanged against Seagate-harddisks in the new server but also lead to problems.

    Hmm - I don't know what to think, I don't know what else to try.
    And even the worst thing: I don't know how to provoke the error. Since now we have moved completely to the new system in a few days I have the "old" system here and could do tests with it. I strongly believe that memory-tests don't show anything (why should they?) - maybe I would need some time of stress-test to the Linux-system. But do I need to run harddisk-benchmarks over 24 hours to give stress-tests to the hdds? Or do I need to run an Apache-benchmark with massive access to the MySQL-database? If I could reproduce the crash I would surely be lucky.

    Any ideas on how to try to provoke the error? And with what tools?

    The errors occur after 1 to 21 days (very different) unexpectedly - you can't even say it's because of "high load" or something like that :-((

  8. #7
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    Are you using any third-party or experimental kernel modules?

  9. #8
    Just Joined!
    Join Date
    Mar 2003
    Posts
    12
    No extraordinary things - not even any nonstandard kernel-modules, no. Using RedHat kernel with all their updates etc. from RH 7.3 and RH 8.0 ... so everything is "RedHat stable".

  10. #9
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    That's what I thought... I still felt I had to check, though.
    What was that you said in your first post about the console? Are both the video card and keyboard input disabled, or only one of them? Are they always disabled, even from boot, or do they fail later? Also, the issue about the services not responding; does that only happen after the server crashes, or is it always like that?

  11. #10
    Just Joined!
    Join Date
    Mar 2003
    Posts
    12
    No no ... stop! You got me completely wrong :-)

    Everything works perfect normally. After the server has crashed ports opened by services which were running up to that time still respond to incoming tcp-ip-connections. But they don't say "hello" (FTP etc.) to the client as they usually do - they just take the connection but don't answer. The funny thing is that DNS works (guess because it doesn't need to access the harddisk and entirely runs in memory).
    And about the console: I can normally login as usual and work with the console as everytime. But after the crash I can't even switch through the textconsoles, can't login, can't do anything. So it's not just the network-part failing. I strongly believe the system crashes in kernel or in some driver component. But no error get logged anywhere - not even on the screen :-(

    Any debugging-ideas?

Page 1 of 3 1 2 3 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •