Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 11
System components(IIRC): Asus A7V8X-X motherboard AMD Athlon-XP 2800+ Maxtor 160GB 7500RPM ATA/133 HDD Radeon 7500(or 7xxx something) -OR- "Powered by ATI" Sapphire Radeon 9000 (PRO? Its OEM so I don't ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Sep 2003
    Posts
    46

    Random segfaults.


    System components(IIRC):
    Asus A7V8X-X motherboard
    AMD Athlon-XP 2800+
    Maxtor 160GB 7500RPM ATA/133 HDD
    Radeon 7500(or 7xxx something) -OR- "Powered by ATI" Sapphire Radeon 9000 (PRO? Its OEM so I don't have anything more than the board). I have swapped out between these cards, and both have the problem.
    2x512MB 333MHz Corsair "Value select" RAM
    "Turbolink" 420W switching P/S "Pentium 4 rated" <-- Is this a noisy P/S or something? I don't have a scope to check it with, although I do have a spectanal to check RFI with if it might help.

    This system was built a month ago.

    I have been running Debian without a problem on it until 2 days ago. I was using an X11 SSH tunnel, and suddenly KDE segfaulted. I had not installed or modified ANY hardware or software when this happened. Every since that first occurence, things have been segfaulting randomly. I tried everything I could to fix the problem, and then decided to reinstall a different distro to test it.

    I tried a Knoppix live-cd first, and it segfaults. I KNOW that Knoppix USED to work perfectly on this box. I mkfsed the root and usr partitions of /dev/hda and tried to install gentoo. Gentoo kept segfaulting during compilation. I finally got it to work. Its now installed, but this keep on segfaulting. I can't compile software effectively because of this.

    I dusted off the mobo and checked for blown caps or any other obvious failures, but I didn't find anything. I swapped out a whole bunch of HW, and that seemed to help for a while, but it didn't work really. I disabled all the peripherals except video and network, and it STILL segfaults.

    Any ideas on what to do next? It sounds like it might be a memory failure or maybe noise, but I can't really tell. I'm still swapping things to see if I can find the problem.

    I can post the contents of /proc somewhere if it would help.

    I am running Kernel 2.6, but it also affects other versions.

    UPDATE:

    Cool! I got a kernel panic! This is progress...
    Here's a picture of the screen.

    Error in interrupt handler - not syncing.[/url]

  2. #2
    Linux Engineer
    Join Date
    Sep 2003
    Location
    Knoxhell, TN
    Posts
    1,078
    i would try testing the ram.. dolda2000 keeps recommending this prog, so i'll just post the link to it.
    http://www.memtest86.com

    run that and see what happens
    Their code will be beautiful, even if their desks are buried in 3 feet of crap. - esr

  3. #3
    Just Joined!
    Join Date
    Sep 2003
    Posts
    46
    Okay. I'm testing the memory with that. I'll post back in the morning. I need to get some sleep.

  4. $spacer_open
    $spacer_close
  5. #4
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    If it doesn't find anything, remember to turn on all tests. It just runs a few fast ones by default.

  6. #5
    Just Joined!
    Join Date
    Sep 2003
    Posts
    46
    I've tested the first board so far. When I checked it this morning, it had been running for 10 hours and had several pages of errors, AFAIK all were test 4 errors, and there was at least one error per pass. The failures weren't at specific areas each time, they were just kind of random and all over the place.

    I'm testing the second board right now, and I'll update this post after it does a few passes.

  7. #6
    Just Joined!
    Join Date
    Sep 2003
    Posts
    46
    I swapped out the memory with some from this computer, and I'm still having errors. A while ago, I got something saying that my C++ libraries were not sane. This time gcc got a "No space left on device" error. I have plenty of space on all the writable filesystems.

    The thing is... I'm getting different errors every time I try to reinstall. This is really wierd... Sometimes something will work, and then sometimes the same thing will fail.

    EDIT: Just to clarify, the error for "No space left on device" was:
    lib_clear.o: No space left on device
    {standard input}: Assembler messages:
    {standard input}:1407: FATAL: can't close lib_clear.o
    : Illegal seek

  8. #7
    Just Joined!
    Join Date
    Sep 2003
    Posts
    46
    This is really wierd. I dropped the CPU speed down to about half, and I'm running Knoppix right now. I have CPU-Burn running in the background, Mozilla open with three tabs, GAIM open, a game of frozen bubble open, and everything is still stable. I'm using the original memory I had. I wonder if the motherboard or the cpu has a problem?

    I'll try compiling gentoo at this clock speed and see if it works.

  9. #8
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    If you turn your clock speed back, what you might want to check is that you have sufficient CPU cooling. If the CPU gets too hot, things like that can easily happen.

  10. #9
    Just Joined!
    Join Date
    Sep 2003
    Posts
    46
    I upped the multiplier to the maximum while leaving the bus speeds down. I'm actually overclocking, but I can run CPU burn indefinitely without any instability. I've got gentoo compiled and running. All the programs compile without any errors now, and everything just seems to work.

    So my guess is that some component has died or gone out of spec, and now there is too much noise on the bus to run at that frequency reliably. I guess I'll RMA the mobo after I do a few more tests to make sure. I think I'll also try a different power supply first just to make sure that isn't the problem, since it could be causing a lot of noise.

    I don't think its a heat problem, but I'll test some more to make sure. I have a big CPU cooler and overheat shutdown enabled in BIOS, so it shouldn't be the problem.

    Actually, I just realized that the problem was causing many more very subtle errors, like a few corrupt files, some plainly incorrect calculations, etc...

  11. #10
    Linux Guru
    Join Date
    Oct 2001
    Location
    Täby, Sweden
    Posts
    7,578
    If lowering the bus speeds fixed it, I have to agree that it does indeed seem as if something's wrong with the mobo. You seem to be knowledgable with hardware, but just in case you wouldn't have thought of it - have you played with the frequency spread spectrum option, if your BIOS has it?

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •