    [SOLVED] Gentoo or Hardware - Help deciding

    Hi Everyone-

    For the past two days I've been having serious, but difficult to pin down problems with my gentoo box. It boots up, but at intermitant times after boot up my system completely locks up. No mouse, no keyboard, no dumping into a terminal, nothing, nathen, nada. I have to ctl-alt-sysreq- RSEIUB just to reboot.

    These problems started not long after an emerge -upDN world && revdep-rebuild so I spent quite a bit of time running through different versions of packages to see if it helped, to no avail. Has anyone heard of any recent bug reports that might do this (especially on amd64 arch)?

    So, I started thinking that it was hardware. The processor temperature isn't getting high, so I did a ram test. After running multiple passes of the extended RAM tests off of a CD, NO ERRORS. My next course of action, when I have some time available is to run off of a boot cd and see if the problems are reproducible then...

    I am really at a loss here, anyone have any other ideas? I'm used to solving linux related problems, but I am not used to a complete system lock-up like this (in fact, come to think of it, I don't think I've ever had a linux system do that in a semi-regular fashion).
    Assuming that you are using stable settings and kernel and no external modules that could introduce bugs, then random lockups are a symptom of faulty hardware.

    The fact that you never observed it before with any other OS might have a simple explanation, like for example:

    • a ram stick just broke, they are sensible enough and any electric peak can send them to hell
    • in the rest of distros you used the cpu and ram usage was't that high (and I am speaking about emerge here, because compiling is one of the heaviest task in all regards: cpu, ram, i/o, etc.)

    Note that memtest isn't infallible either.

    First off, we need to go through the regular drill of posting your emerge --info.

    Do you use ~arch or stable?

    Most of the time when I get lockups like that it is a kernel or video driver. Usually hardware problems will appear in your compiling as a random error.

    Although, I have had a problem before that when I loaded a certain module, the whole system locked. I had to push the reset button on the front. Maybe a module problem? Did your updates include a kernel update?

    As you can see, I am stabbing in the dark here, and you have probably considered most of this, but hey, it will be great if we can get it resolved.

    Also, which DE are you using? Window Manager? Compositing? Does it just lock up in X or can you be in VT1 -6 while it does it? Any log files?

    First of all, thanks for the fast responses. I did a run through my package.keywords and have pruned out those things which I don't actually need to be unstable. So now I'm trying an emerge -eDN world, we'll see if that will help anything

    Now to go through the suggestions:

    No compositing, KDE (but I also tried to run under XFCE4 just to see if it was something kde related... still locked). DE?

    I'm actually doing my system rebuild from a terminal without x running. So far that has not locked up, so I guess that sort of qualifies as evidence of a problem with my video drivers or X itself.

    I was able to try running a boot cd for a couple hours this morning without the problem cropping up, but I'm still not excluding hardware error. i92goboj, you are right about the complete randomness making it sound like a hardware thing, and I think this weekend I'll be able to dig a little deeper into the hardware possibilities.

    I did like the comment about the video drivers, but I have no proof about why that would be the culprit. I didn't do a kernel update, but I did emerge the nvidia-drivers (kernel module). I've tried removing, eselecting, then reloading all before starting an x session, but to no avail. I should also try switching back to no acceleration (vga or nv driver) to see if it could be specifically the nvidia drivers.

    Emerge information. I've got about 30 packages ~amd64, the rest are using unmasked.

    emerge --info
    Portage (default/linux/amd64/2008.0, gcc-4.1.2, glibc-2.9_p20081201-r2, 2.6.27-gentoo-r8 x86_64)
    System uname: Linux-2.6.27-gentoo-r8-x86_64-AMD_Athlon-tm-_64_X2_Dual_Core_Processor_3800+-with-glibc2.2.5
    Timestamp of tree: Fri, 27 Mar 2009 15:30:17 +0000
    distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
    ccache version 2.4 [disabled]
    app-shells/bash:     3.2_p39
    dev-java/java-config: 1.3.7-r1, 2.1.7
    dev-lang/python:     2.4.4-r13, 2.5.2-r7
    dev-python/pycrypto: 2.0.1-r6
    dev-util/cmake:      2.6.2-r1
    sys-devel/autoconf:  2.13, 2.63
    sys-devel/automake:  1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2
    sys-devel/binutils:  2.18-r3
    sys-devel/gcc-config: 1.4.0-r4
    sys-devel/libtool:   1.5.26
    virtual/os-headers:  2.6.27-r2
    CFLAGS="-march=athlon64 -O2 -pipe"
    CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config"
    CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
    CXXFLAGS="-O2 -pipe"
    FEATURES="distlocks fixpackages parallel-fetch protect-owned sandbox sfperms strict unmerge-orphans userfetch"
    PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
    USE="X acc acl alsa amd64 arts berkdb bzip2 cdr cli cracklib crypt cups dri dvd fortran gdbm gif gimpprint gpm gtk iconv ipv6 isdnlog jpeg kde midi mmx mp3 mpeg mplayer mudflap multilib ncurses nls nptl nptlonly opengl openmp pam pcre perl png ppds pppd python readline reflection samba scanner session spl sse sse2 ssl sysfs tcpd tiff truetype unicode usb v4l vorbis xine xorg zlib" ALSA_CARDS="intel8x0" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en" USERLAND="GNU" VIDEO_CARDS="nvidia"
    Thanks for all the suggestions. I might not have much more information until I'm done rebuilding, but I'll be sure to post any new developments.
    Problem solved, sort of. Lets compromise and say diagnosed.

    After a weekend of too much testing, and increasing numbers of problems, I've determined that my harddrive is going bad. Now the fun of trying to get all my things off of there without too much corruption... then starting a new install.

    Very disheartening...

    However, I want to thank everyone for all the help. It helped me reallly narrow in on some of the important issues.
    For the sake of completeness I thought I would fill in a few more gaps before completely closing this thread as solved:

    I did indeed find unrecoverable harddrive errors. These were tested by putting the drive into another machine and trying to repair it. I thought for awhile that this was the only problem. However, after buying two new drives and performing a full, I still encountered the same lockups that I had seen previously.

    After doing some more testing, I think I've finally pinned down the source of the lockups.

    1.) Rebuilding without X running never caused the system to lock up but did result in more problems -- Reason: Harddrive problems

    2.) X locking up -- Reason: Failing video card

    I bought a new video card and things seem to be working well. I am still a little worried that after all of this there could be something up with my motherboard, but I'll just be crossing my fingers.
