Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 12
Yeah. I spent 10 hours on this already myself, so I'm posting a long first post. I had a power failure that lasted about 5 seconds yesterday, and when the ...
  1. #1
    Just Joined!
    Join Date
    Jul 2009
    Posts
    7

    Exclamation Boot Loader Mayhem - After Power Failure (SuSE 10.3)



    Yeah. I spent 10 hours on this already myself, so I'm posting a long first post.

    I had a power failure that lasted about 5 seconds yesterday, and when the computer came back up, it would not boot. First, I had to clear a message that said that the BIOS Checksum failed, and defaults had to be loaded. I checked everything out, and am pretty sure I got everything back to normal there.

    When it went to boot, it completely hangs on
    Code:
    Grub Loading, please wait...
    I've tried everything imaginable. I was able to get at the system using a RIP Linux liveCD. I could run an fsck and mount the partition. Then comes the baffling part. When I select the option in its boot menu to boot to a specific root device, it bootstraps from its own /boot/kernel and then goes straight into my OS. My Apache, MySQL, Postfix, and Dovecot are all working perfectly.

    I installed LILO from within my installation, and matched the configuration, and got almost identical results. The menu came up, but choosing an option caused it to freeze on:
    Code:
    BIOS Data Check Successful
    I noticed that my partition table is backwards. There is a 512mb swap partition and then a 78GB ext3 partition. It worked like this yesterday, but just to be safe, I created a /boot partition at the beginning of the drive, and made a smaller swap.

    This didn't change a thing. I copied new stage1 and stage2 files for GRUB over to /boot, and I reinstalled my Kernel from an RPM just in case any of the above were corrupted, all to no change.

    After all these changes, I can still boot up the system with the liveCD. I don't know if its /boot/kernel is completely bypassing my kernel or not. I also noticed that my drive is being detected as CHS. Trying LBA made no difference either.

    I can't seem to find anything to make LILO or GRUB more verbose when booting the kernel. They both just silently hang.

    Current /etc/lilo.conf
    Code:
    # Modified by YaST2. Last modification on Fri Jul 10 00:15:45 CDT 2009
    menu-scheme = Wb:kw:Wb:Wb
    timeout = 30
    #lba32
    default = "openSUSE"
    boot = /dev/hda1
    root = /dev/hda2
    prompt
    
    image = /boot/vmlinuz-2.6.22.5-31-default
    ###Don't change this comment - YaST2 identifier: Original name: linux###
        label = openSUSE
        append = "pci=nommconf pci=nomsi nofb noresume acpi=off edd=off apm=off splash=silent showopts"
        initrd = /boot/initrd-2.6.22.5-31-default
        root = /dev/hda2
        vga = normal
    Current /boot/grub/menu.lst:
    Code:
    # Modified by YaST2. Last modification on Fri Jul 10 02:37:40 CDT 2009
    default 0
    timeout 3
    ##YaST - activate
    ##YaST - generic_mbr
    
    ###Don't change this comment - YaST2 identifier: Original name: linux###
    title openSUSE 10.3
        root (hd0,0)
        kernel /vmlinuz-2.6.22.5-31-default root=/dev/hda2 vga=normal pci=nommconf pci=nomsi nofb noresume acpi=off edd=off apm=off splash=silent showopts 
        initrd /initrd-2.6.22.5-31-default
    More System Information
    AMD Duron 700MHz, 512MB of RAM, Phoenix BIOS
    Secondary Slave is a 500GB hard drive that is part of a soft RAID with two SATA drives in an add-in PCI controller.

  2. #2
    Just Joined!
    Join Date
    Jul 2009
    Posts
    7

    It's using the kernel on the CD

    Don't know why I didn't try this before.
    Code:
    # cat /proc/version
    
    Linux version 2.6.17 (root@Linux) (gcc version 3.3.6) #7 Wed Jun 21 19:49:05 UTC 2006
    So apparently it's using the actual kernel on the CD, not just as a bootstrapper, but as the OS.

    The kernel on my hard drive is
    Code:
    /boot/vmlinuz-2.6.22.5-31-default
    What does this mean? Does it mean that the installed kernel itself can't find the hard drive and is halting there? And maybe the one I'm booting from sees the drive by a different name?

  3. #3
    Linux Guru gogalthorp's Avatar
    Join Date
    Oct 2006
    Location
    West (by God) Virginia
    Posts
    3,105
    There is a good possibility that the power failure messed you hard drive. You should run a low level disk scan. I use Spinrite (not a free program) but you should be able to get one from your drive manufacturer.

    Chances are good that you have corrupted sectors. This may or may not be correctable and may require a new install. The fact that the BIOS was scrambled does not bode well.

  4. #4
    Just Joined!
    Join Date
    Jul 2009
    Posts
    7

    No Errors Found

    I sure thought you had it with that brilliant suggestion. I have a Western Digital drive, so I downloaded the Data Lifeguard Diagnostic boot CD. I ran an extended test, and it did not find any errors, and didn't silently fix any either, as far as I could tell. It still won't boot from the drive.

    I'm almost tempted to copy the random kernel from the CD to my /boot and try to point GRUB at that. OK, I probably will do that next.

  5. #5
    Just Joined!
    Join Date
    Jul 2009
    Posts
    7

    Unhappy Problem...Solved????

    I copied the kernel and initrd from the RIPLinuX livecd to my /boot partition. I told LILO to boot from that.

    And that works just fine. I can now boot my computer without a CD-ROM. It just happens that it boots to some other kernel.

    Now I'll admit that 2.6.17 isn't that far off from 2.6.22.5-31, but the latter actually came with my distro.

    Should I just pretend all is well, and leave it running the non-standard kernel and call this a case closed? Doesn't sound right to me.

    Why shouldn't I still be able to boot the stock kernel? Any ideas NOW?

  6. #6
    Linux Guru Jonathan183's Avatar
    Join Date
    Oct 2007
    Posts
    2,905
    I'd recommend backup data and a fresh install ... your never really going to be sure you fixed everything even if you get the system to boot.

  7. #7
    Just Joined!
    Join Date
    Jul 2009
    Posts
    7

    It's not my style

    I'm more of a find and fix than a slash and burn type. I'd rather confirm that there's something that can't be resolved any other way, before I move to that point. Maybe a little bit OCD in that respect, but over time I'd not have to guess so often.

    With all the random scripts and programs I've installed, I'd never remember how to get the system back into its current state. It's a hodgepodge server.

    Poor excuse, I'm sure, but I can't afford a proper backup system right now. I only have my data backed up, and not /etc and a list of installed programs. Not to mention dozens of programs that have been compiled from source.

    Any input from anyone else?

  8. #8
    Linux Guru Jonathan183's Avatar
    Join Date
    Oct 2007
    Posts
    2,905
    I'm more of a find and fix than a slash and burn type.
    ... I know what you mean

    Its probably worth having a look in system logs to see if you can get any clues what happend - I think you can do this through yast.

    Check partition structure and that all partitions are mounted correctly using
    Code:
    cat /etc/fstab
    mount
    and output of fdisk -l.

    Now you have the system running you should be able to use yast to install kernel and re-install bootloader (including getting a fresh menu generated). Before you try that it maybe worth copying your /boot/grub/menu.lst file so if you need to you can restore it. Good luck

  9. #9
    Linux Guru gogalthorp's Avatar
    Join Date
    Oct 2006
    Location
    West (by God) Virginia
    Posts
    3,105
    How about trying repair from the install disk?

  10. #10
    Just Joined!
    Join Date
    Jul 2009
    Posts
    7
    Its probably worth having a look in system logs to see if you can get any clues what happend - I think you can do this through yast.
    Not sure what you mean by this. The power went out. I know that happened. The computer didn't even have the time to flush the write buffer to disk, let alone log anything. And when it came back up, the root partition was mounted read-only at best (through GRUB), and the kernel hadn't even loaded yet when it hangs.

    mount shows all the drives mounted correctly when I get booted under this kernel. I moved the kernel and initrd image out of the /boot folder and installed the kernel through Yast. I even tried compiling one from source, and the same thing happened. I had already rewritten the grub menu from scratch, and I know that I formatted it correctly, because this other kernel loads just fine. Don't need to automagically generate one when I learned the syntax forward and back over the 12+ hours I've been working on this.

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •