Find the answer to your Linux question:
Results 1 to 5 of 5
Hi, We are experiencing several crashes on one of our servers running FC12 (2.6.31.12-174.2.22.fc12.x86_64 #1 SMP). We had similar issues with kernels 2.6.31.12-174.2.3.fc12.x86_64 and 2.6.31.5-127.fc12.x86_64, but now the frequency is ...
  1. #1
    Just Joined!
    Join Date
    Feb 2010
    Posts
    3

    FC12 Server Crashes

    Hi,

    We are experiencing several crashes on one of our servers running FC12 (2.6.31.12-174.2.22.fc12.x86_64 #1 SMP). We had similar issues with kernels 2.6.31.12-174.2.3.fc12.x86_64 and 2.6.31.5-127.fc12.x86_64, but now the frequency is increasing from once every 10 days or so, to about once or twice a day, seemingly without a great deal of dependence on CPU/memory load or excessive disk I/O.

    We enabled serial logging and found that the server was crashing with messages of the form "Kernel panic - not syncing: xfs_fs_destroy_inode: cannot reclaim <some hex address>". A quick search suggests that other people have experienced this kind of problem with kernels 2.6.29 and 2.6.30, and some patches were provided, but these did not appear to fix the problem.

    The server consists of a RAID 1 array of 2x500Gb disks which contains the OS, and two RAID 6 arrays of 12x1Tb disks glued together with LVM, which contains our data. The output of xfs_info for the data partition is:

    Code:
    meta-data=/dev/mapper/main-userspace isize=256    agcount=16, agsize=268435455 blks
             =                       sectsz=512   attr=2
    data     =                       bsize=4096   blocks=4294967280, imaxpct=5
             =                       sunit=0      swidth=0 blks
    naming   =version 2              bsize=4096   ascii-ci=0
    log      =internal               bsize=4096   blocks=32768, version=2
             =                       sectsz=512   sunit=0 blks, lazy-count=1
    realtime =none                   extsz=4096   blocks=0, rtextents=0

    It would be fantastic if anybody could suggest anything we might try in order to alleviate this problem. Please let us know if your require any further information.

    Thanks.

    Best wishes,


    Alastair.

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    What kind of server is this - a web server like Apache/Tomcat, a commercial server such as a DBMS or application server like Oracle or SAP, an open source database server such as Postgres or MySql, or an in-house developed application server?
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined!
    Join Date
    Feb 2010
    Posts
    3
    It's a fileserver and the headnode to one of our small clusters. We run NFS and Samba. Hope that answers your question.


    Alastair.

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    Ok. So the system itself is crashing, not just one of the servers running on it. From the message it appears that there is some file system/driver problem with XFS. I assume that you recover with fsck or something similar (xfs_check). Is there a possibility that you have a drive, controller, or array failure happening?
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  5. #5
    Just Joined!
    Join Date
    Feb 2010
    Posts
    3
    Yes, we wondered whether it might be a RAID controller issue, because the card is sending warnings that it fixed data/parity mismatches every few days or so. We've sent detailed logs from the card to the manufacturer to see if they can diagnose any hardware issues.

    Thanks,


    Alastair.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...