Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 15
I have a Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux server that keeps crashing, usually once every 24-72 hours. I run lighttpd, mysql, haproxy and a couple of always-running ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jul 2013
    Posts
    7

    Linux server crash (kernel NULL pointer dereference + soft lockup CPU)


    I have a Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.46-1 x86_64 GNU/Linux server that keeps crashing, usually once every 24-72 hours.

    I run lighttpd, mysql, haproxy and a couple of always-running java processes together with a bunch of shorter-lived java processes.

    I have attached the /var/log/syslog and /var/log/messages. They both contain the kernel NULL pointer dereference and the soft lockup CPU bugs lines.

    Do anyone have any idea on how to debug this?

    Thanks
    Attached Files Attached Files

  2. #2
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,413
    How about error logging with your Java applications? If you log errors/exceptions with log4j you might have some other data that can help.

    Are you running a vanilla kernel with no custom-built drivers?
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  3. #3
    Just Joined!
    Join Date
    Jul 2013
    Posts
    7
    Good idea, I will enable some logging on the java applications. I am running a vanilla kernel yes.

  4. #4
    Just Joined!
    Join Date
    Jul 2013
    Posts
    7
    The java logging turned up nothing, however this time i had a terminal open and got this just before it died:

    Code:
    Message from syslogd@gs at Jul 31 19:37:37 ...
     kernel:[60109.097507] Oops: 0000 [#1] SMP
    
    Message from syslogd@gs at Jul 31 19:37:37 ...
     kernel:[60109.098872] Stack:
    
    Message from syslogd@gs at Jul 31 19:37:37 ...
     kernel:[60109.099163] Call Trace:
    
    Message from syslogd@gs at Jul 31 19:37:37 ...
     kernel:[60109.099197]  <IRQ>
    
    Message from syslogd@gs at Jul 31 19:37:37 ...
     kernel:[60109.102262]  <EOI>
    
    Message from syslogd@gs at Jul 31 19:37:37 ...
     kernel:[60109.102267] Code: 04 01 00 00 7f a1 eb b0 58 5b 5d c3 48 81 fa 30 75 00 00 76 0c 48 c7 c0 d8 6d 2c 81 ba 30 75 00 00 83 fe 03 74 05 83 fe 01 75 21 <c8> 8b 05 12 e2 3c 00 40 88 b7 12 04 00 00 48 8d b7 50 03 00 00
    
    Message from syslogd@gs at Jul 31 19:37:37 ...
     kernel:[60109.102442] CR2: 000000004744765c
    
    Message from syslogd@gs at Jul 31 19:37:37 ...
     kernel:[60109.102838] Kernel panic - not syncing: Fatal exception in interrupt

  5. #5
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,413
    This looks very much like a kernel or driver bug - there is something happening in an interrupt handler. Not likely a Java problem, but perhaps something that Java is interacting with. Have you tried either upgrading, or even downgrading the kernel you are using? If it still happens, then it is probably caused by a loaded driver. Sorry, but without hands on the system, this is about as good as I can give you.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  6. #6
    Just Joined!
    Join Date
    Jul 2013
    Posts
    7
    If this keeps happening I will try different kernel versions indeed.

    However all these futex lines got me thinking that maybe it was all the java synchronized methods that messed things up. So I removed almost all of them and replaced the critical ones with synchronized blocks instead. No crashes since (fingers crossed).

    Also I got this warning (no crash) in syslog before:

    Code:
    Aug  1 21:44:04 gs kernel: [104867.728135] general protection fault: 0000 [#1] SMP
    Aug  1 21:44:04 gs kernel: [104867.728148] CPU 5
    Aug  1 21:44:04 gs kernel: [104867.728152] Modules linked in: iptable_filter iptable_mangle xt_mark ipt_REDIRECT xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack ip_tables x_tables cpufreq_powersave cpufreq_userspace cpufreq_stats cpufreq_conservative ext3 jbd loop acpi_cpufreq mperf coretemp crc32c_intel mxm_wmi ghash_clmulni_intel snd_pcm aesni_intel snd_page_alloc aes_x86_64 snd_timer aes_generic snd cryptd soundcore i2c_i801 pcspkr evdev i2c_core parport_pc parport shpchp iTCO_wdt wmi iTCO_vendor_support video button processor ext4 crc16 jbd2 mbcache btrfs crc32c libcrc32c zlib_deflate dm_mod raid1 md_mod sg sd_mod crc_t10dif xhci_hcd ahci libahci r8169 mii thermal libata fan thermal_sys scsi_mod ehci_hcd usbcore usb_common [last unloaded: scsi_wait_scan]
    Aug  1 21:44:04 gs kernel: [104867.728503]
    Aug  1 21:44:04 gs kernel: [104867.728519] Pid: 32652, comm: java Not tainted 3.2.0-4-amd64 #1 Debian 3.2.46-1 MSI MS-7816/H87-G43 (MS-7816)
    Aug  1 21:44:04 gs kernel: [104867.728571] RIP: 0010:[<ffffffff8106dbd8>]  [<ffffffff8106dbd8>] drop_futex_key_refs+0x1/0x60
    Aug  1 21:44:04 gs kernel: [104867.728614] RSP: 0018:ffff8805df68dcb8  EFLAGS: 00010282
    Aug  1 21:44:04 gs kernel: [104867.728635] RAX: 0000000000000001 RBX: ffff8805df68dd68 RCX: ffff8801b4c3dd08
    Aug  1 21:44:04 gs kernel: [104867.728671] RDX: ffff8807fdf63d08 RSI: ffffffff817abb10 RDI: 7fff8805df68dd28
    Aug  1 21:44:04 gs kernel: [104867.728707] RBP: ffff8805df68df40 R08: ffff8805df68dcf8 R09: 0000000000000202
    Aug  1 21:44:04 gs kernel: [104867.728742] R10: 0000000000000202 R11: ffff88081c84a000 R12: 00000000ffffffff
    Aug  1 21:44:04 gs kernel: [104867.728778] R13: ffff8802b206cea0 R14: 7fff8805df68dd28 R15: 00000000016d2954
    Aug  1 21:44:04 gs kernel: [104867.728814] FS:  00007f5a674cd700(0000) GS:ffff880820b40000(0000) knlGS:0000000000000000
    Aug  1 21:44:04 gs kernel: [104867.728852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Aug  1 21:44:04 gs kernel: [104867.728874] CR2: 00007fe66c8b3500 CR3: 00000006412b5000 CR4: 00000000001406e0
    Aug  1 21:44:04 gs kernel: [104867.728910] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    Aug  1 21:44:04 gs kernel: [104867.728945] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Aug  1 21:44:04 gs kernel: [104867.728982] Process java (pid: 32652, threadinfo ffff8805df68c000, task ffff8802b206cea0)
    Aug  1 21:44:04 gs kernel: [104867.729019] Stack:
    Aug  1 21:44:04 gs kernel: [104867.730477]  ffff8805df68dd68 ffffffff8106de64 00000000000001ab ffffffff00000001
    Aug  1 21:44:04 gs kernel: [104867.730516]  00000000fb601ec0 0000000100000002 0000000000000000 0000000000000064
    Aug  1 21:44:04 gs kernel: [104867.730554]  ffff8805df68dcf8 ffff8805df68dcf8 ffff8805df68dd08 ffff8805df68dd08
    Aug  1 21:44:04 gs kernel: [104867.730593] Call Trace:
    Aug  1 21:44:04 gs kernel: [104867.730613]  [<ffffffff8106de64>] ? futex_wait+0x16a/0x236
    Aug  1 21:44:04 gs kernel: [104867.730636]  [<ffffffff81062116>] ? update_rmtp+0x62/0x62
    Aug  1 21:44:04 gs kernel: [104867.730659]  [<ffffffff8106d28c>] ? futex_wait_queue_me+0x90/0xd5
    Aug  1 21:44:04 gs kernel: [104867.730683]  [<ffffffff8106f10c>] ? do_futex+0xb5/0x810
    Aug  1 21:44:04 gs kernel: [104867.730706]  [<ffffffff810fb5eb>] ? fget_light+0x67/0x7b
    Aug  1 21:44:04 gs kernel: [104867.730730]  [<ffffffff812800d8>] ? sys_sendto+0x108/0x137
    Aug  1 21:44:04 gs kernel: [104867.730754]  [<ffffffff8106f987>] ? sys_futex+0x120/0x151
    Aug  1 21:44:04 gs kernel: [104867.730776]  [<ffffffff81353b52>] ? system_call_fastpath+0x16/0x1b
    Aug  1 21:44:04 gs kernel: [104867.730799] Code: c7 fa f3 4c 81 e8 d2 8f fd ff 48 89 6b 30 48 c7 43 58 00 00 00 00 be 03 00 00 00 48 8b 7b 28 48 83 c4 18 5b 5d e9 f7 1a fd ff 53 <48> 8b 5f 08 48 85 db 75 23 80 3d e8 f2 73 00 01 74 4b be cb 00
    Aug  1 21:44:04 gs kernel: [104867.730902] RIP  [<ffffffff8106dbd8>] drop_futex_key_refs+0x1/0x60
    Aug  1 21:44:04 gs kernel: [104867.730927]  RSP <ffff8805df68dcb8>
    Aug  1 21:44:04 gs kernel: [104867.731281] ---[ end trace d8f7cbe87d46e581 ]---
    The sys_sendto line is interesting as I have it only on one place in the entire java application. And that method was still synchronized.

  7. #7
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,413
    Which version of java are you using? And is it updated?
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  8. #8
    Just Joined!
    Join Date
    Jul 2013
    Posts
    7
    Code:
    java -version
    java version "1.7.0_25"
    OpenJDK Runtime Environment (IcedTea 2.3.10) (7u25-2.3.10-1~deb7u1)
    OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

  9. #9
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, in Chicago, or in a galaxy far, far away.
    Posts
    11,413
    Ok. You are running the OpenJDK (IcedTea). I would suggest that you try to use a pure Oracle Java stack and see what happens.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  10. #10
    Just Joined!
    Join Date
    Jul 2013
    Posts
    7
    Ever since i minimized the amount of synchronized methods I have had no crashes, only a few warnings.

    However, I decided to try the oracle stack anyway to see if I get any more warnings.

    EDIT:

    Okay so I just got this in the terminal:

    Code:
    Message from syslogd@gs at Aug  5 05:12:09 ...
     kernel:[379727.623646] Uhhuh. NMI received for unknown reason 21 on CPU 0.
    
    Message from syslogd@gs at Aug  5 05:12:09 ...
     kernel:[379727.623671] Do you have a strange power saving mode enabled?
    
    Message from syslogd@gs at Aug  5 05:12:09 ...
     kernel:[379727.623693] Dazed and confused, but trying to continue
    Doesn't seem too good but didnt crash.
    Last edited by Sheepa; 08-05-2013 at 03:16 AM.

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •