Find the answer to your Linux question:
Results 1 to 2 of 2
Hi everybody, I have a very strange problem on my nagios server. I run nagios 3.2.0, compiled from source on a RHEL 5.5 plain server. It was working fine since ...
  1. #1
    Just Joined!
    Join Date
    Aug 2009
    Posts
    6

    Nagios frozen, not updating service status.

    Hi everybody,

    I have a very strange problem on my nagios server.

    I run nagios 3.2.0, compiled from source on a RHEL 5.5 plain server.
    It was working fine since about 1 year ago when it was initially installed.

    However, about 12 hours ago it stopped working. The admin interface is loading fine, I can surf and use all the options in the menu, however, the web interface doesn't seem to be updating the nagios status of all my service checks on all my 60 servers.

    In fact, the checks are not really being executed as i do not see any nagios activity on the logs, it doesn't check for any local o remote service, and we are talking about a simple tcp 80 response or mailing on port 25, nrpe doesn't show any activity for internal load average or disk space checks.

    When I launch the nrpe or tcp checks, they work fine and report good results from the shell:

    Code:
    # /home/nagios/libexec/check_tcp -H REMOTE.SRV.IP -p 80
    TCP OK - 0.001 second response time on port 80|time=0.000535s;;;0.000000;10.000000
    Code:
    # /home/nagios/libexec/check_nrpe -H REMOTE.SRV.IP -c check_load
    OK - load average: 2.35, 6.33, 4.46|load1=2.350;15.000;30.000;0; load5=6.330;10.000;25.000;0; load15=4.460;5.000;20.000;0;
    #
    However, this results never got updated at the nagios web interface. It's like it's kind of frozen.
    This was the last thing nagios system reported to the logs before it got frozen:
    Code:
    [1289196000] CURRENT SERVICE STATE: server223_01;Particion /mnt/disk2;OK;HARD;1;DISK OK - free space: /mnt/disk2 193225 MB (88% inode=99%):
    [1289196279] Auto-save of retention data completed successfully.
    [1289199879] Auto-save of retention data completed successfully.
    [1289203479] Auto-save of retention data completed successfully.
    [1289207079] Auto-save of retention data completed successfully.
    [1289210679] Auto-save of retention data completed successfully.
    [1289214279] Auto-save of retention data completed successfully.
    [1289217879] Auto-save of retention data completed successfully.
    [1289221479] Auto-save of retention data completed successfully.
    [1289221859] Caught SIGTERM, shutting down...
    [1289221859] Successfully shutdown... (PID=18009)
    [1289221860] Nagios 3.2.0 starting... (PID=29329)
    [1289221860] Local time is Mon Nov 08 07:11:00 CST 2010
    [1289221860] LOG VERSION: 2.0
    [1289221860] Finished daemonizing... (New PID=29330)
    Now the only thing I got when I restart the service is this:
    Code:
    [1289238578] Caught SIGTERM, shutting down...
    [1289238578] Successfully shutdown... (PID=13234)
    [1289238579] Nagios 3.2.0 starting... (PID=13293)
    [1289238579] Local time is Mon Nov 08 11:49:39 CST 2010
    [1289238579] LOG VERSION: 2.0
    [1289238579] Finished daemonizing... (New PID=13294)
    Code:
    # pidof nagios
    13294
    Any ideas are appreciated!

    Thanks!

  2. #2
    Just Joined!
    Join Date
    Aug 2009
    Posts
    6
    Btw..

    Verbose output of nagios binary also shows everything OK:

    Code:
    [root@server.myserver.com:~]/home/nagios/bin/nagios -v /home/nagios/etc/nagios.cfg
    
    Nagios Core 3.2.0
    Copyright (c) 2009 Nagios Core Development Team and Community Contributors
    Copyright (c) 1999-2009 Ethan Galstad
    Last Modified: 08-12-2009
    License: GPL
    
    Website: Nagios - The Industry Standard in IT Infrastructure Monitoring[/url]
    Reading configuration data...
       Read main config file okay...
    Processing object config file '/home/nagios/etc/objects/commands.cfg'...
    Processing object config file '/home/nagios/etc/objects/contacts.cfg'...
    Processing object config file '/home/nagios/etc/objects/timeperiods.cfg'...
    Processing object config file '/home/nagios/etc/objects/templates.cfg'...
    Processing object config file '/home/nagios/etc/objects/localhost.cfg'...
       Read object config files okay...
    
    Running pre-flight check on configuration data...
    
    Checking services...
        Checked 496 services.
    Checking hosts...
        Checked 69 hosts.
    Checking host groups...
        Checked 1 host groups.
    Checking service groups...
        Checked 0 service groups.
    Checking contacts...
        Checked 1 contacts.
    Checking contact groups...
        Checked 1 contact groups.
    Checking service escalations...
        Checked 0 service escalations.
    Checking service dependencies...
        Checked 0 service dependencies.
    Checking host escalations...
        Checked 0 host escalations.
    Checking host dependencies...
        Checked 0 host dependencies.
    Checking commands...
        Checked 32 commands.
    Checking time periods...
        Checked 5 time periods.
    Checking for circular paths between hosts...
    Checking for circular host and service dependencies...
    Checking global event handlers...
    Checking obsessive compulsive processor commands...
    Checking misc settings...
    
    Total Warnings: 0
    Total Errors:   0
    
    Things look okay - No serious problems were detected during the pre-flight check

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...