Welcome to Linux Forums!

With a comprehensive Linux Forum, information on various types of Linux software and many Linux Reviews articles, we have all the knowledge you need a click away, or accessible via our knowledgeable members.

Linux Forum ArticlesLinux ForumsLinux Forum DownloadsLinux Hosts
Home|Register|FAQ|Member List|Calendar|Unanswered Posts|Forum Rules|Today's Posts|Advanced Search|
SEARCH FOR IN
Go Back   Linux Forums > GNU Linux Zone > Servers
Reload this Page server shutting down ????
Linux Forums
Linux Forums
Welcome To The Linux Forums!
Welcome to Linux Forums. We pride ourselves in being one of the largest Linux communities on the web, we encourage you to REGISTER on our forums and participate in the community. There are over 150,000 members ready to answer your questions. JOINING US today will allow you to make new posts, get support, send messages to other members and submit downloads to our downloads directory and many other great features!

Servers Anything server related, Apache, MySQL, Samba, server security, sendmail, exim, etc

Reply
 
Thread Tools Display Modes
Old 05-12-2008   #1 (permalink)
Kumado
Just Joined!
 
Kumado's Avatar
 
Join Date: Jul 2006
Posts: 51
server shutting down ????

I have an internal server for dhcp, internal dns, ftp installs, ntp and samba for pdc and files storage and apache for in house info pages.

Suse 10 - Asus / AMD
Linux version 2.6.13-15.12-default (geeko@buildhost) (gcc version 4.0.2 20050901 (prerelease) (SUSE Linux)) #1 Thu Aug 24 11:23:58 UTC 2006

It locks-up at random times. It is on but non responsive at the tty or via ssh, etc.

usually it is over the weekend, I come in on Monday morning and it is locked up, maybe it gets lonely

I have so far :

blew out the system
reseated the cpu and mem sticks
new heat sink grease ( bios did not show it running very hot right after reboot )
a memory test - walking bits +++
I thought the power cord seemed, odd, smaller than most and it did flicker when I just touched it so I replaced it

In the warn logs I have this entry several times prior to last entry:

Normal free:6772kB min:3756kB low:4692kB high:5632kB active:543436kB inactive:262292kB present:901120kB pages_scanned:17031822 all_unreclaimable? yes


I do not quite follow it.

there is nothing in the faillog

Any ideas where I should start looking?

After I reboot, it is up and running with no problems and can run 7 to 16 days with no errors.

There are approximately 150 users in the network with 20 ip based cameras.

I only have the one internal dns server.

Thanks

kumado
__________________
E|!
Kumado is offline   Reply With Quote
Old 05-14-2008   #2 (permalink)
wildpossum
Just Joined!
 
wildpossum's Avatar
 
Join Date: Apr 2008
Location: Sydney/Australia
Posts: 74
Send a message via Skype™ to wildpossum
At first pass - I think you may have a memory leak somewhere. That is what the final report is talking about.
Available memory has gone down to less than 3MB, as a lot of memory (on the heap) has not been released or freed by aquiring processes.

Have you set core dumping facilities by using the bash ulimit command? Alternatively, if the system is somewhat running but not really responding have you tried the "SysReq Key" functions to capture data to determine what may be the issue(s). See /usr/scr/linux/Documentation/sysrq.tx for those details. It is very possible that this may respond even though you don't see any visable response on the screen - You can usually use the keyboard if on toggling the caps lock you see the led indicating a change.
If toggling the caps lock doesn't get a led toggle your got a complete lockup which maybe simply because the system see no memory left to use.

A very quick solution, but not the one you want to leave, but one that may help you to catch the offending application is to increase the swap space (x10 say). Generally just adding another swap partition, or if that is not do-able, then add a swap file. Better still, add another drive and use it for additional swap space. Modify /etc/init.d/boot.local to add this drive to the swap space available by adding the command "swapon /dev/diskdrivename".

ALSO:
Have you updated the system with all the security and application fixes on a regular basis ? If not, why not?
I especially think you should update to the latest SuSE 10.0 kernel. Also you may not be aware that OpenSUSE 10.0 release has many problems that were only finally fixed in OpenSUSE 10.3 IMHO.

Are you able to see which process dominate what's running over the weekend with no one around? i.e.: Have a console running "top" with the applicable columns like "WaitChan & Flags" showing as this is a great tool for determing what process is doing what or which processes are awaiting an event that maybe doesn't come around.

I assume the system HAS a UPS which has been serviced regulary and the batteries are exchanged every two years (min)?

Is your system secure? If so, how do you come to that conclusion?
What have you done, or what has been done to ensure system security? Even if the system is used only inhouse is Yast Security got the interanl net also under protection? If not why not - Any user can and do usually introduce crazy stuff via web sites, floppy/USB sticks etc., so be aware of this.

Have you got smartctl running on your drives, maybe the disk drive is developing errors as it gets older which only show over time - It is certinly worth a look with this tool.

Maybe that the camera image collector or its controlling program is leaking as its probably the only one running 24/7 (i.e.: the application is not releaing used memory or heap space over time). If your running Zoneminder that is one program I have seen develop such leakage as I have seven cameras watching Possums (at night) birdlife during the day. You could run a "ddd" session on this application run it vis strace to capture a hanging system call.

Otherwise, you will have to compose a list of applicable issues starting with the hardware (things like voltages, temperatures, disk activities logs, printing the console output directly to another PC so as to log all activity over the w/end. Then the software prcesses as somewhat outlined above. Investigate each and everyone, one step at a time.

Hope this asssits you.
__________________
Grahame
AMD Phenom(QuadCore), 8GB, 3ware RAID6 1.8TB, HD3850(512MB) ..etc.
wildpossum is offline   Reply With Quote
Old 05-14-2008   #3 (permalink)
Kumado
Just Joined!
 
Kumado's Avatar
 
Join Date: Jul 2006
Posts: 51
Thanks for the reply, it does help alot.

I have not been able to keep up with the servers as I would like. I am a teacher and this is a side thing that has grown greatly. enough excuses though.

I would like to pull it off line and bring it up to 10.3, redo the services on it. I may get the chance yet or it may force me to it.

It was a wonder to me because this box has been in service for so long with little changes, uptimes of 150+ days. This seems 'out of the blue' of sorts.

It is on a 1 year old UPS. I do take it down and blow it out, check fans, filters and such, do a mem test. It is in a room with an AC unit to keep the room cool.

I had been trying to determine if this was a hardware issue or software The one log entry was the only weird event, besides locking up, I could find so far.
You have mentioned tests I never knew so I will begin there.
Thanks again for the help.

kumado
__________________
E|!
Kumado is offline   Reply With Quote
Old 05-14-2008   #4 (permalink)
wildpossum
Just Joined!
 
wildpossum's Avatar
 
Join Date: Apr 2008
Location: Sydney/Australia
Posts: 74
Send a message via Skype™ to wildpossum
Hi Kumado.

You probably do not need to upgrade the complete distro if it is doing the job your happy with. Linux is not like Windoze where you have to keep it all up to date otherwise applications just don't work anymore.

Unless there is a need to upgrade you needn't.

Just connect a good fast internet line to the ethernet port, run YAST - Online Update and everything will be done (OS wise) automagically for you. You will have to look at each applications home web site to determine if those applications need upgrading. YAST can probably help you out on this too. I run 10.3 and haven't touch 10.0 since it was released so I cannot add much to assist on application side sadly.

*** By the Way ****
I had on client @ Sydney Uni - Chem Lab that had a 12 year old server still running on a really old slackware disto (Kernel was 1.2.8 I recall). But because it was only an internal machine with a secure router/firewall before it - it just served the students for all these years without any problems (until they turned off the power for only the second time in ten years where the PSU blew on next powerup, the fans stopped working and the disk drive crashed). Linux wise, it still worked wonderfully. They took two days to find it even, as it was working under the stairs in a not-to-easy-to-get-too small cupboard.

If you have further question just ask.
Cheers.
__________________
Grahame
AMD Phenom(QuadCore), 8GB, 3ware RAID6 1.8TB, HD3850(512MB) ..etc.
wildpossum is offline   Reply With Quote
Old 05-27-2008   #5 (permalink)
Kumado
Just Joined!
 
Kumado's Avatar
 
Join Date: Jul 2006
Posts: 51
School is out, I have more time than I did and I see a slip in my server logs. I had added a cron job for my NTP server on this machine.

*/15 * * * * root ntpd -s ntpdate -s time.nist.gov

That has been recent enough to be in the area of shut downs.

I am doing the patches on the machine also, I have several to get.

I am going to remove the cron job for a time ( no pun ) to see if it
has anything to do with it.

I have read several articals, do you have a good idea how often you 'need' to update the time server? It seemed most thought it had to be often to disallow drift. I may have done myself in here.

Thanks

Kumado
__________________
E|!
Kumado is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off


All times are GMT. The time now is 07:51 PM.

Powered by vBulletin 3.6.8 ©2000 - 2007, content relevant URLs by vBSEO, Property of Core Root.

Content Relevant URLs by vBSEO 3.0.0