Results 1 to 7 of 7
I have a two node Oracle 10gR2 RAC environment that runs on RHEL AS 4 u7, kernel 2.6.9-67.0.15.ELsmp.
We've had two things done to the environment:
1. Security scripts required ...
- 03-31-2009 #1Just Joined!
- Join Date
- May 2005
- Posts
- 17
Runaway open files on Oracle RAC, help please
I have a two node Oracle 10gR2 RAC environment that runs on RHEL AS 4 u7, kernel 2.6.9-67.0.15.ELsmp.
We've had two things done to the environment:
1. Security scripts required by DoD
2. Oracle patch 7117233 (mostly a CRS patch but you have to apply to both the cluster registry and the rdbms).
Last week I had this environment crash on me due to being out of open files which is set to 65536 for both the soft and hard for 'oracle'. The database has ~8700 datafiles at this point and our Production environment usually hovers in the 11k open files range. (same OS kernel, same version of oracle).
If I shut down the databases (two), open files drops down to about 1000 files but the instant I bring the databases up, it jumps to over 30k open files and continues to grow over about a 4-5 day period where it will reach the open file limit and crash the databases and servers (if I don't bounce the database and let it start climbing again)
Oracle Support told me it was the version of OCFS2 but after upgrading OCFS2 to the most recent version for our kernel, it resolved nothing. Note: OCFS2 cluster file system on EMC symetrix SAN. This environment has been stable for 2+ years after moving from OCFS to OCFS2. They seem to think it has to do with a problem with mmap files but the OCFS2 patch didn't fix the problem.
I'm suspecting the Security scripts that were run against the servers because I already had to change the umask from 077 back to 022 after troubleshooting permission issues for weeks. If I shut down the database, the open files are closed but once starting the database, it seems like a massive amount of duplicate files are created. It seems like files are opened, unable to be read and are opened again when needed? So open-files continues to grow till it hits the ulimit.
Does anyone have any ideas on this? A strategy to find the culprit? Any help greatly appreciated.
- 04-20-2009 #2Just Joined!
- Join Date
- May 2005
- Posts
- 17
Ok, let me refine the question. How can I troubleshoot mmap file errors/duplicates? Once Oracle starts, at least 3 times the amount of open files are created than should be.
- 04-20-2009 #3Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
This really sounds like an Oracle problem, possibly exacerbated by something in your environment, as you surmised about the security scripts. Have you determined what these files are (temporary, duplicate opens on the same file, etc) and which component(s) of Oracle are opening them? That information might be useful for Oracle tech support.
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 04-21-2009 #4Just Joined!
- Join Date
- May 2005
- Posts
- 17
Oracle has been focusing on hc_<instance name>.dat issues since I gave that as an example and it is created by racgimon. hc_<xxx>.dat is simply a health-check file that is placed in $ORACLE_HOME/dbs/ and is written to every few seconds. racgimon is unable to read the file and keeps creating new ones BUT that doesn't come close to accounting for the number of open files I have. So far Oracle has had me update OCFS2 to latest version for our kernel, apply a 10.2.0.3 patch, upgrade to 10.2.0.4, apply a 10.2.0.4 patch, rollback the 10.2.0.4 patch, apply another 10.2.0.4 patch. That's why I'm here, looking for other possibilities.
- 04-21-2009 #5Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
If you could identify the process that is holding those files open, that's the one with the bug, and it IS a bug in an application process, not the OS. When a process terminates, all open file descriptors it holds are closed. This might be an issue if the process isn't cleaned up, but has simply become a zombie process. Does your system show some zombies?
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 04-22-2009 #6Just Joined!
- Join Date
- May 2005
- Posts
- 17
It isn't just one. For example racgimon has id 6612 with 29 open files for hc_<instance name>.dat. This after restarting the database about 10 minutes ago and there should only be one hc_<instance name>.dat file per instance. I should have about ~11k open files once the database is up and it's at +37k after 10 minutes.
- 04-22-2009 #7Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
I assume you have sent a list of the processes that have excessive opens to Oracle tech support? Something is causing Oracle to keep opening the same file. Perhaps there is some issue with your file system and/or device driver that is reporting an error to Oracle when it opens them, so it thinks they weren't opened successfully, but the OS still thinks they were? Did you change ANY system configuration (hardware or software) between the time everything was OK and the start of this problem (pardon me if you have already answered this previously)?
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!


Reply With Quote
