OVERDOSE: 2011

Saturday, 30 April 2011

-bash: fork: Resource temporarily unavailable

When something like this happens, odds are very low that you will be able to troubleshoot further or execute any system specific commands like '/usr/bin/free', 'ps -eaf | wc -l'...etc.

I wish the error was more specific on what bloody resource in particular but anyways, Google hints this:

1) process table got filled up completely (check for defunct processes mainly).
2) server ran out of memory (IMHO, when memory becomes scarce.. it should call OOM, rather than throwing 'out of resource' error).
3) system reached its maximum number of process pids (by default it's around 32768).
4) number of open file descriptors got exhausted.

Since, I couldn't capture any of the above details when this issue happened, so kept quiet waiting for it to re-occur.

Fortunately/Unfortunately it happened again last night (on one of of our DB boxes running RedHat5).
This time, rather than wasting time in getting access to the server and then figuring what is going wrong, I thought to just crash the system via sysrq keys and got a core.

Once the core was dumped, host got rebooted & apps came up fine....Thank God!

Soon vmcore was treated with crash tool and to no surprise I could see more than 32000 processes (initiated as oracle user with name 'orarootagent.bi') floating around.

A-ha.....so my system-wide limit for maximum number of pids (/proc/sys/kernel/pid_max) was nearly exhausted by some rogue process . Cool ;) What next?

Boss is going to tap my shoulder tomorrow for this finding but at the very next second I might be asked.....HOW DO WE AVOID THIS HAPPENING IN FUTURE???

Suggestions:
a) Increase the overall system limits.
b) Limit the 'oracle' user to something less (probably half the the overall system limits).
~~c) Just blame DBA's for such nasty process and relax!~~

If { } any of the above hack works, you can skip the rest of the section and close this page....else { } continue :)

Tuesday, 12 April 2011

How to find out list of forcefully installed RPM's

Recently we were asked to audit RedHat servers for RPM's which were forcefully installed

(with "--nodeps" OR "--force" option) in past.

Below if how we managed to get that working:

1) Make sure you have "yum-utils" package installed on the server.

If n't, you can install it using yum.

# yum install yum-utils

2) Once done, you can then use "package-cleanup" command to find out dependency problems

in the local RPM database.

# package-cleanup --problems

Setting up yum

Reading local RPM database

Processing all local requires

Missing dependencies:

Package ess-openldap-2.3.1 requires ess-openssl= V2.1.1

Package kernel-debuginfo requires kernel-debuginfo-common-i686 = 2.6.18-92.el5

3) Based on the above output it was clear that both 'ess-openldap-2.3.1' and 'kernel-debuginfo rpm's

were installed on the server without resolving underlying dependencies (probably with --nodeps option).

4) Finally we had to clear these stale rpm entries from the database (with --justdb option) for the OS

upgrade script to work

# rpm -e --justdb ess-openldap-2.3.1

# rpm -e --justdb kernel-debuginfo-2.6.18-92.el5