24 Feb 2009

Debugging the notorious "Segmentation fault" in Apache Error Log?

Have you ever noticed errors as below in the Apache Error Log and scratched your head as from which URL has caused this.

child pid xxxxx exit signal Segmentation fault (11) --> xxxxx is a pid number.

Problem is that whenever an Apache Thread / Process crashes, the error log reports this line and nothing more than that.

That is we get no idea what type of request led to the segfault.

To find out the reasons, one can start Apache in Single Process mode and try accessing the web-site and view the error log. Chances are that you may catch it.

But, generally on a large Site that's really difficult. Moreover, if its a production site, possibility of taking this route is more difficult.

You can also go through a suggested solution at Debugging intermittent crashes.
Solution

There is a module mod_whatkilledus by an Apache Developer Jeff Trawick, which helped me a lot. I hope that this may help others too.

mod_whatkilledus keeps a little bit of state on each active connection, which allows it to know what a thread was handling in case the thread segfaults. If that happens, it writes a report to the error log.

Here is short note from within the mod_whatkilledus code itself:

mod_whatkilledus is an experimental module for Apache httpd 2.x which tracks the current request and logs a report of the active request when a child process crashes. You should verify that it works reasonably on your system before putting it in production.

mod_whatkilledus is called during request processing to save information about the current request. It also implements a fatal exception hook that will be called when a child process crashes.
Apache httpd requirements for mod_whatkilledus:

Apache httpd >= 2.0.49 must be built with the --enable-exception-hook
configure option and mod_so enabled.

Compiling mod_whatkilledus:

apxs -ci -I/path/to/httpd-2.0/server mod_whatkilledus.c

Activating mod_whatkilledus:

1. Load it like any other DSO.

LoadModule whatkilledus_module modules/mod_whatkilledus.so

2. Enable exception hooks for modules like mod_whatkilledus:
EnableExceptionHook On

3. Choose where the report on current activity should be written. If you want it reported to some place other than the error log, use the WhatKilledUsLog directive to specify a fully-qualified filename for the log. Note that the web server user id (e.g., "nobody") must be able to create or append to this log file, as the log file is not opened until a crash occurs.

How I did:

I just recompiled my Apache with extra --enable-exception-hook option in configure as:

cd /usr/local/src/httpd-2.0.58 --> then I was using 2.0.58
./configure \
--prefix=/usr/local/apache \
--with-mpm=prefork \
--enable-so \
--enable-cgi \
--enable-info \
--enable-mod_status \
--enable-usertrack \
--enable-mime-magic \
--enable-rewrite=shared \
--enable-speling=shared \
--enable-nonportable-atomics=yes \
--enable-deflate \
--enable-expires \
--enable-headers \
--enable-setenvif \
--enable-env \
--enable-unique-id \
--enable-logio \
--enable-exception-hook
make
make install

Now, the Apache was ready for Exception-Hook based modules.

Lets, compile the mod_whatkilledus itself:

cd /usr/local/src
mkdir mod_whatkilledus
cd mod_whatkilledus
wget http://people.apache.org/~trawick/mod_whatkilledus.c
wget http://people.apache.org/~trawick/test_char.h

/usr/local/apache/bin/apxs -ci mod_whatkilledus.c
For me it produced the following output:

[root@maria mod_whatkilledus]# /usr/local/apache_back/bin/apxs -ci mod_whatkilledus.c
/usr/local/apache_back/build/libtool --silent --mode=compile gcc -prefer-pic -DAP_HAVE_DESIGNATED_INITIALIZER -DLINUX=2 -D_REENTRANT -D_GNU_SOURCE -g -O2 -pthread -I/usr/local/apache_back/include -I/usr/local/apache_back/include -I/usr/local/apache_back/include -c -o mod_whatkilledus.lo mod_whatkilledus.c && touch mod_whatkilledus.slo
/usr/local/apache_back/build/libtool --silent --mode=link gcc -o mod_whatkilledus.la -rpath /usr/local/apache_back/modules -module -avoid-version mod_whatkilledus.lo
/usr/local/apache_back/build/instdso.sh SH_LIBTOOL='/usr/local/apache_back/build/libtool' mod_whatkilledus.la /usr/local/apache_back/modules
/usr/local/apache_back/build/libtool --mode=install cp mod_whatkilledus.la /usr/local/apache_back/modules/
cp .libs/mod_whatkilledus.so /usr/local/apache_back/modules/mod_whatkilledus.so
cp .libs/mod_whatkilledus.lai /usr/local/apache_back/modules/mod_whatkilledus.la
cp .libs/mod_whatkilledus.a /usr/local/apache_back/modules/mod_whatkilledus.a
ranlib /usr/local/apache_back/modules/mod_whatkilledus.a
chmod 644 /usr/local/apache_back/modules/mod_whatkilledus.a
PATH="$PATH:/sbin" ldconfig -n /usr/local/apache_back/modules
----------------------------------------------------------------------
Libraries have been installed in:
/usr/local/apache_back/modules

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
- add LIBDIR to the `LD_LIBRARY_PATH' environment variable
during execution
- add LIBDIR to the `LD_RUN_PATH' environment variable
during linking
- use the `-Wl,--rpath -Wl,LIBDIR' linker flag
- have your system administrator add LIBDIR to `/etc/ld.so.conf'

See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
chmod 755 /usr/local/apache_back/modules/mod_whatkilledus.so
[root@maria mod_whatkilledus]#
In my httpd.conf, I put the following:

LoadModule whatkilledus_module modules/mod_whatkilledus.so

EnableExceptionHook On

WhatKilledUsLog /var/log/whatkilledus.log
Restarted the Apache (/etc/init.d/httpd restart).

Started watching Apache Error Log:

tail -f /var/log/apache_error_log
I just noticed a SegFault line:

[Mon Feb 23 16:24:56 2009] [notice] child pid 29412 exit signal Segmentation fault (11)

Went on to check the /var/log/whatkilledus.log and I found something similar to the following:

[Mon Feb 23 16:24:56 2009] pid 29412 mod_whatkilledus sig 11 crash
[Mon Feb 23 16:24:56 2009] pid 29412 mod_whatkilledus active connection: 9.65.120.97:2035->9.27.177.26:8080 (conn_rec 434b90)
[Mon Feb 23 16:24:56 2009] pid 29412 mod_whatkilledus active request (request_rec 438b48): GET /silly?fn=sigsegv HTTP/1.1|Host:aplinux.raleigh.ibm.com%3a8080|User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv%3a1.5)Gecko/20031007|Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1|Accept-Language:en-us,en;q=0.5|Accept-Encoding:gzip,deflate|Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7|Keep-Alive:300|Connection:keep-alive|Cache-Control:max-age=0
[Mon Feb 23 16:24:56 2009] pid 29412 mod_whatkilledus end of report

The above logs are not the actual once (I am unable to show due to obvious reasons). Thus, I was able to track the cause and intimated the Development Team and had it resolved.

You can read more about it from Jeff Trawick's pages here and here.

4 comments:

  1. This error is a black box for most of web admins. Invaluable explanation thanks ..

    ReplyDelete
  2. will it effect performance of the server in terms of cpu process, memory and IO

    ReplyDelete
  3. I did not notice significant effect on performance.

    In any case, you would do it temporarily. As soon as you have done your diagnosis, you would definately disable it by commenting the lines:

    #LoadModule whatkilledus_module modules/mod_whatkilledus.so

    #EnableExceptionHook On

    #WhatKilledUsLog /var/log/whatkilledus.log

    ReplyDelete
  4. Hello Sir,

    Just a quick question may be stupid, bcz i haven't used it yet...

    Do mod_whatkilledus, will track every request (process) or just those process who is causing segfault?

    Regards
    Manish Singh

    ReplyDelete