rmoff

December 3, 2010

OBIEE 10g – javahost hang

Filed under: javahost, obiee, sawserver — rmoff @ 14:06

Hot on the heels of one problem, another has just reared its head.

Users started reporting an error with reports that included charts:

Chart server does not appear to be responding in a timely fashion. It may be under heavy load or unavailable.

Set up is a OBIEE 10.1.3.4.1 two-server deployment with BI/PS/Javahost clustered and loadbalanced throughout.

Diagnostics

Javahost was running, and listening, on both servers:

$ps -ef|grep javahost
obieeadm 14076     1  0  Nov 25  ?         9:23 /app/oracle/product/OracleAS_1/jdk/bin/IA64N/java -server -classpath /app/oracle/product/obiee/web/javahost/lib/core/sautils.ja
$netstat -a|grep 9810|grep LISTEN
tcp        0      0  *.9810                 *.*                     LISTEN

In Javahost log file on both servers there were these errors reported, but since javahost had started over a week ago:

Nov 30, 2010 8:08:36 AM MessageProcessorImpl processMessage
WARNING: Unexpected exception. Connection will be closed
java.io.EOFException
        at com.siebel.analytics.web.sawconnect.sawprotocol.SAWProtocol.readInt(SAWProtocol.java:167)
        at com.siebel.analytics.javahost.MessageProcessorImpl.processMessage(MessageProcessorImpl.java:133)
        at com.siebel.analytics.javahost.Listener$Job.run(Listener.java:223)
        at com.siebel.analytics.javahost.standalone.SAJobManagerImpl.threadMain(SAJobManagerImpl.java:205)
        at com.siebel.analytics.javahost.standalone.SAJobManagerImpl$1.run(SAJobManagerImpl.java:153)
        at java.lang.Thread.run(Thread.java:595)

Charts are written to a temp folder, but none have been written since yesterday afternoon:

$ls -lrt /data/bi/tmp/sawcharts/ |tail -n 2
-rw-r-----   1 obieeadm   biadmin      13611 Dec  2 16:30 saw4cee1a27-7.tmp
-rw-r-----   1 obieeadm   biadmin          0 Dec  2 16:31 saw4cee1a27-32.tmp

$ls -lrt /data/bi/tmp/sawcharts/ |tail -n 2
-rw-r-----   1 obieeadm   biadmin       7454 Dec  2 15:25 saw4cee219b-1.tmp
-rw-r-----   1 obieeadm   biadmin          0 Dec  2 15:28 saw4cee219b-6.tmp

First time the error was seen: (from sawserver.out.log)

server01: Fri Dec  3 09:40:23 2010
server02: Thu Dec  2 15:44:38 2010

Resolution

It looked like javahost was up, but not responding to requests — which is pretty much what the error message said on the tin. The solution was that of many a computer problem – turn it off and turn it back on again.

Since the rest of the (production!) OBIEE service was up and in use, I didn’t want to use the normal shutdown script run-saw.sh as this would also kill Presentation Services. Therefore I extracted the following from run-saw.sh and ran it manually on server01:

set +u
ANA_INSTALL_DIR=/app/oracle/product/obiee
. ${ANA_INSTALL_DIR}/setup/common.sh
./shutdown.sh -service

This successfully killed javahost. I restarted it using :

nohup ./run.sh -service >> /data/bi/web/log/javahost.out.log 2>&1 &

But – the error remained when I refreshed the reports (on both servers).

I then killed javahost on server02 using the same method. At this point, Charts started working again. Presumably Presentation Services had been using javahost on server02 and not recognising it had hung saw no reason to switch to javahost on server01. Once it was killed on server02 it switched and thus started working again.
To complete the work I restarted javahost on server02.

Investigation

The only hit on MOS and Google I found was this: OBIEE Chart Server Error When Showing Charts (Doc ID 944139.1) which details some parameters to tweak, although more to do with javahost being busy (which it wasn’t in this case).

Advertisements

1 Comment

  1. You may want to start your services with the autorestart parameter which should have them restart automatically should they crash. Having said that the run-saw.sh script, which is the one that starts the Java Host service doesn’t seem to be coded correctly as it seems to check only that the Presentation Server is running, not the Java host. It shouldn’t take much time to change this script to do so too. If you do that let me know as we had the same problem in Production a few weeks ago. If you do use the autorestart option you should enable core dumps and have a script check for them as the autorestart option may “hide” core dumps given that services are restarted automatically. CT

    Comment by Christian Turri — December 10, 2010 @ 00:09


RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Create a free website or blog at WordPress.com.

%d bloggers like this: