rmoff

October 16, 2009

BI Server hung – nQSError 14054 / 15001 / 23005

Filed under: obiee — rmoff @ 10:43

Watch out if you are using init blocks in your RPD. We hit a bug (#9019374) recently that caused BI Server (10.1.3.4) to hang.

The init block in question should have returned a date to update a repository variable, but because of badly-written SQL and abnormal data in the source table actually returned a null value. BI Server evidently didn’t like this null being inserted somewhere where it shouldn’t have and understandably logged :

[14054] Unable to load subject area: Core.[nQSError: 15001] Could not load navigation space for subject area Core.
[nQSError: 23005] The repository variable, LAST_REFRESH_DT, has no value definition.

Instead of invalidating the subject area Core and continuing as normal (which is what it should do), it hung and had to be killed forcefully (kill -9 nqsserver).
We fixed the init block SQL to exclude the NULL value, and restarted BI Server successfully.

The hang was worse than a crash because diagnostics are harder since it’s less clear what the problem is immediately as nothing’s “broken”, things just aren’t working right. The symptoms we got were

  • Users already logged in found dashboards or answers weren’t working (no errors, just no response when items clicked on)
  • Admin Tool hanging after a short period of time when connected to the server.
  • Users who weren’t logged in getting stuck at the OBIEE “Logging in” screen.

It’s always interesting with these kind of problems looking back on the initial diagnosis. With hindsight, it’s obvious why this caused a problem, but at the time we were scratching our heads. How could our server suddenly have stopped working, when we’d not changed anything for weeks? But of course something had changed: the data! Our init block refreshed hourly, and the data it read from had changed since it refreshed the previous hour.

So lessons to take away are:

  • Don’t write SQL that can return NULL values – never mind if “the data should never be null” – if it CAN then it MIGHT, so code for it!
  • If something’s “suddenly” stopped working, remember to think about less obvious factors like init blocks

Update 12th Feb 2010: Another bug’s been raised on the back of this: BUG 9358471 – REPOSITORY VARIABLES INIT BLOCK SQL SHOULD HANDLE NULL CONDITIONS

Advertisements

2 Comments

  1. Its so kind of you to blog this. we are facing a similar issue for the past several months. Infact we have a Sev1 open with Oracle for the past 5 months… They are still in data collection stage!.

    The issue we have Today is that the server becomes frozen for few minutes, it can range from few seconds to several minutes. At this period, users cannot login to the system (Logging in screen) and the users who are logged in already cannot do anything (no response on any click). the quick solution we have is to restart the services. Once we do, it fixes the issue mostly, else we hit the issue again.

    The only indicator we have is for this issue is the number of active sessions. If it goes beyond 4,5, then we have this issue.

    Did you have the same situation? Im going to ask my teams to check all the initblocks they are using.

    Thanks a lot!

    Comment by Krishna — October 16, 2009 @ 16:20

  2. Thanks. This is an old article but was useful for us yesterday. The W_DAY_D table from OBIEE 10g did not have the information on Feb 29th filled out properly. I suppose Oracle engineering did not account for this leap year day when it was initiated created years ago.

    Comment by wentari — March 2, 2012 @ 04:20


RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Blog at WordPress.com.

%d bloggers like this: