rmoff

May 19, 2011

OBIEE performance – get your database sweating

Filed under: bi, DWH, obiee, oracle, performance, rant, rrdtool — rmoff @ 16:01

Just because something produces the correct numbers on the report, it doesn’t mean you can stop there.

How you are producing those numbers matters, and matters a lot if you have an interest in the long-term health of your system and its ability to scale.

OBIEE is the case in point here, but the principle applies to any architecture with >1 tiers or components.

Let me start with a rhetorical question. The user has got a report which has ten rows of data. Which of the following methods is going to be a more efficient way to generate the report?

  1. Transfer the ten rows of data from the database back to the rendering application, and the rendered report to the user
  2. Transfer fourteen million rows of data from the database back to the rendering application, get the application to crunch these to the required ten rows, transfer rendered report to the user

Obviously, it’s the first one. In the second one (and this is no hyperbolic example to prove a point, I’m sorry to say) we have the following overheads:

  • Network traffic of fourteen million rows from the database to the application server
  • Disk and/or Memory impact on the application server, depending on how it’s processing those fourteen millions rows
  • CPU impact on the application server when it processes those fourteen million rows

Considering OBIEE specifically, you should be aiming to be able to answer all of your questions on the database directly. As soon as you start using the database simply as a source for dumping big volumes of rows into your OBIEE server, you’re storing up problems.
An RDBMS is designed for crunching lots of data. You’ve already lifted all that data off the disk when you selected it out of the tables on Oracle DB – why not get Oracle DB to do the additional processing required too? Where’s the benefit in lifting & shifting all of that data to then reprocess it again?
In fact, consider why you’re crunching big volumes of data each time in the first place. If multiple people have the same big queries, are you missing some aggregates in your data source that would support these queries much better? Or do you need to be looking towards multi-dimensional sources such as Oracle OLAP or Essbase?

The flexibility and power of OBIEE to do its own calculations and aggregations on data is a double-edged sword, and one to be wielded with responsibility. It is a great feature and one which cannot be avoided if you’re using federated queries across disparate sources. But if you’re using it out of ignorance or laziness to brute-force a solution instead of doing the big work at the lowest level possible then you’re asking for trouble.

If you rely on OBIEE to do the heavy work of your reporting solution, then you need to factor this in to your hardware spec for the machine. As a reporting interface to well aggregated data with fairly simple reports, we’ve found that it runs at minimal CPU, and doesn’t stress the disk. You can predict fairly reliably that this should scale just fine so long as your database can. But if you put the workload on the OBIEE server too, you’re going to hit bottlenecks much sooner.

The principle of moving as little data around as possible is described by Cary Millsap as “Filter Early” and described well in his blog article. Thanks to @RonCrisco for helping me remember the name. @cdturri pointed out that it’s also just common sense :-).
Applying it to a specific system, Exadata uses it in its SmartScan technology where it applies filtering of data directly on the storage rather than bringing all the data back up to the database SQL processing layer (h/t @ocpdba).

Case-study

Here’s an example of a situation where all is not well in OBIEE-land.

We were alerted to it by unix complaining that a filesystem was running low on space. Checking the contents of the filesystem we spotted these monsters:

[user@server]/data/bi/tmp $ls -lrt
total 112914646
[...]
-rw-------   1 user   biadmin    29122976800 Apr  6 11:32 nQS_20117_465_33898735.TMP
-rw-------   1 user   biadmin    24816966976 Apr  6 11:12 nQS_20117_464_33792457.TMP
-rw-------   1 user   biadmin    3582054936 Apr  6 11:46 nQS_20117_469_37979712.TMP
[...]

These are temporary files generated by BI Server (nqsserver), and for those of you viewing in black and white, those files are getting on for 30GB a go!

When queried, NQQuery.log embarrassingly admits facts such as:

Rows 13,894,550, bytes 3,260,497,648 retrieved from database query id: xxxx

and Usage Tracking shows the kind of damage being inflicted:

                      Total
                       time     row num db     cum num
START_TS     END_TS     sec   count  query      db row 

06-APR 10:05 10:18      579       0      6   3,436,816 
06-APR 10:05 10:18      553       0      4   3,239,101 
06-APR 10:05 10:18      383       0      3   1,624,656 
06-APR 10:11 11:48     5694       0      1  13,894,550 
06-APR 10:11 11:48     4314       0      1  11,840,156 
06-APR 10:21 10:27      336     456      4   3,239,101 

A look at the machine’s vital statistics for this time period shows the impact:

Affected BI Server, showing impact of workload which quietens down c.12:00

For a bit more detail, here’s the time period at greater resolution:

Systems metric during the heavy report execution

Notice that the server starts paging out during the work, and there’s a lot of disk activity – both read and write. CPU goes from a normal c.5% up to 20-30%.

This was one user doing this — care to place bets on how the system would perform if it were rolled out to ten users running this kind of workload?

For reference, here’s one of our other BI Servers which serves at least 10 times as many users, where the reports do the bulk of their work in the database:

Another BI Server on the same day, with a greater number of users but reports which do their big work on the database

Monitoring for problems

There are a few easy ways to look out for this bad behaviour. The first is in Usage Tracking, where you can look at S_NQ_ACCT.CUM_NUM_DB_ROW and compare it to S_NQ_ACCT.ROW_COUNT, even writing an alert for when this ratio goes above a defined threshold.
You could also look at an alert on S_NQ_ACCT.CUM_NUM_DB_ROW going above a fixed number, since depending on your system’s purpose and architectural principles you might want to aim to never be pulling back more rows from the database than necessary for a report.

I’d also pay close attention to S_NQ_ACCT.NUM_DB_QUERY, as this will show you reports generating more than one query on the database and thus increasing the workload on the BI Server processing the combined results.

Another thing to check is the NQQuery.log, looking at the number of bytes returned from a database query. This could feed into some kind of generated alert for closer investigation if a query returns above a certain number.
OBI 10g:

grep retrieved NQQuery.log|awk '{print $5}'|sort -n

OBI 11g:

grep retrieved NQQuery.log|awk '{print $20}'|sort -n

(if you’re running on Windows then to run this command you’ll need either cygwin or google for a Win32 version of grep/awk/sort – or brew your own Powershell version)

I’d also strongly recommend monitoring your BI Server’s TEMP folder (defined as WORK_DIRECTORY_PATHS in NQSConfig.INI), as this is where it lands the data to disk when it’s got to crunch it. Monitor this for two reasons – to spot when bad stuff’s happening, but also in case it fills up and causes your BI Server to barf. For the latter, you can expect to get:

[nQSError: 10058] A general error has occurred. 
[nQSError: 46118] Out of disk space. 

Developing for the long term

How do you assess your development projects for quality?
You presumably are keen that the reports match the specifications and return the right numbers. You hopefully also have NFRs for how quickly these reports run.
But what about system impact of new developments? How do you quantify this?

Unless you are delivering a standalone project, fire-and-forget, then maybe you can performance test for load and concurrency up-front to validate how well your solution will scale to the required users.
But lots of BI projects are iterative and may well be across functions too. Just because your development project is first to land it doesn’t give you the right to dump a steaming pile on the servers and proclaim that it works with your user volumes and meets their NFRs, so all is well. What about all the projects that come along next, and are going to be hindered by your mess?

This is where another double-edged sword comes into play – “Best Practice”. For me, Best Practice is a way of doing something that multiple implementations, time and expertise has shown to be the best way of not screwing up. It is the starting point from which to work, understanding deviations as required. What it is not, and what gets it the bad reputation, is a fixed set of crude rules to be adhered to blindly and implemented without questioning or understanding.

If a system is not adhering to the kind of best practice I’m talking about here – filter early, in essence – then there may be a good reason. But that reason must be consciously and loudly acknowledged and documented.

October 18, 2010

When is a bug not a bug? When it’s a “design decision”

Filed under: informatica, obia, rant, support — rmoff @ 13:02

Last month I wrote about a problem that Informatica as part of OBIA was causing us, wherein an expired database account would bring Oracle down by virtue of multiple connections from Informatica.

I raised an SR with Oracle (under OBIA support), who after some back-and-forth with Informatica, were told:

This is not a bug. That the two error messages coming back from Oracle are handled differently is the result of a design decision and as such not a product fault.

Is “design decision” the new “undocumented feature” ?

September 13, 2010

RTFM? But where TF is the FM? >> Offline searchable OBIEE 11g documentation

Filed under: documentation, obiee, oracle, rant — rmoff @ 11:32

I’m a geek. I like understanding things in their absolute entirety. It frustrates me to have to make presumptions or assumptions about something. I like to get down ‘n dirty and find out what makes things tick.

So that necessitates reading manuals. And unfortunately, Oracle don’t make that easy all the time.

Some of the manuals are not very good, and others are difficult to find. Given the complexity of the OBIEE stack and proliferation of terminology and product names, sorting the wheat from the chaff can be a headache. Fair enough, for a corporation as big as Oracle with a product portfolio in the [hundreds? thousands?] creating a unified, comprehensive, easy-to-navigate point of reference for documentation must be near on impossible – but that’s their problem to figure out and mine to bitch about until it’s done 😉

In all seriousness, documentation needs to be all in one place but sufficiently isolated so that generic search terms (think “system requirements” etc) can be found within the context of the product, not a dozen other related products.

So – Christian Turri has done the OBIEE 11g developer community a huge favour by taking matters into his own hands and creating a offline, fully indexed and searchable, collection of PDFs. You can read his rationale and further details here, or download the documentation as a zip file directly here.

February 23, 2010

Oracle Support Blog back online, kinda.

Filed under: oracle, rant — rmoff @ 19:02

After it was ripped down last week, Chris Warticki’s blog is back online, albeit with the last posting redirecting visitors to a new location on Oracle Communities.

Maybe I’m getting too old for this s##t, but I’m yet to really get a handle on how Oracle want to interact with real people on the ground. Oracle Communities is a fairly new site that I’ve not explored so much because of no OBIEE area. My Oracle Support has feedback and comment sections, but in general usability stinks. A support blog is torn down and then reappears. Oracle Forums are popular as ever but the software blows and some forums (eg OBIEE) are withering under a very high noise to quality-content ratio. Oh, and Oracle Mix (“Bringing Oracle Customers, Employees, and Developers together”) too, which looked quite neat but I have no idea how that’s supposed to fit it the picture.

Thriving online communities are worth many times over the sum of the parts, but I have a “real” job to do as well as trying to keep up with the numerous sites (and these are just the official Oracle ones) and establishing a presence on each. There’s no point a person having to ask a question in multiple locations just to make sure everyone who can answer it will see it. And unless you have lots of experts in one place you can end up with one professed expert doling out answers with no-one to challenge them.

As an example of a thriving online community (and this is a prime example for any Web2.0-social-medja-bullsh##ter to take heed of) I would offer up : Oracle-L. A mailing list based on email. Not very cutting edge huh? But it’s followed by a huge number of real experts, and it’s moderated enough that there’s no rubbish on it. Who cares if it’s not written in flash, HTML5, or doesn’t have a twitter feed? The content is superb, and the delivery mechanism suffices. Job done. Sure, it would be nice to have recognition points, user profiles, blah blah blah. But build on the thing that matters – people who know their stuff, and content. Oh, and delivery mechanism. I want RSS (or email, I suppose), I don’t want to have to visit five different sites to find out what’s new.

Maybe it’s too many chefs spoiling the broth at Oracle (i.e. everyone knows there should be one site, but so long as it’s their site). But seriously guys, it shouldn’t be too hard to have a single hub of interaction. Should it?

November 11, 2009

#Fail: My Oracle Support

Filed under: oracle, rant, support — rmoff @ 15:45

Metalink was retired this weekend and made way for the new My Oracle Support system. It didn’t go as smoothly as it could have done.

This post is going to be a bit of a rambling rant.

Ultimately people, including me, don’t like their cheese being moved (not unless there’s a really runny piece of Camembert at the end of it). That makes it a bit more difficult to discuss because some of people’s complaints will just be geeks being stubborn (and boy, can geeks be stubborn). Arguments descend into minutiae of detail and flash vs DHTML – whilst the bigger picture gets lost.

People especially don’t like their cheese being moved (okay okay enough of the cheese) change to systems that they depend on to do their job. If it were the migration of a blogging website or somesuch then it’d be a bummer, there’d be grumbling about it, but ultimately people would probably be quite sanguine about it. When it comes to a support website though, it has to be available.

If this were a system that we delivered to our users then we’d (hopefully) get laughed out the building and/or strung up. It stinks, and there’s no denying it. Maybe once upon a time the concept was a good one, but somewhere along the line looks overtook functionality and someone in charge forgot that this wasn’t a beauty contest but a support website relied on by many many people for doing their jobs. Some of the new functionality (and it is there) in MOS is quite neat — but I only discover it by accident because most of the time I’m waiting for the s#dding thing to load or respond to a mouse click.

I can understand a marketing agency designing some krazy kool website to sell junk food to kidz using lots of flash and clever code, and the benefit (whizzy effects impressing target audience) outweighs the disadvantage (lower spec’d PCs can’t display it properly or at reasonable speed). But a support website? C’mon! It’s a support website! It should work in Lynx (maybe not quite). It was apparently tested on a 2 gig / 3Ghz PC – I’d suggest that’s hardly standard fare yet.

I want to go to the Oracle support website and get support. I shouldn’t have to attend training or webinars to use a website. If I do, then the website’s badly designed. Seriously. And enough with the rambling waffly emails already. I get enough emails everyday that any communication about Metalink/MOS needs to be clear and concise. It doesn’t need BS in it about a “Leveraging the personalized, proactive, and collaborative support capabilities […] reduce the time you spend maintaining Oracle solutions” (literal quotation).

As an OBIEE user I’ve already been using My Oracle Support after Metalink3 was discontinued a few months ago. After that migration I raised several non-technical SRs reporting various problems, and almost always got a response with the implication that I was doing something wrong or needed helping, rather than the impression that I’d reported a bug which needed fixing.
Somehow, and I’d have hoped this would come from within the organisation, bugs reported by customers need to go straight to Dev, rather than the customer fobbed off. And I was fobbed off without a doubt. Next time I shall not bother reporting problems because it’s not worth the time I spend on it.

Sr. Customer Support Manager Chris Warticki at Oracle has blogged about the cutover:

There’s another blog from Support here.

OUG survey

OUG are running a survey until 19th Nov:

Last weekend, Oracle closed the current Metalink service and migrated the users to My Oracle Support.
UKOUG has had reports from its membership and from across EMEA of a number of problems in this migration.
In order to enter into dialogue with Oracle on this, we would appreciate it if you could complete the following very short survey.

You can find the survey here

Footnote – non-flash My Oracle Support

There is a non-flash version of My Oracle Support at http://supporthtml.oracle.com. However from where I am I can’t login directly (see errors below)

You might be able to get in indirectly on this link.

Clicking the Home link when going in on this link or trying to login from http://supporthtml.oracle.com gives 500 Internal Server error on IE and “Recursive error in error-page calling for /secure/error.jspx, see the application log for details. ” in FireFox.

Looking at http://supporthtml.oracle.com and having used the flash version for a while now the non-flash version looks pretty similar. More effort’s gone into its appearance than I’d expect for a site that’s been knocked out in HTML as a purely-functional alternative to the main flash site.
It’s evidently not fully functional yet but I wonder if someone’s taken the wise idea to do the rewrite in non-flash and will ditch the flash version at some point in the future?

Follow up

It looks like things are stabilising a bit, although I still get inconsistent results when using supporthtml.oracle.com.

Some more blogs about the problems:

October 30, 2009

The state of OBIEE on the web

Filed under: obiee, rant — rmoff @ 11:06

An advance footnote

I’ll start this by saying why I think things are how they are, and then I’ll get to the meat of my article.

OBIEE in its current incarnation (v10.1.3) is a mature product. All the big bugs have been caught and fixed. All the known quirks are well documented. All the missing features are known. All the clever workarounds have been found. All the neat little hacks have been explored.

What’s left to talk about?

Where have all the good men (and women) gone?

The OTN OBIEE Forum has become a depressing place.

This is no forum of equals, sharing and discussing a joint interest in a product.

I would say that 90%, if not 95%, of posts are from beginners. Already I can hear the shaking of heads, but bear with me. There’s nothing wrong with beginners posting on forums. Everyone has to start somewhere. But the problem is — they then disappear. So instead of organically growing as part of a community and in turn becoming more knowledgeable and helping others with their problems they leave. This leaves a fresh batch of beginners with questions, yet the same people answering them (invariably growing more frustrated and cynical).

The second problem, and not limited to this forum, is that people don’t RTFM! They don’t search in Google, and they sure as heck don’t search the forum. Why should I spend MY time looking up something in the manual for someone just so that THEY don’t have to?

This maybe shows my age, but on Usenet (nntp:// !) the mantra was always to “lurk” first. You gained a feeling for the group, and you saw quickly what questions came up repeatedly. You read the FAQ, so that you didn’t post a Frequently Asked Question. That’s where FAQs came from – people ask the same question again and again.

Asking questions is brilliant. Asking questions is how one learns. Showing curiosity in something and exploring it is how one becomes truly knowledgeable. Answering those questions and thinking about them is how one learns more and proves ones knowledge.
Asking for the hundredth time why Presentation Services won’t start, if OBIEE is supported on a certain OS, how to configure a DSN on Linux, or how to implement SSO is not learning. It’s plain lazy. It’s asking someone else to do your job for you.
If you really can’t get Presentation Services to start, and you post up the full spec of your environment, configuration, and what you’ve tried already and with details of what didn’t work with all the symptoms, links to all the posts you’ve found and ruled out as being relevant to your problem and why — then, and only then, is it fair to be posting your question.

And finally, since when did a lack of manners become acceptable? If someone helps you out, it’s polite to say thank you. People constantly abandon threads yet go on to post new ones. It’s also important from a point of view of people searching (hah!) for previously answered questions. If a suggested solution has been acknowledged to work other people may try it. If a suggested solution has been met by a deafening silence then people may give it less credence.

To finish this section on a positive note, here’s an example of what I think the forum should be about. These are a couple of threads I’ve been involved with where it’s not been necessarily something documented, where there’s been a back and forth of diagnostics, where both sides have benefited from the thread

Silence in blog land

Related to the OTN forum is the decline in OBIEE blogging. Just as there’s no sparks of new ideas or problems being discussed on the OTN Forum, there’s not a lot being posted on the OBIEE blogs any more. It could be literally the case that everything that’s to write about has been.
If one looks at the blog archives of people like Venkat and John M, what else is there to be said, other than re-hashing the same points?
There are new blogs coming along (and I include myself in that, in internet years I’m extremely recent too), but there’s either not a lot to write about, very esoteric, or just re-working the same ideas.

OBIEE 11g

I’ll say again what I did at the top of this posting.

OBIEE in its current incarnation (v10.1.3) is a mature product. All the big bugs have been caught and fixed. All the known quirks are well documented. All the missing features are known. All the clever workarounds have been found. All the neat little hacks have been explored.

On the horizon and (slowly) approaching is OBIEE 11g. It’s is already in a limited ßeta. Once it’s open to a more wide field (presumably those currently ßeta testing it are under NDA) there’ll be a whole raft of exciting new things to talk about and I would imagine things will pick up again

The future

A new OBIEE forum?

I think there’s a real need for a proper, moderated, OBIEE forum. The OTN one is probably beyond redemption. Much better to start a new one with clear posting rules, and as people stray gently bring them round 🙂 No-one has time to herd cats (per the OTN forum), but the odd stray question can easily be redirected. A forum outside of OTN would also be free of its dodgy functionality and restrictions.
A well organised forum with little repetition will have less traffic to read through, and thus be much easier searched.
Registration would be mandatory, to reduce the number of people “fire and forgetting” their questions.
Even have a ‘waiting period’ after registration to reduce the number of “URGETN!!!!!” posts.

Volunteers? 😀

Blogging

Over here it’s approaching the time of year when animals go into hibernation, and maybe some OBIEE bloggers are doing the same, ready to burst forth full of life once OBIEE 11g is available.

A real footnote

Maybe it’s the dark autumn evenings making me a grumpy old sod, and things aren’t as I make out. I’d be interested in people’s comments 🙂

August 4, 2009

OBIA grumble

Filed under: obia, oracle, rant — rmoff @ 16:35

I’m starting on an upgrade from OBIA 7.9.5 to 7.9.6 and wading through the two main docs:

It would be nice if Oracle could come up with some less confusing terminology. It seems that not only is the whole product of OBIA referred to as OBIA (see @lex’s posting for a good explanation), but that the sub-components which are not-OBIEE-or-DAC-or-Informatica is also OBIA, c.f. page 6-1 of the Upgrade guide “[…]upgrade your Oracle BI Applications environment to the current version.”
To me that implies that once I’ve done this, my OBIA will be upgraded – but no, actually, some of the supporting bits will be upgraded, but I still have to do a heck of a lot more grunt work before what I consider OBIA (and the manual is called OBIA too!) is upgraded.

And whilst I’m at it … it tells you to “[…]uninstall […] previous release of Oracle BI Applications […]” – but doesn’t tell you how.
I checked the docs but found no reference, and in the end manually moved the objects listed in the Installation guide page 4-13 as being installed by the Oracle BI Applications installation (which of course installs Oracle BI Applications, but not Oracle BI Applications – spot the difference? sorry, I’m just being fatuous now) out of c:\OracleBI and C:\OracleBIData.
After running the 7.9.6 OBIA installer I noticed a c:\OracleBI\UninstallApps folder, so maybe that was there for 7.9.5 before too and I just missed it.
Still – how difficult would it have been to include in the documentation “You must install previous versions first, run c:\OracleBI\UninstallApps\setup.exe to do this“?

July 21, 2009

OTN forum rant

Filed under: oracle, rant, silly — rmoff @ 15:55

I read and post a bit on the OBIEE and OBIA OTN forums. The noise ratio isn’t too bad, but a few things really get my goat:

  1. Not responding to answers!
    If I’ve gone out of my way to help, or try to help, at least have the courtesy to acknowledge it, and ideally mark as Helpful or Correct as appropriate.
    Even a simple “thanks.” would do. It’s just good manners.
    It also helps people who come along afterwards as you get the completed picture rather than half a question/answer thread.
    Forums Etiquette
  2. Questions like “how do i install obiee will it work on my amiga”. R.T.F.M. has never been more appropriate!

I’m sure there are more, but that’s my steam vented for the moment 🙂

And on a lighter note: Top 10 funniest OBIEE topics

Blog at WordPress.com.