rmoff

August 4, 2011

Security issue on OBIEE 10.1.3.4.1, 11.1.1.3

Filed under: bug, obiee, security — rmoff @ 10:00

July’s Critical Patch Update from Oracle includes CVE-2011-2241, which affects OBIEE versions 10.1.3.4.1 and 11.1.1.3.
No details of the exploit other than it “allows remote attackers to affect availability via unknown vectors related to Analytics Server.”

It is categorised with a CVSS score of 5 (on a scale of 10), with no impact on Authentication, Confidentiality, or Integrity, and “Partial+” impact on Availability. So to a security-unqualified layman (me), it sounds like someone could remotely crash your NQSServer process, but not do any more damage than that.

Patches 11833743 and 11833750 for 10.1.3.4.1 and 11.1.1.3 respectively.

March 8, 2011

OBIEE Systems Management – dodgy counter behaviour

Filed under: bi, bug, jmx, mbeans, obiee, systemsmanagement — rmoff @ 10:59

Over the last few months I’ve been doing a lot of exploring of OBIEE Systems Management data, covered in a mini-series of blog posts, Collecting OBIEE systems management data.

There are a vast number of counters exposed, ranging from the very interesting (Active Sessions, Cache Hits, etc) to the less so (Total Query Piggybacks, although for some seriously hardcore performance tuning even this may be of interest).

This short blog post is about a couple of counters which I’ve been monitoring but which looks to not be entirely reliable. Both are in the Oracle BI DB Connection Pool, and are:

  • Current Connection Count – The current number of open connections in the thread pool.
  • Current Busy Connection Count – The current number of connections assigned to process a query or processing a query in the DB Connection pool.

A picture tells a thousand words in this case:

OBIEE Connection Pool data

We can clearly see :

  • Current Busy Connection Count (“The current number of connections assigned to process a query or processing a query in the DB Connection pool.“) goes negative!
  • Current Connection Count (“The current number of open connections in the thread pool.“) accumulates. Either the counter is buggy, or there really are thousands of open connections in the thread pool which sounds worrying in itself.

These two counters reset and correct themselves only on a BI Server restart, which can be seen by the red vertical lines on the graph.

A snapshot of the current figures (via JConsole) backs up these numbers and puts “Current Connection Count” in context next to the ‘Peak’ and ‘Accumulated’ figures:

OBIEE Systems Management data, viewed through JConsole

January 8, 2011

Materialised Views – PCT Partition Truncation

Filed under: bug, DWH, MViews, oracle — rmoff @ 18:12

I’ve been doing some work recently that involved the use of Materialised Views on Oracle 11g (11.1.0.7), particularly around PCT refresh. There are some things that are not clear from the documentation, or are actually bugs so far as I’m concerned, and I’ve detailed these below.

In this example I was working on part of a DWH with c.2 millions rows aggregated up daily. One of the things that I spent a long time trying to get to work was Partition Truncation when using PCT refresh. We had tried and discarded “FAST” refresh as being completely non-performant for our volumes.

There was something about PCT that confused me for a while, and so maybe other thickos like me will also be confused so I’ll clarify it here: PCT is about Tracking the Change on Partitions of the detail table. It can have interactions with a partition on an MView, but this is not where its terminology refers to.

Our desired aggregation behaviour was as follows. Given the following:

  • a base fact table (partitioned by day)
  • a weekly aggregation MV (partitioned by week)
  • data loaded daily, MV refreshed daily

we would expect the following to be the most efficient way of refreshing the MV:

  1. When we load a new day’s data, for a new week: a new partition populated on the weekly MV
  2. When we load a new day’s data, for an existing week : truncate the partition on the weekly MV, and reload it (As any fule kno, truncate/insert is going to perform better than delete/insert)
  3. When existing data is updated on the base table: corresponding weekly MV partition truncated and reloaded

From what I read in the documentation, this is what PCT would be expected to do. However, it didn’t work like this for us at first.

The Answer

I’ve discussed these elements in more detail in sections below, but to cut to the chase, here is how we got PCT Partition Truncation to work for a detail table partitioned on day and aggregate table partitioned on week:

  • In the MView definition:
    1. Partition by range or list (n.b. not interval)
    2. Join to reference table for partition key (n.b. don’t use ANSI syntax)
      • Eg. Day->Week aggregation, join to calendar reference table
      • Alternatively partition MView on same partition key as detail table
    3. Define the MView USING NO INDEX to prevent the default global index being created
  • When refreshing the MView:

    1. alter session set “_mv_refresh_costing”=’rule_pt_pd_fa_co’;
    2. dbms_mview.refresh(
      • atomic_refresh=>false
      • method=>’P’ or ‘?’

Note: this is what we did to get it to work; it may be that other versions differ particularly around the _mv_refresh_costing parameter and its default value. I presume there is standard disclaimer around using undocumented parameters too.

RTFM!

Links to 11gR1 docs:

What it says in the manual

See here for the documentation regarding PCT partition truncation.
There are three conditions that you must obey for it to work (although see below for a further condition):

  1. The materialized view is partitioned on the partitioning key column or join dependent expressions of the detail table.
    If you think about this it makes a lot of sense. PCT is about tracking change on the partition of the detail table, and so if data in this partition changes, Oracle must be told how this relates to the data in the MView, so that it can refresh just the relevant subset of it.

    • An example of the former (partitioned on the partitioning key of the detail table) is base table and aggregate MV both partitioned on Store, or Warehouse, etc.

      The same partitioning key

    • An example of the latter (join dependent expression) is where you link to a second table in the MV definition, giving Oracle the understanding of the relation between two different partitioning columns (for example, how do weeks relate to days):

      Join Dependent Expression

  2. The materialized view should be range or list partitioned
  3. PCT refresh is nonatomic
    • That is, you specify the atomic_refresh parameter in your refresh call as false:
      EXEC DBMS_MVIEW.REFRESH(	[…], 
      			[…],
      			atomic_refresh=>false,
      			[…]);
      
    • By default (i.e. if you do not specify it), atomic_refresh=>true

<grumble>
The documentation on all this is there, if you look for it. My personal opinion is that other functionality of Oracle DB is better documented than this, and that DBMS_MVIEW as a whole gives away its long history in the disjointed nature of the documentation. If it were re-written today with a focus on data warehousing I’m confident it would be clearer. Anyway, that’s my excuse for not RTFMing properly…
</grumble>

What it doesn’t say in the manual

Despite having met all the above conditions for PCT partition truncation, it still wasn’t happening. When adding or updating data for an existing weekly partition, we always saw DELETE … INSERT, rather than TRUNCATE … INSERT. The DELETE took ages, as would be expected

A 10979 trace (see below) on the MView refresh showed this crucial line :

[...]

Value of _mv_refresh_costing : rule

[...]

This undocumented parameter is at the very heart of getting PCT partition truncation. By default it is set to Rule, which means that you will get DELETE … INSERT and never TRUNCATE…INSERT. There is a document on My Oracle Support: PCT REFRESH ISSUES DELETE WHERE IT SHOULD ISSUE TRUNCATE (Doc ID 733673.1).

The suggested solution is to set it to Cost, which instructs Oracle to cost the different options and use the best for the given situation. However, during testing we saw an instance of bad costing on this leading Oracle to still DELETE instead of TRUNCATE. Therefore, we opted for forcing the choice through rule, but specified in order of preference:

alter session set "_mv_refresh_costing"='rule_pt_pd_fa_co';

where:

  • pt = PCT – TRUNC (i.e. TRUNCATE … INSERT)
  • pd = PCT – DEL/TRUNC (i.e. DELETE … INSERT)
  • fa = FAST
  • co = COMPLETE

(Credit: “wilhelm2000” on this OTN forum post)

N.B. Despite the ‘TRUNC’ in the name, “PCT – DEL/TRUNC” does not do a TRUNCATE – maybe the name comes from DELETE being used as a kind of TRUNCATE?

[What it kind of says in the manual] – Default global index

Our elation at seeing a TRUNCATE occur was short-lived, because the refresh still took ages to run.

After a bit of head-scratching and more RTFMing, we found that the reason for the slow refresh time was down to a default global index that is created on MViews. Named I_SNAP$.[…], they are used when you do “fast” (incremental) refreshes, using the mv$log tables. However, if you’re only doing PCT refreshes of a MView, they are a positive hindrance because they need rebuilding every time. On an MView with millions of rows this takes a long time.

Here is an abridged trace for a PCT-Truncate refresh of an MView with the default global index in place:

/* MV_REFRESH (IND_UNUSABLE) */ ALTER INDEX "HR"."I_SNAP$_MV_WEEK" UNUSABLE

ALTER TABLE "HR"."MV_WEEK" TRUNCATE  PARTITION PART_20101122 UPDATE GLOBAL INDEXES

/* MV_REFRESH (INS) */ INSERT /*+ SKIP_UNQ_UNUSABLE_IDX APPEND BYPASS_RECURSIVE_CHECK */ INTO[…]

BEGIN  sys.dbms_index_utl.multi_level_build(index_list=>'"HR"."I_SNAP$_MV_WEEK"',just_unusable=>TRUE, cont_after_err=>TRUE, concurrent=>TRUE); END;

By defining the MView using the USING NO INDEX clause, the global index is not created and the PCT-Truncate works much more efficiently:

ALTER TABLE "HR"."MV_WEEK" TRUNCATE PARTITION[…]

/* MV_REFRESH (INS) */ INSERT /*+ APPEND  […]

Refresh method

To get PCT refresh, use method=>’P’

EXEC DBMS_MVIEW.REFRESH(	[…], 
			[…],
			method=>'P',
			[…]);

You can also use method=>’?’ where Oracle will try PCT first, and then Complete if PCT is not possible. However, if PCT isn’t possible you may have a problem that you want to know about rather than rebuilding the MV each time without you being aware of it.

Bug: Interval Partitioned MView and PCT Partition Truncation

Interval partitioning removes the headache of partition management for new data. Unfortunately, it appears that you can’t partition MViews and refresh them using PCT partition truncation.
If you try to PCT partition truncate refresh an interval-partitioned MView, you get this error:

ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-00936: missing expression
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2545
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2751
ORA-06512: at "SYS.DBMS_SNAPSHOT", line 2720
ORA-06512: at line 1

You don’t get the error if you do a non-atomic refresh – but if you do a non-atomic refresh you don’t get partition truncation (you get DELETE instead).

Test case: mv_issue_01a.sql

Bug: MView defined with ANSI SQL

This will probably have some people exclaiming “serves you right …” but ANSI SQL joins (INNER JOIN etc, instead of joining tables in the WHERE clause) is what I was brought up on. I discovered after several frustrating hours that you shouldn’t use it when you define MViews, as it can result in an erroneous “COMPILATION ERROR” state for the MView when data on the base table is changed.

There’s a of MOS article: Mview Refresh Fails And COMPILE_STATE => COMPILATION_ERROR (Doc ID 1081493.1), which refers to bug 5759944 and for which there’s a patch number. It’s supposedly fixed in 11.2 but appears to still be present in 11.2.0.1.0.

Test case: mv_issue_02.sql

10979 Tracing

To understand exactly what happens when a MView refresh takes place, you can enable tracing using:

-- enable the trace
alter session set events '10979 trace name context forever';
-- SQL commands to trace go here
 [...]
-- Disable the trace 
alter session set events '10979 trace name context off';

Here’s an annotated trace log of a successful PCT partition truncation MView refresh:

First is one of the key bits – is the refresh atomic (atomic_refresh=>’true’, which is the default) or non-atomic (atomic_refresh=>’false’) which is required for PCT partition truncation

 NON ATOMIC REFRESH
 ON DEMAND REFRESH

 ONLY TRUNCATE based PCT REFRESH possible

Next follows the statements that Oracle would execute, for all of the MV refresh methods that it thinks are applicable in this refresh scenario.

Method 4 is a DELETE … INSERT, whilst Method 5 is TRUNCATE … INSERT.
Note that the global index hasn’t been removed from the MView, hence the additional index maintenance commands

Refresh Method 4
REFRESH Stmt 0
/* MV_REFRESH (DEL) */ DELETE FROM "HR"."MV_WEE[...]
REFRESH Stmt 1
/* MV_REFRESH (INS) */ INSERT /*+ BYPASS_RECURSIVE_CHEC[...]

Refresh Method 5
REFRESH Stmt 0
/* MV_REFRESH (IND_UNUSABLE) */ ALTER INDEX "HR[...]
REFRESH Stmt 1
ALTER TABLE "HR"."MV_WEEK" TRUNCATE  PARTITION [...]
REFRESH Stmt 2
/* MV_REFRESH (INS) */ INSERT /*+ SKIP_UNQ_UNUSABLE_IDX[...]
REFRESH Stmt 3
BEGIN  sys.dbms_index_utl.multi_level_build(index_list=[...]

Watch out for this parameter value – it’s crucial!


Value of _mv_refresh_costing : COST

By default (11.1.0.7) it’s “RULE” and will do DELETE…INSERT *always* instead of TRUNCATE…INSERT

If you’re using COST, then the cost of the possible refresh methods will be calculated:

REFRESH STATEMENT
/* MV_REFRESH (DEL) */ DELETE FROM "HR"."MV_WEE[...]
COST = 4.002855
REFRESH STATEMENT
/* MV_REFRESH (INS) */ INSERT /*+ BYPASS_RECURSIVE_CHEC[...]
COST = 2.571476
TOTAL COST OF REFRESH = 6.574331


REFRESH STATEMENT
/* MV_REFRESH (IND_UNUSABLE) */ ALTER INDEX "HR[...]
COST = 0.000000
REFRESH STATEMENT
ALTER TABLE "HR"."MV_WEEK" TRUNCATE  PARTITION [...]
COST = 0.000000
REFRESH STATEMENT
/* MV_REFRESH (INS) */ INSERT /*+ SKIP_UNQ_UNUSABLE_IDX[...]
COST = 2.279282
REFRESH STATEMENT
BEGIN  sys.dbms_index_utl.multi_level_build(index_list=[...]
COST = 0.000000
TOTAL COST OF REFRESH = 2.279282

After costing, or by rule, the method chosen will be stated:


Refresh method picked PCT - TRUNC

Oracle then records the actual statements executed:


 Executed Stmt -
/* MV_REFRESH (IND_UNUSABLE) */ ALTER INDEX "HR[...]
 Executed Stmt -
ALTER TABLE "HR"."MV_WEEK" TRUNCATE  PARTITION [...]
 Executed Stmt -
/* MV_REFRESH (INS) */ INSERT /*+ SKIP_UNQ_UNUSABLE_IDX[...]
 Executed Stmt -
BEGIN  sys.dbms_index_utl.multi_level_build(index_list=[...]

A final word of warning – remember that the trace file is an internal diagnostic, it is not for user-consumption. I spent a while worrying about this entry in the log:

PARSE ERROR #19:len=12330 dep=1 uid=142 oct=3 lid=142 tim=35441256289311 err=10980
SELECT "R"."BSNS_WK_KEY" "WK_KEY","D"."SOURCE_SYSTEM_KEY" "SOURCE_SYSTEM_KEY","D"."
CLOSE #19:c=0,e=0,dep=1,type=0,tim=35441256289311

Even though there was no error returned to the user session calling the refresh, I figured this must be some problem. However, according to MOS doc Errors In Refresh Snapshot Trace Files ORA-10980 (Doc ID 294513.1) the parse error is raised and cleared internally and therefore nothing to worry about.

September 2, 2010

Misbehaving Informatica kills Oracle

Filed under: bug, informatica, obia, ORA-28001, oracle, security — rmoff @ 13:23

This problem, which in essence is bad behaviour from Informatica bringing down Oracle, is a good illustration of unintended consequences of an apparently innocuous security setting.
Per our company’s security standards, database passwords expire every 90 days. When this happens users are prompted to change their password before they can continue logging into Oracle. This applies to application user IDs too.
It appears that Informatica 8.6.1 HF6 (part of OBIA 7.9.6.1) doesn’t handle an expired password well, spawning multiple connections to the database, eventually bringing Oracle down through memory SWAP space exhaustion.

As a side note, one of our DBAs has been investigating how to prevent a client connection accidentally (through bad coding) or maliciously (DoS) bringing down Oracle in this way, his findings are documented here.

Investigation

As the introduction to any good IT horror story goes … “Everything was running fine; nothing had changed”.

Then our monitoring showed swap space usage on our Oracle 11g database server increasing, and soon after Oracle crashed. The DBAs restarted Oracle, but shortly after the swap space usage was on its way up. The unix tool Glance showed a lot of Oracle processes on the box.

Database

Our Informatica repository user expired on the day on which this happened (Aug 27th):

select username, account_status, expiry_date from dba_users

USERNAME                       ACCOUNT_STATUS                   EXPIRY_DATE
------------------------------ -------------------------------- ---------
INF_REPO                       EXPIRED                          27-AUG-10

When a user ID expires an ORA-28001 is given at login:

sqlplus INF_REPO/password

SQL*Plus: Release 11.1.0.7.0 - Production on Thu Sep 2 08:40:17 2010

Copyright (c) 1982, 2008, Oracle.  All rights reserved.

ERROR:
ORA-28001: the password has expired

Changing password for INF_REPO
New password:

This is the throughput figures for Oracle from Enterprise Manager, note the Logons rate starting to increase at c.13:30 (The rate at 03:00AM is representative of the usual logon load on the database):

Server

Note SWAP space increase (dark blue line) at c. midday (nb GMT/BST mean not all the graphs will align):

Database server metrics

Database server metrics

Note number of Alive Oracle processes (faint yellow line!):

Database server metrics - Oracle application only

Database server metrics - Oracle application only

Informatica

In the Informatica exceptions.log and node.log are the initial errors:

ERROR [Master Elect Data Writer] [DOM_10162] An exception occurred while updating the master election row.
java.sql.SQLException: [informatica][Oracle JDBC Driver]No more data available to read.

ERROR [Master Elect Data Writer] [DOM_10162] An exception occurred while updating the master election row.
java.sql.SQLException: [informatica][Oracle JDBC Driver][Oracle]ORA-01034: ORACLE not available
ORA-27101: shared memory realm does not exist
HPUX-ia64 Error: 2: No such file or directory

ERROR [Master Elect Data Writer] [DOM_10162] An exception occurred while updating the master election row.
java.sql.SQLException: [informatica][Oracle JDBC Driver][Oracle]ORA-28001: the password has expired

Followed by the repeated error approximately every ten seconds:

ERROR [Master Elect Data Writer] [DOM_10162] An exception occurred while updating the master election row.
java.sql.SQLException: [informatica][Oracle JDBC Driver][Oracle]ORA-28001: the password has expired

There are also final errors in the log, occuring once only just after midnight:

FATAL [Domain Monitor] [PCSF_10374] Failed to persist [CpuUsageSummary] with error [[informatica][Oracle JDBC Driver]No more data available to read.].
FATAL [Domain Monitor] [PCSF_10374] Failed to persist [RepoUsageSummary] with error [[informatica][Oracle JDBC Driver]No more data available to read.].

After these Informatica shut down.

Theory

This is what I think is happening:

  • Informatica has a polling component (“[Master Elect Data Writer]”) that updates a database table (part of the Informatica repository) every minute
  • Once the user has expired, Informatica gets ORA-28001: the password has expired when it tries to connect
  • Informatica does not handle ORA-28001 correctly
  • It appears that it leaves the connection open
  • It then retries a few seconds later
  • The connections stack up, each taking swap space allocated to the Oracle process that the connection spawns
  • Eventually the server runs out of resource and Oracle crashes
  • At midnight another Informatica component (“[Domain Monitor]”) tries to update a database table (part of the Informatica repository), and gets the ORA-28001 error.
  • This second component (“[Domain Monitor]”) correctly takes the error as fatal and aborts the Informatica server process

Resolution

In my opinion, Informatica should consistently treat ORA-28001 as fatal.

At the very least, if Informatica isn’t treating ORA-28001 as fatal it should close the connection to the database correctly.


An update from Informatica here

October 21, 2009

Critical Patch Update – OBIEE vuln CVE-2009-1990

Filed under: bug, obiee, security — rmoff @ 11:54

October’s Oracle Critical Patch Update Advisory has been released. There are two vulnerabilities (CVE-2009-1999, CVE-2009-1990) listed under Oracle Application Server for “Component” Business Intelligence Enterprise Edition and one (CVE-2009-3407) for “component” Portal.

  • CVE-2009-1999 is OBIEE and “Fixed in all supported versions. No patch provided in this Critical Patch Update.”.
  • CVE-2009-3407 looks like only OAS (not OBIEE), up to versions 10.1.2.3 and 10.1.4.2.
  • CVE-2009-1990 is OBIEE and is the main vuln of interest. It’s unclear if it’s just OBIEE 10.1.3.4.x, or all versions of OBIEE through to and including 10.1.3.4.1. It’s also confusing putting it on the same table as OAS especially given it has similar versioning (10.1.3.x.x).

For information about patches, see My Oracle Support Note 881382.1. This doc lists patches 8927890 and 8927886 for OBIEE 10.1.3.4.1 and 10.1.3.4.0 respectively. Since no other versions are mentioned that suggests it doesn’t affect them but that’d be a heck of an assumption to make and if I were running < 10.1.3.4.0 I'd be raising an SR to seek clarification especially given the ambiguity of the table in the Advisory doc.

The patch (8927890 for 10.1.3.4.1 / 8927886 for 10.1.3.4.0) updates libnqsmetadata and libnqsexecutionlist libraries (dll / so), so installation should be simple (and thus backout too).

Watch out for the pre-reqs on 8927890, which list the same build (10.1.3.4.0.080726.1900) as 8927886, even though it’s supposed to be for 10.1.3.4.1.
You also need to shutdown BI Scheduler (nqscheduler), even though only BI Server is named in the readme.txt.

There’s no details on the vuln itself that I can find. The READMEs for each patch simply say “This patch fixes the following bug(s)” and lists the patch number (8927886 or 8927890). On MyOracleSupport there’s no results for these bug numbers except a JDEdwards bug (!). On Metalink2 each bug turns up but is not publicly visible.

October 16, 2009

Heads up – Critical Patch Update affecting OBIEE

Filed under: bug, OAS, obiee, security — rmoff @ 09:40

The Critical Patch Update Pre-Release Announcement for October has been published. The pre-release is advance notice of the affected software prior to release of the quarterly Critical Patch Update. It is published on the Thursday prior to the patch releases (which was postponed by a week because of OOW).

It looks like if you’re running OBIEE 10.1.3.4.0 or 10.1.3.4.1 through OAS 10.1.2.3.0/10.1.3.4.0/10.1.3.5.0 then you should check back next Tuesday 20th for details.

Paraphrasing the announcement:

Security vulnerabilities addressed by this Critical Patch Update affect the following products:
[…]
• Oracle Application Server 10g Release 3 (10.1.3), versions 10.1.3.4.0, 10.1.3.5.0
• Oracle Application Server 10g Release 2 (10.1.2), version 10.1.2.3.0
[…]
• Oracle Business Intelligence Enterprise Edition, versions 10.1.3.4.0, 10.1.3.4.1
[…]
Oracle Application Server Executive Summary

This Critical Patch Update contains 3 new security fixes for the Oracle Application Server. 2 of these vulnerabilities may be remotely exploitable without authentication, i.e., may be exploited over a network without the need for a username and password. None of these fixes are applicable to client-only installations, i.e., installations that do not have an Oracle Application Server installed.
[…]
The highest CVSS base score of vulnerabilities affecting Oracle Application Server products is 4.3.

The Oracle Application Server components affected by vulnerabilities that are fixed in this Critical Patch Update are:

* Oracle Business Intelligence Enterprise Edition
[…]

More details from the Oracle Critical Patch Updates and Security Alerts page.

[update 21st October]
Details here, patch is for BI Server so presumably the application server is irrelevant

August 13, 2009

OBIA upgrade 7.9.5 to 7.9.6 – first thoughts

Filed under: bug, dac, obia, oracle — rmoff @ 13:05

We’re upgrading from OBIA 7.9.5 (Financials – GL) to OBIA 7.9.6. Our reasons are for support (7.9.5 does not support Oracle 11g) and minor functionality additions.

Our architecture is: HP-UX 64 bit Itanium (11.31), Oracle 11g (11.1.0.7), separate ETL server, 4x OBIEE servers (2x BI, 2xPS). We have no customisations in the ETL except something for budgets, which is superseded in 7.9.6.

This post is a semi-formed articulation of my frustrations encountered during an initial run through of the upgrade in a sandbox. As we progress with the upgrade I will post further, hopefully more useful, information on what we encounter.

Grumble

Oracle’s upgrade documentation is, in my opinion, not very good. Whilst I find the rest of their OBIA and OBIEE documentation in general clear and comprehensive, the OBIA upgrade document is ambiguous and completely lacking detail in areas.

There is probably also an element of failing to please all the punters all the time… OBIA is for so many different subject areas, so many potential technologies, that upgrade instructions maybe have to be generic.
If we had some highly-customised deployment on some obscure technology set then I would expect to have to work a lot out for myself, so my sense of frustration comes from using the most vanilla of vanilla – Oracle-everything, no customisations – yet still having to figure out so much for myself.

The upgrade document is a compendium of “see xxxx document for more details”. Anyone taking the upgrade half-seriously will invariably end up compiling their own version of instructions, copying-and-pasting the essence of the referenced documents into one place to produce their own upgrade instructions. This is good for understanding the process but leaves me feeling that I’m writing Oracle’s documentation for them. As a side-note to this, the HTML version of the upgrade guide doesn’t use hyperlinking to its many references to the installation guide!

“Upgrading Oracle BI Applications”

Ref: Page 6-1 of “Oracle® Business Intelligence Applications Upgrade Guide for Informatica PowerCenter Users”
This is so patently ridiculous. As I see it, OBIA is basically a combination of OBIEE, Informatica, DAC, etc etc – all individual components which are upgraded separately. Yet this section of the upgrade doc innocently states “Run the Oracle BI Applications installer to upgrade your Oracle BI Applications environment to the current version”. Simple as that! So once I’ve run the installer, my Oracle BI Applications environment will be at the “current version”, right? wrong, obviously.
What this actually installs is the bits and pieces that are the “money-shot” of OBIA — pre-built OBIEE RPD, pre-built DAC repository, pre-built Informatica repository.
OBIA is a confusing enough product, without stupid documentation like this. Maybe the marketeers or salesmen had a hand it, but OBIA is either the sum of the consituant parts (apps plus pre-built repositories), or it’s just the pre-built repositories, but it can’t be both.
It really should be called “Upgrade OBIA pre-built repositories” or something like that. Or provide an installer which does actually install all of OBIA….

Documentation errors

Out of date screenshots

Ref: p. 4-31 of Oracle® Business Intelligence Applications Installation Guide for Informatica PowerCenter Users
Screenshot is out of date, they’ve not updated the documentation for 7.9.6. Two category check boxes are missing, “User Data” and “Overwriten log file”

dacimpt01

Ref: To register Informatica Services in DAC / p. 4-33.
The screenshot is out of date, it doesn’t list the new Domain field:

dacconf01

Configuring Informatica Integration service in DAC

The documentation is wrong on this, as it does not take into account the DAC change of a new Domain field, and Server Hostname being replaced by Service. It also says to modify the record with Name = Oracle_BI_DW_Server, but by default the record has a name of INFORMATICA_DW_SERVER
Therefore when configuring this you need to set:

  • Service: The name of your Informatica Integration Service, eg. Oracle_BI_DW_Base_Integration_Service
  • Domain: The name of your Informatica Domain, eg. Domain_serverA

The rest of the fields as per the document.

Known bugs

Watch out for: 8723317 – unique constraint (DAC_REPO.W_ETL_SYSPROP_U1) violated when imported Metadata into the DAC repository

DAC and Informatica repositories

Your existing Informatica and DAC repositories are not upgraded.

The document says to “Rename and copy” your existing repositories. This comes under “Best practices for preparing to upgrade”. What it should say is you must either use new database schemas when you upgrade, or move your existing ones.
We used Oracle’s export/import to create copies of the existing repository schemas into new schemas (eg DAC_REPO to DAC_REPO_795), and then dropped and recreated the existing repository schema (eg DAC_REPO) so that they were empty to install to.
The upgrade doc simply doesn’t say to do this, or it misleadingly tells you to rename your Informatica repository, which in the context of an Informatica Repository is not possible.
If you don’t do this then you hit problems after the installation of DAC because you already have an existing repository which you get prompted to upgrade when you launch the client.

[edit] Actually – maybe your repositories are upgraded. It depends. You could upgrade your DAC repo in place and then Refresh Base from the new DAC repository metadata, or you can do it the other way around. You’d do the former if you had lots of customisations and the latter if you didn’t. [edit2] If you do the latter then you import a 7.9.5 version of the DAC repo, which throws lots of errors because tables changed in 7.9.6 with quite a lot of non-nullable columns added. Maybe you upgrade your 7.9.5 repository (you’re prompted to do so when you log in with the new DAC client), then export it, then import the new one and then refresh base on that? Or maybe if you’ve few customisations you just install a fresh 7.9.6 DAC repository and apply your few customisations to it manually?
It’s probably all patently obvious to a DAC/OBIA expert but about as clear as mud to someone just trying to figure it out just from the manuals.[/edit2]

It would be really helpful if this kind of thing didn’t have to be inferred and best-guessed from the documentation. Even if the doc just laid out the two approaches and left it to the user to chose the best. As it is the documentation states one method, or none at all. [/edit]

In summary …

If you’re doing an upgrade of OBIA then plan PLENTY of time for figuring out how it’ll work in your environment. Set up a sandbox you can play in and rollback, don’t assume you can follow the documentation sequentially and it all work nicely.

Oracle should be aiming to produce an Upgrade Guide as comprehensive as their Installation Guide. A product, especially an Enterprise one of this scale, is made up of more than its binaries. It is the binaries, the documentation, the support. An upgrade can be traumatic at the best of times – the documentation should give the user confidence to unleash it on their Production environment, not the feeling that one is half-guessing at the solution.

DAC repo your

July 23, 2009

sawserver charts crash

Filed under: bug, obiee, sawserver — rmoff @ 15:31

By a strange co-incidence after following this thread on OTN forums about a BI crash and struggling to understand the actual problem, I think I’ve encountered it myself!

I’ve got a test install of OBIEE running on my Windows XP laptop, and whilst building a report in Answers got this:

The “Error signature” (brings to mind fountain pens and blotting pads) was:

szAppName : sawserver.exe szAppVer : 10.1.3.4 szModName : kernel32.dll
szModVer : 5.1.2600.3119 offset : 000097a3

Going to the sawserver log at c:\OracleBIData\web\log\sawlog0.log disappointingly showed no error entries 😦

A suggestion here of a bug but it relates to pivot tables, which I wasn’t doing.

[update]
I’ve been able to reproduce this error. It’s on a report with a table and chart. It works fine with the original dataset of 97 rows returned, but when I remove a filter and 22,000 rows are returned sawserver barfs when rendering it.

I’ve increased the logging level in sawserver, but still can’t see anything helpful in the log.

It’s definitely something to do with the chart, because I’ve now whittled the report down to just the chart and the large dataset query, and it crashes PS every time it’s run.

I’ve upped the javahost logging (see [OracleBI]\web\javahost\config\logconfig.txt), nothing to be seen there. I dug around in CORDA, the chart generation software, but failed to get any logs out.

The only bit of firm diagnostic is that when a chart is successfully rendered there are two files created in C:\WINDOWS\TEMP:
Charts_150001_63825.javahost.in
Charts_150001_63825.javahost.out
(where 150001_63825 varies). The .in file is XML, I think as sent to Corda for rendering. The .out I binary and I assume is the flash file.
When I run my dodgy report, neither of these two files gets created. For successful charts the javahost log shows:

23-Jul-2009 23:30:33 ChartRpcCall processMessage
INFO: Saved request to C:\WINDOWS\TEMP\Charts_150001_63825.javahost.in. Saving response to C:\WINDOWS\TEMP\Charts_150001_63825.javahost.out

In the javahost log is a WARNING: Unexpected exception. Connection will be closed message, presumably from sawserver barfing out and dropping the connection.

So if the sawserver “sends” graphs to corda via javahost, and javahost isn’t logging the request -and it’s sawserver process that’s crashing- the fault’s presumably somewhere in sawserver, my guess is when it tries to render a chart with too many data points?
[/update]

[update 2]
After upgrading to 10.1.3.4.1 I now get this error displayed in Answers when I try to render the chart:

A fatal error occurred while processing the request. The server responded with: Error while executing ChartRpcCall.processMessage com.siebel.analytics.utils.InputStreamWithLimit$ReadOverTheLimitException at com.siebel.analytics.utils.InputStreamWithLimit.incTotalBytes(InputStreamWithLimit.java:58) at com.siebel.analytics.utils.InputStreamWithLimit.read(InputStreamWithLimit.java:41) at com.siebel.analytics.utils.IOUtils.copyStreams(IOUtils.java:38) at com.siebel.analytics.utils.IOUtils.copyStreams(IOUtils.java:28) at com.siebel.analytics.web.javahostrpccalls.corda.ChartRpcCall.ensureSafeAttrs(ChartRpcCall.java:84) at com.siebel.analytics.web.javahostrpccalls.corda.ChartRpcCall.processUnidi(ChartRpcCall.java:225) at com.siebel.analytics.web.javahostrpccalls.corda.ChartRpcCall.processMessageInternal(ChartRpcCall.java:265) at com.siebel.analytics.javahost.AbstractRpcCall.processMessage(AbstractRpcCall.java:107) at com.siebel.analytics.javahost.MessageProcessorImpl.processMessage(MessageProcessorImpl.java:175) at com.siebel.analytics.javahost.Listener$Job.run(Listener.java:223) at com.siebel.analytics.javahost.standalone.SAJobManagerImpl.threadMain(SAJobManagerImpl.java:205) at com.siebel.analytics.javahost.standalone.SAJobManagerImpl$1.run(SAJobManagerImpl.java:153) at java.lang.Thread.run(Thread.java:619) .
Error Codes: AGEGTYVF

Blog at WordPress.com.