rmoff

March 30, 2009

sawserver won’t start (analytics: Servlet error java.net.ConnectException: Connection refused (errno:239))

Filed under: obiee, sawserver, unix — rmoff @ 12:18

We’re getting this error in the Presentation Services plug-in [analytics].
Log file: /j2ee/home/application-deployments/analytics/home_default_group_1/application.log

09/03/30 13:16:38.75 analytics: Servlet error
java.net.ConnectException: Connection refused (errno:239)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:517)
at java.net.Socket.connect(Socket.java:467)
at java.net.Socket.(Socket.java:364)
at java.net.Socket.(Socket.java:178)
at com.siebel.analytics.web.sawconnect.ConnectionPoolSocketFactoryImpl.createSocket(ConnectionPoolSocketFactoryImpl.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at com.siebel.analytics.web.sawconnect.ConnectionPoolSocketFactoryImpl.createSocket(ConnectionPoolSocketFactoryImpl.java:70)
at com.siebel.analytics.web.sawconnect.ConnectionPool.createNewConnection(ConnectionPool.java:314)
at com.siebel.analytics.web.sawconnect.ConnectionPool.getConnection(ConnectionPool.java:133)
at com.siebel.analytics.web.SAWBridge.processRequest(SAWBridge.java:299)
at com.siebel.analytics.web.SAWBridge.doGet(SAWBridge.java:325)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:743)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:856)
at com.evermind[Oracle Containers for J2EE 10g (10.1.3.3.0) ].server.http.ServletRequestDispatcher.invoke(ServletRequestDispatcher.java:713)
at com.evermind[Oracle Containers for J2EE 10g (10.1.3.3.0) ].server.http.ServletRequestDispatcher.forwardInternal(ServletRequestDispatcher.java:3
70)
at com.evermind[Oracle Containers for J2EE 10g (10.1.3.3.0) ].server.http.HttpRequestHandler.doProcessRequest(HttpRequestHandler.java:871)
at com.evermind[Oracle Containers for J2EE 10g (10.1.3.3.0) ].server.http.HttpRequestHandler.processRequest(HttpRequestHandler.java:453)
at com.evermind[Oracle Containers for J2EE 10g (10.1.3.3.0) ].server.http.AJPRequestHandler.run(AJPRequestHandler.java:302)
at com.evermind[Oracle Containers for J2EE 10g (10.1.3.3.0) ].server.http.AJPRequestHandler.run(AJPRequestHandler.java:190)
at oracle.oc4j.network.ServerSocketReadHandler$SafeRunnable.run(ServerSocketReadHandler.java:260)
at oracle.oc4j.network.ServerSocketAcceptHandler.procClientSocket(ServerSocketAcceptHandler.java:239)
at oracle.oc4j.network.ServerSocketAcceptHandler.access$700(ServerSocketAcceptHandler.java:34)
at oracle.oc4j.network.ServerSocketAcceptHandler$AcceptHandlerHorse.run(ServerSocketAcceptHandler.java:880)
at com.evermind[Oracle Containers for J2EE 10g (10.1.3.3.0) ].util.ReleasableResourcePooledExecutor$MyWorker.run(ReleasableResourcePooledExecutor.
java:303)
at java.lang.Thread.run(Thread.java:595)


The relevant bit of the stacktrace looks like “com.siebel.analytics.web.sawconnect.ConnectionPoolSocketFactoryImpl.createSocket”, i.e. it’s trying to connect to SAW.

I checked if Presentation Services is running:

ps -ef|grep saw

which it was. However, looking in the /web/log folder the sawserver.out.log file is zero bytes!

I stopped all services including OAS and restarted them:
– OAS comes up fine
– Javahost starts fine
– Presentation services process starts but no log file generated
When I try to connect to Analytics I get 500 Server Error and “analytics: Servlet error
java.net.ConnectException: Connection refused (errno:239)” error logged in the analytics Presentation Services plug in log file

Looking at this logically, sawserver is the problem. It’s not starting up – there’s no log and the port 9710 doesn’t get opened up by it.
The strangest thing at this point is that there is no log – there’s normally be at least a “… starting up” type entry, even if nothing else.
Even after increasing the log levels (/web/config/logconfig.xml) to minute levels (100 for each), there is still nothing logged.

On a dev box on which nothing’s changed recently (and which sawserver was running without complaint) I did

run-saw.sh stop

(…wait for a while…)

run-saw.sh start64

and sawserver didn’t come up! This to me points the finger towards the server

This has now gone to Oracle as an SR, as something is clearly up with sawserver 😦

Update: running gpm (glance for x-windows) I found this:
sawserver64 being reported as “Blocked On” “Other” for 100% of the time.
Not sure what that translates to in real money yet though.

Update: solution here!

Bug in Clustered Publisher Scheduler – ClusterManager: detected 1 failed or restarted instances

Filed under: BI publisher, cluster, quartz — rmoff @ 10:40

Follow on from setting up Publisher in a clustered environment, I’ve found a nasty little bug in the scheduling element of Publisher, Quartz.

Looking at the oc4j log file /opmn/logs/default_group~home~default_group~1.log I can see OC4J starting up, and then a whole load of repeated messages:

09/03/30 11:28:43 Oracle Containers for J2EE 10g (10.1.3.3.0) initialized
– ClusterManager: detected 1 failed or restarted instances.
– ClusterManager: Scanning for instance “myserver.fqdn.company.net1238408921404″‘s failed in-progress jobs.
– ClusterManager: detected 1 failed or restarted instances.
– ClusterManager: Scanning for instance “myserver.fqdn.company.net1238408921404″‘s failed in-progress jobs.
– ClusterManager: detected 1 failed or restarted instances.
– ClusterManager: Scanning for instance “myserver.fqdn.company.net1238408921404″‘s failed in-progress jobs.
– ClusterManager: detected 1 failed or restarted instances.
– ClusterManager: Scanning for instance “myserver.fqdn.company.net1238408921404″‘s failed in-progress jobs.
– ClusterManager: detected 1 failed or restarted instances.
– ClusterManager: Scanning for instance “myserver.fqdn.company.net1238408921404″‘s failed in-progress jobs.
– ClusterManager: detected 1 failed or restarted instances.
– ClusterManager: Scanning for instance “myserver.fqdn.company.net1238408921404″‘s failed in-progress jobs.
– ClusterManager: detected 1 failed or restarted instances.
– ClusterManager: Scanning for instance “myserver.fqdn.company.net1238408921404″‘s failed in-progress jobs.
– ClusterManager: detected 1 failed or restarted instances.
– ClusterManager: Scanning for instance “myserver.fqdn.company.net1238408921404″‘s failed in-progress jobs.
[… repeated for 38MB worth ]

Metalink to the rescue …. a search for “Search: ClusterManager: Scanning for instance” throws up doc 739623.1 – Repeated Error Appears In Log File – ClusterManager: detected 1 failed or restarted instances which details the problem and references bug # 7264646.

This is a bug in Quartz (the Publisher scheduling tool), which has been fixed in 1.5.2 (the version that’s included with Publisher is 1.5.1).

On my installation quartz was located in /j2ee/home/applications/xmlpserver/xmlpserver/WEB-INF/lib

Implenting the fix described on Metalink doc 739623.1 solved the problem.

March 27, 2009

ODI Server install – missing odiparams.sh file

Filed under: odi, oui, unix — rmoff @ 14:54

I’m installing ODI agent on our database server using OUI. I selected the “Server” option at install time to get the Agent only, but looking in oracledi/bin odiparams.sh is missing:

$ls -l *.sh
-rwxrwxrwx 1 odiadm dba 685 Nov 21 15:58 agent.sh
-rwxrwxrwx 1 odiadm dba 908 Nov 21 15:58 agentscheduler.sh
-rwxrwxrwx 1 odiadm dba 707 Nov 21 15:58 agentstop.sh
-rwxrwxrwx 1 odiadm dba 941 Nov 21 15:58 agentweb.sh
-rwxrwxrwx 1 odiadm dba 724 Nov 21 15:58 jython.sh

My understanding was that odiparams.sh was necessary, and looking at the code for agent.sh it must be as it includes:

. $ODI_HOME/bin/odiparams.sh

Checking the manual it’s possible to install ODI simply by unzipping the installation folder, so I copied the rest of the bin directoy from the original installation.

I then edited the odiparams.sh as required, and the agent started fine.

Remove windows line feed characters in vi

Filed under: unix — rmoff @ 13:12

If you work with a file in Windows and Unix at some point you might end up with windows line feed characters in your Unix file. It’ll look like this:


one line of text ^M
next line ^M
and next line with more ^M

To remove the ^M character, load the file into vi on unix and enter as a line command the following:

:1,$s/^M//

but instead of typing ^M do Ctrl-V Ctrl-M to get the charaters

Alternatively, load the file in Windows into Notepad++ and use Format -> Convert to UNIX format, then FTP the file back to Unix

March 25, 2009

ORA-12537 / ORA-12518 [Informatica DAC error CMN_1022]

Filed under: dac, informatica, obia, oracle — rmoff @ 08:44

We’re getting problems with an instance of Informatica / out-of-the-box OBIA on a new set of servers. When we run the execution plan we get this error soon after starting:

MAPPING> DBG_21075 Connecting to database [TNSENTRY], user [MYUSER]
MAPPING> CMN_1761 Timestamp Event: [Tue Mar 24 18:56:33 2009]
MAPPING> CMN_1022 Database driver error…
CMN_1022 [
Database driver error…
Function Name : Logon
ORA-12537: TNS:connection closed

Database driver error…
Function Name : Connect
Database Error: Failed to connect to database using user [MYUSER] and connection string [TNSENTRY].]

MAPPING> CMN_1761 Timestamp Event: [Tue Mar 24 18:56:33 2009]
MAPPING> CMN_1076 ERROR creating database connection.

One or two tasks using the DataWarehouse connection succeed, and then the rest fail with the above error.

That one or two tasks succeed proves that the connection string is specified correctly, plus I’d expect to see an auth error if our username/pw was incorrect. We’ve verified the Physical Data Source in DAC, but stupidly in Informatica (Workflow Manager – Connections – Relational) there’s no “test connection”.

Both of the errors, Informatica’s CMN_1022 and Oracle’s ORA-12537, are generic “somat’s bust” ones, neither providing a clue to what the problem is.

Metalink 3 has several entries for CMN_1022 but they just point to configuration/installation errors with the database connectivity.

There’s a matching article on OTN Forums but without a definitive solution

In DAC Physical Data Sources the Max Num Connections is 10. The OTN forum posting refers to performace so guessing maybe Oracle wasn’t happy with the # of concurrent connections I changed it to 1, but the problem remained.

This is on Informatica 8.1.1, Oracle client 10.2.0 and Oracle DB 11.1.0.7.

Our DBA had a look and validated all the connectivity, and also granted the user DBA just to make sure it wasn’t a priviledges issue.

I turned on tracing in the sqlclient (add trace_level_client=16 to the sqlnet.ora in $TNS_ADMIN) and got this rather helpful output:

***********************************************************************
Fatal NI connect error 12537, connecting to:
(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=mydb.company.com)(PORT=1521))(CONNECT_DATA=(SID=TNSENTRY)(SERVER=DEDICATED)(CID=(PROGRAM=pmdtm)(HOST=
apphost)(USER=unixuser))))

VERSION INFORMATION:
TNS for HPUX: Version 10.2.0.1.0 – Production
TCP/IP NT Protocol Adapter for HPUX: Version 10.2.0.1.0 – Production
Time: 25-MAR-2009 11:09:46
Tracing to file: /app/oracle/product/informatica/server/bin/cli_2844.trc
Tns error struct:
ns main err code: 12537
TNS-12537: TNS:connection closed
ns secondary err code: 12560
nt main err code: 507
TNS-00507: Connection closed
nt secondary err code: 0
nt OS err code: 0

and delving into the guts of the .trc file found:

(11) [25-MAR-2009 11:09:46:011] nsprecv: reading from transport…
(11) [25-MAR-2009 11:09:46:011] nttrd: entry
(11) [25-MAR-2009 11:09:46:100] nttrd: exit
(11) [25-MAR-2009 11:09:46:100] ntt2err: entry
(11) [25-MAR-2009 11:09:46:100] ntt2err: Read unexpected EOF ERROR on 38
(11) [25-MAR-2009 11:09:46:100] ntt2err: exit
(11) [25-MAR-2009 11:09:46:100] nsprecv: error exit
(11) [25-MAR-2009 11:09:46:100] nserror: entry
(11) [25-MAR-2009 11:09:46:101] nserror: nsres: id=0, op=68, ns=12537, ns2=12560; nt[0]=507, nt[1]=0, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0

So maybe it’s the DB server that’s not playing ball? I’m guessing the “Read unexpected EOF ERROR on 38” might be relevant.

Taking the opportunity to learn a bit more about Oracle connectivity, I had a look at Oracle® Database Net Services Administrator’s Guide 10g Release 2 (10.2) – Troubleshooting Oracle Net Services. This details setting up logs and traces, and points to Trace Assistant, trcasst. Running it on one of the trace files from a failed connection reported this:

///////////////////////////////////////////////////////////////
Error found. Error Stack follows for thread #: 11
id:0
Operation code:68
NS Error 1:12537
NS Error 2:12560
NT Generic Error:507
Protocol Error:0
OS Error:0
NS & NT Errors Translation
12537, 00000 "TNS:connection closed"
// *Cause: "End of file" condition has been reached; partner has disconnected.
// *Action: None needed; this is an information message.
/
12560, 00000 "TNS:protocol adapter error"
// *Cause: A generic protocol adapter error occurred.
// *Action: Check addresses used for proper protocol specification. Before
// reporting this error, look at the error stack and check for lower level
// transport errors.For further details, turn on tracing and reexecute the
// operation. Turn off tracing when the operation is complete.
/
00507, 00000 "Connection closed"
// *Cause: Normal "end of file" condition has been reached; partner has
// disconnected.
// *Action: None needed; this is an information message.
/
///////////////////////////////////////////////////////////////

which is the same error as I found in the trace file but with each code explained.

We tested different permutations of servers:

Inf server A / 10g client -> DB Server A (11g) -> Fails
Inf server A / 10g client -> DB Server Y (11g) -> Success
Inf server B / 10g client -> DB Server B (11g) -> Success
Inf server A / 10g client -> DB Server Z (10g) -> Success
Inf server C / 11g client -> DB Server C (11g) -> Success
Inf Server C / 11g client -> DB Server A (11g) -> Success

So now we have three identical setups (same informatica/oracle client/oracle DB), two of which work, one fails – when run against Server A.

Our DBA ran a trace on the listener on Server A and picked up this error:

TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
HPUX Error: 32: Broken pipe

which points to a possible OS issue.

Ref: Oracle® Database Installation Guide 11g Release 1 (11.1) for HP-UX – 2.7 Configure Kernel Parameters
Ref: Metalink article 550859.1 – TROUBLESHOOTING GUIDE TNS-12518 TNS listener could not hand off client connection

The UNIX team checked the kernel settings between DB Server A and DB Server Y, but found no differences (in particular they checked maxuprc and nproc).

This problem eventually got resolved after two actions:

1) Database server was restarted
2) Oracle PROCESSES was increased from 200 to 500

We suspect the restart fixed the problem as one of the UNIX guys spotted some “performance funnies” (technical term 😉 ) on the box prior to the restart.

March 24, 2009

Which jdbc driver to use

Filed under: BI publisher, jdbc, obiee — rmoff @ 13:14

In setting the scheduler in Publisher I discovered a useful difference in jdbc drivers.
Our repository is on Oracle 11g.
According to the manual oracle.jdbc.driver.OracleDriver should be used, but previous installations have used oracle.bi.jdbc.AnaJdbcDriver so I tried this too.

In experimenting with both I found you get more useful feedback from the second one. Here’s the same problem reported by both drivers:

· Exception [TOPLINK-4002] (Oracle TopLink – 11g Release 1 (11.1.1.0.0) (Build 080319)): oracle.toplink.exceptions.DatabaseException Internal Exception: java.sql.SQLException: ORA-28000: the account is locked Error Code: 28000

· Exception [TOPLINK-4021] (Oracle TopLink – 11g Release 1 (11.1.1.0.0) (Build 080319)): oracle.toplink.exceptions.DatabaseException Exception Description: Unable to acquire a connection from driver [oracle.bi.jdbc.AnaJdbcDriver], user [OBIEE_PUBL_SCHED] and URL [jdbc:oracle:thin:@dbserver.company.com:1521:ORACLESID]. Verify that you have set the expected driver class and URL. Check your login, persistence.xml or sessions.xml resource. The jdbc.driver property should be set to a class that is compatible with your database platform Internal Exception: java.sql.SQLException: ORA-28000: the account is locked Error Code: 28000

As you can see with the highlighting that I’ve added the second driver gives you the really useful stuff – which ID and server it’s trying to connect to.

Obviously I can check what’s been configured to trace back which ID and server should be being used – but it’s always useful to get confirmation of what it’s actually doing just to rule out me having been stupid and typed the wrong options 🙂

Clustering Publisher – Scheduler and Report Repository

Filed under: BI publisher, cluster, obiee, quartz — rmoff @ 11:28

The Oracle BI Publisher Enterprise Cluster Deployment doc which I just found through Metalink highlighted a couple of points:
– Report repository should be shared
– The scheduler should be configured for a cluster

Report Repository
Through Admin>System Maintenance>Report Repository I changed the path from the default, /xmlp/XMLP to a NFS mount data/shared/xmlp and restarted the xmlpserver application in OAS. On coming back up Publisher complained because all its config files (in xmlp/Admin), had disappeared. I’d not moved any of the contents of /xmlp/XMLP since Report Repository suggested to me that it was just for reports, ergo with no reports yet created there was nothing to move.
So pedantaries aside, I moved the contents of /xmlp/XMLP to my new share, data/shared/xmlp. Publisher was happy after this.

A side effect of config being held in the “Report Repository” path is that when I configured the second BI Publisher server to use this new shared path all of the config I’d done on the first server was applied to the second. I wonder if this is how it’s supposed to work, or there’s going to be server-specific config written to a shared location which will cause problems?

With hindsight, and if the config can be shared like this, then setting up the shared file system first would have been best, and then I’d have only had to configure the one server and the second would have picked it up (for Scheduler changes etc).

Scheduler
I installed the Scheduler schema successfully, and ticked the Enable Clustering under Scheduler Properties. Doing some poking around (google for “Enable Clustering” “Scheduler Properties”) I found this page which documents Quartz (used for scheduling in BI Publisher, some more info here). It states

Enable clustering by setting the “org.quartz.jobStore.isClustered” property to “true”. Each instance in the cluster should use the same copy of the quartz.properties file.

The last sentence of this is reassuring as it describes what I’ve now got with the shared Report Repository folder. Checking data/shared/xmlp/Admin/Scheduler/quartz-config.properties shows that it now includes:

org.quartz.jobStore.isClustered=true

Metalink Metalink Metalink

Filed under: metalink, oracle, support — rmoff @ 09:21

I’m learning about a lot of the Oracle BI stack through reading manuals and trial-and-error.
One thing I’ve realised is that Metalink holds a whole heap of useful information.
For example, simply searching on “publisher cluster” throws up these two very pertinent docs:

  • OBIEE Clustered Installation with BI Publisher (Doc ID 744515.1)
  • BI Publisher does not accept cluster jdbc connection strings (Doc ID 559795.1)

The first one is a publically available PDF, the second one is the answer to the problem I spent more time than I needed to scratching my head over yesterday.

My attitude until now has been to reach for Metalink after I’ve become stuck with a problem – I am now going to try searching it as a matter of course when starting out on any new task.

All I need now is an interface that searches both Metalink and Metalink 3. It seems that some docs and bugs are in both, and some only in one or the other 😦

Firefox add-ins – ones I find useful

Filed under: firefox — rmoff @ 08:40

  • Delicious Bookmarks
  • FireGestures
  • FoxyProxy
  • Greasemonkey
  • HttpFox
  • Screengrab
  • User Agent Switcher

March 23, 2009

Finding config files in unix

Filed under: BI publisher, unix — rmoff @ 16:09

Following my previous work on configuring Publisher, I wanted to note down where the changes were written to.

The -mname syntax of the unix find command comes in handy here:

find /app/oracle/product/obiee -mtime -1

Shows me all files under the specified path which were modified in the last 1 day

and helpfully throws up:

/app/oracle/product/obiee/xmlp/XMLP/Admin/DataSource/datasources.xml

Older Posts »

Blog at WordPress.com.