Monday, August 18, 2014

kgxgncin: CLSS init failed with status 3

I had a problem with an Oracle 11.1.0.6.0 database producing trace files every few minutes. Here is one of them:
 
Trace file /oracle/app/oracle/product/diag/rdbms/ecopwpr1/ECOPWPR1/trace/ECOPWPR1_ora_15634.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/app/oracle/product/11.1.0
System name:    SunOS
Node name:      sge-mktred-spo2
Release:        5.10
Version:        Generic_142900-09
Machine:        sun4u
Instance name: ECOPWPR1
Redo thread mounted by this instance: 1
Oracle process number: 58
Unix process pid: 15634, image: oracle@sge-mktred-spo2
 
*** 2014-07-08 10:18:36.261
*** SESSION ID:(99.22383) 2014-07-08 10:18:36.261
*** CLIENT ID:() 2014-07-08 10:18:36.261
*** SERVICE NAME:(SYS$USERS) 2014-07-08 10:18:36.261
*** MODULE NAME:(JDBC Thin Client) 2014-07-08 10:18:36.261
*** ACTION NAME:() 2014-07-08 10:18:36.261
 
clsc_connect: (109c5b200) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_sge-mktred-spo2_))
2014-07-08 10:18:36.261: [ CSSCLNT]clsssInitNative: failed to connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_sge-mktred-spo2_)), rc 9
kgxgncin: CLSS init failed with status 3
 
I looked for the session afterwards but it had gone:
 
SQL> l
  1  select username from v$session
  2* where sid = 99 and serial# = 22383
SQL> /
 
no rows selected
 
SQL>
 
I started tracing all sessions in the database:
 
SQL> alter system set sql_trace = true
  2  /
 
System altered.
 
SQL>
 
I saw that the trace file directory was quickly filling up with large files so I did the following:
 
SQL> l
  1  alter system
  2* set max_dump_file_size = '1M'
SQL> /
 
System altered.
 
SQL>
 
N.B. If you just give a numeric value without a K or M suffix then Oracle interprets it as a number of operating system blocks so you need to be careful with this parameter.
 
I repeated the following command in the trace file directory every few seconds:
 
Solaris >
grep "kgxgncin: CLSS init failed with status 3" *trc
Solaris >
 
… until a trace file appeared with the error (I didn’t have to wait for long):
 
Solaris > grep "kgxgncin: CLSS init failed with status 3" *trc
ECOPWPR1_ora_23488.trc:kgxgncin: CLSS init failed with status 3
Solaris >
 
I stopped tracing sessions:
 
SQL> alter system set sql_trace = false;
 
System altered.
 
SQL>
 
I looked in the trace file and saw that the error was being produced by the following SQL:
 
=====================
PARSING IN CURSOR #4 len=73 dep=0 uid=76 oct=3 lid=76 tim=2030251677801 hv=3508623573 ad='3af1e7140' sqlid='f25dysg8k2q6p'
select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from V$ASM_DISKGROUP
END OF STMT
PARSE #4:c=0,e=104,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=2030251677798
EXEC #4:c=0,e=66,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=2030251677969
clsc_connect: (109c5f460) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_sge-mktred-spo2_))
2014-07-08 11:05:25.293: [ CSSCLNT]clsssInitNative: failed to connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_sge-mktred-spo2_)), rc 9
kgxgncin: CLSS init failed with status 3
FETCH #4:c=10000,e=7591,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=2030251686303
STAT #4 id=1 cnt=0 pid=0 pos=1 obj=0 op='FIXED TABLE FULL X$KFGRP (cr=0 pr=0 pw=0 time=0 us cost=0 size=95 card=1)'
=====================
 
I looked for the username in the trace file but could not see it so I found it like this:
 
SQL> l
  1  select parsing_schema_name
  2  from v$sql
  3* where sql_text = 'select GROUP_NUMBER,NAME,STATE,TYPE,TOTAL_MB,FREE_MB from V$ASM_DISKGROUP'
SQL> /
 
PARSING_SCHEMA_NAME
------------------------------
OEMGRIDDBA
OEMGRIDDBA
 
SQL>
 
I realized it might be difficult to change the SQL which Oracle Enterprise Manager runs. I confirmed that I could reproduce the problem by running the SQL in a stand-alone SQL*Plus session both in the database concerned and in two other Oracle 11.1.0.6 databases. I confirmed that I could NOT reproduce the problem in an Oracle 11.2.0.1 database so I assumed that it was caused by an Oracle bug. To work around the problem, I set up a job to delete these trace files after a few days.  

No comments: