Wednesday, April 15, 2015

COMMIT Causes ORA-03113

A colleague had a problem with an Oracle 11.2.0.1 database today. It affected the following table:

SQL> desc bepe.sd_national_holiday
Name                       Null?    Type
-------------------------- -------- ------------------
SD_NATIONAL_HOLIDAY_ID     NOT NULL NUMBER(15)
DAY_DATE                            DATE
DESCRIPTION                         VARCHAR2(500)
CREATED_BY                 NOT NULL VARCHAR2(100)
CREATION_DATE              NOT NULL TIMESTAMP(6)
ACTIVITY_BY                NOT NULL VARCHAR2(100)
ACTIVITY_DATE              NOT NULL TIMESTAMP(6)
TCN                        NOT NULL NUMBER
 
SQL>

The table was empty:

SQL> select count(*) from bepe.sd_national_holiday
  2  /
 
  COUNT(*)
----------
         0
 
SQL>

… but, when I tried to add data to it, the COMMIT, which followed, failed with an ORA-03113. Typing EXIT, without doing a COMMIT, caused the same problem:

SQL> insert into bepe.sd_national_holiday
  2  (day_date) values ('15-APR-2015')
  3  /
 
1 row created.
 
SQL> commit
  2  /
commit
     *
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 7037
Session ID: 133 Serial number: 6339
 
SQL> conn /
Connected.
SQL> insert into bepe.sd_national_holiday
  2  (day_date) values ('15-APR-2015')
  3  /
 
1 row created.
 
SQL> exit
ERROR:
ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 133 Serial number: 6343
 
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options (with complications)
APXDEV1 /export/home/oracle/andrew >

The table remained empty:

SQL> l
  1  select count(*)
  2* from bepe.sd_national_holiday
SQL> /
 
  COUNT(*)
----------
         0
 
SQL>

… and trying to move it did not work either:

SQL> l
  1  alter table bepe.sd_national_holiday
  2* move tablespace apex_13482906934262524
SQL> /
alter table bepe.sd_national_holiday
                 *
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 11685
Session ID: 156 Serial number: 21804
 
SQL>

I was able to reproduce this error again and again. I thought there might be some corruption in the underlying data file but dbverify did not show anything. I looked at the messages in the alert log:

ORA-07445: exception encountered: core dump [qesrcRseObj_Invalidate()+4] [SIGSEGV] [ADDR:0x18] [PC:0x1029AFAC4] [Address not mapped to object] []

The closest match I could find for these on My Oracle Support was unpublished bug 9025861. In Oracle 11.2, results from functions run against tables can be cached. Then Oracle can access these cached values rather than rerun the functions. The suggested fix for this bug is to upgrade to Oracle 11.2.0.2 or set result_cache_max_size to zero, which stops cached results being used. I set the parameter to zero:

SQL> l
  1  select value from v$parameter
  2* where name = 'result_cache_max_size'
SQL> /
 
VALUE
----------
2621440
 
SQL> alter system set result_cache_max_size = 0
  2  /
 
System altered.
 
SQL>

… and the problem went away:

SQL> insert into bepe.sd_national_holiday
  2  (day_date) values ('15-APR-2015')
  3  /
 
1 row created.
 
SQL> commit
  2  /
 
Commit complete.
 
SQL>

Strangely enough, resetting the parameter to its previous value did not make the problem return:

SQL> alter system set result_cache_max_size = 2621440
  2  /
 
System altered.
 
SQL> select value from v$parameter
  2  where name = 'result_cache_max_size'
  3  /
 
VALUE
----------
2621440
 
SQL> insert into bepe.sd_national_holiday
  2  (day_date) values ('16-APR-2015')
  3  /
 
1 row created.
 
SQL> commit
  2  /
 
Commit complete.
 
SQL> 

However, this is not a permanent cure. I did the same thing just over a month ago and the problem went away then too but, as you can see, it returned today for no apparent reason.

We noticed that an Oracle 11.2.0.2 database, which is being used for similar work, has not been affected by this problem so upgrading to this version might be the safest option.

Monday, April 13, 2015

statement suspended, wait error to be cleared

This happened in an Oracle 11.2.0.4 database. A colleague complained of poor performance. I looked to see what his session was waiting for:

SQL> l
  1  select event
  2  from v$session_wait
  3  where sid = 226
  4* and state = 'WAITING'
SQL> /
 
EVENT
----------------------------------------------------------------
statement suspended, wait error to be cleared
 
SQL>

I looked in the alert log and saw the following error:

statement in resumable session 'User WORK(105), Session 226, Instance 1' was suspended due to
    ORA-01653: unable to extend table WORK.WRK_CONSUMPTION_NHH by 8192 in tablespace MI_DATA

I increased the size of one of the tablespace’s datafiles:

SQL> l
  1  alter database datafile
  2  '/cogdbase/ecocog5/eco_data/mi_data2.dbf'
  3* resize 31g
SQL> /
 
Database altered.
 
SQL>

Then I looked in the alert log and saw that the statement had started working again:

Mon Apr 13 16:23:19 2015
Completed: alter database datafile
'/cogdbase/ecocog5/eco_data/mi_data2.dbf'
resize 31g
Mon Apr 13 16:23:19 2015
statement in resumable session 'User WORK(105), Session 226, Instance 1' was resumed

Finally, I checked the session and it was waiting for a different event:

SQL> l
  1  select event
  2  from v$session_wait
  3  where sid = 226
  4* and state = 'WAITING'
SQL> /
 
EVENT
----------------------------------------------------------------
db file sequential read
 
SQL>

1st November 2016:
Checked for relevance.
Shared on LinkedIn.

Thursday, April 09, 2015

enq: TM - contention

This example was tested in an Oracle 11.1 database. I created a DEPT (or parent) table, added a couple of departments then made it available to other database users:

SQL> conn andrew/reid
Connected.
SQL> create table dept
  2  (dept_code varchar2(2),
  3   dept_desc varchar2(10))
  4  /
 
Table created.
 
SQL> insert into dept
  2  (dept_code, dept_desc)
  3  values
  4  ('10','IT')
  5  /
 
1 row created.
 
SQL> insert into dept
  2  (dept_code, dept_desc)
  3  values
  4  ('20','Sales')
  5  /
 
1 row created.
 
SQL> select * from dept
  2  /
 
DEPT_CODE  DEPT_DESC
---------- ----------
10         IT
20         Sales
 
SQL> grant all on dept to public
  2  /
 
Grant succeeded.
 
SQL>

I created an EMP (or child) table and tried to cross-check the EMP_DEPT column against the DEPT_CODE column in the DEPT table. This failed with an ORA-02270 as DEPT_CODE in the DEPT table was not unique. This was quite understandable because, if I had joined the EMP and DEPT tables to get the description (DEPT_DESC) for each employee’s department, I needed to be sure that only one value would be returned. I have covered this before but decided it would do no harm to repeat it here. I made the DEPT_CODE column unique in the DEPT table and the problem went away. I added two names to the EMP table with misspelt names (you will see why later). Then I made the table available to other database users:

SQL> create table emp
  2  (emp_name varchar2(10),
  3   emp_dept varchar2(2),
  4   constraint fk1
  5   foreign key(emp_dept)
  6   references dept(dept_code))
  7  /
references dept(dept_code))
                 *
ERROR at line 6:
ORA-02270: no matching unique or primary key for this
column-list
 
SQL> alter table dept
  2  add constraint dept_no_pk
  3  unique(dept_code)
  4  /
 
Table altered.
 
SQL> create table emp
  2  (emp_name varchar2(10),
  3   emp_dept varchar2(2),
  4   constraint fk1
  5   foreign key(emp_dept)
  6   references dept(dept_code))
  7  /
 
Table created.
 
SQL> insert into andrew.emp
  2  (emp_name, emp_dept)
  3  values
  4  ('Dayvid', '10')
  5  /
 
1 row created.
 
SQL> insert into andrew.emp
  2  (emp_name, emp_dept)
  3  values
  4  ('Edwood', '10')
  5  /
 
1 row created.
 
SQL> grant all on emp to public
  2  /
 
Grant succeeded.
 
SQL> 

USERA logged into the database, noticed one of the spelling mistakes, corrected it but did not commit the change:

SQL> conn usera/usera
Connected.
SQL> select * from andrew.emp
  2  /
 
EMP_NAME   EMP_DEPT
---------- ----------
Dayvid     10
Edwood     10
 
SQL> update andrew.emp
  2  set emp_name = 'David'
  3  where emp_name = 'Dayvid'
  4  /
 
1 row updated.
 
SQL> 

USERB logged into the database and tried to change the DEPT_CODE for Sales from 20 to 30 but nothing appeared to happen:

SQL> conn userb/userb
Connected.
SQL> update andrew.dept
  2  set dept_code = '30'
  3  where dept_desc = 'Sales'
  4  / 

USERB reported the problem to a DBA who investigated it as follows:

SQL> conn / as sysdba
Connected.
SQL> select event, state from v$session
  2  where username = 'USERB'
  3  /
 
EVENT                          STATE
------------------------------ -------------------
enq: TM - contention           WAITING
 
SQL>

This showed her that the problem was caused by a foreign key locking issue so she looked at the locks in Enterprise Manager:

 
This showed that the child table was EMP.
 
Eventually, USERA committed his change:

SQL> conn usera/usera
Connected.
SQL> select * from andrew.emp
  2  /
 
EMP_NAME   EMP_DEPT
---------- ----------
Dayvid     10
Edwood     10
 
SQL> update andrew.emp
  2  set emp_name = 'David'
  3  where emp_name = 'Dayvid'
  4  /
 
1 row updated.
 
SQL> commit
  2  /
 
Commit complete.
 
SQL>

And USERB’s transaction finished moments later so he was able to commit it too:

SQL> conn userb/userb
Connected.
SQL> update andrew.dept
  2  set dept_code = '30'
  3  where dept_desc = 'Sales'
  4  /
 
1 row updated.
 
SQL> commit
  2  /
 
Commit complete.
 
SQL>

The DBA looked to see how the EMP (child) table had been created. (I won’t go into how she knew Andrew’s password):

SQL> conn andrew/reid
Connected.
SQL> select dbms_metadata.get_ddl('TABLE','EMP')
  2  from dual
  3  /
 
DBMS_METADATA.GET_DDL('TABLE','EMP')
--------------------------------------------------------------------------------
 
  CREATE TABLE "ANDREW"."EMP"
   (    "EMP_NAME" VARCHAR2(10),
        "EMP_DEPT" VARCHAR2(2),
         CONSTRAINT "FK1" FOREIGN KEY ("EMP_DEPT")
          REFERENCES "ANDREW"."DEPT" ("DEPT_CODE") ENABLE
   ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
  TABLESPACE "USERS"
 
SQL> 

She noticed that the EMP_DEPT column was used for referential integrity checking against the DEPT table. For this to work efficiently, there needs to be an index on this column but Oracle does not enforce this requirement. The DBA looked for indexes on the EMP table but did not find any:

SQL> l
  1  select index_name, column_name, column_position
  2  from dba_ind_columns
  3  where table_owner = 'ANDREW'
  4  and table_name = 'EMP'
  5* order by 1,3
SQL> /
 
no rows selected
 
SQL>

She created the appropriate index:

SQL> l
  1  create index andrew.emp_ind1
  2* on andrew.emp(emp_dept)
SQL> /
 
Index created.
 
SQL> 

USERA noticed the other spelling mistake, corrected it but did not commit the change:

SQL> conn usera/usera
Connected.
SQL> select * from andrew.emp
  2  /
 
EMP_NAME   EMP_DEPT
---------- ----------
David      10
Edwood     10
 
SQL> update andrew.emp
  2  set emp_name = 'Edward'
  3  where emp_name = 'Edwood'
  4  /
 
1 row updated.
 
SQL> 

USERB decided to change the DEPT_CODE for the Sales department to 40 and this time, the update worked immediately:

SQL> conn userb/userb
Connected.
SQL> update andrew.dept
  2  set dept_code = '40'
  3  where dept_desc = 'Sales'
  4  /
 
1 row updated.
 
SQL>

Thursday, April 02, 2015

One Cause of ORA-01092

I saw the following error in a job which had just recreated the control file of a test database after cloning:

SQL>   2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   3
2   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63
  64   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80   81   82   83   84   85   86   87   88   89   90   91   92   93   94
95   96   97   98   99  100  101  102  103  104  105  106  107  108  109  110  111  112  113  114  115  116  117  118  119  120  121  122  123  124  125  126
  127  128  129  130  131  132  133  134  135  136  137
Control file created.
 
SQL> ALTER DATABASE OPEN RESETLOGS
*
ERROR at line 1:
ORA-01092: ORACLE instance terminated. Disconnection forced
 
SQL>

I looked in the alert log and noticed the errors below:

Wed Apr  1 19:30:46 2015
Errors in file /oracle/app/oracle/product/10.2.0/admin/mrmdpt1/udump/mrmdpt1_ora_8712.trc:
ORA-00600: internal error code, arguments: [25025], [115], [], [], [], [], [], []
Wed Apr  1 19:30:47 2015
Errors in file /oracle/app/oracle/product/10.2.0/admin/mrmdpt1/udump/mrmdpt1_ora_8712.trc:
ORA-00600: internal error code, arguments: [25025], [115], [], [], [], [], [], []
Wed Apr  1 19:30:47 2015
Error 600 happened during db open, shutting down database
USER: terminating instance due to error 600
Instance terminated by USER, pid = 8712
ORA-1092 signalled during: ALTER DATABASE OPEN RESETLOGS...

I checked the SQL which had recreated the control file. I saw that one of the datafiles from the source database was missing. The missing file was called '/agasprd/mrmprod/mrm_tables3/undo_1a.dbf'. I checked its file_id in the source database:

SQL> l
  1  select file_id from dba_data_files
  2  where file_name =
  3* '/agasprd/mrmprod/mrm_tables3/undo_1a.dbf'
SQL> /
 
   FILE_ID
----------
       115
 
SQL>

I noticed that it matched the 2nd parameter after the ORA-00600 (i.e. [115]). If this happens to you, you should correct the SQL so that it includes all the source database’s datafiles then repeat the cloning process from the start.