Monday, September 10, 2012

Excessive CPU Usage in Oracle 11.2.0.1.0

I had a problem with excessive CPU usage in Oracle 11.2.0.1.0. One piece of SQL in particular, which runs quickly on other versions, was taking up to 30 seconds to finish. While it was running, two waits could be seen in the database:

SQL> l
  1  select sid, event, seconds_in_wait
  2  from v$session_wait
  3  where wait_class <> 'Idle'
  4* order by sid, event
SQL> /
 
   SID EVENT                          SECONDS_IN_WAIT
------ ------------------------------ ---------------
   156 SQL*Net message to client                    0
 
SQL> /
 
   SID EVENT                          SECONDS_IN_WAIT
------ ------------------------------ ---------------
   101 SQL*Net more data from client                7
   156 SQL*Net message to client                    0
 
SQL> /
 
   SID EVENT                          SECONDS_IN_WAIT
------ ------------------------------ ---------------
   101 asynch descriptor resize                     3
   156 SQL*Net message to client                    0
 
SQL> /
 
   SID EVENT                          SECONDS_IN_WAIT
------ ------------------------------ ---------------
   101 asynch descriptor resize                     7
   156 SQL*Net message to client                    0
 
SQL> /
 
   SID EVENT                          SECONDS_IN_WAIT
------ ------------------------------ ---------------
   101 asynch descriptor resize                    10
   156 SQL*Net message to client                    0
 
SQL> /
 
   SID EVENT                          SECONDS_IN_WAIT
------ ------------------------------ ---------------
   156 SQL*Net message to client                    0
 
SQL>
 
At this stage, I was not concerned with the SQL*Net more data from client event, just the asynch descriptor resize event. I looked on My Oracle Support and decided it could be caused by bug 9829397. Oracle has various patches available to fix the problem. The workaround is to set disk_asynch_io to false.
 
According to Oracle:
 
DISK_ASYNCH_IO controls whether I/O to datafiles, control files, and logfiles is asynchronous (that is, whether parallel server processes can overlap I/O requests with CPU processing during table scans). If your platform supports asynchronous I/O to disk, Oracle recommends that you leave this parameter set to its default value. However, if the asynchronous I/O implementation is not stable, you can set this parameter to false to disable asynchronous I/O. If your platform does not support asynchronous I/O to disk, this parameter has no effect.
 
First I checked that it was set to true:
 
SQL> l
  1  select value from v$parameter
  2* where name = 'disk_asynch_io'
SQL> /
 
VALUE
--------------------
TRUE
 
SQL>
 
Then I set it to false to check if this was causing my problem (I had to bounce the database to do so):
 
SQL> l
  1  select value from v$parameter
  2* where name = 'disk_asynch_io'
SQL> /
 
VALUE
--------------------
FALSE
 
SQL>
 
Then I read elsewhere in the Oracle documentation that:
 
If you set DISK_ASYNCH_IO to false, then you should also set DBWR_IO_SLAVES to a value other than its default of zero in order to simulate asynchronous I/O.
 
So I set it to 1 (I had to bounce the database to do this):
 
SQL> l
  1  select value from v$parameter
  2* where name = 'dbwr_io_slaves'
SQL> /
 
VALUE
----------
1
 
SQL>
 
After this, the performance was much better and the worst performing piece of SQL took an average of 3 seconds to run instead of 30. This showed that I had identified the problem correctly. However, as a long term solution it might be better to move to a version of Oracle which is not affected by this problem at all.

No comments:

Post a Comment