Thursday, November 08, 2012

How I Fixed a Long Running Delete Statement in Oracle

I was asked to look at a long running DELETE statement recently. Once I had solved the problem, I decided to record it for future reference. I set up data to reproduce the problem in an Oracle 11.2.0.2.7 test database and ran it as follows:

SQL> conn andrew/reid
Connected.
SQL> select to_char(sysdate,'DD-MON-YY HH24:MI:SS')
  2  start_time from dual
  3  /

START_TIME
------------------
25-OCT-12 18:38:00

SQL> alter session set sql_trace = true
  2  /

Session altered.

SQL> delete t1
  2  /

1000000 rows deleted.

SQL> alter session set sql_trace = false
  2  /

Session altered.

SQL> select to_char(sysdate,'DD-MON-YY HH24:MI:SS')
  2  finish_time from dual
  3  /

FINISH_TIME
------------------
26-OCT-12 07:37:07

SQL>
 
Deleting 1 million rows would not normally take almost 13 hours so there was something else going on. I ran the trace file through tkprof with sys=no and saw the following in the output file:
 
********************************************************************************
 
SQL ID: 9mk1dmrqf9dv8 Plan Hash: 784337829
 
delete t1
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          3          0           0
Execute      1    604.16     795.65       3645       1624    6062716     1000000
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2    604.17     795.66       3645       1627    6062716     1000000
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 126  (ANDREW)
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  DELETE  T1 (cr=1561304066 pr=3714 pw=0 time=3797440855 us)
   1000000    1000000    1000000   TABLE ACCESS FULL T1 (cr=1562 pr=1467 pw=0 time=18489606 us cost=461 size=13097435 card=1007495)
 
Rows     Execution Plan
-------  ---------------------------------------------------
      0  DELETE STATEMENT   MODE: ALL_ROWS
      0   DELETE OF 'T1'
1000000    TABLE ACCESS   MODE: ANALYZED (FULL) OF 'T1' (TABLE)
 
********************************************************************************
 
According to this, the deletion had only used 10 minutes of CPU time, which did not explain the 13 hour run time. At the end of the file I saw this, which gave me a clue:
 
OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
 
call      count       cpu    elapsed       disk      query    current        rows
-------  ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       120      0.00       0.01          0          0          6           0
Execute 1000129  42414.03   45950.87         60 1561302066    1038509     1000000
Fetch       270      0.01       0.58          9        478          0         147
-------  ------  -------- ---------- ---------- ---------- ----------  ----------
total   1000519  42414.04   45951.47         69 1561302544    1038515     1000147
 
Recursive statements are commands run automatically by Oracle so I ran the tkprof again, this time without the sys=no. An extra entry appeared in the output file and it accounted for the excessive run time:
 
********************************************************************************
 
SQL ID: dm6vj6u32r27r Plan Hash: 1813935393
 
delete from "ANDREW"."T2"
where
"COL1" = :1
 
call      count       cpu    elapsed       disk      query    current        rows
-------  ------  -------- ---------- ---------- ---------- ----------  ----------
Parse         1      0.00       0.00          0          0          0           0
Execute 1000000  42413.99   45950.84         60 1561302066    1038509     1000000
Fetch         0      0.00       0.00          0          0          0           0
-------  ------  -------- ---------- ---------- ---------- ----------  ----------
total   1000001  42413.99   45950.84         60 1561302066    1038509     1000000
 
Misses in library cache during parse: 1
Misses in library cache during execute: 2
Optimizer mode: CHOOSE
Parsing user id: SYS   (recursive depth: 1)
Number of plan statistics captured: 1
 
Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  DELETE  T2 (cr=1562 pr=1 pw=0 time=34707 us)
         0          0          0   TABLE ACCESS FULL T2 (cr=1562 pr=1 pw=0 time=34700 us)
 
********************************************************************************
 
I had not run this statement so something was causing Oracle to execute it automatically one million times. I thought it might be some sort of referential integrity checking so I looked for indexes on table T1:
 
SQL> l
  1  select index_name, column_name, column_position
  2  from dba_ind_columns
  3  where table_owner = 'ANDREW'
  4* and table_name = 'T1'
SQL> /
 
INDEX_NAME      COLUMN_NAME     COLUMN_POSITION
--------------- --------------- ---------------
PK1             COL1                          1
 
SQL>
 
Then I checked to see if any tables were referring to index PK1:
 
SQL> l
  1  select owner, table_name
  2  from dba_constraints
  3* where r_constraint_name = 'PK1'
SQL> /
 
OWNER           TABLE_NAME
--------------- ---------------
ANDREW          T2
 
SQL>
 
... and looked up the DDL used to create this table:
 
SQL> l
  1* select dbms_metadata.get_ddl('TABLE','T2','ANDREW') from dual
SQL> /
 
  CREATE TABLE "ANDREW"."T2"
   (    "COL1" NUMBER,
         CONSTRAINT "CHECK_NUMBER1" FOREIGN KEY ("COL1")
          REFERENCES "ANDREW"."T1" ("COL1") ON DELETE CASCADE ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS" ;
 
SQL>
 
This showed referential integrity checking between tables T1 and T2. Each time a row was deleted from table T1, the corresponding row(s) had to be removed from table T2. This was the reason for the 1 million executions of  delete from "ANDREW"."T2" where "COL1" = :1 in the tkprof output above.
 
I looked for indexes on table T2 but there were none:
 
SQL> l
  1  select index_name
  2  from dba_indexes
  3  where owner = 'ANDREW'
  4* and table_name = 'T2'
SQL> /
 
no rows selected
 
SQL>
 
This explained the full table scans when rows were deleted from table T2. I created an index on table T2 and reran the test. This time the deletion took a little over 3 minutes:
 
SQL> create index t2_index on t2(col1)
  2  /
 
Index created.
 
SQL> select to_char(sysdate,'DD-MON-YY HH24:MI:SS')
  2  start_time from dual
  3  /
 
START_TIME
------------------
29-OCT-12 12:14:26
 
SQL> delete t1
  2  /
 
1000000 rows deleted.
 
SQL> select to_char(sysdate,'DD-MON-YY HH24:MI:SS')
  2  finish_time from dual
  3  /
 
FINISH_TIME
------------------
29-OCT-12 12:17:45
 
SQL>

No comments: