Wednesday, October 12, 2011

NOT EXISTS v NOT IN

This example, tested on Oracle 10, shows how NOT EXISTS may be more efficient than NOT IN.
 
First, create an emp table for employees E1 through E9999:

SQL> create table emp
  2  (empno  varchar2(5),
  3   deptno varchar2(5))
  4  /

Table created.

SQL>


Next, create a dept table for departments D1 through D9999:


SQL> create table dept
  2  (deptno varchar2(5))
  3  /

Table created.

SQL>


Put employee E1 in department D1, employee E2 in department D2 etc:

SQL> declare
  2  begin
  3  for ctr in 1..9999 loop
  4  insert into emp values ('E'||ctr,'D'||ctr);
  5  insert into dept values ('D'||ctr);
  6  end loop;
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL> 


Create an index and gather statistics on the emp table:

SQL> create index deptno_index on emp(deptno)
  2  /

Index created.

SQL> exec dbms_stats.gather_table_stats -
> (ownname => 'SYSTEM', tabname => 'EMP');

PL/SQL procedure successfully completed.

SQL>


Then delete employee E5000. This will leave department D5000 empty:

SQL> delete from emp
  2  where empno = 'E5000';

1 row deleted.

SQL>


Finally, find the empty department using 2 different SQL statements:
 
SQL> set timing on
SQL> alter session
  2  set timed_statistics = true
  3  /

Session altered.

Elapsed: 00:00:00.05
SQL> alter session
  2  set sql_trace = true
  3  /

Session altered.

Elapsed: 00:00:00.05

SQL>

First, use NOT EXISTS. This runs in a fraction of a second:

SQL> select deptno from dept
  2  where not exists
  3  (select deptno from emp
  4  where emp.deptno = dept.deptno)
  5  /

DEPTN
-----
D5000

Elapsed: 00:00:00.09

SQL>

For the second version, use NOT IN. This takes around 7 seconds:
 
SQL> select deptno from dept
  2  where deptno not in
  3  (select deptno from emp)
  4  /

DEPTN
-----
D5000

Elapsed: 00:00:07.10
SQL> alter session
  2  set sql_trace = false
  3  /

Session altered.

Elapsed: 00:00:00.04
SQL>


By running the trace file through tkprof, you can see why NOT EXISTS is faster. It uses the index whereas NOT IN does not. As usual, click on the screen prints to display them at their original size and bring them into focus:

































2 comments:

Javier Morales said...

Hi Andrew,

Your post is really interesting. I was really astonished!

Well, I would like to point 2 things about your test:

1.- I assume you obviate to gather statistics for DEPT table intentionally, isn't it?

2.- Your test is valid for 10g, but in Oracle11g NOT IN works as faster as NOT EXISTS.

(sorry, I couldn't set Courier New font for comments...)

SQL> select deptno from dept
2 where not exists
3 (select deptno from emp
4 where emp.deptno = dept.deptno)
5 /

DEPTN
-----
D5000

Transcurrido: 00:00:00.02
SQL> select deptno from dept
2 where deptno not in
3 (select deptno from emp)
4 /

DEPTN
-----
D5000

Transcurrido: 00:00:00.02


And so the trace looks like NOT IN and NOT EXISTS approach differently the use of the index created, similar of what you show, but very different costs.


********************************************************************************

select deptno from dept
where not exists
(select deptno from emp
where emp.deptno = dept.deptno)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.01 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 52 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.01 0 53 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 HASH JOIN ANTI (cr=52 pr=0 pw=0 time=6581 us cost=16 size=99990 card=9999)
9999 9999 9999 TABLE ACCESS FULL DEPT (cr=23 pr=0 pw=0 time=1413 us cost=7 size=39996 card=9999)
9998 9998 9998 INDEX FAST FULL SCAN DEPTNO_INDEX (cr=29 pr=0 pw=0 time=1041 us cost=8 size=59994 card=9999)(object id 212520)



********************************************************************************

select deptno from dept
where deptno not in
(select deptno from emp)

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 1 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 53 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 54 0 1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
1 1 1 HASH JOIN ANTI NA (cr=53 pr=0 pw=0 time=6618 us cost=17 size=99990 card=9999)
9999 9999 9999 TABLE ACCESS FULL DEPT (cr=23 pr=0 pw=0 time=1542 us cost=7 size=39996 card=9999)
9998 9998 9998 TABLE ACCESS FULL EMP (cr=30 pr=0 pw=0 time=1027 us cost=9 size=59994 card=9999)



I could reproduce your test case setting OPTIMIZER_FEATURES_ENABLE to 10.2.0.5, and so 9.2.0.8

Andrew Reid said...

Dear Javier,

Thank you for your comments.

I didn't gather statistics on the DEPT table because I assumed that Oracle would do a full table scan in both cases. Perhaps I should have gathered them then let Oracle decide what to do.

I noticed that this behaviour changed in Oracle 11 some time later but could not decide how to present it without copying the original post. However, your test with OPTIMIZER_FEATURES_ENABLE has given me an idea.

I also did this post in Spanish and have added a link to it at the end of this post for you to see it if you wish.

Kind Regards,

Andrew