I
attended the UKOUG conference early in December. While I was there, I
went to a presentation by Chris Saxon, where he explained how Oracle
decides when to use an index. The example below, which I ran in an
Oracle 11.1 database, is based on what I learnt.
I used to believe that if a where
clause only returned a small percentage of the rows in a table, Oracle
would use an index if it could. Then I would be surprised whenever
Oracle did not follow this rule. However, I now know that it is not as
simple as this; it also depends on the clustering factor.
I ran the query below to look at the DBA_SEGMENTS view and show the number of rows for segments owned by SYSTEM, the total number of rows and the percentage of rows for segments owned by SYSTEM. The important point for this example is that SYSTEM owned less than 2% of the segments:
SQL> l
1 select
2 (select count(*)
3 from dba_segments
4 where owner = 'SYSTEM') as system,
5 (select count(*)
6 from dba_segments) as total,
7 (select count(*)
8 from dba_segments
9 where owner = 'SYSTEM') /
10 (select count(*)
11 from dba_segments) * 100
12 as percentage
13* from dual
SQL> /
SYSTEM TOTAL PERCENTAGE
---------- ---------- ----------
434 23153 1.87448711
SQL>
I copied 3 columns from every row in this view into a table:
SQL> create table tab1
2 as select owner, segment_name, bytes
3 from dba_segments
4 /
Table created.
SQL>
Based on what I used to think, if I queried this table to find all the rows for segments belonging to SYSTEM,
Oracle would use an index if it had one. The problem with this idea is
that even with an index, Oracle needs to read every block containing a
row belonging to SYSTEM. So, once I had created the table, I
counted the number of blocks with one (or more) of these rows. The rows
were in several blocks as they were scattered throughout the table:
SQL> select count
2 (distinct dbms_rowid.rowid_block_number(rowid))
3 from tab1
4 where owner = 'SYSTEM'
5 /
COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------------------------------------------------
84
SQL>
… then I counted the number of blocks in the table:
SQL> select blocks
2 from user_segments
3 where segment_name = 'TAB1'
4 /
BLOCKS
----------
128
SQL>
Although less than 2% of the table’s rows were for segments belonging to SYSTEM, even with an index in place, Oracle would have to read 84/128x100 = (almost) 66% of the blocks to get to them.
I created an index on the owner column:
SQL> create index ind1 on tab1(owner)
2 /
Index created.
SQL>
I checked the clustering_factor
of the index. If this is close to the number of BLOCKS in the table,
the data is well clustered (in relation to the index in question). If
the clustering_factor is higher, approaching the number of ROWS
in the table, the data is not well clustered. In this case, the rows for
segments belonging to SYSTEM were all over the place so the data was not well clustered:
SQL> select clustering_factor
2 from user_indexes
3 where index_name = 'IND1'
4 /
CLUSTERING_FACTOR
-----------------
1087
SQL>
I gathered statistics for the table. I found that if I did not do this, Oracle had to do some dynamic sampling when querying the table and I want to look at this in a future post, not now:
SQL> exec dbms_stats.gather_table_stats(-
> ownname=>'ANDREW',tabname=>'TAB1');
PL/SQL procedure successfully completed.
SQL>
I traced my session and ran a test query, which calculated how much space the segments belonging to SYSTEM occupied:
SQL> alter session set sql_trace = true
2 /
Session altered.
SQL> select sum(bytes)
2 from tab1
3 where owner = 'SYSTEM'
4 /
SUM(BYTES)
----------
52953088
SQL> alter session set sql_trace = false
2 /
Session altered.
SQL>
… then I ran the trace file through tkprof
and found that Oracle had not used the index when running the query. It
had to read well over half of the table’s blocks so a full table scan
was much quicker:
********************************************************************************
select sum(bytes)
from tab1
where owner = 'SYSTEM'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 120 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 120 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 1698 (ANDREW)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=120 pr=0 pw=0 time=0 us)
434 TABLE ACCESS FULL TAB1 (cr=120 pr=0 pw=0 time=6 us cost=11 size=5655 card=435)
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT MODE: ALL_ROWS
1 SORT (AGGREGATE)
434 TABLE ACCESS MODE: ANALYZED (FULL) OF 'TAB1' (TABLE)
********************************************************************************
I repeated the test but this time I sorted the data on the owner column first so the rows for segments belonging to SYSTEM would be together:
SQL> create table tab2
2 as select owner, segment_name, bytes
3 from dba_segments
4 order by owner
5 /
Table created.
SQL>
These rows were only in 3 blocks:
SQL> select count
2 (distinct dbms_rowid.rowid_block_number(rowid))
3 from tab2
4 where owner = 'SYSTEM'
5 /
COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------------------------------------------------
3
SQL>
… but (obviously), the total number of blocks was the same as in TAB1:
SQL> select blocks
2 from user_segments
3 where segment_name = 'TAB2'
4 /
BLOCKS
----------
128
SQL>
… so, to read all the rows for segments belonging to SYSTEM,
Oracle would only need to access 3/128x100 = (just over) 2.3% of the
blocks in the table (assuming it had an index to tell it where to go):
SQL> create index ind2 on tab2(owner)
2 /
Index created.
SQL>
The clustering_factor was very close to the number of BLOCKS in the table so, as expected, the data was well clustered for the index:
SQL> select clustering_factor
2 from user_indexes
3 where index_name = 'IND2'
4 /
CLUSTERING_FACTOR
-----------------
116
SQL> exec dbms_stats.gather_table_stats(-
> ownname=>'ANDREW',tabname=>'TAB2');
PL/SQL procedure successfully completed.
SQL> alter session set sql_trace = true
2 /
Session altered.
SQL> select sum(bytes)
2 from tab2
3 where owner = 'SYSTEM'
4 /
SUM(BYTES)
----------
52953088
SQL> alter session set sql_trace = false
2 /
Session altered.
SQL>
… and the trace file showed that Oracle used the index when it ran the query against TAB2:
********************************************************************************
select sum(bytes)
from tab2
where owner = 'SYSTEM'
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 2 0.00 0.00 0 6 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 4 0.00 0.00 0 6 0 1
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 1698 (ANDREW)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=6 pr=0 pw=0 time=0 us)
434 TABLE ACCESS BY INDEX ROWID TAB2 (cr=6 pr=0 pw=0 time=3 us cost=5 size=5499 card=423)
434 INDEX RANGE SCAN IND2 (cr=3 pr=0 pw=0 time=0 us cost=2 size=0 card=423)(object id 219070)
Rows Execution Plan
------- ---------------------------------------------------
0 SELECT STATEMENT MODE: ALL_ROWS
1 SORT (AGGREGATE)
434 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF 'TAB2'
(TABLE)
434 INDEX MODE: ANALYZED (RANGE SCAN) OF 'IND2' (INDEX)
********************************************************************************
The
only problem with this is that my tables were only queried on one
column. In real-life, tables are often queried on several columns. You
can add indexes to speed up these queries but the data can only be
stored one way round. It is therefore unlikely that all of the indexes
will have an optimal clustering_factor. To illustrate this, I found a table with 21 indexes in an Oracle 11.2 production database:
SQL> select count(*) from dba_indexes
2 where table_owner = 'UIMSMGR'
3 and table_name = 'UABOPEN'
4 /
COUNT(*)
----------
21
SQL>
I checked how many blocks and rows it had:
SQL> select blocks, num_rows from dba_tables
2 where owner = 'UIMSMGR'
3 and table_name = 'UABOPEN'
4 /
BLOCKS NUM_ROWS
---------- ----------
4606999 99046543
SQL>
I checked the clustering_factor of each index:
SQL> select clustering_factor from dba_indexes
2 where table_owner = 'UIMSMGR'
3 and table_name = 'UABOPEN'
4 order by 1
5 /
CLUSTERING_FACTOR
-----------------
0
0
1324
1640
1284785
4319498
4431395
5113131
10048510
11847679
14469614
14808490
20604699
21757869
22821251
23166573
23336072
24103639
24306402
25662626
26071543
21 rows selected.
SQL>
The first five indexes have a clustering_factor well below the number of blocks in the table. I’m not sure what this means so I will try to look at it in a future post.
The next three indexes have a clustering_factor close to the number of blocks. The data is well clustered for these indexes.
However, the final 13 indexes have a higher clustering_factor. The data is not well clustered for these indexes so Oracle is less likely to use them.
The indexes with low clustering factors - are they on columns with a large number of null values?
ReplyDeleteYes, you are right. I have shown this now in the next post.
ReplyDelete