Wednesday, December 23, 2015

Clustering_Factor

I attended the UKOUG conference early in December. While I was there, I went to a presentation by Chris Saxon, where he explained how Oracle decides when to use an index. The example below, which I ran in an Oracle 11.1 database, is based on what I learnt.
 
I used to believe that if a where clause only returned a small percentage of the rows in a table, Oracle would use an index if it could. Then I would be surprised whenever Oracle did not follow this rule. However, I now know that it is not as simple as this; it also depends on the clustering factor.
 
I ran the query below to look at the DBA_SEGMENTS view and show the number of rows for segments owned by SYSTEM, the total number of rows and the percentage of rows for segments owned by SYSTEM. The important point for this example is that SYSTEM owned less than 2% of the segments:
 
SQL> l
  1  select
  2  (select count(*)
  3   from dba_segments
  4   where owner = 'SYSTEM') as system,
  5  (select count(*)
  6   from dba_segments) as total,
  7  (select count(*)
  8   from dba_segments
  9   where owner = 'SYSTEM') /
 10  (select count(*)
 11   from dba_segments) * 100
 12  as percentage
 13* from dual
SQL> /
 
    SYSTEM      TOTAL PERCENTAGE
---------- ---------- ----------
       434      23153 1.87448711
 
SQL>
 
I copied 3 columns from every row in this view into a table:
 
SQL> create table tab1
  2  as select owner, segment_name, bytes
  3  from dba_segments
  4  /
 
Table created.
 
SQL>
 
Based on what I used to think, if I queried this table to find all the rows for segments belonging to SYSTEM, Oracle would use an index if it had one. The problem with this idea is that even with an index, Oracle needs to read every block containing a row belonging to SYSTEM. So, once I had created the table, I counted the number of blocks with one (or more) of these rows. The rows were in several blocks as they were scattered throughout the table:
 
SQL> select count
  2  (distinct dbms_rowid.rowid_block_number(rowid))
  3  from tab1
  4  where owner = 'SYSTEM'
  5  /
 
COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------------------------------------------------
                                                 84
 
SQL>
 
… then I counted the number of blocks in the table:
 
SQL> select blocks
  2  from user_segments
  3  where segment_name = 'TAB1'
  4  /
 
   BLOCKS
----------
       128
 
SQL>
 
Although less than 2% of the table’s rows were for segments belonging to SYSTEM, even with an index in place, Oracle would have to read 84/128x100 = (almost) 66% of the blocks to get to them.
 
I created an index on the owner column:
 
SQL> create index ind1 on tab1(owner)
  2  /
 
Index created.
 
SQL>
 
I checked the clustering_factor of the index. If this is close to the number of BLOCKS in the table, the data is well clustered (in relation to the index in question). If the clustering_factor is higher, approaching the number of ROWS in the table, the data is not well clustered. In this case, the rows for segments belonging to SYSTEM were all over the place so the data was not well clustered:
 
SQL> select clustering_factor
  2  from user_indexes
  3  where index_name = 'IND1'
  4  /
 
CLUSTERING_FACTOR
-----------------
             1087
 
SQL>
 
I gathered statistics for the table. I found that if I did not do this, Oracle had to do some dynamic sampling when querying the table and I want to look at this in a future post, not now:
 
SQL> exec dbms_stats.gather_table_stats(-
> ownname=>'ANDREW',tabname=>'TAB1');
 
PL/SQL procedure successfully completed.
 
SQL>
 
I traced my session and ran a test query, which calculated how much space the segments belonging to SYSTEM occupied:
 
SQL> alter session set sql_trace = true
  2  /
 
Session altered.
 
SQL> select sum(bytes)
  2  from tab1
  3  where owner = 'SYSTEM'
  4  /
 
SUM(BYTES)
----------
  52953088
 
SQL> alter session set sql_trace = false
  2  /
 
Session altered.
 
SQL>
 
… then I ran the trace file through tkprof and found that Oracle had not used the index when running the query. It had to read well over half of the table’s blocks so a full table scan was much quicker:
 
********************************************************************************
 
select sum(bytes)
from tab1
where owner = 'SYSTEM'
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0        120          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0        120          0           1
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 1698  (ANDREW)
 
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=120 pr=0 pw=0 time=0 us)
    434   TABLE ACCESS FULL TAB1 (cr=120 pr=0 pw=0 time=6 us cost=11 size=5655 card=435)
 
 
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      1   SORT (AGGREGATE)
    434    TABLE ACCESS   MODE: ANALYZED (FULL) OF 'TAB1' (TABLE)
 
********************************************************************************
 
I repeated the test but this time I sorted the data on the owner column first so the rows for segments belonging to SYSTEM would be together:
 
SQL> create table tab2
  2  as select owner, segment_name, bytes
  3  from dba_segments
  4  order by owner
  5  /
 
Table created.
 
SQL>
 
These rows were only in 3 blocks:
 
SQL> select count
  2  (distinct dbms_rowid.rowid_block_number(rowid))
  3  from tab2
  4  where owner = 'SYSTEM'
  5  /
 
COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------------------------------------------------
                                                  3
 
SQL>
 
… but (obviously), the total number of blocks was the same as in TAB1:                                                
 
SQL> select blocks
  2  from user_segments
  3  where segment_name = 'TAB2'
  4  /
 
   BLOCKS
----------
       128
 
SQL>
 
… so, to read all the rows for segments belonging to SYSTEM, Oracle would only need to access 3/128x100 = (just over) 2.3% of the blocks in the table (assuming it had an index to tell it where to go):
 
SQL> create index ind2 on tab2(owner)
  2  /
 
Index created.
 
SQL>
 
The clustering_factor was very close to the number of BLOCKS in the table so, as expected, the data was well clustered for the index:
 
SQL> select clustering_factor
  2  from user_indexes
  3  where index_name = 'IND2'
  4  /
 
CLUSTERING_FACTOR
-----------------
              116
 
SQL> exec dbms_stats.gather_table_stats(-
> ownname=>'ANDREW',tabname=>'TAB2');
 
PL/SQL procedure successfully completed.
 
SQL> alter session set sql_trace = true
  2  /
 
Session altered.
 
SQL> select sum(bytes)
  2  from tab2
  3  where owner = 'SYSTEM'
  4  /
 
SUM(BYTES)
----------
  52953088
 
SQL> alter session set sql_trace = false
  2  /
 
Session altered.
 
SQL>
 
… and the trace file showed that Oracle used the index when it ran the query against TAB2:
 
********************************************************************************
 
select sum(bytes)
from tab2
where owner = 'SYSTEM'
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          0          6          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.00          0          6          0           1
 
Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 1698  (ANDREW)
 
Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=6 pr=0 pw=0 time=0 us)
    434   TABLE ACCESS BY INDEX ROWID TAB2 (cr=6 pr=0 pw=0 time=3 us cost=5 size=5499 card=423)
    434    INDEX RANGE SCAN IND2 (cr=3 pr=0 pw=0 time=0 us cost=2 size=0 card=423)(object id 219070)
 
 
Rows     Execution Plan
-------  ---------------------------------------------------
      0  SELECT STATEMENT   MODE: ALL_ROWS
      1   SORT (AGGREGATE)
    434    TABLE ACCESS   MODE: ANALYZED (BY INDEX ROWID) OF 'TAB2'
               (TABLE)
    434     INDEX   MODE: ANALYZED (RANGE SCAN) OF 'IND2' (INDEX)
 
********************************************************************************
 
The only problem with this is that my tables were only queried on one column. In real-life, tables are often queried on several columns. You can add indexes to speed up these queries but the data can only be stored one way round. It is therefore unlikely that all of the indexes will have an optimal clustering_factor. To illustrate this, I found a table with 21 indexes in an Oracle 11.2 production database:
 
SQL> select count(*) from dba_indexes
  2  where table_owner = 'UIMSMGR'
  3  and table_name = 'UABOPEN'
  4  /
 
  COUNT(*)
----------
        21
 
SQL>
 
I checked how many blocks and rows it had:
 
SQL> select blocks, num_rows from dba_tables
  2  where owner = 'UIMSMGR'
  3  and table_name = 'UABOPEN'
  4  /
 
    BLOCKS   NUM_ROWS
---------- ----------
   4606999   99046543
 
SQL>
 
I checked the clustering_factor of each index:
 
SQL> select clustering_factor from dba_indexes
  2  where table_owner = 'UIMSMGR'
  3  and table_name = 'UABOPEN'
  4  order by 1
  5  /
 
CLUSTERING_FACTOR
-----------------
                0
                0
             1324
             1640
          1284785
          4319498
          4431395
          5113131
         10048510
         11847679
         14469614
         14808490
         20604699
         21757869
         22821251
         23166573
         23336072
         24103639
         24306402
         25662626
         26071543
 
21 rows selected.
 
SQL>
 
The first five indexes have a clustering_factor well below the number of blocks in the table. I’m not sure what this means so I will try to look at it in a future post.
 
The next three indexes have a clustering_factor close to the number of blocks. The data is well clustered for these indexes.
 
However, the final 13 indexes have a higher clustering_factor. The data is not well clustered for these indexes so Oracle is less likely to use them.

2 comments:

  1. The indexes with low clustering factors - are they on columns with a large number of null values?

    ReplyDelete
  2. Yes, you are right. I have shown this now in the next post.

    ReplyDelete