Andrew's Oracle Blog: Clustering

I attended the UKOUG conference early in December. While I was there, I went to a presentation by Chris Saxon, where he explained how Oracle decides when to use an index. The example below, which I ran in an Oracle 11.1 database, is based on what I learnt.

I used to believe that if a where clause only returned a small percentage of the rows in a table, Oracle would use an index if it could. Then I would be surprised whenever Oracle did not follow this rule. However, I now know that it is not as simple as this; it also depends on the clustering factor.

I ran the query below to look at the DBA_SEGMENTS view and show the number of rows for segments owned by SYSTEM, the total number of rows and the percentage of rows for segments owned by SYSTEM. The important point for this example is that SYSTEM owned less than 2% of the segments:

SQL> l

1 select

2 (select count(*)

3 from dba_segments

4 where owner = 'SYSTEM') as system,

5 (select count(*)

6 from dba_segments) as total,

7 (select count(*)

8 from dba_segments

9 where owner = 'SYSTEM') /

10 (select count(*)

11 from dba_segments) * 100

12 as percentage

13* from dual

SQL> /

SYSTEM TOTAL PERCENTAGE

---------- ---------- ----------

434 23153 1.87448711

SQL>

I copied 3 columns from every row in this view into a table:

SQL> create table tab1

2 as select owner, segment_name, bytes

3 from dba_segments

4 /

Table created.

SQL>

Based on what I used to think, if I queried this table to find all the rows for segments belonging to SYSTEM, Oracle would use an index if it had one. The problem with this idea is that even with an index, Oracle needs to read every block containing a row belonging to SYSTEM. So, once I had created the table, I counted the number of blocks with one (or more) of these rows. The rows were in several blocks as they were scattered throughout the table:

SQL> select count

2 (distinct dbms_rowid.rowid_block_number(rowid))

3 from tab1

4 where owner = 'SYSTEM'

5 /

COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))

---------------------------------------------------

SQL>

… then I counted the number of blocks in the table:

SQL> select blocks

2 from user_segments

3 where segment_name = 'TAB1'

4 /

BLOCKS

----------

128

SQL>

Although less than 2% of the table’s rows were for segments belonging to SYSTEM, even with an index in place, Oracle would have to read 84/128x100 = (almost) 66% of the blocks to get to them.

I created an index on the owner column:

SQL> create index ind1 on tab1(owner)

2 /

Index created.

SQL>

I checked the clustering_factor of the index. If this is close to the number of BLOCKS in the table, the data is well clustered (in relation to the index in question). If the clustering_factor is higher, approaching the number of ROWS in the table, the data is not well clustered. In this case, the rows for segments belonging to SYSTEM were all over the place so the data was not well clustered:

SQL> select clustering_factor

2 from user_indexes

3 where index_name = 'IND1'

4 /

CLUSTERING_FACTOR

-----------------

1087

SQL>

I gathered statistics for the table. I found that if I did not do this, Oracle had to do some dynamic sampling when querying the table and I want to look at this in a future post, not now:

SQL> exec dbms_stats.gather_table_stats(-

> ownname=>'ANDREW',tabname=>'TAB1');

PL/SQL procedure successfully completed.

SQL>

I traced my session and ran a test query, which calculated how much space the segments belonging to SYSTEM occupied:

SQL> alter session set sql_trace = true

2 /

Session altered.

SQL> select sum(bytes)

2 from tab1

3 where owner = 'SYSTEM'

4 /

SUM(BYTES)

----------

52953088

SQL> alter session set sql_trace = false

2 /

Session altered.

SQL>

… then I ran the trace file through tkprof and found that Oracle had not used the index when running the query. It had to read well over half of the table’s blocks so a full table scan was much quicker:

********************************************************************************

select sum(bytes)

from tab1

where owner = 'SYSTEM'

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 0.00 0.00 0 120 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.00 0.00 0 120 0 1

Misses in library cache during parse: 1

Optimizer mode: ALL_ROWS

Parsing user id: 1698 (ANDREW)

Rows Row Source Operation

------- ---------------------------------------------------

1 SORT AGGREGATE (cr=120 pr=0 pw=0 time=0 us)

434 TABLE ACCESS FULL TAB1 (cr=120 pr=0 pw=0 time=6 us cost=11 size=5655 card=435)

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT MODE: ALL_ROWS

1 SORT (AGGREGATE)

434 TABLE ACCESS MODE: ANALYZED (FULL) OF 'TAB1' (TABLE)

********************************************************************************

I repeated the test but this time I sorted the data on the owner column first so the rows for segments belonging to SYSTEM would be together:

SQL> create table tab2

2 as select owner, segment_name, bytes

3 from dba_segments

4 order by owner

5 /

Table created.

SQL>

These rows were only in 3 blocks:

SQL> select count

2 (distinct dbms_rowid.rowid_block_number(rowid))

3 from tab2

4 where owner = 'SYSTEM'

5 /

COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))

---------------------------------------------------

SQL>

… but (obviously), the total number of blocks was the same as in TAB1:

SQL> select blocks

2 from user_segments

3 where segment_name = 'TAB2'

4 /

BLOCKS

----------

128

SQL>

… so, to read all the rows for segments belonging to SYSTEM, Oracle would only need to access 3/128x100 = (just over) 2.3% of the blocks in the table (assuming it had an index to tell it where to go):

SQL> create index ind2 on tab2(owner)

2 /

Index created.

SQL>

The clustering_factor was very close to the number of BLOCKS in the table so, as expected, the data was well clustered for the index:

SQL> select clustering_factor

2 from user_indexes

3 where index_name = 'IND2'

4 /

CLUSTERING_FACTOR

-----------------

116

SQL> exec dbms_stats.gather_table_stats(-

> ownname=>'ANDREW',tabname=>'TAB2');

PL/SQL procedure successfully completed.

SQL> alter session set sql_trace = true

2 /

Session altered.

SQL> select sum(bytes)

2 from tab2

3 where owner = 'SYSTEM'

4 /

SUM(BYTES)

----------

52953088

SQL> alter session set sql_trace = false

2 /

Session altered.

SQL>

… and the trace file showed that Oracle used the index when it ran the query against TAB2:

********************************************************************************

select sum(bytes)

from tab2

where owner = 'SYSTEM'

call count cpu elapsed disk query current rows

------- ------ -------- ---------- ---------- ---------- ---------- ----------

Parse 1 0.00 0.00 0 0 0 0

Execute 1 0.00 0.00 0 0 0 0

Fetch 2 0.00 0.00 0 6 0 1

------- ------ -------- ---------- ---------- ---------- ---------- ----------

total 4 0.00 0.00 0 6 0 1

Misses in library cache during parse: 1

Optimizer mode: ALL_ROWS

Parsing user id: 1698 (ANDREW)

Rows Row Source Operation

------- ---------------------------------------------------

1 SORT AGGREGATE (cr=6 pr=0 pw=0 time=0 us)

434 TABLE ACCESS BY INDEX ROWID TAB2 (cr=6 pr=0 pw=0 time=3 us cost=5 size=5499 card=423)

434 INDEX RANGE SCAN IND2 (cr=3 pr=0 pw=0 time=0 us cost=2 size=0 card=423)(object id 219070)

Rows Execution Plan

------- ---------------------------------------------------

0 SELECT STATEMENT MODE: ALL_ROWS

1 SORT (AGGREGATE)

434 TABLE ACCESS MODE: ANALYZED (BY INDEX ROWID) OF 'TAB2'

(TABLE)

434 INDEX MODE: ANALYZED (RANGE SCAN) OF 'IND2' (INDEX)

********************************************************************************

The only problem with this is that my tables were only queried on one column. In real-life, tables are often queried on several columns. You can add indexes to speed up these queries but the data can only be stored one way round. It is therefore unlikely that all of the indexes will have an optimal clustering_factor. To illustrate this, I found a table with 21 indexes in an Oracle 11.2 production database:

SQL> select count(*) from dba_indexes

2 where table_owner = 'UIMSMGR'

3 and table_name = 'UABOPEN'

4 /

COUNT(*)

----------

SQL>

I checked how many blocks and rows it had:

SQL> select blocks, num_rows from dba_tables

2 where owner = 'UIMSMGR'

3 and table_name = 'UABOPEN'

4 /

BLOCKS NUM_ROWS

---------- ----------

4606999 99046543

SQL>

I checked the clustering_factor of each index:

SQL> select clustering_factor from dba_indexes

2 where table_owner = 'UIMSMGR'

3 and table_name = 'UABOPEN'

4 order by 1

5 /

CLUSTERING_FACTOR

-----------------

1324

1640

1284785

4319498

4431395

5113131

10048510

11847679

14469614

14808490

20604699

21757869

22821251

23166573

23336072

24103639

24306402

25662626

26071543

21 rows selected.

SQL>

The first five indexes have a clustering_factor well below the number of blocks in the table. I’m not sure what this means so I will try to look at it in a future post.

The next three indexes have a clustering_factor close to the number of blocks. The data is well clustered for these indexes.

However, the final 13 indexes have a higher clustering_factor. The data is not well clustered for these indexes so Oracle is less likely to use them.

Andrew's Oracle Blog

Wednesday, December 23, 2015

Clustering_Factor

2 comments: