July | 2015 | Oracle Tips and Tricks -- David Fitzjarrell

July 31, 2015

What’s In A (User) Name?

Filed under: General — dfitzjarrell @ 13:44


"So each one of you agrees to disagree with whatever the other one agrees with,
but if you both disagree with the same thing, aren't you really in agreement?"
- Norton Juster, The Phantom Tollbooth

Recently a blog post caught my attention, primarily because it was talking about primary keys in a database. In the post an interesting situation was presented, one involving using a username column as the key. The point was made that because duplicate username values can be generated that, alone, wasn’t enough to guarantee uniqueness. Surrogate keys were also mentioned but dismissed because adding a sequence column didn’t make the actual username unique. Let’s look at a way a username can be generated by the actual data and a surrogate value. First, let’s create a table:


SQL>> create table employee(
  2          username        varchar2(8),
  3          first_name      varchar2(40),
  4          last_name       varchar2(40),
  5          hire_date       date,
  6          position        varchar2(30),
  7          office_loc      varchar2(40));

Table created.

SQL>>

Nothing unusual there, but now the fun part begins. The business rules for usernames are as follows:

	1	Usernames must be no longer than 8 characters
	2	Usernames must include the first initial and the first five characters of the last name
	3	Usernames must be unique

An interesting set of business rules, indeed. The first two rules are fairly easy to enforce, especially since rule 2 provides 6 characters. This leaves two characters left to ensure uniqueness. This is where the argument over natural and surrogate primary keys rears its ugly head. In this situation a ‘natural’ primary key isn’t possible since it could be that two employees will generate the same 6-character username. Let’s make this happen for the Rumenfrumper Rock Company. They have hired, over the years, 5 of the Williams sisters (Dee, Diane, Dora, Desdimona and Darcy). They are also changing vendors for their HR application and must use the table defined above. They now need to migrate their existing data and generate new usernames. This almost looks like an impossible task.

Taking a step back for a moment and examining the situation it appears that a surrogate key could help here. Unfortunately if the table is redesigned to include a user_id column we still get the problem of duplicate usernames, making it impossible for all of the sisters to use the new HR application (no matter which one of the sisters tries to register time on a timesheet the hours will all go to one timesheet and the others will show no hours). So, let’s re-think how we can use a surrogate value to generate truly unique usernames. If the generated numeric value is appended to the existing username, we get a unique username that allows every employee to use the new HR application. To do that a procedure could be used to insert new employees into the system. First, a primary key is created against the username column:


SQL>> alter table employee add constraint emp_pk primary key (username);

Table altered.

SQL>>

A procedure, like this one, could be written:


SQL>
SQL> --
SQL> -- Insert a new employee
SQL> --
SQL> -- Generate a unique username
SQL> --
SQL> create or replace procedure add_employee(p_firstnm in varchar2, p_lastnm in varchar2,
                                p_hiredt in date, p_pos in varchar2, p_offloc in varchar2)
  2  is
  3          v_username      varchar2(8);
  4          v_max           varchar2(8);
  5          v_ct            number:=0;
  6
  7  --
  8  -- Get the last known username for the 'base' username
  9  --
 10          cursor get_max is
 11          select max(username)
 12          from employee
 13          where substr(username,1,6) = v_username;
 14
 15  begin
 16  --
 17  -- Generate the 'base' username
 18  -- from the first and last names supplied
 19  --
 20          v_username:=substr(p_firstnm, 1, 1)||substr(p_lastnm,1,5);
 21
 22  --
 23  -- Fetch the 'last' username in the database using the
 24  -- generated 'base'
 25  --
 26  -- Since each username has a two-digit number appended this will be
 27  -- the username with the highest number
 28  --
 29          open get_max;
 30          fetch get_max into v_max;
 31          close get_max;
 32
 33  --
 34  -- Extract the appended number, then add 1
 35  --
 36          v_ct := nvl(to_number(substr(v_max, 7,2)), 0);
 37          v_ct := v_ct + 1;
 38  --
 39  -- Generate the unique username
 40  --
 41          v_username:=v_username||lpad(v_ct, 2, '0');
 42
 43  --
 44  -- Insert the new employee record
 45  --
 46          insert into employee
 47          values (v_username, p_firstnm, p_lastnm, p_hiredt, p_pos, p_offloc);
 48
 49  --
 50  -- Commit the insert
 51  --
 52          commit;
 53  --
 54  -- Provide usable output when exceptions occur
 55  --
 56  exception
 57          WHEN others THEN -- Now we figure out what failed and why.
 58                -- Output desired error message
 59                dbms_output.put_line('-20999: Sumthin'' bad happened -- error stack follows');
 60                -- Output actual line number of error source
 61                dbms_output.put(dbms_utility.format_error_backtrace);
 62                -- Output the actual error number and message
 63                dbms_output.put_line(dbms_utility.format_error_stack);
 64  end;
 65  /

Procedure created.

SQL>
SQL> show errors
No errors.
SQL>

Enter new employees into the system using the procedure:


SQL>
SQL> --
SQL> -- Add new employees
SQL> --
SQL> -- All employees will generate the same 'base'
SQL> --
SQL> exec add_employee('DEE','WILLIAMS', sysdate - 1000, 'CLERK','OMAHA')

PL/SQL procedure successfully completed.

SQL> exec add_employee('DIANE','WILLIAMS', sysdate - 900, 'CLERK','OMAHA')

PL/SQL procedure successfully completed.

SQL> exec add_employee('DORA','WILLIAMS', sysdate - 800, 'CLERK','OMAHA')

PL/SQL procedure successfully completed.

SQL> exec add_employee('DESDIMONA','WILLIAMS', sysdate - 600, 'CLERK','OMAHA')

PL/SQL procedure successfully completed.

SQL> exec add_employee('DARCY','WILLIAMS', sysdate - 300, 'CLERK','OMAHA')

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- Check the contents of the employee table
SQL> --
SQL> select *
  2  from employee;

USERNAME FIRST_NAME       LAST_NAME      HIRE_DATE POSITION                       OFFICE_LOC
-------- ---------------- -------------- --------- ------------------------------ -----------
DWILLI01 DEE              WILLIAMS       03-NOV-12 CLERK                          OMAHA
DWILLI02 DIANE            WILLIAMS       11-FEB-13 CLERK                          OMAHA
DWILLI03 DORA             WILLIAMS       22-MAY-13 CLERK                          OMAHA
DWILLI04 DESDIMONA        WILLIAMS       08-DEC-13 CLERK                          OMAHA
DWILLI05 DARCY            WILLIAMS       04-OCT-14 CLERK                          OMAHA

SQL>

All five sisters were entered successfully, each with a unique username. The company then hired the last sister, Dierdra:


SQL>
SQL> --
SQL> -- Add another employee
SQL> --
SQL> exec add_employee('DIERDRA','WILLIAMS', sysdate - 30, 'CLERK','OMAHA')

PL/SQL procedure successfully completed.

SQL>
SQL> select *
  2  from employee;

USERNAME FIRST_NAME       LAST_NAME           HIRE_DATE POSITION                  OFFICE_LOC
-------- ---------------- ------------------- --------- ------------------------- ----------
DWILLI01 DEE              WILLIAMS            03-NOV-12 CLERK                     OMAHA
DWILLI02 DIANE            WILLIAMS            11-FEB-13 CLERK                     OMAHA
DWILLI03 DORA             WILLIAMS            22-MAY-13 CLERK                     OMAHA
DWILLI04 DESDIMONA        WILLIAMS            08-DEC-13 CLERK                     OMAHA
DWILLI05 DARCY            WILLIAMS            04-OCT-14 CLERK                     OMAHA
DWILLI06 DIERDRA          WILLIAMS            01-JUL-15 CLERK                     OMAHA

6 rows selected.

SQL>

Notice the similar usernames have incremented numeric values based on the highest current login at the time of the insert. Now let’s add another employee with a different name and see what the procedure generates:


SQL>
SQL> --
SQL> -- Add another employee
SQL> --
SQL> exec add_employee('WAXAHATCHIE','BORGENFORFER', sysdate - 10, 'CLERK','OMAHA')

PL/SQL procedure successfully completed.

SQL>
SQL> select *
  2  from employee;

USERNAME FIRST_NAME          LAST_NAME        HIRE_DATE POSITION                   OFFICE_LOC
-------- ------------------- ---------------- --------- -------------------------- ----------
DWILLI01 DEE                 WILLIAMS         03-NOV-12 CLERK                      OMAHA
DWILLI02 DIANE               WILLIAMS         11-FEB-13 CLERK                      OMAHA
DWILLI03 DORA                WILLIAMS         22-MAY-13 CLERK                      OMAHA
DWILLI04 DESDIMONA           WILLIAMS         08-DEC-13 CLERK                      OMAHA
DWILLI05 DARCY               WILLIAMS         04-OCT-14 CLERK                      OMAHA
DWILLI06 DIERDRA             WILLIAMS         01-JUL-15 CLERK                      OMAHA
WBORGE01 WAXAHATCHIE         BORGENFORFER     21-JUL-15 CLERK                      OMAHA

7 rows selected.

SQL>

Each new employee is added with a unique username, and this works because only one person will be entering new employees into the system and the procedure commits after each employee is added. This guarantees that the sequence will be unbroken for a given ‘base’. If this were a multi-user application such a technique would fail because duplicate usernames could be generated, which is why using transactional methods to generate numeric primary key values won’t work. And if you have more than 99 employees who generate the same username ‘base’ the inserts will fail for the table this example is using. I haven’t seen a business where 100 people all have the same 6 letter combination of initial and first five of the last name, but stranger things have happened.

This certainly isn’t the only way to address such a problem, and I’m certain that more than one enterprising developer will try another approach. My goal is to offer a different view on defining primary keys, one not solely dependent on ‘natural’ values or surrogate keys, but a combination of both.

Sometimes you agree to disagree.

Comments (1)

July 28, 2015

It Isn’t The Index

Filed under: Indexes,Performance — dfitzjarrell @ 15:36


"If you want sense, you'll have to make it yourself."
- Norton Juster, The Phantom Tollbooth

One of the more confusing statistics in Oracle is, apparently, one called the clustering factor. Associated with an index (as evidenced by the columns in DBA_INDEXES) it’s actually dependent on the table data, more specifically the distance between ‘jumps’ for a given index key. Let’s define a ‘jump’ to be the number of blocks between rows containing the given index key starting with the first block found containing that key. That’s confusing but don’t despair, this will be explained in detail.

Oracle, in its infinite wisdom, computes a clustering factor for an index based upon the key value and the scattering of the rows containing that key across the table data. Looking at a simple example let’s look at a simple non-unique index (indexes for which the clustering factor matters) and the associated table data. The example starts by building the table and loading the data:


SQL> 
SQL> create table clust_fact(
  2  	     id      number,
  3  	     nm      varchar2(40),
  4  	     dt      date,
  5  	     sq      number);

Table created.

SQL> 
SQL> begin
  2  	     for i in 1..1000000 loop
  3  		     insert into clust_fact(id,nm,dt,sq)
  4  		     values(mod(i,77), 'Record '||i, sysdate - mod(i, 77), i);
  5  	     end loop;
  6  
  7  	     commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL>

Next, build the index and generate the statistics:


SQL>
SQL> create index clust_fact_idx on clust_fact(id);

Index created.

SQL> 
SQL> exec dbms_stats.gather_table_stats(user, 'CLUST_FACT', cascade=>true);

PL/SQL procedure successfully completed.

SQL>

Query the clustering factor computed by Oracle:


SQL> select index_name, clustering_factor
  2  from user_indexes
  3  where table_name = 'CLUST_FACT';

INDEX_NAME                     CLUSTERING_FACTOR                                                                                    
------------------------------ -----------------                                                                                    
CLUST_FACT_IDX                            370731                                                                                    

SQL>

The clustering factor provides a representation of how far Oracle needs to ‘jump’, on average, between blocks containing the index key value. To understand this a bit better let’s look at the code Oracle actually uses to compute the clustering factor and explain what it does:


SQL> 
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),1) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL>

Oracle uses a function named sys_op_countchg, which takes two parameters, the first 15 bytes of the rowid and the number of blocks to traverse to trigger the count to increase; by default Oracle chooses to increment the clustering factor counter every time the block address changes. If three rows, for example, exist in the current block the counter does not increment; if the next three rows are in a different block the counter is incremented by 1 (Oracle doesn’t change blocks for the next two reads after the change). This keeps going througout the table to compute the final clustering factor. Knowing the function used by Oracle and its parameters it is possible to modify the call to accept larger values for the number of block changes before the clustering factor counter is incremented. To allow Oracle to consider 3 blocks ‘close enough’ to not change the counter the query is modified as follows:


SQL> 
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),3) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL>

Given how the values were computed for the ID column the clustering factor doesn’t change when we increase the block change count Oracle considers to be within range of the starting block address. Possible values for this second parameter are 1, 3, 5, 9 and 16. Going through the list of possible values it’s obvious with the current data loading the clustering factor won’t improve no matter which value we choose:


SQL> 
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),5) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),9) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),16) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL>

Since how the table data is ordered affects the clustering factor the table is reloaded with data ordered using the ID column:


SQL> 
SQL> create table clust_tmp as select * from clust_fact order by id;

Table created.

SQL> 
SQL> truncate table clust_fact;

Table truncated.

SQL> 
SQL> insert into clust_fact select * from clust_tmp;

1000000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats(user, 'CLUST_FACT', cascade=>true);

PL/SQL procedure successfully completed.

SQL>

Querying the clustering factor reveals it has significantly decreased since the keys are much closer together:


SQL> 
SQL> select index_name, clustering_factor
  2  from user_indexes
  3  where table_name = 'CLUST_FACT';

INDEX_NAME                     CLUSTERING_FACTOR                                                                                    
------------------------------ -----------------                                                                                    
CLUST_FACT_IDX                              4889                                                                                    

SQL> 
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),1) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4889                                                                                                                          

SQL> 
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),3) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4888                                                                                                                          

SQL> 
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),5) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4888                                                                                                                          

SQL> 
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),9) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4885                                                                                                                          

SQL> 
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),16) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4881                                                                                                                          

SQL>

It is worth noting that the clustering factor slightly decreases with larger values for the number of blocks Oracle can skip before updating the change count. The changes are not huge, and only differ by 8 from the value Oracle calculates by default to setting the ‘blocks to skip’ parameter to 16, its highest possible value. Since the original load created the SQ values in sequence the table is reloaded with the SQ values in order and a new concatenated index using both the ID and SQ columns is created. This is done to show how the clustering factor behaves for other indexes when it’s ‘optimized’ for one index:


SQL> 
SQL> drop table clust_tmp purge;

Table dropped.

SQL> 
SQL> create table clust_tmp as select * from clust_fact order by sq;

Table created.

SQL> 
SQL> truncate table clust_fact;

Table truncated.

SQL> 
SQL> insert into clust_fact select * from clust_tmp;

1000000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 
SQL> create index clust_fact_idx2 on clust_fact(id, sq);

Index created.

SQL> 
SQL> exec dbms_stats.gather_table_stats(user, 'CLUST_FACT', cascade=>true);

PL/SQL procedure successfully completed.

SQL>

Querying the clustering factor the value for the original table load again appears and, again, doesn’t change even when the block skip parameter to the sys_op_countchg function is increased through the available values:


SQL> 
SQL> select index_name, clustering_factor
  2  from user_indexes
  3  where table_name = 'CLUST_FACT';

INDEX_NAME                     CLUSTERING_FACTOR                                                                                    
------------------------------ -----------------                                                                                    
CLUST_FACT_IDX2                           370731                                                                                    
CLUST_FACT_IDX                            370731                                                                                    

SQL> 
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),1) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),3) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),5) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),9) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),16) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),1) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),3) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),5) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),9) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL> 
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),16) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    370731                                                                                                                          

SQL>

Ordering the data by ID shows that for both indexes the clustering factor decreases, since the indexes both include the ID column. The presence of the SQ column causes the second index clustering factor to be greater when the default parameters are used; interestingly as the block skip count increases for the second index the clustering factor decreases to that of the single-column index:


SQL> 
SQL> drop table clust_tmp purge;

Table dropped.

SQL> 
SQL> create table clust_tmp as select * from clust_fact order by id;

Table created.

SQL> 
SQL> truncate table clust_fact;

Table truncated.

SQL> 
SQL> insert into clust_fact select * from clust_tmp;

1000000 rows created.

SQL> 
SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats(user, 'CLUST_FACT', cascade=>true);

PL/SQL procedure successfully completed.

SQL> 
SQL> select index_name, clustering_factor
  2  from user_indexes
  3  where table_name = 'CLUST_FACT';

INDEX_NAME                     CLUSTERING_FACTOR                                                                                    
------------------------------ -----------------                                                                                    
CLUST_FACT_IDX2                            75994                                                                                    
CLUST_FACT_IDX                              4889                                                                                    

SQL> 
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),1) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4889                                                                                                                          

SQL> 
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),3) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4889                                                                                                                          

SQL> 
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),5) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4889                                                                                                                          

SQL> 
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),9) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4886                                                                                                                          

SQL> 
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),16) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4882                                                                                                                          

SQL> 
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),1) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
     75994                                                                                                                          

SQL> 
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),3) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4967                                                                                                                          

SQL> 
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),5) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4861                                                                                                                          

SQL> 
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),9) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4861                                                                                                                          

SQL> 
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),16) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4861                                                                                                                          

SQL>

For multiple indexes on the same table, where no column is common between them, the clustering factor can skyrocket for some indexes when the table data is ordered to favor one particular index. That example wasn’t run and is left for you to investigate. Let’s do something similar and change the order of the columns in the second index and see how that affects the clustering factor:


SQL%gt; select index_name, clustering_factor
  2  from user_indexes
  3  where table_name = 'CLUST_FACT';

INDEX_NAME                     CLUSTERING_FACTOR                                                                                    
------------------------------ -----------------                                                                                    
CLUST_FACT_IDX2                           999810                                                                                    
CLUST_FACT_IDX                              4889                                                                                    

SQL> 
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),1) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4889                                                                                                                          

SQL> 
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),3) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4889                                                                                                                          

SQL> 
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),5) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4889                                                                                                                          

SQL> 
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),9) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4886                                                                                                                          

SQL> 
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX") */
  4    sys_op_countchg(substrb(t.rowid,1,15),16) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4882                                                                                                                          

SQL> 
SQL> --
SQL> -- 1 block limit (Oracle default)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),1) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    999810                                                                                                                          

SQL> 
SQL> --
SQL> -- 3 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),3) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    999810                                                                                                                          

SQL> 
SQL> --
SQL> -- 5 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),5) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    999810                                                                                                                          

SQL> 
SQL> --
SQL> -- 9 block limit
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),9) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    999810                                                                                                                          

SQL> 
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),16) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
    999810                                                                                                                          

SQL>

Even though there is a common column between the indexes the column order matters; ordering the data by the key column for the single-column index caused the clustering factor for the second index to be much greater than it was when the leading column for the second index was the ID column.

Although associated with an index, the clustering factor is affected by the table data so no series of index rebuilds will ‘fix’ a ‘bad’ value. The table data needs to be ordered to favor an index (or indexes if they contain the column used to order the table data). This, of course, improves performance for the positively affected indexes and, unfortunately, adversely affects other indexes created against that same table. Care must be taken when deciding whether or not to order the table data; ‘reasonable’ performance metrics must be decided upon before embarking on such a change, and it should be done in test or dev before considering it in production. It may be that overall ‘acceptable’ performance across all of the indexes is better than ‘outstanding’ performance for one index that causes ‘terrible’ performance for all others. It’s not a simple decision and needs to be carefully considered.

If you want to implement the modified clustering factor computed by modifying the query Oracle uses to generate this statistic you will need to manually update the index statistics using the DBMS_STATS package using one of the SET_INDEX_STATS procedures:


SQL> 
SQL> --
SQL> -- 16 block limit (ASM configuration)
SQL> --
SQL> 
SQL> column clf new_value cl_f
SQL> 
SQL> select /*+  no_parallel_index(t, "CLUST_FACT_IDX2")  dbms_stats
  2    cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
  3    no_substrb_pad  no_expand index(t,"CLUST_FACT_IDX2") */
  4    sys_op_countchg(substrb(t.rowid,1,15),16) as clf
  5  from
  6   "BING"."CLUST_FACT" t where "ID" is not null
  7  and "SQ" is not null;

       CLF                                                                                                                          
----------                                                                                                                          
      4861                                                                                                                          

SQL> exec dbms_stats.set_index_stats(ownname => 'BING', indname => 'CLUST_FACT_IDX2', clstfct => &cl_f);

PL/SQL procedure successfully completed.

SQL>

Using the new_value functionality in SQL*Plus saves the computed value into a reusable variable to pass to procedures and functions that need it. The above code would set the clustering factor for the CLUST_FACT_IDX2 to 4861, replacing the Oracle-calculated value of 75994. Again, care must be taken when undertaking such a change since you cannot predict the resulting performance change.

Since this post was originally written Oracle has implemented an enhancement to DBMS_STATS to allow the DBA to tell Oracle how many table blocks can expect to be cached. The SET_TABLE_PREFS, SET_SCHEMA_PREFS and SET_DATABASE_PREFS procedures have been modified to accept the TABLE_CACHED_BLOCKS parameter; this tells Oracle how many index blocks will be cached so these can be ignored when incrementing the clustering factor when statistics are gathered. Looking at an example we see:


SQL> create table mytab(
  2  myid    number,
  3  mydata  varchar2(40),
  4  mydate  date)
  5  cache;

Table created.

SQL> 
SQL> begin
  2  	     for i in 1..1000000 loop
  3  		     insert into mytab
  4  		     values(trunc(9999*(dbms_random.value)), 'Line '||i, trunc(sysdate - mod(i, 4637)));
  5  	     end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> 
SQL> commit;

Commit complete.

SQL> 
SQL> create index mytab_idx
  2  on mytab(myid, mydate);

Index created.

SQL> 
SQL> exec dbms_stats.gather_table_stats(ownname=>'BING', tabname=>'MYTAB', cascade=>true, estimate_percent=>null)

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_index_stats(ownname=>user, indname=>'MYTAB_IDX', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> 
SQL> select table_name, index_name, clustering_factor
  2  from user_indexes
  3  where table_name = 'MYTAB';

TABLE_NAME                     INDEX_NAME                     CLUSTERING_FACTOR                                                     
------------------------------ ------------------------------ -----------------                                                     
MYTAB                          MYTAB_IDX                                 996238                                                     

SQL>

For no adjustment to the default TABLE_CACHED_BLOCKS parameter we see a really large clustering factor (and this is due to the random nature of the MYID values generated). Let’s set TABLE_CACHED_BLOCKS to 47 for MYTAB and see what Oracle calculates for the clustering factor:


SQL> exec dbms_stats.set_table_prefs(ownname=>'BING', tabname=>'MYTAB', pname=>'TABLE_CACHED_BLOCKS', pvalue=>47);

PL/SQL procedure successfully completed.

SQL> 
SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats(ownname=>'BING', tabname=>'MYTAB', cascade=>true, estimate_percent=>null)

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_index_stats(ownname=>user, indname=>'MYTAB_IDX', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> 
SQL> select table_name, index_name, clustering_factor
  2  from user_indexes
  3  where table_name = 'MYTAB';

TABLE_NAME                     INDEX_NAME                     CLUSTERING_FACTOR                                                     
------------------------------ ------------------------------ -----------------                                                     
MYTAB                          MYTAB_IDX                                 987551                                                     

SQL>

The clustering factor has decreased somewhat (again it’s not a dramatic decrease due to the random nature of the MYID values). Let’s set the parameter to its maximum value (255) and see what Oracle provides as a clustering factor:


SQL> exec dbms_stats.set_table_prefs(ownname=>'BING', tabname=>'MYTAB', pname=>'TABLE_CACHED_BLOCKS', pvalue=>255);

PL/SQL procedure successfully completed.

SQL> 
SQL> commit;

Commit complete.

SQL> 
SQL> exec dbms_stats.gather_table_stats(ownname=>'BING', tabname=>'MYTAB', cascade=>true, estimate_percent=>null)

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_index_stats(ownname=>user, indname=>'MYTAB_IDX', estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> 
SQL> select table_name, index_name, clustering_factor
  2  from user_indexes
  3  where table_name = 'MYTAB';

TABLE_NAME                     INDEX_NAME                     CLUSTERING_FACTOR                                                     
------------------------------ ------------------------------ -----------------                                                     
MYTAB                          MYTAB_IDX                                 939451                                                     

SQL>

It’s reduced again, although not dramatically, but it is less than it was originally. Using this new parameter to DBMS_STATS.SET_TABLE_PREFS, DBMS_STATS.SET_SCHEMA_PREFS and DBMS_STATS.SET_DATABASE_PREFS procedures can potentially fix a pessimistic clustering factor and improve index selection and, in turn, improve query performance.

The clustering factor determines which index Oracle will choose for a query against a given table. Changing the clustering factor will influence the CBO calculations sometimes for the better, sometimes for the worse. Plan carefully and test, test, test before second-guessing Oracle.

Which just makes sense.

Comments (2)