Oracle Tips and Tricks — David Fitzjarrell

June 12, 2013

Sorting Out Constraints

Filed under: General — dfitzjarrell @ 13:24

An interesting question was posed on the Oracle-L list a while back; a member was having difficulty getting a constraint created using an existing non-unique index as Oracle would consistently consume all of the TEMP space and cause the process to terminate for lack of resources. The conventional wisdom was that since that index was already in existence no sorting would be required and it should be a relatively small task to validate the uniqueness of the data and create the constraint. Unfortunately that isn’t the case; I’ve tried to replicate the situation in my personal database but could not create the 1.8 billion rows reported in the original question (yes, that’s 1.8 BILLION rows). The data volume doesn’t really matter as far as the plans and execution path Oracle takes are concerned so I set up an example using the EMP table and set off to trace the session and see what Oracle was doing using guidance from a similar example posted by Jonathan Lewis. First let’s do some setup:


alter table emp modify ename varchar2(12);

Table altered.

alter table emp modify empno number;

Table altered.

SQL> 
SQL> begin
  2  	     for i in 1..6000 loop
  3  		     insert into emp
  4  		     values(i, 'RAMBO'||i, 'FRUMP', 7000+mod(i,29), add_months(sysdate, (-1*i)), mod(1000*i, 43), 0, 40);
  5  	     end loop;
  6  
  7  	     for i in 8000..1000000 loop
  8  		     insert into emp
  9  		     values(i, 'RAM'||i, 'FRUMP', 7000+mod(i,29), add_months(sysdate, -1), mod(i, 43), 0, 40);
 10  	     end loop;
 11  
 12  	     commit;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL> 

I’ve modified the table to increase the length of the ename column and increase the size of the empno column, then loaded 999,000 rows into the table. Now it’s time to create the non-unique index:


SQL> create index emp_idx on emp(empno);

Index created.

SQL> 

Verifying the existence of the index:


SQL> select * from user_segments where segment_name = 'EMP_IDX';

SEGMENT_NAME                                                                      PARTITION_NAME                 SEGMENT_TYPE       SEGMENT_SU TABLESPACE_NAME                     BYTES     BLOCKS
--------------------------------------------------------------------------------- ------------------------------ ------------------ ---------- ------------------------------ ---------- ----------
   EXTENTS INITIAL_EXTENT NEXT_EXTENT MIN_EXTENTS MAX_EXTENTS   MAX_SIZE RETENTI MINRETENTION PCT_INCREASE  FREELISTS FREELIST_GROUPS BUFFER_ FLASH_C CELL_FL
---------- -------------- ----------- ----------- ----------- ---------- ------- ------------ ------------ ---------- --------------- ------- ------- -------
EMP_IDX                                                                                                          INDEX              ASSM       USERS                            18874368       2304
        33          65536     1048576           1  2147483645 2147483645                                                              DEFAULT DEFAULT DEFAULT


SQL> 

So I know the index exists (yes, I knew that from the ‘Index created.’ statement but I like to see that Oracle has all of its ducks in a row). I will now add the primary key constraint using the non-unique index, this on Oracle version 11.2.0.3:


SQL> alter table emp add constraint emp_pk primary key(empno) using index emp_idx;

Table altered.

SQL> 

In this case the constraint created successfully without running out of memory or temp space, but I had far fewer rows in the table. Looking at the trace file generated for that session a curious query rears its ugly head:


********************************************************************************

SQL ID: 2u8n8uzz763hd Plan Hash: 0

alter table emp add constraint emp_pk primary key(empno) using index emp_idx
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.06          0          2          0           0
Execute      1      0.00       0.02          0       4805          4           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.00       0.08          0       4807          4           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 611  

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  log file sync                                   1        0.00          0.00
  SQL*Net message to client                       1        0.00          0.00
  SQL*Net message from client                     1        0.00          0.00
********************************************************************************

SQL ID: bwy3tr44u2071 Plan Hash: 1273047431

select /*+ all_rows ordered */ A.rowid, :1, :2, :3 
from
 "BING"."EMP" A, (select /*+ all_rows */ "EMPNO" from "BING"."EMP" A where( 
  "EMPNO" is not null) group by  "EMPNO" having count(1) > 1) B where( 
  "A"."EMPNO" = "B"."EMPNO")


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          4          0           0
Fetch        1      9.43       9.94       2227       4448          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      9.43       9.95       2227       4452          0           0

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS   (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  HASH JOIN  (cr=4448 pr=2227 pw=0 time=9948046 us cost=3476 size=33098532 card=871014)
    998015     998015     998015   INDEX FULL SCAN EMP_IDX (cr=2224 pr=2227 pw=0 time=1530974 us cost=456 size=21775350 card=871014)(object id 83641)
         0          0          0   VIEW  (cr=2224 pr=0 pw=0 time=5014325 us cost=456 size=11323182 card=871014)
         0          0          0    FILTER  (cr=2224 pr=0 pw=0 time=5014317 us)
    998015     998015     998015     SORT GROUP BY (cr=2224 pr=0 pw=0 time=3919827 us cost=456 size=11323182 card=871014)
    998015     998015     998015      INDEX FULL SCAN EMP_IDX (cr=2224 pr=0 pw=0 time=1251086 us cost=456 size=11323182 card=871014)(object id 83641)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  db file sequential read                       260        0.02          0.15
  db file scattered read                        284        0.01          0.16
********************************************************************************

(The full query Oracle is executing is this:


select /*+ all_rows ordered */
	A.rowid, :1, :2, :3
from
	BING.EMP A,
	(
	select /*+ all_rows */
		EMPNO
	from
		BING.EMP A
	where	(EMPNO is not null)
	group by
		EMPNO
	having
		count(1) > 1
	) B
where
	(A.EMPNO = B.EMPNO)
union all
select
	/*+ all_rows ordered */
	A.rowid, :1, :2, :3
from
	BING.EMP A
where
	(EMPNO is null)
;

and the formatted plan from dbms_xplan is:


Plan hash value: 628507593

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |  1048K|    25M|   487   (7)| 00:00:06 |
|   1 |  UNION-ALL         |         |       |       |            |          |
|*  2 |   FILTER           |         |       |       |            |          |
|   3 |    SORT GROUP BY   |         |  1048K|    25M|   487   (7)| 00:00:06 |
|   4 |     INDEX FULL SCAN| EMP_IDX |  1048K|    25M|   456   (1)| 00:00:06 |
|*  5 |   FILTER           |         |       |       |            |          |
|   6 |    INDEX FULL SCAN | EMP_IDX |  1048K|    25M|   456   (1)| 00:00:06 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(COUNT(*)>1)
   5 - filter(NULL IS NOT NULL)

which differs from the plan shown in the tkprof output as the UNION-ALL is replaced by a HASH JOIN. I will use the actual plan from the tkprof report to continue this discussion.)

Notice that Oracle is not only using index full scans but is also sorting the results and creating a hash join to validate no key has more than one entry in the table. Since an index is already in place the query above seems to be rather inefficient; a better query is shown below which uses the index values in order (it walks the index keys) to eliminate the sort operation. Running the query and checking the plan it’s obvious this method of attack saves time and work — notice the SORT GROUP BY NOSORT step:


SQL> 
SQL> set autotrace on
SQL> 
SQL> select *
  2  from
  3  (select /*+ index(emp(empno)) */ empno
  4  	     from emp
  5  	     where empno is not null
  6  	     group by empno
  7  	     having count(*) > 1)
  8  /

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 2890940416

---------------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |  1048K|    13M|   456   (1)| 00:00:06 |
|*  1 |  FILTER               |         |       |       |            |          |
|   2 |   SORT GROUP BY NOSORT|         |  1048K|    13M|   456   (1)| 00:00:06 |
|   3 |    INDEX FULL SCAN    | EMP_IDX |  1048K|    13M|   456   (1)| 00:00:06 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(COUNT(*)>1)

SQL> 

But, if that query is used as a subquery the plan isn’t what would be expected:


SQL> 
SQL> select /*+ leading(y x) use_nl(x) index(emp(empno)) no_merge(y) */
  2  	     x.rowid
  3  from
  4  (select *
  5  from
  6  (select /*+ index(emp(empno)) */ empno
  7  	     from emp
  8  	     where empno is not null
  9  	     group by empno
 10  	     having count(*) > 1)) y, emp x
 11  where x.empno = y.empno
 12  /

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1374118939

-------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |  1048K|    38M|   464   (2)| 00:00:06 |
|   1 |  NESTED LOOPS       |         |  1048K|    38M|   464   (2)| 00:00:06 |
|   2 |   VIEW              |         |  1048K|    13M|   456   (1)| 00:00:06 |
|*  3 |    FILTER           |         |       |       |            |          |
|   4 |     HASH GROUP BY   |         |  1048K|    13M|   456   (1)| 00:00:06 |
|   5 |      INDEX FULL SCAN| EMP_IDX |  1048K|    13M|   456   (1)| 00:00:06 |
|*  6 |   INDEX RANGE SCAN  | EMP_IDX |     1 |    25 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(COUNT(*)>1)
   6 - access("X"."EMPNO"="Y"."EMPNO")

SQL> 

A HASH GROUP BY replaces the SORT GROUP BY NOSORT and the sorting returns (as part of the HASH GROUP BY a sort is executed on the hash keys). That can be ‘turned off’ by setting _gby_hash_aggregation_enabled to FALSE which would produce the following plan:


SQL> select /*+ leading(y x) use_nl(x) index(emp(empno)) no_merge(y) */
  2          x.rowid
  3  from
  4  (select *
  5  from
  6  (select /*+ index(emp(empno)) */ empno
  7          from emp
  8          where empno is not null
  9          group by empno
 10          having count(*) > 1)) y, emp x
 11  where x.empno = y.empno
 12  /

no rows selected


Execution Plan
----------------------------------------------------------
Plan hash value: 1759981554

-------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |  1048K|    38M|   464   (2)| 00:00:06 |
|   1 |  NESTED LOOPS       |         |  1048K|    38M|   464   (2)| 00:00:06 |
|   2 |   VIEW              |         |  1048K|    13M|   456   (1)| 00:00:06 |
|*  3 |    FILTER           |         |       |       |            |          |
|   4 |     SORT GROUP BY   |         |  1048K|    13M|   456   (1)| 00:00:06 |
|   5 |      INDEX FULL SCAN| EMP_IDX |  1048K|    13M|   456   (1)| 00:00:06 |
|*  6 |   INDEX RANGE SCAN  | EMP_IDX |     1 |    25 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(COUNT(*)>1)
   6 - access("X"."EMPNO"="Y"."EMPNO")

SQL>

which more clearly shows the sort in action. The query has been hinted to provide a nested loops path rather than the overall hash join and the INDEX FULL SCAN driving the orignal hash join has been replaced with an INDEX RANGE SCAN which returns a smaller set of data. But it’s still sorting the data, unnecessarily, and try as we might it doesn’t appear it can be fixed, at least not simply.

Oracle offers a package named DBMS_ADVANCED_REWRITE that can be used to ‘divert’ execution from the submitted query to a similar, better performing, query. Unfortunately it doesn’t set hidden parameters so the query submitted to DBMS_ADVANCED_REWRITE needs to generate a better plan without the use of such parameters. As Jonathan Lewis pointed out to rewrite the above query to get the desired plan requires subquery factoring (the WITH clause) and DBMS_ADVANCED_REWRITE doesn’t accept queries using subquery factoring. Drat, drat and double drat.

It appears that using an existing nonunique index to enforce a primary key constraint may not be the path to take with extremely large tables (with millions or billions of rows). Dropping the existing index and constraint and creating a new primary key constraint provides this execution path:


********************************************************************************

SQL ID: 336zud14mr3ma Plan Hash: 2730705978

CREATE UNIQUE INDEX "BING"."EMP_PK" on "BING"."EMP"("EMPNO") NOPARALLEL


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          0          1          0           0
Execute      1      5.48       6.28          0       6543       3180           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      5.48       6.30          0       6544       3180           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 721     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  INDEX BUILD UNIQUE EMP_PK (cr=6646 pr=0 pw=2086 time=6277350 us)(object id 0)
    998015     998015     998015   SORT CREATE INDEX (cr=6464 pr=0 pw=0 time=4524100 us)
    998015     998015     998015    TABLE ACCESS FULL EMP (cr=6464 pr=0 pw=0 time=1430193 us cost=1744 size=6814899 card=524223)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  direct path write                               1        0.00          0.00
  reliable message                                1        0.00          0.00
********************************************************************************

A single sort on the table data, using a full table scan, seems to be a more efficient path to the primary key, especially on large tables as no HASH JOIN is necessary. That’s not what I would expect given the existence of an index, unique or not, for the key but that’s currently how Oracle is behaving. Since there is no obvious way around this it’s just a matter of time until Oracle modifies the code to make this process more efficient.

Hopefully such occurrences (primary key constraints using existing nonunique indexes failing to create) are few and far between; it is nice to know, though, what Oracle is doing to create/enable such constraints to better prepare for the possiblity the constraint won’t create as expected.

Boy, I’m glad that’s sorted.

About these ads

2 Comments »

  1. I thought you were going to help me solve my problem with a 10gr2 (10.2.0.5) DB with a similar symptom, but I sure don’t have the index created in the same way.
    I’ve got a straight unique PK.

    However, thanks for the article.
    I especially like the pun at the very end. Worth a chuckle.

    Comment by Darryl Griffiths — September 4, 2013 @ 08:28 | Reply

    • Not knowing your situation it’s difficult to give any assistance in solving your problem. Can you elaborate on what your problem actually is? Once I know that I may be able to assist.

      Comment by dfitzjarrell — September 4, 2013 @ 11:05 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 706 other followers

%d bloggers like this: