Oracle Tips and Tricks — David Fitzjarrell

September 8, 2014

Blooms

Filed under: General — dfitzjarrell @ 09:41

In Oracle releases 10.2.0.x and later join processing can be made more efficient by the use of Bloom filters, primarily to reduce traffic between parallel query slaves. What is a Bloom filter? Named after Burton Howard Bloom, who came up with the concept in the 1970s, it’s an efficient data structure used to quickly determine if an element has a high probability of being a member of a given set. It’s based on a bit array that allows for rapid searches and returns one of two results: either the element is probably in the set (which can produce false positives) or the element is definitely not in the set. The filter cannot produce false negatives, and the incidence of false positives is relatively rare. Another advantage to Bloom filters is their small size relative to other data structures used for similar purposes (self-balancing binary search trees, hash tables, or linked lists). The possibility of false positives necessitates the addition of another filter to eliminate them from the results, yet such a filter doesn’t add appreciably to the process time and, therefore, goes relatively unnoticed.

When Bloom filters are used for a query on an Exadata system, the filter predicate and the storage predicate will list the SYS_OP_Bloom_FILTER function as being called. This function includes the additional filter to eliminate any false positives that could be returned. It’s the storage predicate that provides the real power of the Bloom filter on Exadata. Using a Bloom filter to pre-join the tables at the storage server level reduces the volume of data the database servers need to process and can significantly reduce the execution time of the given query.

An example of Bloom filters in action follows; a script named Bloom_fltr_ex.sql was executed to generate this output on an Exadata system; the script can be emailed on request:


SQL> --
SQL> -- Create sample tables
SQL> --
SQL> -- Create them parallel, necessary 
SQL> -- to get a Smart Scan on these tables
SQL> --
SQL> create table emp(
  2  	empid   number,
  3  	empnmvarchar2(40),
  4  	empsal  number,
  5  	empssn  varchar2(12),
  6  	constraint emp_pk primary key (empid)
  7  ) parallel 4;

Table created.

SQL>
SQL> create table emp_dept(
  2  	empid   number,
  3  	empdept number,
  4  	emploc  varchar2(60),
  5    constraint emp_dept_pk primary key(empid)
  6  ) parallel 4;

Table created.

SQL>
SQL> create table dept_info(
  2  	deptnum number,
  3  	deptnm  varchar2(25),
  4    constraint dept_info_pk primary key(deptnum)
  5  ) parallel 4;

Table created.

SQL>
SQL> --
SQL> -- Load sample tables with data
SQL> --
SQL> begin
  2  	     for i in 1..2000000 loop
  3  		     insert into emp
  4  		     values(i, 'Fnarm'||i, (mod(i, 7)+1)*1000, mod(i,10)||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||mod(i,10)||mod(i,10));
  5  		     insert into emp_dept
  6  		values(i, (mod(i,8)+1)*10, 'Zanzwalla'||(mod(i,8)+1)*10);
  7  		     commit;
  8  	     end loop;
  9  	     insert into dept_info
 10  	     select distinct empdept, case when empdept = 10 then 'SALES'
 11  					   when empdept = 20 then 'PROCUREMENT'
 12  					   when empdept = 30 then 'HR'
 13  					   when empdept = 40 then 'RESEARCH'
 14  					   when empdept = 50 then 'DEVELOPMENT'
 15  					   when empdept = 60 then 'EMPLOYEE RELATIONS'
 16  					   when empdept = 70 then 'FACILITIES'
 17  					   when empdept = 80 then 'FINANCE' end
 18  	     from emp_dept;
 19  
 20  end;
 21  /

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- Run join query using Bloom filter
SQL> --
SQL> -- Generate execution plan to prove Bloom
SQL> -- filter usage
SQL> --
SQL> -- Also report query execution time
SQL> --
SQL> set autotrace on
SQL> set timing on
SQL>
SQL> select /*+ Bloom join 2 parallel 2 use_hash(empemp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal, count(*) emps
  2  fromemp e join emp_depted on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  whereed.empdept = 20
  4  group by e.empid, e.empnm, d.deptnm, e.empsal;

     EMPID EMPNM                  DEPTNM                        EMPSAL       EMPS
---------- ---------------------- ------------------------- ---------- ----------
    904505 Fnarm904505            PROCUREMENT                     1000          1
    907769 Fnarm907769            PROCUREMENT                     3000          1
    909241 Fnarm909241            PROCUREMENT                     5000          1
    909505 Fnarm909505            PROCUREMENT                     3000          1
    909641 Fnarm909641            PROCUREMENT                     6000          1
    910145 Fnarm910145            PROCUREMENT                     6000          1
...
    155833 Fnarm155833            PROCUREMENT                     7000          1
    155905 Fnarm155905            PROCUREMENT                     2000          1
    151081 Fnarm151081            PROCUREMENT                     1000          1
    151145 Fnarm151145            PROCUREMENT                     2000          1

250000 rows selected.

Elapsed: 00:00:14.27

Execution Plan
----------------------------------------------------------
Plan hash value: 2313925751

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows | Bytes | Cost (%CPU)| Time     |   TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |  218K|    21M|  1378   (1)| 00:00:01 |       |      |            |
|   1 |  PX COORDINATOR                          |              |      |       |            |          |       |      |            |
|   2 |   PX SEND QC (RANDOM)                    | :TQ10003     |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                         |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | PCWP |            |
|   4 |     PX RECEIVE                           |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | PCWP |            |
|   5 |      PX SEND HASH                        | :TQ10002     |  218K|    21M|  1378   (1)| 00:00:01 | Q1,02 | P->P | HASH       |
|   6 |       HASH GROUP BY                      |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,02 | PCWP |            |
|*  7 |        HASH JOIN                         |              |  218K|    21M|  1376   (1)| 00:00:01 | Q1,02 | PCWP |            |
|   8 |         PX RECEIVE                       |              |  218K|    11M|   535   (1)| 00:00:01 | Q1,02 | PCWP |            |
|   9 |          PX SEND BROADCAST               | :TQ10001     |  218K|    11M|   535   (1)| 00:00:01 | Q1,01 | P->P | BROADCAST  |
|  10 |           NESTED LOOPS                   |              |  218K|    11M|   535   (1)| 00:00:01 | Q1,01 | PCWP |            |
|  11 |            BUFFER SORT                   |              |      |       |            |          | Q1,01 | PCWC |            |
|  12 |             PX RECEIVE                   |              |      |       |            |          | Q1,01 | PCWP |            |
|  13 |              PX SEND BROADCAST           | :TQ10000     |      |       |            |          |       | S->P | BROADCAST  |
|  14 |               TABLE ACCESS BY INDEX ROWID| DEPT_INFO    |    1 |    27 |     1   (0)| 00:00:01 |       |      |            |
|* 15 |                INDEX UNIQUE SCAN         | DEPT_INFO_PK |    1 |       |     1   (0)| 00:00:01 |       |      |            |
|  16 |            PX BLOCK ITERATOR             |              |  218K|  5556K|   534   (1)| 00:00:01 | Q1,01 | PCWC |            |
|* 17 |             TABLE ACCESS STORAGE FULL    | EMP_DEPT     |  218K|  5556K|   534   (1)| 00:00:01 | Q1,01 | PCWP |            |
|  18 |         PX BLOCK ITERATOR                |              | 1657K|    75M|   839   (1)| 00:00:01 | Q1,02 | PCWC |            |
|* 19 |          TABLE ACCESS STORAGE FULL       | EMP          | 1657K|    75M|   839   (1)| 00:00:01 | Q1,02 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("ED"."EMPID"="E"."EMPID")
  15 - access("D"."DEPTNUM"=20)
  17 - storage("ED"."EMPDEPT"=20)
       filter("ED"."EMPDEPT"=20)
  19 - storage(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))
       filter(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
     60  recursive calls
    174  db block gets
  40753  consistent gets
  17710  physical reads
   2128  redo size
9437983  bytes sent via SQL*Net to client
 183850  bytes received via SQL*Net from client
  16668  SQL*Net roundtrips to/from client
      6  sorts (memory)
      0  sorts (disk)
 250000  rows processed

SQL>

In less than 15 seconds 250,000 rows were returned from a three table join of over 4 million rows; the Bloom filter made a dramatic difference in how this query was processed and provided exceptional performance given the volume of data queried.

Without the storage level execution of the Bloom filter, the query execution time increased by 2.33 seconds, a 16.3 percent increase. For longer execution times, this difference can be significant. It isn’t the Bloom filter that gives Exadata such power with joins, it’s the fact that Exadata can execute it not only at the database level but also at the storage level, something commodity hardware configurations can’t do.

Of course Bloom filters are not exclusive to Exadata, as the following run of the same script on a non-Exadata system shows (only the abbreviated output section is reproduced here):


SQL> select /*+ Bloom join 2 parallel 2 use_hash(emp emp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal	  --, count(*) emps
  2  from emp e join emp_dept ed on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  where ed.empdept = 20;

     EMPID EMPNM                                    DEPTNM                        EMPSAL                                                              
---------- ---------------------------------------- ------------------------- ----------                                                              
   1670281 Fnarm1670281                             PROCUREMENT                     5000                                                              
   1670289 Fnarm1670289                             PROCUREMENT                     6000                                                              
   1670297 Fnarm1670297                             PROCUREMENT                     7000                                                              
   1670305 Fnarm1670305                             PROCUREMENT                     1000                                                              
   1670313 Fnarm1670313                             PROCUREMENT                     2000                                                              
   1670321 Fnarm1670321                             PROCUREMENT                     3000                                                              
   1670329 Fnarm1670329                             PROCUREMENT                     4000                                                              
   1670337 Fnarm1670337                             PROCUREMENT                     5000                                                              
   1670345 Fnarm1670345                             PROCUREMENT                     6000                                                              
   1670353 Fnarm1670353                             PROCUREMENT                     7000                                                              
   1670361 Fnarm1670361                             PROCUREMENT                     1000                                                              
...
   1857369 Fnarm1857369                             PROCUREMENT                     4000                                                              
   1857377 Fnarm1857377                             PROCUREMENT                     5000                                                              
   1857385 Fnarm1857385                             PROCUREMENT                     6000                                                              

250000 rows selected.

Elapsed: 00:00:54.86

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2643012915                                                                                                                           
                                                                                                                                                      
----------------------------------------------------------------------------------------------------------------------------------                    
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |                    
----------------------------------------------------------------------------------------------------------------------------------                    
|   0 | SELECT STATEMENT                     |              |   273K|    26M|  1378   (1)| 00:00:01 |        |      |            |                    
|   1 |  PX COORDINATOR                      |              |       |       |            |          |        |      |            |                    
|   2 |   PX SEND QC (RANDOM)                | :TQ10002     |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |                    
|*  3 |    HASH JOIN                         |              |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   4 |     PX RECEIVE                       |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   5 |      PX SEND BROADCAST               | :TQ10001     |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | P->P | BROADCAST  |                    
|   6 |       NESTED LOOPS                   |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|   7 |        BUFFER SORT                   |              |       |       |            |          |  Q1,01 | PCWC |            |                    
|   8 |         PX RECEIVE                   |              |       |       |            |          |  Q1,01 | PCWP |            |                    
|   9 |          PX SEND BROADCAST           | :TQ10000     |       |       |            |          |        | S->P | BROADCAST  |                    
|  10 |           TABLE ACCESS BY INDEX ROWID| DEPT_INFO    |     1 |    27 |     1   (0)| 00:00:01 |        |      |            |                    
|* 11 |            INDEX UNIQUE SCAN         | DEPT_INFO_PK |     1 |       |     1   (0)| 00:00:01 |        |      |            |                    
|  12 |        PX BLOCK ITERATOR             |              |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWC |            |                    
|* 13 |         TABLE ACCESS FULL            | EMP_DEPT     |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|  14 |     PX BLOCK ITERATOR                |              |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWC |            |                    
|* 15 |      TABLE ACCESS FULL               | EMP          |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
----------------------------------------------------------------------------------------------------------------------------------                    
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   3 - access("ED"."EMPID"="E"."EMPID")                                                                                                               
  11 - access("D"."DEPTNUM"=20)                                                                                                                       
  13 - filter("ED"."EMPDEPT"=20)                                                                                                                      
  15 - filter(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))                                                                                               
                                                                                                                                                      
Note                                                                                                                                                  
-----                                                                                                                                                 
   - dynamic sampling used for this statement (level=2)                                                                                               


Statistics
----------------------------------------------------------                                                                                            
         33  recursive calls                                                                                                                          
        139  db block gets                                                                                                                            
      36224  consistent gets                                                                                                                          
      17657  physical reads                                                                                                                           
          0  redo size                                                                                                                                
    9526012  bytes sent via SQL*Net to client                                                                                                         
     183846  bytes received via SQL*Net from client                                                                                                   
      16668  SQL*Net roundtrips to/from client                                                                                                        
          5  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
     250000  rows processed                                                                                                                           

SQL>

The only differences between the two plans involve the storage component of Exadata. Notice also the vast difference in elapsed times between Exadata and non-Exadata systems; Exadata ran the query returning 250,000 rows in just over 14 seconds and the non-Exadata system returned the same volume of data in just under 55 seconds. Both times are substantial improvements over not using a Bloom filter so any hardware running 11.2.0.2 or later will benefit from Bloom filters.

Of course there is a ‘gotcha’ with Bloom filters, and Oracle recognizes this as a bug in releases older than 12.1.0.2. Hash left outer joins, hash right outer joins and hash full outer joins do not use a Bloom Filter and for those interested the unpublished bug number is 17716301. MOS document 1919508.1 reports details on this behavior including an example. As reported in that document the patch can be backported to earlier releases when an SR is filed requesting it.

Bloom filters are a welcome addition to both Exadata and non-Exadata installations as they can improve query performance with parallel queries, as evidenced here. Now that you know how to spot when they are used you may find they are utilized more often than you expected.

It’s a bloomin’ miracle.

August 17, 2014

“You’re A Real ACE”

Filed under: General — dfitzjarrell @ 13:40

Recently I was declared, for want of a better word, an Oracle ACE. I was nominated by someone I respect and admire; just the nomination itself was enough, really, to make my day. When I received notification that I had been selected I was truly surprised. I immediately thanked my nominator, who told me:


                     "Still, it is just a nomination form, and you did all the work that
                     made it so impressive. Congratulations!"

Honestly, it wasn’t, and isn’t, work. It’s a pleasure to share my knowledge with others, which is why I blog, write for http://www.databasejournal.com twice a month and wrote “Oracle Exadata Survival Guide”.

Being an Oracle ACE is an honor, but it’s also a responsibility. What got me here was writing and participating in discussion groups, and that won’t change. Knowledge is to be shared, not hoarded. What you know does no one else any good if you don’t share that knowledge and experience. If Da Vinci had kept his notes to himself, if Newton hadn’t published his Laws of Motion, if Copernicus has kept quiet our world may have been quite different. It’s because these people had the foresight to look deeper into our world and then tell us what they found that puts us where we are today. It’s only right that we, as beneficiaries of the knowledge others before us shared, share our knowledge no matter how unimportant it may seem. Someone, somewhere, will find it useful and will be grateful that we did.

It may be that our knowledge is gained equally by insight and by mistakes. I know that I’ve made mistakes I’ve learned from, and I know others have done the same. Norton Juster, an architect who wrote a wonderful book named “The Phantom Tollbooth”, said it best:


                     "You may not see it now," said the Princess of Pure Reason, looking
                     knowingly at Milo's puzzled face, "but whatever we learn has a
                     purpose and whatever we do affects everything and everyone else, if
                     even in the tiniest way. Why, when a housefly flaps his wings, a
                     breeze goes round the world; when a speck of dust falls to the
                     ground, the entire planet weighs a little more; and when you stamp
                     your foot, the earth moves slightly off its course. Whenever you
                     laugh, gladness spreads like the ripples in a pond; and whenever
                     you're sad, no one anywhere can be really happy. And it's much the
                     same thing with knowledge, for whenever you learn something new,
                     the whole world becomes that much richer."

                     "And remember, also," added the Princess of Sweet Rhyme, "that many
                     places you would like to see are just off the map and many things 
                     you want to know are just out of sight or a little beyond your 
                     reach. But someday you'll reach them all, for what you learn today, 
                     for no reason at all, will help you discover all the wonderful 
                     secrets of tomorrow."

I think this sums up being an Oracle ACE better than any words I could put on a page. I keep this quote with me, to remind me that mistakes aren’t the end of the world if you learn from them and move on. And what I learn from those mistakes may help others so they don’t make those same mistakes.

I love sharing what I know; I’ve been doing it for over 16 years now, in various forums, some that are no longer as popular as they once were. I never realized how far my commentary reached until I became an Oracle ACE; I have received congratulations and comments that I never expected, mostly of the ‘it’s about time’ sentiment. Simply because you don’t see the gratitude of others who benefit from your knowledge doesn’t mean that gratitude doesn’t exist. I see now that it does, and I am humbled by it.

“Oracle ACE”. To be among those who have been graced by this honor gives me pause. It’s a good pause, awesome if I may be allowed to say it. And it’s an honor that will only make me work harder so that I can live up to such high standards.

Now, back to our regularly scheduled programming.

July 22, 2014

That’s … Huge!

Filed under: Exadata,General,Performance — dfitzjarrell @ 14:27

Recently I’ve noticed the occasional thread in Oracle newsgroups and lists asking about hugepages support in Linux, including ‘best practices’ for hugepages configuration. This information is out on that ‘world-wide web’ in various places; I’d rather put a lot of that information in this post to provide an easier way to get to it. I’ll cover what hugepages are, what they do, what they can’t do and how best to allocate them for your particular installation. Let’s get started.

“Normal” memory pages in Linux are 4 KB in size and are allocated as needed where the memory map will allow so they are likely not contiguous. Hugepages, in comparison, are 2 MB pages locked in memory and are allocated in a contiguous ‘chunk'; these are allocated at boot time using a parameter in the /etc/sysctl.conf file named vm.nr_hugepages for RedHat Linux kernel 2.6 and a parameter named vm.hugetlb_pool for RedHat Linux kernel 2.4. You must remember that each page is 2 MB is size, as this affects how many hugepages you need to allocate to cover all of the SGAs of running Oracle databases. Set this too high and your system might not boot; set this too low and you won’t get the benefit of using hugepages in one or more Oracle instances. [When use_large_pages is set to ONLY and insufficient hugepages are available the database won't start; changing use_large_pages to TRUE allows the database to start but it won't be using hugepages, which results in more paging activity, lowering performance.] Since hugepages are contiguous it stands to reason that performance will improve since Oracle won’t need to access memory segments across the entire range of installed memory. Hugepages also reduce the size of ‘page tables’ by reducing the overhead for page tables for each connection to the database. Page tables are part of the Linux memory management, created and used to map a virtual address to a physical memory location. For SGAs 100 MB and greater using hugepages is probably a good idea, and that idea gets better as the SGA size increases. Kevin Closson reported back in 2009 on the page table overhead for an Oracle database with an 8000 MB SGA running 500 simultaneous connections. The page table overhead reported was startling — almost 6.8 GB of memory used for page tables when hugepages were not configured. He ran the same test again, on the same Oracle database, only that test was run after hugepages were allocated and used by the Oracle database. The page table overhead was reduced to slightly less than 23 MB. That is a dramatic change in memory usage, one that can definitely improve performance.

Hugepages are available for IPC (Inter-Process Communication) shared memory; this is the ‘standard’ shared memory model (starting with UNIX System V) allowing multiple processes to access the same shared memory segment. There is also another form of shared memory segment, the memory-mapped file, and currently such segments are not supported by hugepages. Oracle, on Linux, gives you a choice of using hugepages or memory-mapped files and you implement that choice by selecting to use (or not use) Automatic Memory Management (AMM). Choosing AMM disallows hugepages since AMM uses memory-mapped files located in /dev/shm; you cannot use AMM and hugepages together. For databases with SGAs up to 100 MB using AMM is probably a good idea; a script to determine how many hugepages to allocate won’t consider SGAs of less than 100 MB. For Oracle databases with larger SGAs hugepages is, in my opinion, the choice to make as you can significantly reduce shared memory overhead where page tables are concerned.

Calculating how many hugepages to allocate has been a difficult task in the past; Oracle supplies a script that can calculate that value for you provided you have all of the databases that you want to use hugepages running; the script from Oracle won’t consider databases you have created that aren’t up and available as it uses the ipcs -m command to return the allocated shared memory segments. The script is available from My Oracle Support in Doc ID 401749.1. It generates a list of IPC shared memory segment sizes owned by Oracle and computes the total number of hugepages required. It outputs the result to be used in setting the vm.nr_hugepages parameter in /etc/sysctl.conf. Once that setting is modified the system has to be rebooted before the changes are implemented. Sample output from that script is shown below:


This script is provided by Doc ID 401749.1 from My Oracle Support
(http://support.oracle.com) where it is intended to compute values for
the recommended HugePages/HugeTLB configuration for the current shared
memory segments. Before proceeding with the execution please note following:
 * For ASM instance, it needs to configure ASMM instead of AMM.
 * The 'pga_aggregate_target' is outside the SGA and
   you should accommodate this while calculating SGA size.
 * In case you changes the DB SGA size,
   as the new SGA will not fit in the previous HugePages configuration,
   it had better disable the whole HugePages,
   start the DB with new SGA size and run the script again.
And make sure that:
 * Oracle Database instance(s) are up and running
 * Oracle Database 11g Automatic Memory Management (AMM) is not setup
   (See Doc ID 749851.1)
 * The shared memory segments can be listed by command:
     # ipcs -m


Press Enter to proceed...

Recommended setting: vm.nr_hugepages = 25240

For Linux RedHat kernel 2.6 this is the number of hugepages, not the overall size of the memory allocated for those hugepages. For RedHat kernel 2.4 a different parameter, vm.hugetlb_pool, is set to the actual size that pool needs to be. Fortunately the script knows which kernel is running and reports the correct parameter and value to set.

After you have hugepages allocated you need to tell Oracle to use them exclusively. An init parameter, use_large_pages, should be set to ONLY to keep Oracle from allocating conventionally managed memory segments for the SGA. If your hugepages calculations are correct then all of the databases on your database server will start without error. Allocate too few hugepages and databases with use_large_pages set to ONLY won’t start after all of the existing hugepages are used. Setting use_large_pages to TRUE will allow a database to start without using hugepages but that’s not what you want to get better performance from Oracle.

Earlier I mentioned that allocating too many hugepages can be worse than allocating too few hugepages. Since each hugepage consumes 2 MB of space multiplying the vm.nr_hugepages setting by 2097152 will show how much memory is occupied by hugepages. It is possible to try to allocate all of the system memory as hugepages resulting in a system that will not boot in multi-user mode. The kernel isn’t configured to use hugepages; the more hugepages that are allocated the less conventionally managed memory is available. Take it all away and there is no place for the operating system to start necessary processes after the kernel is loaded at boot time. Using single-user mode will allow the system adminstrator to modify the offending entry and change it to a more reasonable value so the system can be rebooted. Of course you don’t want to end up in that situation; manually calculating such a value can result in errors so use the Oracle-supplied script to calculate the hugepages value you actually need.

Exadata systems run best when hugepages are allocated and used exclusively. The exachk script checks for such settings in each running database on the system and reports those that do not have use_large_pages set to ONLY. It also checks if the currently configured hugepages setting meets or exceeds the estimated number of hugepages required by all running instances on the given database (compute) node. If this check fails it most likely will fail for all available nodes on the system. Failing this check isn’t the end of the world but it does mean that the failing databases aren’t using memory as efficiently as they could. For each node run the script, make the reported changes then reboot. Before starting any of the databases ensure that all have use_large_pages set to ONLY; start the databases across the cluster using srvctl. The next run ef exachk should report that the hugepages check passed.

RAC databases need to be configured the same way Exadata databases are configured (since Exadata databases are RAC databases). Modify kernel parameters on every database server in the cluster, then reboot. Modify the spfile for each database in the cluster then use srvctl to start those databases cluster-wide. Presuming the hugepages settings are correct across the cluster there should be no error when starting the databases.

Using hugepages for Oracle databases on Linux is, to me, the only way to go if you want to increase database performance. Of course, if you have loads memory to spare you may not see the need. Most systems I’ve worked on don’t have that luxury so being as ‘frugal’ with memory as you can is good. Making the most of available resources is key and hugepages on Linux can definitely help with respect to Oracle.

Size does matter.

July 15, 2014

It Pays To Be Smart

Filed under: Exadata,General — dfitzjarrell @ 09:40

Exadata is a powerful system, able to provide exceptional performance. Much of this peformance is due to Smart Scans, Exadata’s mechanism for shifting the workload to the storage cells and having them filter and reduce the amount of data the database servers must process. Not every Smart Scan that starts ends up completing, though. Oracle may decide that a different path, one that doesn’t include a Smart Scan, may be more efficient. Certain conditions must be met and there is a metric that records Smart Scan starts and another that records how many of those starts actually execute Smart Scans. Let’s look at why Oracle may decide a Smart Scan isn’t the ‘smartest’ route to the data and which metrics you can use to see how many Smart Scans end up using another execution path.

Three conditions can trigger Oracle foregoing a smart scan: setting optimizer_mode to either FIRST_ROWS or FIRST_ROWS_n or executing a query using the ‘where rownum’ predicate. Oracle originally opts to do the Smart Scan then can perform a few block I/O operations to see if that pathway can satisfy the query more efficiently. When that happens a metric, cell num fast response sessions, is incremented. If the block I/O is indeed faster Oracle foregoes the Smart Scan and completes the operation with conventional block I/O. If the block I/O proves that correct results won’t be returned to the calling session in a timely fashion Oracle resumes the Smart Scan and continues processing. Simply because this counter is incremented does not mean, by itself, that Oracle didn’t execute a Smart Scan. You’ll need another metric for that.

When Oracle decides to return to Smart Scan execution another metric is incremented, cell num fast response sessions continuing to smart scan. The same conditions that triggered the previous metric also apply here. Thus when Oracle decides to forego a Smart Scan only the previous metric, cell num fast response sessions, is incremented.

It’s fairly easy to see how many Smart Scans followed through and how many didn’t with the following query:


select a.value smart_scan_started, b.value smart_scan_cont, a.value - b.value no_smart_scan
from v$sysstat a, v$sysstat b
where a.statistic#=262
and b.statistic#=263
/

You may find that none of the Smart Scans that were started actually went on to finish:


SMART_SCAN_STARTED SMART_SCAN_CONT NO_SMART_SCAN
------------------ --------------- -------------
              1222               0          1222

Or you may find that all of them finished:


SMART_SCAN_STARTED SMART_SCAN_CONT NO_SMART_SCAN
------------------ --------------- -------------
               407             407             0

You may also find that some did, and some didn’t, continue:


SMART_SCAN_STARTED SMART_SCAN_CONT NO_SMART_SCAN
------------------ --------------- -------------
                53              27            26

What is important to note is that simply because Oracle started a Smart Scan then chose to not continue on with it doesn’t indicate there is any problem with your Exadata system. On the contrary it indicates that Exadata is doing exactly what it’s supposed to do, efficiently return query results to the calling session. That may be through a Smart Scan, it may be through conventional I/O. Think of it as adjusting your route to a destination to avoid heavy traffic. If Oracle can get the correct results through a few conventional I/O operations it will do so; if a Smart Scan is necessary Oracle can again ‘change gears’ and resume it to ensure correct data is sent to the calling session. It all comes down to using the right tool for the job.

Smart Scans are great when you need them; it’s also good to know that Oracle, on Exadata, can decide when to use them and when it’s best not to. It’s also good to realize that when that happens it’s not because of a problem with Exadata. It’s simply Oracle choosing the right method to get the correct data. Sometimes the shorter path is the better path.

Robert Frost may have said it best:


 
"Two roads diverged in a wood, and I—  
I took the one less traveled by,  
And that has made all the difference." 

Oracle, on Exadata, may take the ‘road less traveled by’ when it comes to choosing a Smart Scan. That’s normally nothing to worry about as Oracle is selecting the most efficient path to return the data.

Anyone for a stroll?

July 1, 2014

What A Performance!

Filed under: General,Performance,stats — dfitzjarrell @ 10:09

Performance is one of the big watchwords in tht IT industry; nowhere is it more often used than in the database arena, and it’s usually not a good sign. End users know nothing, really, of the inner workings of the database yet they are the first to point the ‘finger of blame’ at the database when the processing speed slows or queries take longer than they would like to return data. [For end users performance is time, and the longer a process takes the more dismal performance seems to be.] Yes, there ARE times when the database is at fault — ‘missing’ indexes or stale statistics can cause execution plans to change, as can inserts and updates to heavily transacted tables (which many times goes back to stale statistics since they now no longer accurately reflect the data and its distribution in the table). Bugs in the database software can also cause issues by causing the gathered statistics to miscalculate values or generate ‘useless’ frequency histograms, such as on single-valued not null columns. Such histograms can direct the optimizer to choose a path it wouldn’t normally choose when the statistics and histograms contain useful data.

Here is where the problem can get complicated: it may not be the database at fault when performance is down. Yes, it’s the database server you’re geting data from. Yes, things were running just fine until today. By ‘logical’ deduction it’s the database at fault, right? Not necessarily. The database also has to deal with the operating system, the storage system, the available memory and the network to get its work done. Problems in any of those areas are likely to adversely affect database performance. So it’s not necesarily the database causing the problem. How do you know where the problem originates? Oracle does provide some tools to help in assessing where a problem is likely to be, but you may also need to involve System Administrators, Network Administrators and/or Storage Administrators to fully investigate some issues. Let’s take a look at the Oracle supplied tools, along with some operating system utilities that can be used to diagnose performance issues.

At the first sign of trouble many look to the DBA to investigate the database for bottlenecks or other performance-robbing possibilities that surely must exist. Oracle provides at least two tools in releases 10.1 and later to investigate the database for performance problems — the AWR report, a general overall ‘picture’ of the database ‘health’ that can give direction to such a quest and the ASH report, a more detailed ‘picture’ for a given window of activity that provides more detailed wait information. These utilities should be used at the first report of performance issues to determine if the problem is database-related or caused by another area, such as the operating system or network. Starting with the AWR (Automatic Workload Repository) report the two sections to begin an investigation, in my estimation, are Wait Events Statistics and SQL Statistics. The first provides a system-wide view of the waits encountered and the times those waits consumed. Background Wait Events and the Wait Event Histogram sections can provide some insight into what was occurring during the report window; waits to watch are I/O related waits (reads, writes) which can signal issues with a query plan or issues with the storage (issues not addressable by the DBA). The SQL Statistics list SQL statements in order of several areas, including overall elapsed time, consumed CPU time, I/O wait time and physical reads. The SQL statistics can usually point to ‘bad’ plans, making performance problems a database issue because such plans are the result of bad or stale statistics. In 11.2 and later releases dynamic sampling can mitigate missing statistics by automatically sampling the data prior to generating a query execution plan. We won’t get into the details of using these reports to effect database tuning procedures as that’s a topic for another article. The intent here is to provide a means to determine IF the problem is a database issue or if it’s caused by outside ‘forces’. For releases earlier than 10.1 Oracle provides Statspack, the pre-cursor to AWR and the next step after the original database snapshot utilities utlbstat and utlestst. This utiility can provide a very good snapshot of database activity, although not quite as robust as the AWR report. It is still available in 11.2 if you would like to install it. AWR and ASH are utilities which are part of the Diagnostic and Tuning Pack for OEM, and thus need to be licensed if they are to be used. Statspack doesn’t require such a license.

The “Top 5 Timed Foreground Events” section gives an overview of the wait events which consumed the most time during the report window:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU   0 82.42  
log file sync 8 0 11 30.72 Commit
Disk file operations I/O 1 0 1 0.33 User I/O
os thread startup 0 0   0.00 Concurrency
log file single write 0 0   0.00 System I/O

Host CPU (CPUs: 4 Cores: 2 Sockets: 1)

This provides a quick look at the ‘worst offenders’ in the wait event category (although they may not be performance-robbing events). Start here then work down to the Operating System Statistics section:

Operating System Statistics – Detail

Snap Time Load %busy %user %sys %idle %iowait
20-Jun 07:04:42 0.00          
20-Jun 08:00:45 0.00 7.12 4.65 2.48 92.88 0.00

The Wait Event Histogram Detail can be very informative:

Wait Event Histogram Detail (64 msec to 2 sec)

  • Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
  • Units for % of Total Waits: ms is milliseconds s is 1024 milliseconds (approximately 1 second)
  • % of Total Waits: total waits for all wait classes, including Idle
  • % of Total Waits: value of .0 indicates value was <.05%; value of null is truly 0
  • Ordered by Event (only non-idle events are displayed)
    % of Total Waits
Event Waits 64ms to 2s <32ms <64ms <1/8s <1/4s <1/2s <1s <2s >=2s
ADR block file read 3 81.3 12.5 6.3          
Disk file operations I/O 1 98.2 1.8            
control file parallel write 15 98.7 .8 .3 .1        
control file sequential read 139 94.0 4.7 .8 .3 .1   .0  
db file parallel read 2   50.0   50.0        
db file parallel write 185 91.4 5.7 1.8 .8 .1 .2    
db file scattered read 63 93.5 5.0 1.2 .3        
db file sequential read 142 94.2 4.5 1.2 .1 .0      
direct path read 1     100.0          
enq: CR – block range reuse ckpt 16 82.2 15.6   1.1   1.1    
log file parallel write 82 92.5 4.6 2.6 .3 .1      
log file switch completion 2     50.0 50.0        
log file sync 1 90.0 10.0            

Wait Event Histogram Detail (4 sec to 2 min)

No data exists for this section of the report.

Wait Event Histogram Detail (4 min to 1 hr)

No data exists for this section of the report.

Service Statistics

  • ordered by DB Time
Service Name DB Time (s) DB CPU (s) Physical Reads (K) Logical Reads (K)
SYS$USERS 0 0 0 2
SYS$BACKGROUND 0 0 8 436
smedley 0 0 0 0
smedleyXDB 0 0 0 0

Such histograms can show where the preponderance of waits lie making it easier to determine if they are problematic.

Next on the list is the SQL Statistics report. There are several sections, the one shown is ordered by elapsed time:

SQL ordered by Elapsed Time

  • Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
  • % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
  • %Total – Elapsed Time as a percentage of Total DB time
  • %CPU – CPU Time as a percentage of Elapsed Time
  • %IO – User I/O Time as a percentage of Elapsed Time
  • Captured SQL account for 1.1E+04% of Total DB Time (s): 0
  • Captured PL/SQL account for 2.3E+03% of Total DB Time (s): 0
Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text
5.88 1 5.88 2070.31 15.66 79.60 acc988uzvjmmt   delete from WRH$_SYSMETRIC_HIS…
4.60 22 0.21 1618.80 27.16 64.39 70vs1d7ywk5m0 MMON_SLAVE begin dbms_stats.copy_table_st…
2.49 44 0.06 876.07 10.66 83.39 2mp99nzd9u1qp   delete from histgrm$ where obj…
2.14 769 0.00 753.49 4.38 96.01 db78fxqxwxt7r   select /*+ rule */ bucket, end…
1.57 212 0.01 551.64 21.91 74.49 43ruajmwv9fus   insert into histgrm$(obj#, int…
1.42 1 1.42 501.54 7.67 86.00 0sfsrf0qzvsjv   delete from WRH$_SYSMETRIC_SUM…
1.19 22 0.05 418.37 2.63 93.41 d89c1mh5pvbkz MMON_SLAVE SELECT /*+ rule */ U.NAME, OT….
0.83 1 0.83 291.84 13.18 77.73 2whm2vvjb98k7   delete from WRH$_SQL_PLAN tab …
0.80 22 0.04 281.90 25.34 67.82 8sd37ztuwkyyc MMON_SLAVE begin dbms_swrf_internal.massa…
0.66 1 0.66 231.35 2.38 98.81 28bm5y53nujvz   delete from WRH$_RESOURCE_LIMI…
0.54 1 0.54 188.64 0.00 91.89 4y3yvmbav7uwb   delete /*+ UNNEST(@SEL$2) UNNE…
0.53 22 0.02 185.34 0.00 96.67 63kf3an2j0pkc   insert into sys.wri$_optstat_h…
0.51 1 0.51 179.21 9.20 75.31 96w2cbx7ntmtv   delete from WRH$_BG_EVENT_SUMM…
0.50 1 0.50 175.27 6.27 94.54 31sdu97k3baq1   delete from WRH$_SHARED_POOL_A…
0.49 22 0.02 171.58 12.81 66.96 480utgnq0t6mf MMON_SLAVE begin dbms_stats.unlock_partit…
0.44 1 0.44 156.14 17.60 80.45 1h7zt6jks80pa   delete from WRH$_ENQUEUE_STAT …
0.42 1 0.42 146.27 15.03 94.38 bzscyq07w79ab   delete /*+ dynamic_sampling(4)…
0.38 2,361 0.00 133.20 24.75 80.65 96g93hntrzjtr   select /*+ rule */ bucket_cnt,…
0.35 1 0.35 123.71 4.44 94.32 9v9n97qj8z1dg   delete /*+ dynamic_sampling(4)…
0.35 22 0.02 122.04 27.02 66.45 2rn08n06qr261   update histgrm$ set obj# = dec…
0.34 1 0.34 121.06 0.00 96.73 6ajkhukk78nsr   begin prvt_hdm.auto_execute( :…
0.31 1 0.31 107.67 0.00 97.51 48ffykkmk7sgc   delete /*+ dynamic_sampling(4)…
0.28 1 0.28 99.65 0.00 79.94 350myuyx0t1d6   insert into wrh$_tablespace_st…
0.27 375 0.00 94.36 17.47 87.22 bvn085xhxfn9f   insert into sys.wri$_optstat_h…
0.26 161 0.00 90.72 6.06 90.14 74anujtt8zw4h   select o.owner#, o.name, o.nam…
0.25 1 0.25 86.79 6.33 95.67 47x8f8z2hx1a2   delete from WRH$_MUTEX_SLEEP t…
0.24 1 0.24 85.22 6.45 92.64 az8cj7dfnpp45   delete from WRH$_MEM_DYNAMIC_C…
0.23 1 0.23 81.60 6.73 41.10 3mqvkt9as1phq   merge /*+ dynamic_sampling(mm…
0.22 1 0.22 76.17 0.00 87.96 du2bqfpfj9au2   delete from WRH$_LIBRARYCACHE …
0.21 1 0.21 73.67 14.92 93.36 b7ba0zt64wtc7   delete from WRH$_RSRC_CONSUMER…
0.20 44 0.00 71.59 7.68 71.51 as3uq6ggb3gx6   delete from hist_head$ where o…
0.20 89 0.00 71.52 30.73 92.39 3ktacv9r56b51   select owner#, name, namespace…
0.20 10 0.02 71.14 0.00 99.04 c6awqs517jpj0   select /*+ index(idl_char$ i_i…
0.20 1,098 0.00 70.92 23.24 58.47 3c1kubcdjnppq   update sys.col_usage$ set equa…
0.19 22 0.01 68.12 56.47 8.79 9dy457uf4uxzd MMON_SLAVE SELECT /*+ all_rows */ ‘"’||UI…
0.19 1 0.19 67.57 0.00 0.00 f318xdxdn0pdc   insert into wrh$_log (snap_id,…
0.19 113 0.00 67.01 0.00 84.34 59vjj34vugaav   delete from obj$ where obj# = …
0.18 10 0.02 64.60 0.00 96.35 cvn54b7yz0s8u   select /*+ index(idl_ub1$ i_id…
0.18 375 0.00 63.34 43.38 53.87 95mpkn5xz9001   insert into hist_head$(obj#, i…
0.18 22 0.01 62.59 0.00 71.91 dr277b6yv83uy MMON_SLAVE SELECT COLUMN_NAME FROM DBA_PA…
0.18 671 0.00 62.17 17.68 0.00 5ms6rbzdnq16t   select job, nvl2(last_date, 1,…
0.18 1 0.18 61.81 8.89 96.19 7f8bfqbjxdssr   delete from WRH$_LOG tab where…
0.17 1 0.17 61.43 17.89 95.77 7g732rx16j8jc   insert into WRH$_SERVICE_STAT …
0.16 48 0.00 56.37 48.74 48.27 4h8danbxj5nz6   begin dbms_utility.validate(:b…
0.16 1 0.16 54.88 80.11 0.00 dayq182sk41ks   insert into wrh$_memory_target…
0.15 252 0.00 54.58 30.21 39.65 7ng34ruy5awxq   select i.obj#, i.ts#, i.file#,…
0.15 1 0.15 54.51 80.65 0.00 bm2pwrpcr8ru6   select sga_size s, sga_size_fa…
0.15 1 0.15 51.42 106.87 0.00 bunssq950snhf   insert into wrh$_sga_target_ad…
0.14 1 0.14 49.59 0.00 93.41 084z6qq3wwkdc   delete from WRH$_SQL_WORKAREA_…
0.14 1 0.14 48.68 0.00 90.79 17g3cqghntwb6   delete from WRH$_PGA_TARGET_AD…
0.13 22 0.01 44.60 0.00 35.38 7kc5p5bw4uz55   update tabpart$ set dataobj# =…
0.12 1 0.12 41.02 26.79 91.44 83hpfc7nhjpvg   delete from WRH$_IOSTAT_FILETY…
0.12 22 0.01 40.72 121.45 16.37 267s83vd49cfa MMON_SLAVE SELECT PARTITION_NAME FROM DBA…
0.11 66 0.00 40.29 13.64 87.09 9fg7fjbwmv7v8   insert into sys.wri$_optstat_t…
0.11 1 0.11 39.34 13.97 65.25 9xcfxn794wp5h   select 1 from WRH$_LATCH_CHILD…
0.11 1 0.11 37.29 0.00 98.77 3hy3nssu9shxj   select /*+ FIRST_ROWS(1) PARAL…
0.10 6 0.02 36.11 0.00 63.61 9wncfacx0nj9h   insert into smon_scn_time (thr…
0.10 2 0.05 35.13 0.00 1.98 934ur8r7tqbjx   SELECT DBID FROM V$DATABASE
0.10 1 0.10 33.93 0.00 74.29 1v44r7vam2wbt   delete from WRH$_IOSTAT_FUNCTI…
0.10 1 0.10 33.68 0.00 96.85 cs78htw2dhb0x   delete from WRH$_IC_DEVICE_STA…
0.09 66 0.00 33.17 0.00 92.58 gc7b0drtzbyc6   select max(intcol#) from hist_…
0.09 89 0.00 33.07 0.00 92.64 8swypbbr0m372   select order#, columns, types …
0.09 1 0.09 32.97 16.66 86.94 8h77nkt8bnukh   delete from WRH$_IOSTAT_DETAIL…
0.09 1 0.09 31.42 17.49 83.52 7gfhbhxrxx8wf   delete /*+ dynamic_sampling(4)…

This section can help determine which queries may be creating performance issues; using the sql_id and text you can retrive the plans to see if a rewrite may improve the situation. It can also show where statistics may not be current, although it would also be a good idea to run a 10053 trace against each suspect query to report what statistics the optimizer is using.

The ASH, or Active Session History, report digs deeper into important wait events and timings, providing a better picture of the wait activity during the requested window; a section of this report is shown below:

SQL ID Planhash Sampled # of Executions % Activity Event % Event Top Row Source % RwSrc SQL Text
572fbaj0fdw2b 1648993636 1 7.69 CPU + Wait for CPU 5.77 SELECT STATEMENT 5.77 select output from table(dbms_…
572fbaj0fdw2b 1648993636 1 7.69 db file sequential read 1.92 ** Row Source Not Available ** 1.92
6ajkhukk78nsr   1 3.85 CPU + Wait for CPU 3.85 ** Row Source Not Available ** 3.85 begin prvt_hdm.auto_execute( :…
1bx8mgs8by25x 2223429142 1 1.92 CPU + Wait for CPU 1.92 HASH JOIN 1.92 select coalesce( p1s.parameter…
2asgk01xtb6p0 3702291220 1 1.92 control file sequential read 1.92 FIXED TABLE – FULL 1.92 select d.dbid dbid , d.name db…
317v5hnvvd49h 1656119837 1 1.92 CPU + Wait for CPU 1.92 UPDATE 1.92 UPDATE wrh$_seg_stat_obj ob S…

Again, use this report to initially determine if the problem is database-related; if that is the case then this report should provide a wealth of data aimed at directing your database/query tuning efforts. If there is no indication of database problems then other areas are likely at fault. Waits to examine from both reports include SQL*Net more data to client, Disk file operations I/O and os thread startup. The first wait in that list an be an indicator of network problems, the second storage issues (array problems, pending disk failures) or CPU issues (excessive time for O/S thread startups can indicate over-taxed CPUs which may be caused by other, non-database processes).

After examining reports generated by AWR, ASH or Statspack you find the database is not the source of the performance issue you may be able to discern which area outside of the database is the likely suspect. SQL*Net more data to client waits, if the time per wait is relatively high, can indicate problems in the network layer. Utilities such as netstat can show configured network interfaces and network traffic data on those interfaces. Example output is shown below:

$ netstat -ginw
Kernel Interface table
Iface       MTU   Met      RX-OK RX-ERR RX-DRP RX-OVR      TX-OK TX-ERR TX-DRP TX-OVR Flg
bondeth0     1500   0 5084620269      0      0      0 3923352780      0      0      0 BMmRU
bondeth0:1   1500   0      - no statistics available -                                BMmRU
bondeth0:3   1500   0      - no statistics available -                                BMmRU
bondeth0:4   1500   0      - no statistics available -                                BMmRU
eth0         1500   0   42182099      0      0      0   21397203      0      0      0 BMRU
eth4         1500   0     305021      0      0      0     281488      0      0      0 BMsRU
eth5         1500   0 5084315248      0      0      0 3923071292      0      0      0 BMsRU
ib0         64000   0 2419856775      0      1      0 2178061942      3    271      0 BMsRU
ib1         64000   0   21305668      0      0      0          0      0      0      0 BMsRU
lo          16436   0 4495752311      0      0      0 4495752311      0      0      0 LRU
$

The utility reports transmit and receive statistics, invluding dropped packets and packet errors. Running netstat -ginw may show which local interface is having issues; send such output to the Network Administrator so he or she can interpret the results and take appropriate action.

What if the problem isn’t the database, and it isn’t the network? The next area that can adversely affect performance is the storage tier. Statspack and ASH/AWR reports also provide disk read and write wait information. Again, if the wait time for an individual read or write wait is high (for example, a single wait over 300 milliseconds) it may be the storage that’s at fault. As disks age their performance can deteriorate as the alignment the platter had when new is compromised by wear and tear around the spindle. Firmware can also be an issue, especially when disks are upgraded and the firmware is not. This is not an area for the DBA to assess, but it is one he or she can report to the Storage Administrator for further investigation.

Network-related issues may also be a problem with either the local network interface card (NIC) or the cabling. This is, again, a task for the Network Administrator to assess and address, but you may be able to ‘point’ him or her in the right direction with output from the ping utility. Provided you have two or more systems that can ‘see’ the same endpoint a ping comparison may show network problems local to the database server. For example if server A can ping server C with response times less than 20 milliseconds and server B, using the same subnet, reports ping times in the 100 millisecond or greater range this could indicate either cabling issues or a NIC that may be misconfigured or failing. This isn’t likely to be valid if the two servers are on different subnets which is why it’s necessary to get the Network Administrator involved as soon as you suspect a network issue. He or she can provide needed configuration data and can better analyze and diagnose such problems.

And yet another area where performance can suffer is at the operating system level. A system starved for CPU, memory or both because of a temporarily high load can severly slow down database processes along with many other non-database-related services. The System Administrator may need to be called in on the issue to report what processes were running during a particular period when response time was slow. Remember that the end users equate performance with response time; a large, resource-intensive report that runs once a month can seriously slow down other database activity during that same period. System backups can also consume resources normally available for user and database processes. Knowing the period of poor performance can greatly assist the System Administrator investigating the issue. If it’s a defined window, for example from 8 PM to Midnight on Tuesdays, the System Administrator can track down the process causing the slowdown by monitoring the system activity during that period. It may be a process that can’t be stopped or rescheduled, but at least the end users will know what is causing their normally fast transactions to run much slower than usual. As mentioned before when you know it’s not a database issue get the Administrators of the suspected systems involved, because the earlier you get more eyes on the problem the faster the cause can be found and a solution put in place.

The DBA is the ‘go-to’ person when the database is to blame for performance issues (long run times for specific queries, as an example). But, when the DBA finds nothing awry in the database, other parties may need to get involved. Again, just because it seems to be the database at fault doesn’t mean that the database IS at fault; it could be other, related areas affecting performance. Knowing what database reports can be of use is the first step in being able to assess a situation so other areas, such as network and storage, can be brought in if need be. Knowing who to involve is just as important as knowing that a performance issue exists. Communication is key in these situations; a network or storage administrator may ask for additional information regarding the problem; this is where the AWR and/or Statspack reports may be helpful. Learn how to generate these reports now (if you don’t already know) so that when you need them you can concentrate on the issues at hand rather than how to generate a report.

Performance is a term that can have a broad interpretation, depending upon who is doing the interpreting. End users may more often report performance issues when a report that ran fine yesterday now takes much longer to complete because all they may see is their process running slowly. Knowing when the problem is the database and when it isn’t is key to getting such issues diagnosed and resolved. Hopefully this article has given you some tools to help in that regard; use them wisely and communicate well so that you can be part of the solution by knowing who else to involve when it isn’t the database that is at fault.

Now, on with the show!

June 6, 2014

“A Ponderous Chain”

Filed under: General — dfitzjarrell @ 11:59

Chained and migrated rows in an Oracle database can seriously impact I/O performance, especially with Exadata. In a ‘normal’ Oracle database [read that as "non-Exadata, non-ASM" database] chained rows, in small numbers where that usually means 1% or less of the total rows in a table, are generally a nuisance. The performance hit they generate in such a situation is small and may not even be noticed by the end users. Add more chained/migrated rows to that mix and the I/O work increase starts becoming noticeable. How do these rows get created, what is the problem they cause and how can you mitigate the issue? Let’s look into what chained/migrated rows are, what can cause them and how to deal with them.

Chained and migrated rows are a unique lot — they exist in two or more data blocks and are linked by the “head” rowid, usually the rowid of the original row. The chained piece or pieces come into being when a row is updated and the data block containing it can’t store the entire row any more. A migrated row is one that is moved entirely to a new block with a ‘pointer’ left behind associated with the original rowid. Let’s try to draw a rather simplified picture to illustrate that concept:

Original row -- Data fits into the available space in the block

       ------------------------------------------------------
000001:|Bonsai Bob|14734|23-NOV-2013|343.27|Duluth|Dept 300 |
       ------------------------------------------------------

Update is made -- Data now takes more space than block has available

       ------------------------------------------------------
000001:|Samurai Sam The Singing Elf|14734|23-NOV-2013|343.27|Duluth|Dept 300 |
       ------------------------------------------------------

Data no longer 'fits' in one block, it's divided between blocks --

       ------------------------------------------------------
000001:|Samurai Sam The Singing Elf|14734|23-NOV-2013|000017|
       ------------------------------------------------------
000017:|343.27|Duluth|Dept 300                              |
       ------------------------------------------------------

Notice there is a ‘pointer’ to the next piece of the row at the end of the original row. This occurs for every ‘split’ this row takes to get the data to ‘fit’ in the table. It’s also possible for a row to be migrated, in its entirety, to an empty block to minimize the trips Oracle has to take to retrieve the data; this occurs when the original row was small and occupied minimal space in a data block. Updates cause it to expand in length and make Oracle relocate the data to another block or blocks with more available space:

Original row -- Data occupies available space

       ------------------------------------------------------
000009:||Bonsai Bob|14734|23-NOV-2013|
       ------------------------------------------------------

Update is made -- Data no longer fits

       ------------------------------------------------------
000009:|Samurai Sam The Singing Elf|14734|23-NOV-2013|343.27|Duluth|Dept 300 |
       ------------------------------------------------------

Data is divided between blocks --

       ------------------------------------------------------
000009:||000019|
       ------------------------------------------------------
000019:|Samurai Sam The Singing Elf|14734|23-NOV-2013|000037|
       ------------------------------------------------------
000037:|343.27|Duluth|Dept 300                              |
       ------------------------------------------------------

The original rowid now has a pointer to the new location of the modified data. That migrated location may also be chained to an additional block because the length of the modified data eventually exceeds the data block size.

What does this mean for performance? It depends, really, on the overall number of rows in the table or schema that are chained or migrated. For a database not on Exadata, or one not using ASM, each chained/migrated row requires at least two fetches to return the data, which at least doubles the work Oracle has to do to process that row. For one, ten or one hundred rows this may be a very small decrease in performance. When 10% or more of the data in a table is chained/migrated I/O processing slows down, and this can happen for tables where the row length can exceed the block size. Without intervention Oracle usually sets the block size to 8196, or 8K, bytes. After taking into consideration any ‘overhead’ introduced by Oracle the maximum available space in an 8K block is rougly 7900 to 8000 bytes. Any table row longer than that will create a chained row even for an empty block. This can be changed at database creation or by creating tablespaces using a larger block size. [Smaller block sizes can also be used but would not benefit any process to reduce row chaining.] The largest possible block size supported by Oracle is 32K so if a table row exceeds that length when fully populated chained rows are guaranteed to exist. The more ‘passes’ Oracle must make to fetch a row the longer the I/O waits can be resulting in dimished performance so it’s best to minimize their occurrence.

Exadata and ASM present a different scenario. Since a diskgroup is striped across all available disks associated with that group the pieces of a chained row can exist on different disks, and, for Exadata, different storage cells. For a ‘regular’ database using ASM the performance can be slower because not only is Oracle needing to visit different blocks to return data it may also have to visit different disks as well, introducing additional latency. On top of that for an Exadata database chained rows, if not on the same disk or accessed by the same storage cell, causes Oracle to revert to regular block I/O, eliminating any chance of using a Smart Scan and any of the optimizations a Smart Scan would provide.

The question to answer is “How do I eliminate, or at least minimize, chained and migrated rows?” The first, and most effective, way is to design tables with the database block size in mind, taking care to not exceed the available space in the block when a row is fully populated and all variable-length columns contain data sized at the maximum defined length. It is possible to increase that available space somewhat, by setting two parameters, PCTFREE and PCTUSED, to tell Oracle how to allocate the datablock space. PCTUSED defines how much of the block can be used for data and governs when a block is taken off of the ‘available’ list. PCTFREE reserves the declared percentage for updates to existing data in the block. If rows are being constantly updated, such as in a financial system, it might be best to set PCTUSED to a smaller value than the default (which is 60) and PCTFREE to 100-PCTUSED, to ensure more space is available for updates to existing data. I have seen at least one system where PCTUSED was set to 1 and PCTFREE set to 99, allowing 1 row per block with plenty of room for updates. In that system the existence of chained rows dropped significantly. Please note that making such a change to an existing table using ‘alter table … ‘ won’t affect the currently populated data blocks. To do that in Oracle releases 9.0.1.x and later you’ll need to perform a move operation, to rebuild the table with the new PCTUSED and PCTFREE settings:


SQL> alter table empdrop move pctfree 99 pctused 1;

Table altered.

SQL>

Unfortunately such a move can increase the table size considerably since the rows per block has now been reduced to a much smaller value. [If you're still in the 'dark ages', using Oracle 8.1 or earlier, you'll need to export the table, pre-create it with a script modified to set the PCTUSED and PCTFREE 'properly' then import the data ignoring the table create error.] This is not a task to be taken lightly, nor one to use ‘canned’ scripts and ‘best practice’ values as each implementation is different. Simply because one table benefitted from such drastic settings for PCTUSED and PCTFREE doesn’t mean EVERY table will need such a dramatic change. Base your changes on the incidence of chained rows and the maximum row length as compared to the database block size.

“How do I know if I have chained rows?”, you ask. Oracle keeps a system statistic, ‘table fetch continued row’, that increments each time a continued/migrated row is fetched. It doesn’t tell you WHERE those rows are, however the ANALYZE TABLE command is still available to provide that information. It can take a while to run on a large table but by default it will populate a table named, brilliantly enough, CHAINED_ROWS, designed to contain all of the information you could need to identify which rows are chained in a table. There is no way to gather this information at the schema level (DBMS_STATS doesn’t have any procedure or function to gather that data) but you can create a script to do the deed. There is some preparation involved, namely creating the CHAINED_ROWS table, but Oracle has conveniently provided a script for that named utlchain.sql, located in $ORACLE_HOME/rdbms/admin:


rem
rem $Header: utlchain.sql 07-may-96.19:40:01 sbasu Exp $
rem
Rem Copyright (c) 1990, 1995, 1996, 1998 by Oracle Corporation
Rem NAME
REM    UTLCHAIN.SQL
Rem  FUNCTION
Rem    Creates the default table for storing the output of the
Rem    analyze list chained rows command
Rem  NOTES
Rem  MODIFIED
Rem     syeung     06/17/98  - add subpartition_name
Rem     mmonajje   05/21/96 -  Replace timestamp col name with analyze_timestam
Rem     sbasu      05/07/96 -  Remove echo setting
Rem     ssamu      08/14/95 -  merge PTI with Objects
Rem     ssamu      07/24/95 -  add field for partition name
Rem     glumpkin   10/19/92 -  Renamed from CHAINROW.SQL
Rem     ggatlin    03/09/92 -  add set echo on
Rem     rlim       04/29/91 -         change char to varchar2
Rem   Klein      01/10/91 - add owner name for chained rows
Rem   Klein      12/04/90 - Creation
Rem

create table CHAINED_ROWS (
  owner_name         varchar2(30),
  table_name         varchar2(30),
  cluster_name       varchar2(30),
  partition_name     varchar2(30),
  subpartition_name  varchar2(30),
  head_rowid         rowid,
  analyze_timestamp  date
);

Oracle also provides another version of that table in the $ORACLE_HOME/rdbms/admin/utlchn1.sql script, that uses the UROWID datatype which supports both Oracle and non-Oracle database rowid values:


Rem
Rem $Header: utlchn1.sql 24-jun-99.07:57:57 echong Exp $
Rem
Rem utlchn1.sql
Rem
Rem  Copyright (c) Oracle Corporation 1998, 1999. All Rights Reserved.
Rem
Rem    NAME
Rem      utlchn1.sql - 
Rem
Rem    DESCRIPTION
Rem      
Rem
Rem    NOTES
Rem      
Rem
Rem    MODIFIED   (MM/DD/YY)
Rem    echong      06/24/99 - rename
Rem    syeung      06/22/98 - add subpartition_name
Rem    echong      06/05/98 - chained rows table with urowid type
Rem    echong      06/05/98 - Created
Rem

create table CHAINED_ROWS (
  owner_name         varchar2(30),
  table_name         varchar2(30),
  cluster_name       varchar2(30),
  partition_name     varchar2(30),
  subpartition_name  varchar2(30),
  head_rowid         urowid,
  analyze_timestamp  date
);

Generally speaking you probably won’t need the table using the UROWID data type but either table can be used to report chained rows in Oracle databases.

Once the CHAINED_ROWS table is created it’s a simple task to generate a script to analyze all of the tables in a given schema for chained rows:


select 'analyze table '||owner||'.'||table_name||' list chained rows;'
from dba_tables
where owner = upper('&1');

Spool the output from that query to a file then execute it from the SQL> prompt. Remember that by default the command inserts data into the CHAINED ROWS table. You can, if you’re feeling adventurous, create a similar table with a different name, and have ANALYZE TABLE insert the chained row data into your table by telling Oracle where you want the data to go. For example if you decide to create a table named BAD_JUJU with the same structure as the CHAINED_ROWS table you can do this:


select 'analyze table '||owner||'.'||table_name||' list chained rows into bad_juju;'
from dba_tables
where owner = upper('&1');

You can then query your BAD_JUJU table for chained row information.

If the ‘table fetch continued row’ statistic reports 0 then you have nothing to worry about and nothing to ‘fix':


SQL> select value from v$sysstat where name = 'table fetch continued row';

     VALUE
----------
         0

SQL>

In reality that isn’t likely to happen as some of the SYS-owned tables can, and do, contain chained rows. This is the result from a small, personal database I have on my laptop:


SQL> select value from v$sysstat where name = 'table fetch continued row';

     VALUE
----------
       778

SQL>

Let’s see where those chained rows are:


SQL> select owner_name, table_name, count(*) chain_ct
  2  from bad_juju
  3  group by owner_name, table_name;

OWNER_NAME                     TABLE_NAME                       CHAIN_CT
------------------------------ ------------------------------ ----------
SYS                            SMON_SCN_TIME                           1
SYS                            TABPART$                                1
SYS                            WRI$_DBU_FEATURE_USAGE                  5
APEX_030200                    WWV_FLOW_STEPS                         10
SYS                            IDL_CHAR$                             165
SYS                            METASTYLESHEET                        113
APEX_030200                    WWV_FLOW_TEMPLATES                     18
MDSYS                          SDO_DATUMS                              2
SYS                            COLLECTION$                             2
SYS                            IDL_UB1$                             6683
APEX_030200                    WWV_FLOW_STEP_PROCESSING               57
MDSYS                          SDO_ELLIPSOIDS                          8
SYS                            HIST_HEAD$                            265
SYS                            USER$                                   1
MDSYS                          SDO_COORD_OPS                          45
SYS                            IDL_SB4$                               31
SYS                            IND$                                    4
SYS                            METHOD$                                 3
SYS                            RESULT$                                 3
SYS                            TRIGGER$                                1
APEX_030200                    WWV_FLOW_PAGE_PLUG_TEMPLATES            1
SYS                            PARAMETER$                             13
SYS                            TAB$                                    2
SYS                            VIEW$                                  67
APEX_030200                    WWV_FLOW_PAGE_PLUGS                    36
MDSYS                          SDO_COORD_OP_PARAM_VALS               116
MDSYS                          SDO_COORD_OP_PATHS                     53
SYS                            ATTRIBUTE$                             29
SYS                            IDL_UB2$                              837
SYS                            JAVA$MC$                              124

30 rows selected.

SQL>

Notice that no user tables or vendor-supplied application schemas have chained rows, only Oracle-supplied tables, so do not be confused if you analyze the tables in a given schema and find there are no chained rows to address. Is it a good idea to change PCTFREE and PCTUSED on those Oracle-supplied tables? It’s your decision to make; if recursive sql calls to these tables are consuming large amounts of time it may be beneficial to change those values. I really suspect it’s not that big of a performance hit and you’d be better off addressing chained rows, should they exist, in any application schemas present in your database.

With Exadata chained rows do more than simply increase the I/O workload, they can shift it from the storage cells to the database server by reverting to regular block I/O rather than using a Smart Scan. Since Smart Scans can’t process chained rows that span disks and/or storage cells this eliminates any use of Storage Indexes, Predicate Filtering and Column Projection. There may be chained rows that can be processed by a Smart Scan (where the row is chained on the same disk accessed by the same storage cell) but those are very likely rare occurrences. Exadata provides three statistics regarding chained rows and their processing — ‘chained rows processed by cell’, ‘chained rows rejected by cell’ and ‘chained rows skipped by cell’. These can report the number of rows in each ‘category’ from the time the system was started. Each statistic, and what it reports, is listed below:


chained rows processed by cell

This statistic, chained rows processed by cell, reports on the chained rows that were processed within a storage cell, that is, where the row pieces (or at least the ones Oracle needed to
return results) are located within the storage accessed by a given storage cell.  This is known as inter-block chaining, where the row piece is located in the same data block as the head 
row.  This is a special case of row chaining for rows with more than 255 columns; the next piece is in the same block as the head and can be fetched with no additional effort.

chained rows rejected by cell

This counter records the chained rows where the next piece is not in the same block or cell, as described in the previous section.  This statistic is incremented in, apparently,
special cases as it is not incremented very often in the systems I have dealt with.  When this is incremented the Smart Scan fell back to regular block I/O to process the row.

chained rows skipped by cell

This is the statistic most often incremented when a Smart Scan needs to revert back to regular block I/O.  Again it's incremented when a chained row is found and the remaining pieces reside
across the entire storage stripe, spanning storage cells in the process.  It isn't clear when the previous counter or this counter should be incremented in such situations; I have found that
this counter is the one most often incremented when Smart Scans revert to regular block I/O to process chained rows.

The same methods for non-Exadata databases can be applied to those on Exadata; since ASM stripes the diskgroup across all available disks it may not be possible to correct every chained row that exists. That does not mean that the chained/migrated row situation should not be addressed, but remember that the storage volume on Exadata systems is often times far larger than on conventional databases using commodity hardware and that fact can make any process intended to reduce or eliminate chained rows take longer. If the performance hit from the chained rows is large enough any effort to reduce that I/O workload is worth the time and the trouble it may take to implement.

Chained and migrated rows can be a bother, and can introduce performance issues when dealing with large volumes of them. Sometimes it’s not possible to eliminate them because the row length exceeds even the largest available database block size. But there are ways to mitigate their creation for tables where the row length easily fits within the database block, either by altering how the block space is allocated using PCTUSED and PCTFREE, moving the table or tables to a tablespace having a larger block size or by moving the table, in place, to relocate the rows and reduce, or eliminate, the existing chaining. This isn’t a quick operation, and it certainly should not be done online especially in production databases, but the problem, in many cases, can be successfully addressed.

It’s a situation worth pondering.

May 19, 2014

When Life Gives You Lemons…

Filed under: General — dfitzjarrell @ 12:35

Some of the strangest ideas (to me, anyway) can be found in the depths of PL/SQL code, be it a package, a procedure, a function or an anonymous block. Granted, such decisions may be the result of budgetary considerations which prevented a more robust and reliable application design. I was looking at some code just recently that made me scratch my head in wonder and disbelief. Let’s look at a similar piece of PL/SQL that does the basically the same thing and see why it’s inefficient and can cause performance problems.

Our example begins with a seemingly innocuous piece of code:


...
      for u_cur in (select rowid rid, user_id from master_user_list
                     where usr_id_status = 'Available') loop
	--
        -- Check if someone has this record locked
	--

	-- This can fail to identify 'locked' records since it relies
	-- on succesful completion of a transaction
	--
	-- And some other session could be actively updating this record as
	-- your session is trying to use it
	--

        begin
          select rowid, user_id
            into v_row_id, p_user.user_id
            from master_user_list
           where usr_id_status = 'Available'
             and rowid = u_cur.rid
             for update of usr_id_status, usr_id_status_cd skip locked;
        exception

	--
	-- This should throw an error immediately, rather than
	-- waiting for more procedural/transactional code
	-- to execute
	--

          when no_data_found then null;
        end;

        -- If not then exit the loop 
        exit when p_user.user_id is not null;
      end loop;
      --
      -- Check to verify a user_id was found
      --

      -- Interesting, since we just found an available
      -- user_id in the previous block of code ...
      --

      if p_user.user_id is not null then
	--
        -- Update the master table to show this user_id is now taken
	--

	-- Since two sessions can find the same 'available' user_id and use it
	-- this isn't efficient nor is it reliable
	--

        begin
          update master_user_list set usr_id_status = 'Used', usr_id_status_cd = 'U', dt_id_iss = v_date where rowid = v_row_id;
        exception
          when others then
            rollback;
            p_mess_txt := sqlerrm;
            return(-93);
        end;

      else
        rollback;
        p_mess_txt := 'No available user_id.';
        return(-91);
      end if;
    else
      -- The user_id was specified, make sure it is reserved and available. Return error if it isn't.
      --

      -- Obviously the 'select ... for update of ... skip locked;' statement prevents two sessions from
      -- using the same user_id found in the first part of the code
      --   

      begin
        select rowid
          into v_row_id
          from master_user_list
         where usr_id_status = 'Reserved'
           and user_id = p_user.user_id
           for update of usr_id_status, usr_id_status_cd skip locked;
      exception

      --
      -- And this code should throw an error immediately, rather than waiting until later in the code
      -- to do so
      --

        when no_data_found then null;
      end;
      --
      -- Check to verify a user_id was found
      --

      -- If the exception handler actually threw an error rather than obscuring it this next section of code 
      -- probably wouldn't be needed as a separate block
      --
      -- It is surprising such code works since the v_got_one variable is declared yet never set before it's
      -- used to see if the user_id has been 'found'
      --

      if v_got_one is not null then

        -- Update the master table to show this user_id is now taken
        begin
          update master_user_list set usr_id_status = 'Used', usr_id_status_cd = 'U', dt_id_iss = v_date where rowid = v_row_id;
        exception
          when others then
            rollback;
            p_mess_txt := sqlerrm;
            return(-3);
        end;
      else
        rollback;
        p_mess_txt := 'The user id '||p_user.user_id||' is not available';
        return(-92);
      end if;
    end if;
...

This seems, to me, to be a lot of work to find a usable USER_ID, and it doesn’t even access the actual APPUSER table to check if the user_id is in use. Rewriting the initial select statement that ‘finds’ available user_id values could help by eliminating the redundant steps to return such user_ids. The real issue is that after all of that work to reveal the next “available” user_id the insert into the APPUSER table can fail:


    -- Create new user record
    begin
      insert into appuser(user_id,
                          alternate_id,
                          first_name,
                          last_name,
                          password_hash,
                          date_of_birth,
                          gender,
                          ethnicity,
                          created_by,
                          create_date,
                          upper_user_id
                         )
                  values (p_user.user_id,
                          p_user.alternate_id,
                          p_user.first_name,
                          p_user.last_name,
                          p_user.pwd,
                          p_user.birthdate,
                          p_user.gender,
                          p_user.ethnicity,
                          v_user,
                          v_date,
                          upper(p_user.user_id)
                         );
    exception
      when dup_val_on_index then
        rollback;
        p_mess_txt := 'Duplicate value in the APPUSER table';
        return (-111);

      when others then
        rollback;
        p_mess_txt := sqlerrm;
        return (-112);
    end;

This is a lot of code to execute just to end up with no new user created because the USER_ID it ‘found’ was also found and taken by another session while this session was working to take it. It truly is the ‘long way around’ to try to ensure data integrity; after all of that work Oracle can still prevent you from inserting a new user record because the primary key constraint simply won’t let you.

This illustrates the difficulty of writing transactional code that can scale and can allow multiple concurrent sessions to perform work. The logic shows thought; given that assigning a new user_id relies on a table containing ‘acceptable’ user_ids for the application makes it difficult to implement without going through all of the motions shown above. Yet, even after all of those checks it still can result in failure to assign a user_id because some other session got to it first.

Can this be improved? Possibly, but it’s still relying on the results of a select statement that could return yet-to-be-committed values that have been updated as used by one session as another session is making its selection, even though it’s referencing both the source and destination tables to find available values:


...
      for u_cur in (select mul.rowid rid, mul.user_id from master_user_list mul
                    left outer join appuser au on au.user_id = mul.user_id
                     where usr_id_status = 'Available' and au.user_id is null) loop
...

This is the beauty and bane of Oracle’s read consistency mechanism which relies on undo to ‘reconstruct’ the data image as of the time the query began. Uncommitted inserts and updates can ‘sneak in’ allowing two or more sessions to ‘take aim’ at the same value, until one session commits, at which time subsequent inserts fail. Updates are no safer, since sessions can queue up to update the same record in a table. When that happens the last update to that record is the update the world sees, and changes made prior to that transaction are ‘lost’ as values updated by other sessions are overwritten. True, you can use DBMS_LOGMNR to resurrect those transactions in older releases of Oracle and flashback technology in 10.2 and later releases allows you to see data as it was within the scope of the undo_retention. It isn’t an easy problem to solve.

Sometimes there isn’t a way around ‘rolling your own’ logic to process application data. Of course the fewer times you have to do that the better off you are since you can rely solely on primary key constraints to ensure the data integrity (which is good since the constraints aren’t transactional).

“Now you see me, now you don’t” — it can happen when seemingly simple tasks take on a level of complexity dictated by the application design. Using pre-defined user_id values, for instance, like the code shown above, make things more complicated than they need to be, in my opinion. Using generated values with a specific template and a sequence might make the process easier and more efficient, but won’t allow for user_ids that have been pre-allocated; there may be a very good reason to have such user_ids as those values may be assigned reserved, administrative privileges in the application. Some may insist that using a template and a sequence can’t restrict the total number of users who can access an application, but thats not true as a sequence can have a maximum value which would prevent any user accounts from being created once that maxiumum sequence value has been reached:


SQL> create sequence mynewseq
  2  start with 1 increment by 1 nocycle maxvalue 6 nocache;

Sequence created.

SQL>
SQL> select mynewseq.nextval from dual;

   NEXTVAL
----------
         1

SQL> select mynewseq.nextval from dual;

   NEXTVAL
----------
         2

SQL> select mynewseq.nextval from dual;

   NEXTVAL
----------
         3

SQL> select mynewseq.nextval from dual;

   NEXTVAL
----------
         4

SQL> select mynewseq.nextval from dual;

   NEXTVAL
----------
         5

SQL> select mynewseq.nextval from dual;

   NEXTVAL
----------
         6

SQL> select mynewseq.nextval from dual;
select mynewseq.nextval from dual
*
ERROR at line 1:
ORA-08004: sequence MYNEWSEQ.NEXTVAL exceeds MAXVALUE and cannot be instantiated


SQL>

Using an exception handler to trap and deal with the ORA-08004 error would allow the application to gracefully exit when the sequence has been exhausted.

It’s tough finding yourself between a rock and a hard place when writing application code. Hopefully the developer will consider all possible options and choose the least ‘undesirable’, even though it’s not the ideal solution. It’s better to see such limitations before the applicaton design is implemented and such flaws become painfully apparent, but that’s not always possible as developers may join a project so far in progress that a re-design is not possible. Unfortunately cost considerations can also interfere, causing design shortcuts that affect how the application is written. In the end it may be a ‘do what you can with what you have’ scenario and you must accept that the ‘solution’ isn’t ideal. Of course we can all hope that doesn’t happen, but this is the real world and sometimes restrictive budgets win over proper coding and design.

When life gives you lemons, don’t make lemonade, everybody else does that. Be creative, and make pie. Who knows, you just might improve the situation by showing your resourcefulness.

May 13, 2014

Ooops!

Filed under: General — dfitzjarrell @ 11:01

A recent IT-related survey had in its list of questions one rather interesting offering:

“Are there any safeguards preventing the DBA from accidentally dropping a table?”

That is probably not a commonly considered aspect of data security however it is an issue that should be voiced and addressed. How do you ensure that the DBA doesn’t, or can’t, drop a table accidentally? Oracle has at least two ways to ensure that a table cannot be accidentally dropped but there are some limitations to those methods. Let’s look at those methods to see which one works for the DBA account.

For a DBA who’s been in the ‘trenches’ for a while the first (and possibly most obvious) method is a schema-level trigger to catch the act and prevent it:


SQL> connect / as sysdba
Connected.
SQL> create or replace trigger drop_trigger
  2  before drop on bingnorfer.schema
  3  begin
  4      raise_application_error(
  5        num => -20999,
  6        msg => 'Objects cannot be dropped.');
  7  end;
  8  /

Trigger created.

SQL>

The trigger is in place, so let’s test it first as the schema owner, then as the DBA:


SQL> connect bingnorfer/#@#@#@#@#@#@
Connected.
SQL> drop table empdrop;
drop table empdrop
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-20999: Objects cannot be dropped.
ORA-06512: at line 2


SQL>

So far, so good, the table we wanted to drop can’t be dropped by the schema owner because of the schema-level trigger. Can the DBA drop the table?


SQL> connect dfitzj/^%^%^%^%^%^%^%^%^%
Connected.
SQL> drop table bingnorfer.empdrop;

Table dropped.

SQL>

That’s disconcerting, the trigger doesn’t stop a DBA from dropping the table. But, if you read the documentation on such triggers, you’d know they only apply to the schema owner, and no one else. Having such a trigger does prevent accidentally dropping an object from a given schema if the owner is the one attempting to drop it, it just doesn’t stop the DBA from doing so.

It’s possible to create a modified DBA role, absent the DROP ANY privileges, which does stop the DBA from accidentally dropping anything except what he or she owns. In 11.2.0.3 the DBA role currently has the following list of DROP privileges:


ROLE                           PRIVILEGE                                ADM
------------------------------ ---------------------------------------- ---
DBA                            DROP TABLESPACE                          YES
DBA                            DROP ROLLBACK SEGMENT                    YES
DBA                            DROP ANY INDEX                           YES
DBA                            DROP ANY INDEXTYPE                       YES
DBA                            DROP ANY EDITION                         YES
DBA                            DROP ANY MINING MODEL                    YES
DBA                            DROP ANY CUBE BUILD PROCESS              YES
DBA                            DROP ANY TRIGGER                         YES
DBA                            DROP ANY MATERIALIZED VIEW               YES
DBA                            DROP ANY MEASURE FOLDER                  YES
DBA                            DROP ANY VIEW                            YES
DBA                            DROP ANY LIBRARY                         YES
DBA                            DROP ANY DIMENSION                       YES
DBA                            DROP ANY EVALUATION CONTEXT              YES
DBA                            DROP ANY CUBE                            YES
DBA                            DROP ANY TABLE                           YES
DBA                            DROP PUBLIC DATABASE LINK                YES
DBA                            DROP ANY TYPE                            YES
DBA                            DROP ANY CONTEXT                         YES
DBA                            DROP ANY SYNONYM                         YES
DBA                            DROP PUBLIC SYNONYM                      YES
DBA                            DROP ANY DIRECTORY                       YES
DBA                            DROP ANY RULE SET                        YES
DBA                            DROP ANY CLUSTER                         YES
DBA                            DROP ANY SEQUENCE                        YES
DBA                            DROP ANY ROLE                            YES
DBA                            DROP ANY PROCEDURE                       YES
DBA                            DROP PROFILE                             YES
DBA                            DROP ANY OUTLINE                         YES
DBA                            DROP USER                                YES
DBA                            DROP ANY OPERATOR                        YES
DBA                            DROP ANY RULE                            YES
DBA                            DROP ANY SQL PROFILE                     YES
DBA                            DROP ANY ASSEMBLY                        YES
DBA                            DROP ANY CUBE DIMENSION                  YES

35 rows selected.

SQL>

Eliminating all of those DROP privileges by creating a modified DBA role could severely alter how a DBA would do his/her job, possibly causing him or her to connect using “/ as sysdba” to bypass the restrictions, which wouldn’t be a good thing. Eliminating the DROP ANY privileges might be a better option. Creating the new, restricted DBA role:


SQL> create role dba_mod;

Role created.

SQL> grant CREATE SESSION to dba_mod;

Grant succeeded.

SQL> grant ALTER SESSION to dba_mod;

Grant succeeded.

SQL> grant DROP TABLESPACE to dba_mod;

Grant succeeded.

SQL> grant BECOME USER to dba_mod;

Grant succeeded.

SQL> grant DROP ROLLBACK SEGMENT to dba_mod;

Grant succeeded.

SQL> grant SELECT ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant INSERT ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant UPDATE ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant SELECT ANY SEQUENCE to dba_mod;

Grant succeeded.

SQL> grant CREATE ROLE to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY PROCEDURE to dba_mod;

Grant succeeded.

SQL> grant ALTER PROFILE to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY DIRECTORY to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY LIBRARY to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY LIBRARY to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY INDEXTYPE to dba_mod;

Grant succeeded.

SQL> grant DEQUEUE ANY QUEUE to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY EVALUATION CONTEXT to dba_mod;

Grant succeeded.

SQL> grant EXPORT FULL DATABASE to dba_mod;

Grant succeeded.

SQL> grant CREATE RULE to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY SQL PROFILE to dba_mod;

Grant succeeded.

SQL> grant ADMINISTER ANY SQL TUNING SET to dba_mod;

Grant succeeded.

SQL> grant CHANGE NOTIFICATION to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY MINING MODEL to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY CUBE DIMENSION to dba_mod;

Grant succeeded.

SQL> grant CREATE CUBE to dba_mod;

Grant succeeded.

SQL> grant ALTER ROLLBACK SEGMENT to dba_mod;

Grant succeeded.

SQL> grant DELETE ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant ALTER DATABASE to dba_mod;

Grant succeeded.

SQL> grant FORCE ANY TRANSACTION to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY PROCEDURE to dba_mod;

Grant succeeded.

SQL> grant UNDER ANY TYPE to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY LIBRARY to dba_mod;

Grant succeeded.

SQL> grant CREATE DIMENSION to dba_mod;

Grant succeeded.

SQL> grant DEBUG ANY PROCEDURE to dba_mod;

Grant succeeded.

SQL> grant CREATE RULE SET to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY RULE SET to dba_mod;

Grant succeeded.

SQL> grant ANALYZE ANY DICTIONARY to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY EDITION to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY ASSEMBLY to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY CUBE to dba_mod;

Grant succeeded.

SQL> grant SELECT ANY CUBE to dba_mod;

Grant succeeded.

SQL> grant RESTRICTED SESSION to dba_mod;

Grant succeeded.

SQL> grant CREATE TABLESPACE to dba_mod;

Grant succeeded.

SQL> grant ALTER TABLESPACE to dba_mod;

Grant succeeded.

SQL> grant CREATE USER to dba_mod;

Grant succeeded.

SQL> grant ALTER USER to dba_mod;

Grant succeeded.

SQL> grant LOCK ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant CREATE VIEW to dba_mod;

Grant succeeded.

SQL> grant GRANT ANY ROLE to dba_mod;

Grant succeeded.

SQL> grant CREATE TRIGGER to dba_mod;

Grant succeeded.

SQL> grant CREATE TYPE to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY OPERATOR to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY DIMENSION to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY DIMENSION to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY OUTLINE to dba_mod;

Grant succeeded.

SQL> grant ADMINISTER DATABASE TRIGGER to dba_mod;

Grant succeeded.

SQL> grant RESUMABLE to dba_mod;

Grant succeeded.

SQL> grant FLASHBACK ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY RULE SET to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY RULE SET to dba_mod;

Grant succeeded.

SQL> grant IMPORT FULL DATABASE to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY RULE to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY PROGRAM to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY EDITION to dba_mod;

Grant succeeded.

SQL> grant CREATE ASSEMBLY to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY ASSEMBLY to dba_mod;

Grant succeeded.

SQL> grant CREATE CUBE DIMENSION to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY CUBE BUILD PROCESS to dba_mod;

Grant succeeded.

SQL> grant UPDATE ANY CUBE DIMENSION to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY INDEX to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY SEQUENCE to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY ROLE to dba_mod;

Grant succeeded.

SQL> grant ANALYZE ANY to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY OPERATOR to dba_mod;

Grant succeeded.

SQL> grant CREATE INDEXTYPE to dba_mod;

Grant succeeded.

SQL> grant UNDER ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant SELECT ANY DICTIONARY to dba_mod;

Grant succeeded.

SQL> grant GRANT ANY OBJECT PRIVILEGE to dba_mod;

Grant succeeded.

SQL> grant CREATE EVALUATION CONTEXT to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY EVALUATION CONTEXT to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY RULE to dba_mod;

Grant succeeded.

SQL> grant CREATE JOB to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY JOB to dba_mod;

Grant succeeded.

SQL> grant CREATE MINING MODEL to dba_mod;

Grant succeeded.

SQL> grant INSERT ANY CUBE DIMENSION to dba_mod;

Grant succeeded.

SQL> grant UPDATE ANY CUBE BUILD PROCESS to dba_mod;

Grant succeeded.

SQL> grant ALTER SYSTEM to dba_mod;

Grant succeeded.

SQL> grant AUDIT SYSTEM to dba_mod;

Grant succeeded.

SQL> grant UNLIMITED TABLESPACE to dba_mod;

Grant succeeded.

SQL> grant CREATE ROLLBACK SEGMENT to dba_mod;

Grant succeeded.

SQL> grant COMMENT ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant CREATE CLUSTER to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY INDEX to dba_mod;

Grant succeeded.

SQL> grant DROP PUBLIC DATABASE LINK to dba_mod;

Grant succeeded.

SQL> grant CREATE PROFILE to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY MATERIALIZED VIEW to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY TYPE to dba_mod;

Grant succeeded.

SQL> grant UNDER ANY VIEW to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY INDEXTYPE to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY OUTLINE to dba_mod;

Grant succeeded.

SQL> grant ADMINISTER RESOURCE MANAGER to dba_mod;

Grant succeeded.

SQL> grant MANAGE SCHEDULER to dba_mod;

Grant succeeded.

SQL> grant MANAGE FILE GROUP to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY MINING MODEL to dba_mod;

Grant succeeded.

SQL> grant SELECT ANY MINING MODEL to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY MEASURE FOLDER to dba_mod;

Grant succeeded.

SQL> grant DELETE ANY MEASURE FOLDER to dba_mod;

Grant succeeded.

SQL> grant CREATE TABLE to dba_mod;

Grant succeeded.

SQL> grant BACKUP ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY CLUSTER to dba_mod;

Grant succeeded.

SQL> grant DROP PUBLIC SYNONYM to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY VIEW to dba_mod;

Grant succeeded.

SQL> grant CREATE SEQUENCE to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY SEQUENCE to dba_mod;

Grant succeeded.

SQL> grant FORCE TRANSACTION to dba_mod;

Grant succeeded.

SQL> grant CREATE PROCEDURE to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY PROCEDURE to dba_mod;

Grant succeeded.

SQL> grant ALTER RESOURCE COST to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY TYPE to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY OPERATOR to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY INDEXTYPE to dba_mod;

Grant succeeded.

SQL> grant ENQUEUE ANY QUEUE to dba_mod;

Grant succeeded.

SQL> grant ON COMMIT REFRESH to dba_mod;

Grant succeeded.

SQL> grant DEBUG CONNECT SESSION to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY CLASS to dba_mod;

Grant succeeded.

SQL> grant MANAGE ANY FILE GROUP to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY ASSEMBLY to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ASSEMBLY to dba_mod;

Grant succeeded.

SQL> grant COMMENT ANY MINING MODEL to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY CUBE DIMENSION to dba_mod;

Grant succeeded.

SQL> grant DELETE ANY CUBE DIMENSION to dba_mod;

Grant succeeded.

SQL> grant SELECT ANY CUBE DIMENSION to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY TABLE to dba_mod;

Grant succeeded.

SQL> grant CREATE SYNONYM to dba_mod;

Grant succeeded.

SQL> grant CREATE PUBLIC SYNONYM to dba_mod;

Grant succeeded.

SQL> grant AUDIT ANY to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY TRIGGER to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY TRIGGER to dba_mod;

Grant succeeded.

SQL> grant DROP PROFILE to dba_mod;

Grant succeeded.

SQL> grant GRANT ANY PRIVILEGE to dba_mod;

Grant succeeded.

SQL> grant CREATE LIBRARY to dba_mod;

Grant succeeded.

SQL> grant CREATE OPERATOR to dba_mod;

Grant succeeded.

SQL> grant MERGE ANY VIEW to dba_mod;

Grant succeeded.

SQL> grant ADMINISTER SQL TUNING SET to dba_mod;

Grant succeeded.

SQL> grant UPDATE ANY CUBE to dba_mod;

Grant succeeded.

SQL> grant INSERT ANY MEASURE FOLDER to dba_mod;

Grant succeeded.

SQL> grant ADMINISTER SQL MANAGEMENT OBJECT to dba_mod;

Grant succeeded.

SQL> grant MANAGE TABLESPACE to dba_mod;

Grant succeeded.

SQL> grant DROP USER to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY CLUSTER to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY SYNONYM to dba_mod;

Grant succeeded.

SQL> grant CREATE DATABASE LINK to dba_mod;

Grant succeeded.

SQL> grant CREATE PUBLIC DATABASE LINK to dba_mod;

Grant succeeded.

SQL> grant CREATE MATERIALIZED VIEW to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY MATERIALIZED VIEW to dba_mod;

Grant succeeded.

SQL> grant EXECUTE ANY TYPE to dba_mod;

Grant succeeded.

SQL> grant QUERY REWRITE to dba_mod;

Grant succeeded.

SQL> grant GLOBAL QUERY REWRITE to dba_mod;

Grant succeeded.

SQL> grant MANAGE ANY QUEUE to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY CONTEXT to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY EVALUATION CONTEXT to dba_mod;

Grant succeeded.

SQL> grant ALTER ANY RULE to dba_mod;

Grant succeeded.

SQL> grant ADVISOR to dba_mod;

Grant succeeded.

SQL> grant SELECT ANY TRANSACTION to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY SQL PROFILE to dba_mod;

Grant succeeded.

SQL> grant READ ANY FILE GROUP to dba_mod;

Grant succeeded.

SQL> grant CREATE EXTERNAL JOB to dba_mod;

Grant succeeded.

SQL> grant CREATE ANY CUBE to dba_mod;

Grant succeeded.

SQL> grant CREATE MEASURE FOLDER to dba_mod;

Grant succeeded.

SQL> grant CREATE CUBE BUILD PROCESS to dba_mod;

Grant succeeded.

SQL> grant FLASHBACK ARCHIVE ADMINISTER to dba_mod;

Grant succeeded.

SQL>

Modifying our DBAs privileges:


SQL> revoke dba from dfitzj;

Revoke succeeded.

SQL> grant dba_mod to dfitzj;

Grant succeeded.

SQL>

It’s now time to again test if our DBA can drop the bingnorfer.empdrop table:


SQL> connect dfitzj/^%^%^%^%^%^%^%^%^%^%^
Connected.
SQL> drop table bingnorfer.empdrop;
drop table bingnorfer.empdrop
                      *
ERROR at line 1:
ORA-01031: insufficient privileges


SQL>

It appears we have succeeded in preventing the table from being dropped; removing the DROP ANY privileges from the DBA_MOD role keeps objects that aren’t owned by the specified DBA account from being dropped from the database, including triggers, indexes, packages, procedures, etc. It is a limited victory, however, as a DBA with access to the ‘oracle’ user account on Linux/UNIX or an account granted the ORA_DBA role in a Windows environment can bypass these restrictions:


SQL> connect / as sysdba
Connected.
SQL> drop table bingnorfer.empdrop;

Table dropped.

SQL>

It seems the situation becomes one of ensuring that an accidentally dropped table can be recovered/restored. If the recyclebin hasn’t been disabled for ‘performance’ reasons (an ambiguous ‘reason’ since the recyclebin doesn’t usually produce any performance issues) it’s a simple task to flashback the missing table to before it was dropped:


SQL> flashback table bingnorfer.empdrop to before drop;

Flashback complete.

SQL>

The table has been recovered/restored to its state just before the drop. A DBA exercise afterwards is to rename any indexes present on the table as they still have the recyclebin names:


SQL> select index_name
  2  from user_indexes
  3  where table_name = 'EMPDROP'
  4  /

INDEX_NAME
------------------------------
EMPDROP_IDX2
EMPDROP_IDX1

SQL> connect / as sysdba
Connected.
SQL> drop table bingnorfer.empdrop;

Table dropped.

SQL> flashback table bingnorfer.empdrop to before drop;

Flashback complete.

SQL> connect bingnorfer/#@#@#@#@#@#@#@#@#@#@
Connected.
SQL> select * from user_ind_columns
  2  where table_name = 'EMPDROP'
  3  /

INDEX_NAME                     TABLE_NAME
------------------------------ ------------------------------
COLUMN_NAME
--------------------------------------------------------------------------------
COLUMN_POSITION COLUMN_LENGTH CHAR_LENGTH DESC
--------------- ------------- ----------- ----
BIN$843BovlIRAGGlUqlWe5q/Q==$0 EMPDROP
EMPNO
              1            22           0 ASC

BIN$B62aeRx+RlerON0vDt1+dA==$0 EMPDROP
DEPTNO
              1            22           0 ASC


SQL>

We know which recyclebin name goes with the original index name so the indexes can be renamed:


SQL> alter index "BIN$843BovlIRAGGlUqlWe5q/Q==$0" rename to empdrop_idx1;

Index altered.

SQL> alter index "BIN$B62aeRx+RlerON0vDt1+dA==$0" rename to empdrop_idx2;

Index altered.

SQL> select * from user_ind_columns
  2  where table_name = 'EMPDROP'
  3  /

INDEX_NAME                     TABLE_NAME
------------------------------ ------------------------------
COLUMN_NAME
--------------------------------------------------------------------------------
COLUMN_POSITION COLUMN_LENGTH CHAR_LENGTH DESC
--------------- ------------- ----------- ----
EMPDROP_IDX1                   EMPDROP
EMPNO
              1            22           0 ASC

EMPDROP_IDX2                   EMPDROP
DEPTNO
              1            22           0 ASC


SQL>

This table was small (less than 20 rows) so the flashback and rename were executed fairly quickly. What if the table has more data:


SQL> select count(*) from empdrop;

  COUNT(*)
----------
   9437184

SQL> SQL> select * from user_ind_columns
  2  where table_name = 'EMPDROP'
  3  /

INDEX_NAME                     TABLE_NAME
------------------------------ ------------------------------
COLUMN_NAME
--------------------------------------------------------------------------------
COLUMN_POSITION COLUMN_LENGTH CHAR_LENGTH DESC
--------------- ------------- ----------- ----
BIN$OfyFePmsTPyt5qYQzxXlpw==$0 EMPDROP
EMPNO
              1            22           0 ASC

BIN$P768D2zRQ2mtGkwBkr42SA==$0 EMPDROP
DEPTNO
              1            22           0 ASC


SQL> set timing on
SQL> alter index "BIN$OfyFePmsTPyt5qYQzxXlpw==$0" rename to empdrop_idx1;

Index altered.

Elapsed: 00:00:00.00
SQL> alter index "BIN$P768D2zRQ2mtGkwBkr42SA==$0" rename to empdrop_idx2;

Index altered.

Elapsed: 00:00:00.00
SQL> 

The rename really took no time at all; such an act does lock the table so if the table is being used (the table was restored using flashback table and sessions immediately started executing DML against it) you may need to wait for a ‘quiet time’ in DEV/TEST/QA to run the rename. Since an outage would be required for PROD you should have time to recover the table and rename the indexes before anyone starts using it.

What if the table were dropped with the PURGE option? At that point flashback table wouldn’t work as there would be no recyclebin object to restore. You would then need an RMAN backup of the table to recover it to its last known state. Yes, an export of the table would restore the structure and indexes but wouldn’t restore any of the data inserted or updated since the export was taken so that isn’t really a good option to rely upon. In Oracle 12c you can use RMAN to recover a single table; RMAN goes through the process of creating an auxiliary database, restores the tablespaces necessary to recover the table or tables and associated indexes, creates a data pump export then imports the desired tables into the target database. The following RMAN command accomplishes this, again in Oracle 12c:


recover table bingnorfer.empdrop
until time 'SYSDATE'
auxiliary destination '/tmp/oracle12c/recover'
datapump destination '/tmp/recover12c/dumpfiles'
dump file 'empdrop_exp_dump.dmp';

In 11gR2 and earlier releases it will be necessary to create an empty auxiliary database, containing the SYSTEM, SYSAUX and UNDO tablespaces (just the SYSTEM and ROLLBACK tablespaces for 9i and earlier releases) as well as the tablespace where the dropped table resides. If there are indexes on the table the associated index tablespace must also be created. Perform an incomplete recovery of the database, using a recovered backup controlfile, to the point in time to just before the drop. Export the table from the auxiliary database then import it into the target database. Such a process is covered in the online Oracle documentation found at http://tahiti.oracle.com.

It isn’t an easy task to prevent a DBA from accidentally dropping a table, especially if he or she is connected using “/ as sysdba”, a connection that should really only be used to create, drop, startup or shutdown a database. It can be done by implementing a modified DBA role, but those DBAs with database server access as ‘oracle’ can easily and quickly bypass those restrictions. Managing database server access as ‘oracle’ is a topic for another article, but it is necessary to be aware such connections can be made to a database and tables can be carelessly or inadvertently dropped. It is also necessary to know how to restore dropped tables to a database using both flashack table and the more involved RMAN procedures.

“Ooops, I dropped an important table.” That shouldn’t be a phrase you hear but in the rush to get things done sometimes the unthinkable happens. If a table does get dropped from PROD you have ways to restore and recover that table to put the database, and the associated application, back in working order. Hopefully you won’t need them.

April 1, 2014

Let’s All Join In

Filed under: General — dfitzjarrell @ 10:33

While investigating a question on returning unmatched rows between two tables I set up an example where both tables had indexes which could be used to speed up the query and hopefully return the result set in less time than required for a full table scan. To my surprise when the index was used on the table with the missing record the query returned no rows. Not understanding this behavior (as I KNEW there was at least one row in the data range that wasn’t in the smaller table) I checked the execution plans for queries that worked (meaning they returned the correct ‘missing’ row) and the ones that didn’t. We’ll go through my investigation and reveal the reason for this unexpected result.

First we’ll provide the problem we need to solve, namely how to return unmatched rows from two tables. Setting up the example is simple and straightforward — create two tables and load them with data, ensuring, in this example, that table B is missing rows that are present in table A:


SQL>
SQL> create table a (
  2  	     id number,
  3  	     dt      date
  4  );

Table created.

SQL>
SQL> create table b (
  2  	     id number,
  3  	     dt      date
  4  );

Table created.

SQL>
SQL> begin
  2  	     for i in 1..1000000 loop
  3  		     insert into a
  4  		     values(i, trunc(sysdate)+i);
  5  		     if mod(i,9317)  0 then
  6  			     insert into b
  7  			     values(i, trunc(sysdate)+i);
  8  		     end if;
  9  	     end loop;
 10
 11  	     commit;
 12
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>

Compute statistics on the schema to prevent dynamic sampling:


SQL>
SQL> exec dbms_stats.gather_schema_stats('GRIBNAUT')

PL/SQL procedure successfully completed.

SQL>

Create the indexes we want to make the select statements more efficient:


SQL>
SQL> create index a_id_idx on a(dt);

Index created.

SQL> create index b_id_idx on b(dt);

Index created.

SQL>

Now the fun begins. Queries are executed that use both indexes to speed up processing and, as a result, return the ‘wrong’ result set:


SQL>
SQL> set autotrace on timing on
SQL>
SQL> select a.id
  2  from a left join b
  3  on b.id = a.id
  4  where a.dt between trunc(sysdate)+1000 and trunc(sysdate)+10000
  5  and b.dt between trunc(sysdate)+1000 and trunc(sysdate)+10000
  6  and b.id is null;

no rows selected

Elapsed: 00:00:02.22

Execution Plan
----------------------------------------------------------
Plan hash value: 991751610

-------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |     1 |    26 |    97   (0)| 00:00:01 |
|*  1 |  FILTER                        |          |       |       |            |          |
|   2 |   NESTED LOOPS                 |          |       |       |            |          |
|   3 |    NESTED LOOPS                |          |     1 |    26 |    97   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| B        |     1 |    13 |    49   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | B_ID_IDX |  9001 |       |    26   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN           | A_ID_IDX |  9002 |       |    25   (0)| 00:00:01 |
|*  7 |    TABLE ACCESS BY INDEX ROWID | A        |     1 |    13 |    48   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TRUNC(SYSDATE@!)+1000=TRUNC(SYSDATE@!)+1000 AND "B"."DT"=TRUNC(SYSDATE@!)+1000 AND "A"."DT"<=TRUNC(SYSDATE@!)+10000)
   7 - filter("B"."ID"="A"."ID")

Note
-----
   - SQL plan baseline "SQL_PLAN_827rna5a37fsc89d874e3" used for this statement


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         49  consistent gets
         28  physical reads
          0  redo size
        330  bytes sent via SQL*Net to client
        509  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL>
SQL> select /*+ leading(b a) */ a.id
  2  from a left join b
  3  on b.id = a.id
  4  where a.dt between trunc(sysdate)+1000 and trunc(sysdate)+10000
  5  and b.dt between trunc(sysdate)+1000 and trunc(sysdate)+10000
  6  and b.id is null;

no rows selected

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 991751610

-------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |     1 |    26 |    97   (0)| 00:00:01 |
|*  1 |  FILTER                        |          |       |       |            |          |
|   2 |   NESTED LOOPS                 |          |       |       |            |          |
|   3 |    NESTED LOOPS                |          |     1 |    26 |    97   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| B        |     1 |    13 |    49   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | B_ID_IDX |  9001 |       |    26   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN           | A_ID_IDX |  9002 |       |    25   (0)| 00:00:01 |
|*  7 |    TABLE ACCESS BY INDEX ROWID | A        |     1 |    13 |    48   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TRUNC(SYSDATE@!)+1000=TRUNC(SYSDATE@!)+1000 AND "B"."DT"=TRUNC(SYSDATE@!)+1000 AND "A"."DT"<=TRUNC(SYSDATE@!)+10000)
   7 - filter("B"."ID"="A"."ID")

Note
-----
   - SQL plan baseline "SQL_PLAN_6p83dmsh1ur9589d874e3" used for this statement


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         49  consistent gets
          0  physical reads
          0  redo size
        330  bytes sent via SQL*Net to client
        509  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL>
SQL> select /*+ leading(b a) use_nl(b a) */ a.id
  2  from a left join b
  3  on b.id = a.id
  4  where a.dt between trunc(sysdate)+1000 and trunc(sysdate)+10000
  5  and b.dt between trunc(sysdate)+1000 and trunc(sysdate)+10000
  6  and b.id is null;

no rows selected

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 991751610

-------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |     1 |    26 |    97   (0)| 00:00:01 |
|*  1 |  FILTER                        |          |       |       |            |          |
|   2 |   NESTED LOOPS                 |          |       |       |            |          |
|   3 |    NESTED LOOPS                |          |     1 |    26 |    97   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| B        |     1 |    13 |    49   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | B_ID_IDX |  9001 |       |    26   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN           | A_ID_IDX |  9002 |       |    25   (0)| 00:00:01 |
|*  7 |    TABLE ACCESS BY INDEX ROWID | A        |     1 |    13 |    48   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TRUNC(SYSDATE@!)+1000=TRUNC(SYSDATE@!)+1000 AND "B"."DT"=TRUNC(SYSDATE@!)+1000 AND "A"."DT"<=TRUNC(SYSDATE@!)+10000)
   7 - filter("B"."ID"="A"."ID")

Note
-----
   - SQL plan baseline "SQL_PLAN_7up4d2184h90h89d874e3" used for this statement


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         49  consistent gets
          0  physical reads
          0  redo size
        330  bytes sent via SQL*Net to client
        509  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

SQL>

Notice the order of the tables in the join; this is due to the Optimizer converting the outer join to an inner join:


  Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$64C6BB79")
      OUTER_JOIN_TO_INNER(@"SEL$9E43CB6E" "B"@"SEL$1")
      OUTLINE(@"SEL$9E43CB6E")
      MERGE(@"SEL$58A6D7F6")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$58A6D7F6")
      MERGE(@"SEL$1")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$1")
      INDEX_RS_ASC(@"SEL$64C6BB79" "B"@"SEL$1" ("B"."DT"))
      INDEX(@"SEL$64C6BB79" "A"@"SEL$2" ("A"."DT"))
      LEADING(@"SEL$64C6BB79" "B"@"SEL$1" "A"@"SEL$2")
      USE_NL(@"SEL$64C6BB79" "A"@"SEL$2")
      NLJ_BATCHING(@"SEL$64C6BB79" "A"@"SEL$2")
    END_OUTLINE_DATA
  */

These examples generated several comments which caused me to examine what I did and what I may have done incorrectly. Re-writing the first ‘non-working’ query by moving the index access for table B to the join condition caused Oracle to produce the correct result set:


SQL> select a.id
  2  from a left join b
  3  on b.id = a.id
  4  and b.dt between trunc(sysdate)+1000 and trunc(sysdate)+10000
  5  where a.dt between trunc(sysdate)+1000 and trunc(sysdate)+10000
  6  and b.id is null;

        ID
----------
      9317

Elapsed: 00:00:00.70

Execution Plan
----------------------------------------------------------
Plan hash value: 4150175149

-------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |    90 |  2340 |    99   (2)| 00:00:01 |
|*  1 |  FILTER                        |          |       |       |            |          |
|*  2 |   FILTER                       |          |       |       |            |          |
|*  3 |    HASH JOIN OUTER             |          |    90 |  2340 |    99   (2)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| A        |  9002 |   114K|    49   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | A_ID_IDX |  9002 |       |    26   (0)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID| B        |  9001 |   114K|    49   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN          | B_ID_IDX |  9001 |       |    26   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TRUNC(SYSDATE@!)+1000=TRUNC(SYSDATE@!)+1000 AND "A"."DT"=TRUNC(SYSDATE@!)+1000 AND
              "B"."DT"(+)

Obviously, after seeing this example, putting the date restriction for table B in the WHERE clause was incorrect thinking on my part.

Now look at a series of ‘working’ queries:


SQL>
SQL> select a.id
  2  from a, b
  3  where a.dt between trunc(sysdate)+1000 and trunc(sysdate)+10000
  4  and b.id (+) = a.id
  5  and b.id is null;

        ID
----------
      9317

Elapsed: 00:00:00.14

Execution Plan
----------------------------------------------------------
Plan hash value: 764351325

------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |    90 |  1620 |   740   (2)| 00:00:01 |
|*  1 |  FILTER                       |          |       |       |            |          |
|*  2 |   HASH JOIN ANTI              |          |    90 |  1620 |   740   (2)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| A        |  9002 |   114K|    49   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | A_ID_IDX |  9002 |       |    26   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL          | B        |   999K|  4882K|   687   (1)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TRUNC(SYSDATE@!)+1000=TRUNC(SYSDATE@!)+1000 AND
              "A"."DT"<=TRUNC(SYSDATE@!)+10000)

Note
-----
   - SQL plan baseline "SQL_PLAN_a6prrdg4ukbttd488787e" used for this statement


Statistics
----------------------------------------------------------
          1  recursive calls
          1  db block gets
       2606  consistent gets
          0  physical reads
          0  redo size
        521  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> select id
  2  from
  3  (select a.id, a.dt
  4   from a left join b
  5   on b.id = a.id
  6   where b.id is null)
  7  where dt between trunc(sysdate)+1000 and trunc(sysdate)+10000;

        ID
----------
      9317

Elapsed: 00:00:00.14

Execution Plan
----------------------------------------------------------
Plan hash value: 764351325

------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |    90 |  1620 |   740   (2)| 00:00:01 |
|*  1 |  FILTER                       |          |       |       |            |          |
|*  2 |   HASH JOIN ANTI              |          |    90 |  1620 |   740   (2)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| A        |  9002 |   114K|    49   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | A_ID_IDX |  9002 |       |    26   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL          | B        |   999K|  4882K|   687   (1)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TRUNC(SYSDATE@!)+1000=TRUNC(SYSDATE@!)+1000 AND
              "A"."DT"<=TRUNC(SYSDATE@!)+10000)

Note
-----
   - SQL plan baseline "SQL_PLAN_42g6vdgv7hfm3d488787e" used for this statement


Statistics
----------------------------------------------------------
          1  recursive calls
          1  db block gets
       2606  consistent gets
          0  physical reads
          0  redo size
        521  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> select id
  2  from a
  3  where id not in (select id from b where dt between trunc(sysdate)+1000 and trunc(sysdate)+10000)
  4  and a.dt between trunc(sysdate)+1000 and trunc(sysdate)+10000;

        ID
----------
      9317

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 2827420310

------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |    90 |  2340 |    99   (2)| 00:00:01 |
|*  1 |  FILTER                       |          |       |       |            |          |
|*  2 |   HASH JOIN ANTI NA           |          |    90 |  2340 |    99   (2)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| A        |  9002 |   114K|    49   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | A_ID_IDX |  9002 |       |    26   (0)| 00:00:01 |
|   5 |    TABLE ACCESS BY INDEX ROWID| B        |  9001 |   114K|    49   (0)| 00:00:01 |
|*  6 |     INDEX RANGE SCAN          | B_ID_IDX |  9001 |       |    26   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TRUNC(SYSDATE@!)+1000=TRUNC(SYSDATE@!)+1000 AND
              "A"."DT"=TRUNC(SYSDATE@!)+1000 AND "DT"<=TRUNC(SYSDATE@!)+10000)

Note
-----
   - SQL plan baseline "SQL_PLAN_792mbp1nk9zf69c0ae570" used for this statement


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         98  consistent gets
          0  physical reads
          0  redo size
        521  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

Another comment stated the first working query was not the same as the first ‘nonworking’ query; they are not supposed to be. My goal was to determine why one set of conditions using the index on table B produced no rows and another set of conditions, in general absent that same restriction, produced the correct results. As stated in earlier in this article my thought process was a bit flawed as my presumptions with respect to including a date restriction on table B in the WHERE clause may not have been correct.

In the above examples the outer join is preserved, resulting in the ‘correct’ answer. All of this because of an index. Notice, too, that the only table using an index is table A. To be fair and honest it is possible to use an index on table B and get the proper result, as the examples below illustrate:


SQL> select a_id
  2  from
  3  (select id a_id
  4  from a
  5  where dt between trunc(sysdate)+1000 and trunc(sysdate)+10000) a,
  6  (select id b_id
  7  from b
  8  where dt between trunc(sysdate)+1000 and trunc(sysdate)+10000) b
  9  where b_id (+) = a_id
 10  and b_id is null;

      A_ID
----------
      9317

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 4150175149

-------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |    90 |  2340 |    99   (2)| 00:00:01 |
|*  1 |  FILTER                        |          |       |       |            |          |
|*  2 |   FILTER                       |          |       |       |            |          |
|*  3 |    HASH JOIN OUTER             |          |    90 |  2340 |    99   (2)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| A        |  9002 |   114K|    49   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | A_ID_IDX |  9002 |       |    26   (0)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID| B        |  9001 |   114K|    49   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN          | B_ID_IDX |  9001 |       |    26   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TRUNC(SYSDATE@!)+1000=TRUNC(SYSDATE@!)+1000 AND "DT"=TRUNC(SYSDATE@!)+1000 AND "DT"(+)<=TRUNC(SYSDATE@!)+10000)

Note
-----
   - SQL plan baseline "SQL_PLAN_0awgfy98n1g1rdca7cba2" used for this statement


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         98  consistent gets
          0  physical reads
          0  redo size
        523  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> select id
  2  from a
  3  where dt between trunc(sysdate)+1000 and trunc(sysdate)+10000
  4  minus
  5  select id
  6  from b
  7  where dt between trunc(sysdate)+1000 and trunc(sysdate)+10000;

        ID
----------
      9317

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 3815546394

-------------------------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |  9002 |   228K|   101  (52)| 00:00:01 |
|   1 |  MINUS                         |          |       |       |            |          |
|   2 |   SORT UNIQUE                  |          |  9002 |   114K|    50   (2)| 00:00:01 |
|*  3 |    FILTER                      |          |       |       |            |          |
|   4 |     TABLE ACCESS BY INDEX ROWID| A        |  9002 |   114K|    49   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | A_ID_IDX |  9002 |       |    26   (0)| 00:00:01 |
|   6 |   SORT UNIQUE                  |          |  9001 |   114K|    50   (2)| 00:00:01 |
|*  7 |    FILTER                      |          |       |       |            |          |
|   8 |     TABLE ACCESS BY INDEX ROWID| B        |  9001 |   114K|    49   (0)| 00:00:01 |
|*  9 |      INDEX RANGE SCAN          | B_ID_IDX |  9001 |       |    26   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(TRUNC(SYSDATE@!)+1000=TRUNC(SYSDATE@!)+1000 AND "DT"<=TRUNC(SYSDATE@!)+10000)
   7 - filter(TRUNC(SYSDATE@!)+1000=TRUNC(SYSDATE@!)+1000 AND "DT"<=TRUNC(SYSDATE@!)+10000)

Note
-----
   - SQL plan baseline "SQL_PLAN_6h13amvqrnwdx3cdf639d" used for this statement


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         98  consistent gets
          0  physical reads
          0  redo size
        521  bytes sent via SQL*Net to client
        520  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

In the first example both tables are used to create in-line views of the desired data range; those individual result sets are then joined to return the ‘correct’ result. In the second example the two select statements use the MINUS operator to again return the ‘correct’ answer.

Please refer back to the beginning of this article to see yet another query that uses the index on table B and produces the correct results

Is this a bug in Oracle 11.2? I suspected that it was; it certainly seemed strange that the outer join was converted to an inner join when an index on the ‘smaller’ table, the one with the missing data, was used. It turns out that a misstep in my logic caused at least one of the empty result sets. Correcting my logic and query produced the correct answer.

With a product as complex as Oracle some bugs are bound to be present; of course sometimes the ‘bug’ is in the logic behind an apparently misbehaving query which does teach you to pay careful attention to the results a query delivers. Even though the queries which produced the incorrect results are syntatically correct you can’t be certain that Oracle won’t do something ‘behind the scenes’ that can produce the wrong answer and reveal an error in your logic. My gratitude goes out to those who paid enough attention to make comments on this article and point out places where I went wrong so I could update it with corrected information.

Now, let’s join the party.

March 7, 2014

Hide And Seek (After A Fashion)

Filed under: General — dfitzjarrell @ 12:12

Indexes can be very useful tools to improve query performance, but it can be difficult to test indexes in an environment that exactly mirrors production. Although not the preferred method, Oracle offers in 11.2 and later releases the option of invisible indexes. What are invisible indexes? An invisible index can’t be ‘seen’ by any session by default, so it can be created and remain unavailable until testing is scheduled. This can be confusing since Oracle has never offered such a feature. Looking at an example might clear up any confusion.

Creating an invisible index is fairly straightforward:


SQL> create index emp_empno_idx
  2  on emp(empno)
  3  invisible;

Index created.

SQL>

We now have an invisible index on the EMPNO column of the ubiquitous EMP table. Will it be used? No, since the required parameter, optimizer_use_invisible_indexes, has not been altered from its default of FALSE. We prove this by executing the following query and examining the plan:


SQL>
SQL> select
  2  	     *
  3  from
  4  	     emp
  5  where
  6  	     empno < 7400;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20


Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    87 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |    87 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("EMPNO"<7400)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
         16  consistent gets
          0  physical reads
          0  redo size
        861  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

Notice the full table scan for a single row result set; even though we know the index exists the INVISIBLE keyword prevents Oracle from using it. Let’s fix that with the next attempt:


SQL> alter session set optimizer_use_invisible_indexes=true;

Session altered.

SQL>

SQL> select
  2             *
  3  from
  4             emp
  5  where
  6             empno < 7400;



     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20



Execution Plan
----------------------------------------------------------
Plan hash value: 576302360
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     1|    87 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP           |     1|    87 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | EMP_EMPNO_IDX |     1|       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO"<7400)

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          9  recursive calls
          0  db block gets
         12  consistent gets
          0  physical reads
          0  redo size
        861  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

The current session can now see and use this invisible index even though no one else can. Since no DDL was issued the cursor is not invalidated. The parameter optimizer_use_invisible_indexes was set to TRUE at the session level, which now allows Oracle to know this index exists. If you’ve created more than one invisible index then all invisible indexes are visible from the session where this parameter is set to TRUE. This could make testing difficult if several invisible indexes are created using a common column especially if it’s the leading column. In the following example the intent was to test EMO_EMPNO_IDX but two similar invisible indexes were created:


SQL> create index emp_empno_idx
  2  on emp(empno,sal)
  3  invisible;

Index created.

SQL>
SQL> create index emp_empdept_idx
  2  on emp(empno,deptno)
  3  invisible;

Index created.

SQL>
SQL> set autotrace on
SQL>
SQL> select
  2             *
  3  from
  4             emp
  5  where
  6             empno < 7400;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20

Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost(%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    87 |     3  (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |    87 |     3  (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("EMPNO"<7400)

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
         16  consistent gets
          0  physical reads
          0  redo size
        861  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>
SQL> alter session set optimizer_use_invisible_indexes=true;

Session altered.

SQL>
SQL> select
  2             *
  3  from
  4             emp
  5  where
  6             empno < 7400;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20

Execution Plan
----------------------------------------------------------
Plan hash value: 1632405565
-----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name            |Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |    1 |    87 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP             |    1 |    87 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | EMP_EMPDEPT_IDX |    1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO"<7400)

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        861  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL>

Notice that the intended index was not selected; the second of the two invisible indexes was favored for the query. It’s best if only one invisible index per table is created to avoid such occurrences. Of course it’s not hard to drop an invisible index so if the first attempt doesn’t function as intended the index can be dropped and a revised invisible index can replace it.

When should you use invisible indexes? Normally in a development or load test environment when performance issues need to be addressed. An invisible index can be created and then tested using the commands shown previously; the beauty of such a system is that only one session needs to see the invisible index so it can be tested. This prevents a new index from affecting existing processes so that other sessions on a heavily used test server won’t be disrupted by its presence. Test suite results can be invalidated by creating visible indexes without first verifying prior testing is complete (I’ve seen situations where indexes were created on load test systems because the person creating the indexes thought the system was ‘free’ and did not check that information first).

So you have an invisible index created and it provides the desired results; how does it become visible to all? The following command establishes its visibility:


SQL> alter index emp_empno_idx visible;

Index altered.

SQL>

A quick query against EMP from another session proves the index can be seen by all:


SQL> select
  2             *
  3  from
  4             emp
  5  where
  6             empno < 7400;

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20

Execution Plan
----------------------------------------------------------
Plan hash value: 576302360
---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |     1|    87 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP           |     1|    87 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | EMP_EMPNO_IDX |     1|       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("EMPNO"<7400)

Note
-----
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
        210  recursive calls
          0  db block gets
         47  consistent gets
          0  physical reads
          0  redo size
        861  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed

SQL> 

Once you make an index visible the DDL invalidates the current cursor for queries against the table where the once-invisible index is built. Since the plan changes this should not be a surprising event. This does result in a hard parse, so you will likely see the hard to soft parse ratio increase after invisible indexes are made visible.

Invisible indexes are a very useful addition to an already world-class database, allowing performance tuning in a relatively isolated environment. Of course such tools must be used with discretion, forethought and care as the example with two similar invisible indexes illustrates. Keeping that in mind I find that invisible indexes can greatly improve the tuning process since the indexes can be tested and refined with a minimal impact on other sessions. Implementation then becomes less of a ‘hit-or-miss’ proposition in the finalized version of a product release or in a firefighting session geared toward immediate performance improvement since various forms of an index can be created and the most beneficial version can be selected. Remember, too, that there are no ‘silver bullets’ when it comes to Oracle performance.

Next Page »

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 665 other followers