Oracle Tips and Tricks — David Fitzjarrell

October 22, 2014

Memory Improvement

Filed under: General — dfitzjarrell @ 09:37

Oracle 12c (version 12.1.0.2) offers the option of using in-memory processing to speed things along. Called the In-memory option it’s installed when you install the 12.1.0.2 software. Using it can make a considerable difference in processing speed, provided you have sufficient resources (RAM) available. Let’s revisit an older example, on Bloom filters, and see if Oracle processes things any faster in-memory.

Looking again at the Bloom filter example using 11.2.0.3 on a Windows installation it’s evident that the Bloom filter provides an increased level of performance (only the abbreviated output section is reproduced here):


SQL> select /*+ Bloom join 2 parallel 2 use_hash(emp emp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal	  --, count(*) emps
  2  from emp e join emp_dept ed on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  where ed.empdept = 20;

     EMPID EMPNM                                    DEPTNM                        EMPSAL                                                              
---------- ---------------------------------------- ------------------------- ----------                                                              
   1670281 Fnarm1670281                             PROCUREMENT                     5000                                                              
   1670289 Fnarm1670289                             PROCUREMENT                     6000                                                              
   1670297 Fnarm1670297                             PROCUREMENT                     7000                                                              
   1670305 Fnarm1670305                             PROCUREMENT                     1000                                                              
   1670313 Fnarm1670313                             PROCUREMENT                     2000                                                              
   1670321 Fnarm1670321                             PROCUREMENT                     3000                                                              
   1670329 Fnarm1670329                             PROCUREMENT                     4000                                                              
   1670337 Fnarm1670337                             PROCUREMENT                     5000                                                              
   1670345 Fnarm1670345                             PROCUREMENT                     6000                                                              
   1670353 Fnarm1670353                             PROCUREMENT                     7000                                                              
   1670361 Fnarm1670361                             PROCUREMENT                     1000                                                              
...
   1857369 Fnarm1857369                             PROCUREMENT                     4000                                                              
   1857377 Fnarm1857377                             PROCUREMENT                     5000                                                              
   1857385 Fnarm1857385                             PROCUREMENT                     6000                                                              

250000 rows selected.

Elapsed: 00:00:54.86

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2643012915                                                                                                                           
                                                                                                                                                      
----------------------------------------------------------------------------------------------------------------------------------                    
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |                    
----------------------------------------------------------------------------------------------------------------------------------                    
|   0 | SELECT STATEMENT                     |              |   273K|    26M|  1378   (1)| 00:00:01 |        |      |            |                    
|   1 |  PX COORDINATOR                      |              |       |       |            |          |        |      |            |                    
|   2 |   PX SEND QC (RANDOM)                | :TQ10002     |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |                    
|*  3 |    HASH JOIN                         |              |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   4 |     PX RECEIVE                       |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   5 |      PX SEND BROADCAST               | :TQ10001     |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | P->P | BROADCAST  |                    
|   6 |       NESTED LOOPS                   |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|   7 |        BUFFER SORT                   |              |       |       |            |          |  Q1,01 | PCWC |            |                    
|   8 |         PX RECEIVE                   |              |       |       |            |          |  Q1,01 | PCWP |            |                    
|   9 |          PX SEND BROADCAST           | :TQ10000     |       |       |            |          |        | S->P | BROADCAST  |                    
|  10 |           TABLE ACCESS BY INDEX ROWID| DEPT_INFO    |     1 |    27 |     1   (0)| 00:00:01 |        |      |            |                    
|* 11 |            INDEX UNIQUE SCAN         | DEPT_INFO_PK |     1 |       |     1   (0)| 00:00:01 |        |      |            |                    
|  12 |        PX BLOCK ITERATOR             |              |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWC |            |                    
|* 13 |         TABLE ACCESS FULL            | EMP_DEPT     |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|  14 |     PX BLOCK ITERATOR                |              |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWC |            |                    
|* 15 |      TABLE ACCESS FULL               | EMP          |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
----------------------------------------------------------------------------------------------------------------------------------                    
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   3 - access("ED"."EMPID"="E"."EMPID")                                                                                                               
  11 - access("D"."DEPTNUM"=20)                                                                                                                       
  13 - filter("ED"."EMPDEPT"=20)                                                                                                                      
  15 - filter(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))                                                                                               
                                                                                                                                                      
Note                                                                                                                                                  
-----                                                                                                                                                 
   - dynamic sampling used for this statement (level=2)                                                                                               


Statistics
----------------------------------------------------------                                                                                            
         33  recursive calls                                                                                                                          
        139  db block gets                                                                                                                            
      36224  consistent gets                                                                                                                          
      17657  physical reads                                                                                                                           
          0  redo size                                                                                                                                
    9526012  bytes sent via SQL*Net to client                                                                                                         
     183846  bytes received via SQL*Net from client                                                                                                   
      16668  SQL*Net roundtrips to/from client                                                                                                        
          5  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
     250000  rows processed                                                                                                                           

SQL>

Now let’s get Oracle’s In-memory option working. By default it’s disabled; the initialization parameter inmemory_size is set to 0. To get In-memory configured and working three steps need to be performed:


1. Set inmemory_size to a non-zero value; in the database used for this example it's been set to 576M.  As this is a parameter that cannot be modified directly
   it will need to be changed using scope=spfile.
2. If desired increase the sga_max_size by the value that inmemory_size has been set to.  This is not a requirement but remember that any memory allocated
   to the in-memory database will be taken from the SGA allocation.
3. Restart the database to effect the changes.

One caveat on this is calculating how much memory will be needed to provide a workable In-memory configuration; it’s similar to configuring the KEEP pool (a task from earlier releases) as you should allocate sufficient memory to contain all of the tables that will be used to prevent delays in loading the data to the IM column store. Now you have the in-memory option configured and ready to use. Alter a table to use inmemory and you will start seeing the possible benefits of this configuration. (By default the following in-memory options are set:


  PRIORITY NONE -- Oracle controls when table data is populated in the IM column store.  Population may be delayed if memory is needed for other processes.
                   Other values are LOW, MEDIUM, HIGH and CRITICAL.  As expected a higher PRIORITY loads before any lower priority.
  MEMCOMPRESS FOR QUERY LOW -- This option provides the best query performance and the least compression.
  DISTRIBUTE AUTO -- Used for RAC this controls how the data is distributed between nodes.  BY ROWID RANGE, BY PARTITION and BY SUBPARTITION are other options.
  NO DUPLICATE -- Another RAC parameter, this controls how many nodes will have duplicated data.  DUPLICATE involves one node for a total of two (so in a two-node
                  RAC configuration DUPLICATE and DUPLICATE ALL provide the same duplication),  DUPLICATE ALL duplicates the table data across all available RAC nodes.

Re-writing the Bloom filter example to take advantage of this new feature allows us to see how it will perform:


SQL>
SQL> --
SQL> -- Create sample tables
SQL> --
SQL> create table emp(
  2  	     empid   number,
  3  	     empnm   varchar2(40),
  4  	     empsal  number,
  5  	     empssn  varchar2(12),
  6  	     constraint emp_pk primary key (empid)
  7  ) inmemory;

Table created.

SQL>
SQL> create table emp_dept(
  2  	     empid   number,
  3  	     empdept number,
  4  	     emploc  varchar2(60),
  5  	     constraint emp_dept_pk primary key(empid)
  6  ) inmemory;

Table created.

SQL>
SQL> create table dept_info(
  2  	     deptnum number,
  3  	     deptnm  varchar2(25),
  4  	     constraint dept_info_pk primary key(deptnum)
  5  ) inmemory;

Table created.

SQL>
SQL> --
SQL> -- Load sample tables with data
SQL> --
SQL> begin
  2  	     for i in 1..2000000 loop
  3  		     insert into emp
  4  		     values(i, 'Fnarm'||i, (mod(i, 7)+1)*1000, mod(i,10)||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||mod(i,10)||mod(i,10));
  5  		     insert into emp_dept
  6  		     values(i, (mod(i,8)+1)*10, 'Zanzwalla'||(mod(i,8)+1)*10);
  7  		     commit;
  8  	     end loop;
  9  	     insert into dept_info
 10  	     select distinct empdept, case when empdept = 10 then 'SALES'
 11  					   when empdept = 20 then 'PROCUREMENT'
 12  					   when empdept = 30 then 'HR'
 13  					   when empdept = 40 then 'RESEARCH'
 14  					   when empdept = 50 then 'DEVELOPMENT'
 15  					   when empdept = 60 then 'EMPLOYEE RELATIONS'
 16  					   when empdept = 70 then 'FACILITIES'
 17  					   when empdept = 80 then 'FINANCE' end
 18  	     from emp_dept;
 19
 20  end;
 21  /

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- Gather statistics
SQL> --
SQL> exec dbms_stats.gather_schema_stats('BING')

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- Run join query using bloom filter
SQL> --

SQL> set autotrace on
SQL> set timing on
SQL>
SQL> select /*+ bloom join 2 parallel 2 use_hash(emp emp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal	  --, count(*) emps
  2  from emp e join emp_dept ed on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  where ed.empdept = 20;

     EMPID EMPNM                                    DEPTNM                        EMPSAL
---------- ---------------------------------------- ------------------------- ----------
   1548633 Fnarm1548633                             PROCUREMENT                     3000
   1548641 Fnarm1548641                             PROCUREMENT                     4000
   1548649 Fnarm1548649                             PROCUREMENT                     5000
 ...
    897969 Fnarm897969                              PROCUREMENT                     3000
    897977 Fnarm897977                              PROCUREMENT                     4000
    897985 Fnarm897985                              PROCUREMENT                     5000
    897993 Fnarm897993                              PROCUREMENT                     6000
    898001 Fnarm898001                              PROCUREMENT                     7000

250000 rows selected.

Elapsed: 00:00:25.67

Execution Plan
----------------------------------------------------------
Plan hash value: 2613779428

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |   250K|    10M|       |  3863   (2)| 00:00:01 |
|*  1 |  HASH JOIN                    |           |   250K|    10M|  8304K|  3863   (2)| 00:00:01 |
|   2 |   JOIN FILTER CREATE          | :BF0000   |   250K|    10M|       |  3863   (2)| 00:00:01 |
|*  3 |    HASH JOIN                  |           |   250K|  5371K|       |    94  (23)| 00:00:01 |
|*  4 |     TABLE ACCESS INMEMORY FULL| DEPT_INFO |     1 |    13 |       |     1   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS INMEMORY FULL| EMP_DEPT  |   250K|  2197K|       |    92  (22)| 00:00:01 |
|   6 |   JOIN FILTER USE             | :BF0000   |  2000K|    41M|       |   137  (17)| 00:00:01 |
|*  7 |    TABLE ACCESS INMEMORY FULL | EMP       |  2000K|    41M|       |   137  (17)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ED"."EMPID"="E"."EMPID")
   3 - access("ED"."EMPDEPT"="D"."DEPTNUM")
   4 - inmemory("D"."DEPTNUM"=20)
       filter("D"."DEPTNUM"=20)
   5 - inmemory("ED"."EMPDEPT"=20)
       filter("ED"."EMPDEPT"=20)
   7 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"E"."EMPID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"E"."EMPID"))

Note
-----
   - this is an adaptive plan


Statistics
----------------------------------------------------------
         14  recursive calls
          0  db block gets
         26  consistent gets
          0  physical reads
          0  redo size
    9659722  bytes sent via SQL*Net to client
     183877  bytes received via SQL*Net from client
      16668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     250000  rows processed

SQL>

Notice that the Bloom filter is still used, and in conjunction with the In-memory option the execution time was considerably shorter than the original, 11.2.0.3 Windows server, example. Notice also that since the table was in-memory the memory sorts are 0 (since all processing was done in-memory). If you have the resources available using the In-memory option for Oracle 12.1.0.2 might improve performance significantly.

Bloom filters plus the In-memory database option with Oracle 12.1.0.2 can improve query performance dramatically, and that can make for very happy end users. It’s a bit of a chore to get In-memory configured (the hardest part being the available memory calculations) but it appears to be worth every bit of time spent to get it working.

We could all use a little memory improvement now and then.

October 7, 2014

To Skip, Or Not To Skip

Filed under: General — dfitzjarrell @ 14:02

An interesting issue presented itself just recently with a logical standby database I manage. The database is used for generating reports and the client wanted to skip all DML activity for a given schema as it wasn’t necessary for reporting purposes. I had done this in version 10.2.0.x; it was a simple procedure on a low-traffic database:


alter database stop logical standby apply;
exec dbms_logstdby.skip('DML','','%')
alter database start logical standby apply;

Pretty simple stuff, really. In the 10.2.0.x standby configuration I had been lucky that no DDL changes for the specified schema had occurred, so missing a step wasn’t obvious as it didn’t impact the apply process. For a while the 11.2 standby was happily applying transactions and skipping DML for the specified schema. Overnight, however, the apply process shut down for no immediately apparent reason. The schema in question had all of its DML ignored so why did apply stop? Looking into the alert log provided the following set of errors:


ORA-26808: Apply process AS01 died unexpectedly.
ORA-02149: Specified partition does not exist
LOGSTDBY Apply process AS03 server id=3 pid=85 OS id=27432 stopped

This required further investigation. Checking the primary database I found 594 partitions for tables owned by the schema that was supposed to be skipped. On the logical standby I found 43 fewer partitions for that same schema. So it appeared that the actions I took the day before, to skip all DML activity for this schema, weren’t enough. Pondering the issue I went to the Oracle online documentation and eventually found information showing a step I had missed. Skipping the schema DML wasn’t enough, the DDL also had to be ignored and to do that required the following set of instructions:


ALTER DATABASE ABORT LOGICAL STANDBY APPLY;
EXECUTE DBMS_LOGSTDBY.SKIP(stmt => 'SCHEMA_DDL',-
     schema_name => '', -
     object_name => '%');
EXECUTE DBMS_LOGSTDBY.SKIP ('DML', '', '%');
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;  

Using the abort command does two things: it stops the apply process and it rolls back any pending transactions. This is necessary so that ‘offending’ transactions, the ones that throw the listed error, get rolled back; this ‘resets’ the transaction ‘pointer’ for Log Miner so that when the apply process is restarted it can skip the transactions throwing the errors. Notice the second step in the skip process:


EXECUTE DBMS_LOGSTDBY.SKIP(stmt => 'SCHEMA_DDL',-
     schema_name => '', -
     object_name => '%');

This is necessary to tell Oracle to completely skip any DDL for the specified schema and keeps transaction processing going even if there are DDL changes on the primary database for the schema you want to skip. Since I hadn’t executed the skip procedure twice, with the two sets of slightly different parameters, only half of the job was done; it wasn’t apparent until partitions were added to the specified schema on the primary that the skip wasn’t processed completely. Once that missing step had been executed, and the original skip statement was re-submitted, the apply process started applying transactions and the standby lag started getting shorter and shorter.

It pays to read the documentation when performing unfamiliar tasks or what seem to be familiar tasks in a new release. Missed steps can create issues that can seem to be mysterious; verifying the required steps is necessary to ensure a task is completed successfully. What you may have given the least thought to may be the very thing that stops you in your tracks.

Skip, skip, skip to my Lou.

September 25, 2014

Map Reading

Filed under: General — dfitzjarrell @ 11:55

Consider the following concept: When you are born you are issued a map showing the direction your life will take. Along the way people will come into your life and help you make sense of parts of that map. You may not know these people at the time but they will be important in establishing where you are to be headed and possibly what you should be doing. Eventually you can read the entire map and know where you should go; hopefully you’ll already be there.

Continuing with that premise there have been four extremely influential people who have come into my life to help me read my map and set the direction that landed me here, as an Oracle DBA. I would like to acknowledge these people here; they will be listed in ‘order of appearance’, to take a phrase from the television industry. Any one of them would have been very influential; it’s the fact that they all entered my life at some point and helped guide me along the way that I find important.

If it weren’t for Janet Hamilton, Ph.D., I probably never would have stayed with my plans for a college degree in Chemistry. Janet is a one of a kind person, extremely intelligent yet willing and able to share her knowledge. She taught both Freshman Inorganic Chemistry and Sophomore Organic Chemistry so those of us who were Chemistry majors, Pre-Med or Nursing students had her for two full years. She also taught the Organic Chemistry labs so many of us saw a lot of her each week. She was always willing to help during her office hours and I took advantage of that as often as I could. She was passionate about Chemistry and instilled that passion in us. What she taught me was this: knowledge is best when it’s shared. She was a shining example of this every day and fueled in me a desire to share what I know.

Next in line was Cora Schweitzer, a lovely woman who taught Advanced World Literature and did it with a passion and flair I have yet to experience again. Unlike many English teachers Cora brought the subject to life and did it on a daily basis. I could not wait to go to class and experience her knowledge and wisdom. I learned from her that passion is a key to workplace success and happiness, and that you should do what you love as a career. That is sometimes a difficult task to complete as thought, logic and passion can ‘collide'; doing what you love may not be the most financially advantageous direction to take but it can make a difference in how your life is spent. Sometimes it’s better to make a living doing something you are passionate about than it is to go for the financial success, especially if financial success leads to a great deal of stress. One day in class Cora admitted she would sit near her office door waiting for the sound of others leaving the building. She would then call out, “Going home?” in hopes of getting a ride after her day was done. I made regular pilgrimages to her office soon after that to offer to drive her home. We had lovely conversations while I drove, and I learned even more how interesting and fascinating she was. Sadly Cora died in 1982; she left this world a better place than she found it and I am truly grateful to have been a part of her wonderful life.

Eunice Bonar taught me Quantitative Analysis, and did it like no other person could. She was a stickler for details, but that’s how you had to be when performing quantitative procedures and tests. She showed me the beauty of the Q-test and the satisfaction of producing repeatable results through care and planning. No detail was too small to overlook; she was preparing us for work as Analytical Chemists, a Chemistry career based on precision and accuracy. And she taught us the difference between the two as you can have good precision and poor accuracy, good accuracy and poor precision, or good precision and good accuracy. Regardless of the tasks at hand Eunice made the work interesting and fun, at least to me. Just like my grandfather taught me years before Eunice further instilled in me the desire to do my best at whatever task lies ahead; if you can’t sign your name to it, it isn’t worth doing.

Last, but certainly not least, is Dennis Marynick, who taught Physical Chemistry. This is the toughest subject a Chemistry or Chemical Engineering major would take. Bumper stickers were sold by the local American Chemical Society chapter stating “Honk If You Passed P-Chem!”. The subject matter was tough but Dennis could make it interesting. He could translate all of his years of Doctoral research into concepts that promoted understanding. True, there were topics that he just couldn’t work his magic on but, for the most part, he made Physical Chemistry less intimidating, possibly because he, too, loved what he did for a living. He also never stopped learning, not because he had to but because he wanted to. That Is an important concept that I integrated into my life.

What I learned from these four outstanding people is this: never stop learning, do what you are passionate about and share what you know, no matter how unimportant you think it might be. Knowledge is power, and sharing what you know can empower others. Because of these people I have become who I am today, a person constantly learning new things and sharing that knowledge as often as I can.

Each of us has at least one person who has given us direction and purpose; remember these people and what they did for you. Maybe some day someone will remember you for your positive contributions to their life.

September 23, 2014

“ID, Please”

Filed under: General — dfitzjarrell @ 10:13

In a forum I frequent the following question was raised:

We see some sqls that have been running for a long time in our prod DB, 11.2. when we check the v$session, i see null sql ids for them.
The sql_address shows up as 00. I would like to know what these sqls are doing because they correspond to processes that are burning up
the CPU and driving the usage to > 85%.

An interesting question, indeed. What are those NULL sql_ids doing, and are they actually NULL? To find out I set up an experiment using both a long series of insert statements and a PL/SQL block fetching 100,000 rows at a time. The results were surprising; let’s look at the example code and see what happened when it ran. Here is the code:


set echo on serveroutput on size 1000000

create table emp (empid number not null,
                  empname varchar2(40),
                  deptno        number);

set timing on

begin
        for i in 1..5000000 loop
                insert into emp
                values (i, 'Smorthorper'||i, mod(i, 40)+1);
        end loop;

        commit;
end;
/

set timing off

exec dbms_stats.gather_schema_stats('BING')

set timing on
declare
        type empcoltyp is table of emp%rowtype;
        emp_c empcoltyp;
        recct   number:=0;

        cursor get_emp_data is
        select * from emp;

begin
        open get_emp_data;
        loop
        fetch get_emp_data bulk collect into emp_c limit 100000;
        exit when get_emp_data%notfound;

        for i in 1..emp_c.count loop
                recct:=recct+1;
        end loop;
        dbms_output.put_line('Total records processed: '||recct);
        dbms_lock.sleep(3);
        end loop;
end;
/

set timing off

It’s a fairly simple script, really. I used the bulk collect and sleep to create a long-running fetch cycle to see if that displayed the observed behavior. Now, let’s run the script and see how the sql_id column in V$SESSION behaved. I used two sessions, one running the script and one to check the sql_id values from V$SESSION.  Session 1 generated the following output for the long-running insert:


SQL> create table emp (empid number not null,
  2                    empname varchar2(40),
  3                    deptno        number);

Table created.

SQL>
SQL> set timing on
SQL>
SQL> begin
  2          for i in 1..5000000 loop
  3                  insert into emp
  4                  values (i, 'Smorthorper'||i, mod(i, 40)+1);
  5          end loop;
  6
  7          commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

Elapsed: 00:02:43.27
SQL>
SQL> set timing off

Session 2 reported the following behavior for the sql_id column:


SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469               gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469               gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469               gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL>

Notice that occasionally the sql_id becomes NULL then returns to its previous value, as reported in prev_sql_id. When long-running SQL statements are being processed it’s possible that the sql_id column can go NULL, and the sql_address be reported as 00, even though the same statement that was started minutes or even hours ago is still running. In such cases the prev_sql_id holds the sql_id for the currently running SQL. This doesn’t hold true for PL/SQL blocks, even with a sleep added. Session 1 reports the PLSQL block and it’s apparent progress::


SQL> set timing on
SQL> declare
  2          type empcoltyp is table of emp%rowtype;
  3          emp_c empcoltyp;
  4          recct   number:=0;
  5
  6          cursor get_emp_data is
  7          select * from emp;
  8
  9  begin
 10          open get_emp_data;
 11          loop
 12          fetch get_emp_data bulk collect into emp_c limit 100000;
 13          exit when get_emp_data%notfound;
 14
 15          for i in 1..emp_c.count loop
 16                  recct:=recct+1;
 17          end loop;
 18          dbms_output.put_line('Total records processed: '||recct);
 19          dbms_lock.sleep(3);
 20          end loop;
 21  end;
 22  /
Total records processed: 100000
Total records processed: 200000
Total records processed: 300000
Total records processed: 400000
Total records processed: 500000
Total records processed: 600000
Total records processed: 700000
Total records processed: 800000
Total records processed: 900000
Total records processed: 1000000
Total records processed: 1100000
Total records processed: 1200000
Total records processed: 1300000
Total records processed: 1400000
Total records processed: 1500000
Total records processed: 1600000
Total records processed: 1700000
Total records processed: 1800000
Total records processed: 1900000
Total records processed: 2000000
Total records processed: 2100000
Total records processed: 2200000
Total records processed: 2300000
Total records processed: 2400000
Total records processed: 2500000
Total records processed: 2600000
Total records processed: 2700000
Total records processed: 2800000
Total records processed: 2900000
Total records processed: 3000000
Total records processed: 3100000
Total records processed: 3200000
Total records processed: 3300000
Total records processed: 3400000
Total records processed: 3500000
Total records processed: 3600000
Total records processed: 3700000
Total records processed: 3800000
Total records processed: 3900000
Total records processed: 4000000
Total records processed: 4100000
Total records processed: 4200000
Total records processed: 4300000
Total records processed: 4400000
Total records processed: 4500000
Total records processed: 4600000
Total records processed: 4700000
Total records processed: 4800000
Total records processed: 4900000
Total records processed: 5000000

PL/SQL procedure successfully completed.

Elapsed: 00:02:34.16
SQL>
SQL> set timing off

Session 2 again is used to display the sql_id throughout the PL/SQL run; notice the sql_id does not go NULL at any time during the run [the output has been abbreviated to reduce the large number of repeated lines]:


SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

...
SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

...
SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

...
SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

SQL>

At no time did the sql_id display as NULL during the PL/SQL execution. I suspect it’s because the repeated fetch operations, limited to 1/50th of the total data, kept the cursor, and its associated SQL statement, active the entire duration of the run.

It would appear (and I think I have a valid test case as proof) that during long-running SQL statements the sql_id can be reported as NULL even though the statement is still executing. In such cases the prev_sql_id usually provides the identifier for the currently running SQL. Of course not all NULL sql_ids are associated with long-running statements, and V$SESSION_LONGOPS does not, by default, report every long-running statement or operation to hit a database. The ‘long-running’ criteria according to Oracle is any statement that runs for more than 6 seconds. A partial list of operations that are recorded follows:


•Table scan
•Index Fast Full Scan
•Hash join
•Sort/Merge
•Sort Output
•Rollback
•Gather Table Index Statistics

NESTED LOOP JOINS are not automatically recorded in V$SESSION_LONGOPS although HASH JOINS are. It is possible to circumvent this by using the DBMS_APPLICATION_INFO.SET_SESSION_LONGOPS procedure to register your long-running operation with the view. Modifying the previous code to make V$SESSION_LONGOPS report interim activity:


declare
        type empcoltyp is table of emp%rowtype;
        emp_c empcoltyp;
        recct   number:=0;
        r_idx    BINARY_INTEGER;
        l_no      BINARY_INTEGER;
        ttlwrk number;
        so_far     number;
        obj       BINARY_INTEGER;

        cursor get_emp_data is
        select * from emp;

begin
        r_idx := dbms_application_info.set_session_longops_nohint;
        so_far := 0;
        ttlwrk := 50;
        select object_id
        into obj
        from user_objects
        where object_name = 'EMP';

        open get_emp_data;
        loop
        fetch get_emp_data bulk collect into emp_c limit 100000;
        exit when get_emp_data%notfound;

        while so_far < 50 loop
             for i in 1..emp_c.count loop
                recct:=recct+1;
             end loop;
             dbms_output.put_line('Total records processed: '||recct);
             dbms_lock.sleep(3);
             so_far := so_far + 1;
             dbms_application_info.set_session_longops(r_idx, l_no, 'Bounded fetch ', obj, 0, so_far, ttlwrk, 'EMP', 'fetches');
             end loop;
        end loop;
end;
/

V$SESSION_LONGOPS now reports progress for the bulk collect fetch operations:


SQL> select message from v$session_longops where sofar  totalwork;

MESSAGE
-----------------------------------------------------------------
Bounded fetch : EMP 113315: 1 out of 50 fetches done

SQL> /

MESSAGE
-----------------------------------------------------------------
Bounded fetch : EMP 113315: 2 out of 50 fetches done

SQL> /

MESSAGE
-----------------------------------------------------------------
Bounded fetch : EMP 113315: 3 out of 50 fetches done

SQL>
...

Oracle can exhibit ‘strange’ behavior with the sql_id values in V$SESSION, but just because the sql_id is NULL that doesn’t mean Oracle isn’t processing a long-running statement. It’s likely processing the same SQL reported in prev_sql_id so you can look to that column to find (possibly) active SQL. As a side note you can also get V$SESSION_LONGOPS to report on any operation you consider to be long using the example code as a guide. Obviously you can’t get V$SESSION_LONGOPS to report on ‘canned’ application operations if they don’t meet the reporting criteria (that would require the vendor to modify their code, not an easy task to accomplish). The idea to remember is that the sql_id for some long-running operations isn’t lost when the sql_id value goes NULL, it’s just been moved to the prev_sql_id column.

I need to see your ID, please.

September 8, 2014

Blooms

Filed under: Exadata,General,Performance — dfitzjarrell @ 09:41

In Oracle releases 10.2.0.x and later join processing can be made more efficient by the use of Bloom filters, primarily to reduce traffic between parallel query slaves. What is a Bloom filter? Named after Burton Howard Bloom, who came up with the concept in the 1970s, it’s an efficient data structure used to quickly determine if an element has a high probability of being a member of a given set. It’s based on a bit array that allows for rapid searches and returns one of two results: either the element is probably in the set (which can produce false positives) or the element is definitely not in the set. The filter cannot produce false negatives, and the incidence of false positives is relatively rare. Another advantage to Bloom filters is their small size relative to other data structures used for similar purposes (self-balancing binary search trees, hash tables, or linked lists). The possibility of false positives necessitates the addition of another filter to eliminate them from the results, yet such a filter doesn’t add appreciably to the process time and, therefore, goes relatively unnoticed.

When Bloom filters are used for a query on an Exadata system, the filter predicate and the storage predicate will list the SYS_OP_Bloom_FILTER function as being called. This function includes the additional filter to eliminate any false positives that could be returned. It’s the storage predicate that provides the real power of the Bloom filter on Exadata. Using a Bloom filter to pre-join the tables at the storage server level reduces the volume of data the database servers need to process and can significantly reduce the execution time of the given query.

An example of Bloom filters in action follows; a script named Bloom_fltr_ex.sql was executed to generate this output on an Exadata system; the script can be emailed on request:


SQL> --
SQL> -- Create sample tables
SQL> --
SQL> -- Create them parallel, necessary 
SQL> -- to get a Smart Scan on these tables
SQL> --
SQL> create table emp(
  2  	empid   number,
  3  	empnmvarchar2(40),
  4  	empsal  number,
  5  	empssn  varchar2(12),
  6  	constraint emp_pk primary key (empid)
  7  ) parallel 4;

Table created.

SQL>
SQL> create table emp_dept(
  2  	empid   number,
  3  	empdept number,
  4  	emploc  varchar2(60),
  5    constraint emp_dept_pk primary key(empid)
  6  ) parallel 4;

Table created.

SQL>
SQL> create table dept_info(
  2  	deptnum number,
  3  	deptnm  varchar2(25),
  4    constraint dept_info_pk primary key(deptnum)
  5  ) parallel 4;

Table created.

SQL>
SQL> --
SQL> -- Load sample tables with data
SQL> --
SQL> begin
  2  	     for i in 1..2000000 loop
  3  		     insert into emp
  4  		     values(i, 'Fnarm'||i, (mod(i, 7)+1)*1000, mod(i,10)||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||mod(i,10)||mod(i,10));
  5  		     insert into emp_dept
  6  		values(i, (mod(i,8)+1)*10, 'Zanzwalla'||(mod(i,8)+1)*10);
  7  		     commit;
  8  	     end loop;
  9  	     insert into dept_info
 10  	     select distinct empdept, case when empdept = 10 then 'SALES'
 11  					   when empdept = 20 then 'PROCUREMENT'
 12  					   when empdept = 30 then 'HR'
 13  					   when empdept = 40 then 'RESEARCH'
 14  					   when empdept = 50 then 'DEVELOPMENT'
 15  					   when empdept = 60 then 'EMPLOYEE RELATIONS'
 16  					   when empdept = 70 then 'FACILITIES'
 17  					   when empdept = 80 then 'FINANCE' end
 18  	     from emp_dept;
 19  
 20  end;
 21  /

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- Run join query using Bloom filter
SQL> --
SQL> -- Generate execution plan to prove Bloom
SQL> -- filter usage
SQL> --
SQL> -- Also report query execution time
SQL> --
SQL> set autotrace on
SQL> set timing on
SQL>
SQL> select /*+ Bloom join 2 parallel 2 use_hash(empemp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal, count(*) emps
  2  fromemp e join emp_depted on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  whereed.empdept = 20
  4  group by e.empid, e.empnm, d.deptnm, e.empsal;

     EMPID EMPNM                  DEPTNM                        EMPSAL       EMPS
---------- ---------------------- ------------------------- ---------- ----------
    904505 Fnarm904505            PROCUREMENT                     1000          1
    907769 Fnarm907769            PROCUREMENT                     3000          1
    909241 Fnarm909241            PROCUREMENT                     5000          1
    909505 Fnarm909505            PROCUREMENT                     3000          1
    909641 Fnarm909641            PROCUREMENT                     6000          1
    910145 Fnarm910145            PROCUREMENT                     6000          1
...
    155833 Fnarm155833            PROCUREMENT                     7000          1
    155905 Fnarm155905            PROCUREMENT                     2000          1
    151081 Fnarm151081            PROCUREMENT                     1000          1
    151145 Fnarm151145            PROCUREMENT                     2000          1

250000 rows selected.

Elapsed: 00:00:14.27

Execution Plan
----------------------------------------------------------
Plan hash value: 2313925751

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows | Bytes | Cost (%CPU)| Time     |   TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |  218K|    21M|  1378   (1)| 00:00:01 |       |      |            |
|   1 |  PX COORDINATOR                          |              |      |       |            |          |       |      |            |
|   2 |   PX SEND QC (RANDOM)                    | :TQ10003     |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                         |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | PCWP |            |
|   4 |     PX RECEIVE                           |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | PCWP |            |
|   5 |      PX SEND HASH                        | :TQ10002     |  218K|    21M|  1378   (1)| 00:00:01 | Q1,02 | P->P | HASH       |
|   6 |       HASH GROUP BY                      |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,02 | PCWP |            |
|*  7 |        HASH JOIN                         |              |  218K|    21M|  1376   (1)| 00:00:01 | Q1,02 | PCWP |            |
|   8 |         PX RECEIVE                       |              |  218K|    11M|   535   (1)| 00:00:01 | Q1,02 | PCWP |            |
|   9 |          PX SEND BROADCAST               | :TQ10001     |  218K|    11M|   535   (1)| 00:00:01 | Q1,01 | P->P | BROADCAST  |
|  10 |           NESTED LOOPS                   |              |  218K|    11M|   535   (1)| 00:00:01 | Q1,01 | PCWP |            |
|  11 |            BUFFER SORT                   |              |      |       |            |          | Q1,01 | PCWC |            |
|  12 |             PX RECEIVE                   |              |      |       |            |          | Q1,01 | PCWP |            |
|  13 |              PX SEND BROADCAST           | :TQ10000     |      |       |            |          |       | S->P | BROADCAST  |
|  14 |               TABLE ACCESS BY INDEX ROWID| DEPT_INFO    |    1 |    27 |     1   (0)| 00:00:01 |       |      |            |
|* 15 |                INDEX UNIQUE SCAN         | DEPT_INFO_PK |    1 |       |     1   (0)| 00:00:01 |       |      |            |
|  16 |            PX BLOCK ITERATOR             |              |  218K|  5556K|   534   (1)| 00:00:01 | Q1,01 | PCWC |            |
|* 17 |             TABLE ACCESS STORAGE FULL    | EMP_DEPT     |  218K|  5556K|   534   (1)| 00:00:01 | Q1,01 | PCWP |            |
|  18 |         PX BLOCK ITERATOR                |              | 1657K|    75M|   839   (1)| 00:00:01 | Q1,02 | PCWC |            |
|* 19 |          TABLE ACCESS STORAGE FULL       | EMP          | 1657K|    75M|   839   (1)| 00:00:01 | Q1,02 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("ED"."EMPID"="E"."EMPID")
  15 - access("D"."DEPTNUM"=20)
  17 - storage("ED"."EMPDEPT"=20)
       filter("ED"."EMPDEPT"=20)
  19 - storage(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))
       filter(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
     60  recursive calls
    174  db block gets
  40753  consistent gets
  17710  physical reads
   2128  redo size
9437983  bytes sent via SQL*Net to client
 183850  bytes received via SQL*Net from client
  16668  SQL*Net roundtrips to/from client
      6  sorts (memory)
      0  sorts (disk)
 250000  rows processed

SQL>

In less than 15 seconds 250,000 rows were returned from a three table join of over 4 million rows; the Bloom filter made a dramatic difference in how this query was processed and provided exceptional performance given the volume of data queried.

Without the storage level execution of the Bloom filter, the query execution time increased by 2.33 seconds, a 16.3 percent increase. For longer execution times, this difference can be significant. It isn’t the Bloom filter that gives Exadata such power with joins, it’s the fact that Exadata can execute it not only at the database level but also at the storage level, something commodity hardware configurations can’t do.

Of course Bloom filters are not exclusive to Exadata, as the following run of the same script on a non-Exadata system shows (only the abbreviated output section is reproduced here):


SQL> select /*+ Bloom join 2 parallel 2 use_hash(emp emp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal	  --, count(*) emps
  2  from emp e join emp_dept ed on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  where ed.empdept = 20;

     EMPID EMPNM                                    DEPTNM                        EMPSAL                                                              
---------- ---------------------------------------- ------------------------- ----------                                                              
   1670281 Fnarm1670281                             PROCUREMENT                     5000                                                              
   1670289 Fnarm1670289                             PROCUREMENT                     6000                                                              
   1670297 Fnarm1670297                             PROCUREMENT                     7000                                                              
   1670305 Fnarm1670305                             PROCUREMENT                     1000                                                              
   1670313 Fnarm1670313                             PROCUREMENT                     2000                                                              
   1670321 Fnarm1670321                             PROCUREMENT                     3000                                                              
   1670329 Fnarm1670329                             PROCUREMENT                     4000                                                              
   1670337 Fnarm1670337                             PROCUREMENT                     5000                                                              
   1670345 Fnarm1670345                             PROCUREMENT                     6000                                                              
   1670353 Fnarm1670353                             PROCUREMENT                     7000                                                              
   1670361 Fnarm1670361                             PROCUREMENT                     1000                                                              
...
   1857369 Fnarm1857369                             PROCUREMENT                     4000                                                              
   1857377 Fnarm1857377                             PROCUREMENT                     5000                                                              
   1857385 Fnarm1857385                             PROCUREMENT                     6000                                                              

250000 rows selected.

Elapsed: 00:00:54.86

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2643012915                                                                                                                           
                                                                                                                                                      
----------------------------------------------------------------------------------------------------------------------------------                    
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |                    
----------------------------------------------------------------------------------------------------------------------------------                    
|   0 | SELECT STATEMENT                     |              |   273K|    26M|  1378   (1)| 00:00:01 |        |      |            |                    
|   1 |  PX COORDINATOR                      |              |       |       |            |          |        |      |            |                    
|   2 |   PX SEND QC (RANDOM)                | :TQ10002     |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |                    
|*  3 |    HASH JOIN                         |              |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   4 |     PX RECEIVE                       |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   5 |      PX SEND BROADCAST               | :TQ10001     |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | P->P | BROADCAST  |                    
|   6 |       NESTED LOOPS                   |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|   7 |        BUFFER SORT                   |              |       |       |            |          |  Q1,01 | PCWC |            |                    
|   8 |         PX RECEIVE                   |              |       |       |            |          |  Q1,01 | PCWP |            |                    
|   9 |          PX SEND BROADCAST           | :TQ10000     |       |       |            |          |        | S->P | BROADCAST  |                    
|  10 |           TABLE ACCESS BY INDEX ROWID| DEPT_INFO    |     1 |    27 |     1   (0)| 00:00:01 |        |      |            |                    
|* 11 |            INDEX UNIQUE SCAN         | DEPT_INFO_PK |     1 |       |     1   (0)| 00:00:01 |        |      |            |                    
|  12 |        PX BLOCK ITERATOR             |              |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWC |            |                    
|* 13 |         TABLE ACCESS FULL            | EMP_DEPT     |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|  14 |     PX BLOCK ITERATOR                |              |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWC |            |                    
|* 15 |      TABLE ACCESS FULL               | EMP          |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
----------------------------------------------------------------------------------------------------------------------------------                    
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   3 - access("ED"."EMPID"="E"."EMPID")                                                                                                               
  11 - access("D"."DEPTNUM"=20)                                                                                                                       
  13 - filter("ED"."EMPDEPT"=20)                                                                                                                      
  15 - filter(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))                                                                                               
                                                                                                                                                      
Note                                                                                                                                                  
-----                                                                                                                                                 
   - dynamic sampling used for this statement (level=2)                                                                                               


Statistics
----------------------------------------------------------                                                                                            
         33  recursive calls                                                                                                                          
        139  db block gets                                                                                                                            
      36224  consistent gets                                                                                                                          
      17657  physical reads                                                                                                                           
          0  redo size                                                                                                                                
    9526012  bytes sent via SQL*Net to client                                                                                                         
     183846  bytes received via SQL*Net from client                                                                                                   
      16668  SQL*Net roundtrips to/from client                                                                                                        
          5  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
     250000  rows processed                                                                                                                           

SQL>

The only differences between the two plans involve the storage component of Exadata. Notice also the vast difference in elapsed times between Exadata and non-Exadata systems; Exadata ran the query returning 250,000 rows in just over 14 seconds and the non-Exadata system returned the same volume of data in just under 55 seconds. Both times are substantial improvements over not using a Bloom filter so any hardware running 11.2.0.2 or later will benefit from Bloom filters.

Of course there is a ‘gotcha’ with Bloom filters, and Oracle recognizes this as a bug in releases older than 12.1.0.2. Hash left outer joins, hash right outer joins and hash full outer joins do not use a Bloom Filter and for those interested the unpublished bug number is 17716301. MOS document 1919508.1 reports details on this behavior including an example. As reported in that document the patch can be backported to earlier releases when an SR is filed requesting it.

Bloom filters are a welcome addition to both Exadata and non-Exadata installations as they can improve query performance with parallel queries, as evidenced here. Now that you know how to spot when they are used you may find they are utilized more often than you expected.

It’s a bloomin’ miracle.

August 17, 2014

“You’re A Real ACE”

Filed under: General — dfitzjarrell @ 13:40

Recently I was declared, for want of a better word, an Oracle ACE. I was nominated by someone I respect and admire; just the nomination itself was enough, really, to make my day. When I received notification that I had been selected I was truly surprised. I immediately thanked my nominator, who told me:


                     "Still, it is just a nomination form, and you did all the work that
                     made it so impressive. Congratulations!"

Honestly, it wasn’t, and isn’t, work. It’s a pleasure to share my knowledge with others, which is why I blog, write for http://www.databasejournal.com twice a month and wrote “Oracle Exadata Survival Guide”.

Being an Oracle ACE is an honor, but it’s also a responsibility. What got me here was writing and participating in discussion groups, and that won’t change. Knowledge is to be shared, not hoarded. What you know does no one else any good if you don’t share that knowledge and experience. If Da Vinci had kept his notes to himself, if Newton hadn’t published his Laws of Motion, if Copernicus has kept quiet our world may have been quite different. It’s because these people had the foresight to look deeper into our world and then tell us what they found that puts us where we are today. It’s only right that we, as beneficiaries of the knowledge others before us shared, share our knowledge no matter how unimportant it may seem. Someone, somewhere, will find it useful and will be grateful that we did.

It may be that our knowledge is gained equally by insight and by mistakes. I know that I’ve made mistakes I’ve learned from, and I know others have done the same. Norton Juster, an architect who wrote a wonderful book named “The Phantom Tollbooth”, said it best:


                     "You may not see it now," said the Princess of Pure Reason, looking
                     knowingly at Milo's puzzled face, "but whatever we learn has a
                     purpose and whatever we do affects everything and everyone else, if
                     even in the tiniest way. Why, when a housefly flaps his wings, a
                     breeze goes round the world; when a speck of dust falls to the
                     ground, the entire planet weighs a little more; and when you stamp
                     your foot, the earth moves slightly off its course. Whenever you
                     laugh, gladness spreads like the ripples in a pond; and whenever
                     you're sad, no one anywhere can be really happy. And it's much the
                     same thing with knowledge, for whenever you learn something new,
                     the whole world becomes that much richer."

                     "And remember, also," added the Princess of Sweet Rhyme, "that many
                     places you would like to see are just off the map and many things 
                     you want to know are just out of sight or a little beyond your 
                     reach. But someday you'll reach them all, for what you learn today, 
                     for no reason at all, will help you discover all the wonderful 
                     secrets of tomorrow."

I think this sums up being an Oracle ACE better than any words I could put on a page. I keep this quote with me, to remind me that mistakes aren’t the end of the world if you learn from them and move on. And what I learn from those mistakes may help others so they don’t make those same mistakes.

I love sharing what I know; I’ve been doing it for over 16 years now, in various forums, some that are no longer as popular as they once were. I never realized how far my commentary reached until I became an Oracle ACE; I have received congratulations and comments that I never expected, mostly of the ‘it’s about time’ sentiment. Simply because you don’t see the gratitude of others who benefit from your knowledge doesn’t mean that gratitude doesn’t exist. I see now that it does, and I am humbled by it.

“Oracle ACE”. To be among those who have been graced by this honor gives me pause. It’s a good pause, awesome if I may be allowed to say it. And it’s an honor that will only make me work harder so that I can live up to such high standards.

Now, back to our regularly scheduled programming.

July 22, 2014

That’s … Huge!

Filed under: Exadata,General,Performance — dfitzjarrell @ 14:27

Recently I’ve noticed the occasional thread in Oracle newsgroups and lists asking about hugepages support in Linux, including ‘best practices’ for hugepages configuration. This information is out on that ‘world-wide web’ in various places; I’d rather put a lot of that information in this post to provide an easier way to get to it. I’ll cover what hugepages are, what they do, what they can’t do and how best to allocate them for your particular installation. Let’s get started.

“Normal” memory pages in Linux are 4 KB in size and are allocated as needed where the memory map will allow so they are likely not contiguous. Hugepages, in comparison, are 2 MB pages locked in memory and are allocated in a contiguous ‘chunk'; these are allocated at boot time using a parameter in the /etc/sysctl.conf file named vm.nr_hugepages for RedHat Linux kernel 2.6 and a parameter named vm.hugetlb_pool for RedHat Linux kernel 2.4. You must remember that each page is 2 MB is size, as this affects how many hugepages you need to allocate to cover all of the SGAs of running Oracle databases. Set this too high and your system might not boot; set this too low and you won’t get the benefit of using hugepages in one or more Oracle instances. [When use_large_pages is set to ONLY and insufficient hugepages are available the database won't start; changing use_large_pages to TRUE allows the database to start but it won't be using hugepages, which results in more paging activity, lowering performance.] Since hugepages are contiguous it stands to reason that performance will improve since Oracle won’t need to access memory segments across the entire range of installed memory. Hugepages also reduce the size of ‘page tables’ by reducing the overhead for page tables for each connection to the database. Page tables are part of the Linux memory management, created and used to map a virtual address to a physical memory location. For SGAs 100 MB and greater using hugepages is probably a good idea, and that idea gets better as the SGA size increases. Kevin Closson reported back in 2009 on the page table overhead for an Oracle database with an 8000 MB SGA running 500 simultaneous connections. The page table overhead reported was startling — almost 6.8 GB of memory used for page tables when hugepages were not configured. He ran the same test again, on the same Oracle database, only that test was run after hugepages were allocated and used by the Oracle database. The page table overhead was reduced to slightly less than 23 MB. That is a dramatic change in memory usage, one that can definitely improve performance.

Hugepages are available for IPC (Inter-Process Communication) shared memory; this is the ‘standard’ shared memory model (starting with UNIX System V) allowing multiple processes to access the same shared memory segment. There is also another form of shared memory segment, the memory-mapped file, and currently such segments are not supported by hugepages. Oracle, on Linux, gives you a choice of using hugepages or memory-mapped files and you implement that choice by selecting to use (or not use) Automatic Memory Management (AMM). Choosing AMM disallows hugepages since AMM uses memory-mapped files located in /dev/shm; you cannot use AMM and hugepages together. For databases with SGAs up to 100 MB using AMM is probably a good idea; a script to determine how many hugepages to allocate won’t consider SGAs of less than 100 MB. For Oracle databases with larger SGAs hugepages is, in my opinion, the choice to make as you can significantly reduce shared memory overhead where page tables are concerned.

Calculating how many hugepages to allocate has been a difficult task in the past; Oracle supplies a script that can calculate that value for you provided you have all of the databases that you want to use hugepages running; the script from Oracle won’t consider databases you have created that aren’t up and available as it uses the ipcs -m command to return the allocated shared memory segments. The script is available from My Oracle Support in Doc ID 401749.1. It generates a list of IPC shared memory segment sizes owned by Oracle and computes the total number of hugepages required. It outputs the result to be used in setting the vm.nr_hugepages parameter in /etc/sysctl.conf. Once that setting is modified the system has to be rebooted before the changes are implemented. Sample output from that script is shown below:


This script is provided by Doc ID 401749.1 from My Oracle Support
(http://support.oracle.com) where it is intended to compute values for
the recommended HugePages/HugeTLB configuration for the current shared
memory segments. Before proceeding with the execution please note following:
 * For ASM instance, it needs to configure ASMM instead of AMM.
 * The 'pga_aggregate_target' is outside the SGA and
   you should accommodate this while calculating SGA size.
 * In case you changes the DB SGA size,
   as the new SGA will not fit in the previous HugePages configuration,
   it had better disable the whole HugePages,
   start the DB with new SGA size and run the script again.
And make sure that:
 * Oracle Database instance(s) are up and running
 * Oracle Database 11g Automatic Memory Management (AMM) is not setup
   (See Doc ID 749851.1)
 * The shared memory segments can be listed by command:
     # ipcs -m


Press Enter to proceed...

Recommended setting: vm.nr_hugepages = 25240

For Linux RedHat kernel 2.6 this is the number of hugepages, not the overall size of the memory allocated for those hugepages. For RedHat kernel 2.4 a different parameter, vm.hugetlb_pool, is set to the actual size that pool needs to be. Fortunately the script knows which kernel is running and reports the correct parameter and value to set.

After you have hugepages allocated you need to tell Oracle to use them exclusively. An init parameter, use_large_pages, should be set to ONLY to keep Oracle from allocating conventionally managed memory segments for the SGA. If your hugepages calculations are correct then all of the databases on your database server will start without error. Allocate too few hugepages and databases with use_large_pages set to ONLY won’t start after all of the existing hugepages are used. Setting use_large_pages to TRUE will allow a database to start without using hugepages but that’s not what you want to get better performance from Oracle.

Earlier I mentioned that allocating too many hugepages can be worse than allocating too few hugepages. Since each hugepage consumes 2 MB of space multiplying the vm.nr_hugepages setting by 2097152 will show how much memory is occupied by hugepages. It is possible to try to allocate all of the system memory as hugepages resulting in a system that will not boot in multi-user mode. The kernel isn’t configured to use hugepages; the more hugepages that are allocated the less conventionally managed memory is available. Take it all away and there is no place for the operating system to start necessary processes after the kernel is loaded at boot time. Using single-user mode will allow the system adminstrator to modify the offending entry and change it to a more reasonable value so the system can be rebooted. Of course you don’t want to end up in that situation; manually calculating such a value can result in errors so use the Oracle-supplied script to calculate the hugepages value you actually need.

Exadata systems run best when hugepages are allocated and used exclusively. The exachk script checks for such settings in each running database on the system and reports those that do not have use_large_pages set to ONLY. It also checks if the currently configured hugepages setting meets or exceeds the estimated number of hugepages required by all running instances on the given database (compute) node. If this check fails it most likely will fail for all available nodes on the system. Failing this check isn’t the end of the world but it does mean that the failing databases aren’t using memory as efficiently as they could. For each node run the script, make the reported changes then reboot. Before starting any of the databases ensure that all have use_large_pages set to ONLY; start the databases across the cluster using srvctl. The next run ef exachk should report that the hugepages check passed.

RAC databases need to be configured the same way Exadata databases are configured (since Exadata databases are RAC databases). Modify kernel parameters on every database server in the cluster, then reboot. Modify the spfile for each database in the cluster then use srvctl to start those databases cluster-wide. Presuming the hugepages settings are correct across the cluster there should be no error when starting the databases.

Using hugepages for Oracle databases on Linux is, to me, the only way to go if you want to increase database performance. Of course, if you have loads memory to spare you may not see the need. Most systems I’ve worked on don’t have that luxury so being as ‘frugal’ with memory as you can is good. Making the most of available resources is key and hugepages on Linux can definitely help with respect to Oracle.

Size does matter.

July 15, 2014

It Pays To Be Smart

Filed under: Exadata,General — dfitzjarrell @ 09:40

Exadata is a powerful system, able to provide exceptional performance. Much of this peformance is due to Smart Scans, Exadata’s mechanism for shifting the workload to the storage cells and having them filter and reduce the amount of data the database servers must process. Not every Smart Scan that starts ends up completing, though. Oracle may decide that a different path, one that doesn’t include a Smart Scan, may be more efficient. Certain conditions must be met and there is a metric that records Smart Scan starts and another that records how many of those starts actually execute Smart Scans. Let’s look at why Oracle may decide a Smart Scan isn’t the ‘smartest’ route to the data and which metrics you can use to see how many Smart Scans end up using another execution path.

Three conditions can trigger Oracle foregoing a smart scan: setting optimizer_mode to either FIRST_ROWS or FIRST_ROWS_n or executing a query using the ‘where rownum’ predicate. Oracle originally opts to do the Smart Scan then can perform a few block I/O operations to see if that pathway can satisfy the query more efficiently. When that happens a metric, cell num fast response sessions, is incremented. If the block I/O is indeed faster Oracle foregoes the Smart Scan and completes the operation with conventional block I/O. If the block I/O proves that correct results won’t be returned to the calling session in a timely fashion Oracle resumes the Smart Scan and continues processing. Simply because this counter is incremented does not mean, by itself, that Oracle didn’t execute a Smart Scan. You’ll need another metric for that.

When Oracle decides to return to Smart Scan execution another metric is incremented, cell num fast response sessions continuing to smart scan. The same conditions that triggered the previous metric also apply here. Thus when Oracle decides to forego a Smart Scan only the previous metric, cell num fast response sessions, is incremented.

It’s fairly easy to see how many Smart Scans followed through and how many didn’t with the following query:


select a.value smart_scan_started, b.value smart_scan_cont, a.value - b.value no_smart_scan
from v$sysstat a, v$sysstat b
where a.statistic#=262
and b.statistic#=263
/

You may find that none of the Smart Scans that were started actually went on to finish:


SMART_SCAN_STARTED SMART_SCAN_CONT NO_SMART_SCAN
------------------ --------------- -------------
              1222               0          1222

Or you may find that all of them finished:


SMART_SCAN_STARTED SMART_SCAN_CONT NO_SMART_SCAN
------------------ --------------- -------------
               407             407             0

You may also find that some did, and some didn’t, continue:


SMART_SCAN_STARTED SMART_SCAN_CONT NO_SMART_SCAN
------------------ --------------- -------------
                53              27            26

What is important to note is that simply because Oracle started a Smart Scan then chose to not continue on with it doesn’t indicate there is any problem with your Exadata system. On the contrary it indicates that Exadata is doing exactly what it’s supposed to do, efficiently return query results to the calling session. That may be through a Smart Scan, it may be through conventional I/O. Think of it as adjusting your route to a destination to avoid heavy traffic. If Oracle can get the correct results through a few conventional I/O operations it will do so; if a Smart Scan is necessary Oracle can again ‘change gears’ and resume it to ensure correct data is sent to the calling session. It all comes down to using the right tool for the job.

Smart Scans are great when you need them; it’s also good to know that Oracle, on Exadata, can decide when to use them and when it’s best not to. It’s also good to realize that when that happens it’s not because of a problem with Exadata. It’s simply Oracle choosing the right method to get the correct data. Sometimes the shorter path is the better path.

Robert Frost may have said it best:


 
"Two roads diverged in a wood, and I—  
I took the one less traveled by,  
And that has made all the difference." 

Oracle, on Exadata, may take the ‘road less traveled by’ when it comes to choosing a Smart Scan. That’s normally nothing to worry about as Oracle is selecting the most efficient path to return the data.

Anyone for a stroll?

July 1, 2014

What A Performance!

Filed under: General,Performance,stats — dfitzjarrell @ 10:09

Performance is one of the big watchwords in tht IT industry; nowhere is it more often used than in the database arena, and it’s usually not a good sign. End users know nothing, really, of the inner workings of the database yet they are the first to point the ‘finger of blame’ at the database when the processing speed slows or queries take longer than they would like to return data. [For end users performance is time, and the longer a process takes the more dismal performance seems to be.] Yes, there ARE times when the database is at fault — ‘missing’ indexes or stale statistics can cause execution plans to change, as can inserts and updates to heavily transacted tables (which many times goes back to stale statistics since they now no longer accurately reflect the data and its distribution in the table). Bugs in the database software can also cause issues by causing the gathered statistics to miscalculate values or generate ‘useless’ frequency histograms, such as on single-valued not null columns. Such histograms can direct the optimizer to choose a path it wouldn’t normally choose when the statistics and histograms contain useful data.

Here is where the problem can get complicated: it may not be the database at fault when performance is down. Yes, it’s the database server you’re geting data from. Yes, things were running just fine until today. By ‘logical’ deduction it’s the database at fault, right? Not necessarily. The database also has to deal with the operating system, the storage system, the available memory and the network to get its work done. Problems in any of those areas are likely to adversely affect database performance. So it’s not necesarily the database causing the problem. How do you know where the problem originates? Oracle does provide some tools to help in assessing where a problem is likely to be, but you may also need to involve System Administrators, Network Administrators and/or Storage Administrators to fully investigate some issues. Let’s take a look at the Oracle supplied tools, along with some operating system utilities that can be used to diagnose performance issues.

At the first sign of trouble many look to the DBA to investigate the database for bottlenecks or other performance-robbing possibilities that surely must exist. Oracle provides at least two tools in releases 10.1 and later to investigate the database for performance problems — the AWR report, a general overall ‘picture’ of the database ‘health’ that can give direction to such a quest and the ASH report, a more detailed ‘picture’ for a given window of activity that provides more detailed wait information. These utilities should be used at the first report of performance issues to determine if the problem is database-related or caused by another area, such as the operating system or network. Starting with the AWR (Automatic Workload Repository) report the two sections to begin an investigation, in my estimation, are Wait Events Statistics and SQL Statistics. The first provides a system-wide view of the waits encountered and the times those waits consumed. Background Wait Events and the Wait Event Histogram sections can provide some insight into what was occurring during the report window; waits to watch are I/O related waits (reads, writes) which can signal issues with a query plan or issues with the storage (issues not addressable by the DBA). The SQL Statistics list SQL statements in order of several areas, including overall elapsed time, consumed CPU time, I/O wait time and physical reads. The SQL statistics can usually point to ‘bad’ plans, making performance problems a database issue because such plans are the result of bad or stale statistics. In 11.2 and later releases dynamic sampling can mitigate missing statistics by automatically sampling the data prior to generating a query execution plan. We won’t get into the details of using these reports to effect database tuning procedures as that’s a topic for another article. The intent here is to provide a means to determine IF the problem is a database issue or if it’s caused by outside ‘forces’. For releases earlier than 10.1 Oracle provides Statspack, the pre-cursor to AWR and the next step after the original database snapshot utilities utlbstat and utlestst. This utiility can provide a very good snapshot of database activity, although not quite as robust as the AWR report. It is still available in 11.2 if you would like to install it. AWR and ASH are utilities which are part of the Diagnostic and Tuning Pack for OEM, and thus need to be licensed if they are to be used. Statspack doesn’t require such a license.

The “Top 5 Timed Foreground Events” section gives an overview of the wait events which consumed the most time during the report window:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU   0 82.42  
log file sync 8 0 11 30.72 Commit
Disk file operations I/O 1 0 1 0.33 User I/O
os thread startup 0 0   0.00 Concurrency
log file single write 0 0   0.00 System I/O

Host CPU (CPUs: 4 Cores: 2 Sockets: 1)

This provides a quick look at the ‘worst offenders’ in the wait event category (although they may not be performance-robbing events). Start here then work down to the Operating System Statistics section:

Operating System Statistics – Detail

Snap Time Load %busy %user %sys %idle %iowait
20-Jun 07:04:42 0.00          
20-Jun 08:00:45 0.00 7.12 4.65 2.48 92.88 0.00

The Wait Event Histogram Detail can be very informative:

Wait Event Histogram Detail (64 msec to 2 sec)

  • Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
  • Units for % of Total Waits: ms is milliseconds s is 1024 milliseconds (approximately 1 second)
  • % of Total Waits: total waits for all wait classes, including Idle
  • % of Total Waits: value of .0 indicates value was <.05%; value of null is truly 0
  • Ordered by Event (only non-idle events are displayed)
    % of Total Waits
Event Waits 64ms to 2s <32ms <64ms <1/8s <1/4s <1/2s <1s <2s >=2s
ADR block file read 3 81.3 12.5 6.3          
Disk file operations I/O 1 98.2 1.8            
control file parallel write 15 98.7 .8 .3 .1        
control file sequential read 139 94.0 4.7 .8 .3 .1   .0  
db file parallel read 2   50.0   50.0        
db file parallel write 185 91.4 5.7 1.8 .8 .1 .2    
db file scattered read 63 93.5 5.0 1.2 .3        
db file sequential read 142 94.2 4.5 1.2 .1 .0      
direct path read 1     100.0          
enq: CR – block range reuse ckpt 16 82.2 15.6   1.1   1.1    
log file parallel write 82 92.5 4.6 2.6 .3 .1      
log file switch completion 2     50.0 50.0        
log file sync 1 90.0 10.0            

Wait Event Histogram Detail (4 sec to 2 min)

No data exists for this section of the report.

Wait Event Histogram Detail (4 min to 1 hr)

No data exists for this section of the report.

Service Statistics

  • ordered by DB Time
Service Name DB Time (s) DB CPU (s) Physical Reads (K) Logical Reads (K)
SYS$USERS 0 0 0 2
SYS$BACKGROUND 0 0 8 436
smedley 0 0 0 0
smedleyXDB 0 0 0 0

Such histograms can show where the preponderance of waits lie making it easier to determine if they are problematic.

Next on the list is the SQL Statistics report. There are several sections, the one shown is ordered by elapsed time:

SQL ordered by Elapsed Time

  • Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
  • % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
  • %Total – Elapsed Time as a percentage of Total DB time
  • %CPU – CPU Time as a percentage of Elapsed Time
  • %IO – User I/O Time as a percentage of Elapsed Time
  • Captured SQL account for 1.1E+04% of Total DB Time (s): 0
  • Captured PL/SQL account for 2.3E+03% of Total DB Time (s): 0
Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text
5.88 1 5.88 2070.31 15.66 79.60 acc988uzvjmmt   delete from WRH$_SYSMETRIC_HIS…
4.60 22 0.21 1618.80 27.16 64.39 70vs1d7ywk5m0 MMON_SLAVE begin dbms_stats.copy_table_st…
2.49 44 0.06 876.07 10.66 83.39 2mp99nzd9u1qp   delete from histgrm$ where obj…
2.14 769 0.00 753.49 4.38 96.01 db78fxqxwxt7r   select /*+ rule */ bucket, end…
1.57 212 0.01 551.64 21.91 74.49 43ruajmwv9fus   insert into histgrm$(obj#, int…
1.42 1 1.42 501.54 7.67 86.00 0sfsrf0qzvsjv   delete from WRH$_SYSMETRIC_SUM…
1.19 22 0.05 418.37 2.63 93.41 d89c1mh5pvbkz MMON_SLAVE SELECT /*+ rule */ U.NAME, OT….
0.83 1 0.83 291.84 13.18 77.73 2whm2vvjb98k7   delete from WRH$_SQL_PLAN tab …
0.80 22 0.04 281.90 25.34 67.82 8sd37ztuwkyyc MMON_SLAVE begin dbms_swrf_internal.massa…
0.66 1 0.66 231.35 2.38 98.81 28bm5y53nujvz   delete from WRH$_RESOURCE_LIMI…
0.54 1 0.54 188.64 0.00 91.89 4y3yvmbav7uwb   delete /*+ UNNEST(@SEL$2) UNNE…
0.53 22 0.02 185.34 0.00 96.67 63kf3an2j0pkc   insert into sys.wri$_optstat_h…
0.51 1 0.51 179.21 9.20 75.31 96w2cbx7ntmtv   delete from WRH$_BG_EVENT_SUMM…
0.50 1 0.50 175.27 6.27 94.54 31sdu97k3baq1   delete from WRH$_SHARED_POOL_A…
0.49 22 0.02 171.58 12.81 66.96 480utgnq0t6mf MMON_SLAVE begin dbms_stats.unlock_partit…
0.44 1 0.44 156.14 17.60 80.45 1h7zt6jks80pa   delete from WRH$_ENQUEUE_STAT …
0.42 1 0.42 146.27 15.03 94.38 bzscyq07w79ab   delete /*+ dynamic_sampling(4)…
0.38 2,361 0.00 133.20 24.75 80.65 96g93hntrzjtr   select /*+ rule */ bucket_cnt,…
0.35 1 0.35 123.71 4.44 94.32 9v9n97qj8z1dg   delete /*+ dynamic_sampling(4)…
0.35 22 0.02 122.04 27.02 66.45 2rn08n06qr261   update histgrm$ set obj# = dec…
0.34 1 0.34 121.06 0.00 96.73 6ajkhukk78nsr   begin prvt_hdm.auto_execute( :…
0.31 1 0.31 107.67 0.00 97.51 48ffykkmk7sgc   delete /*+ dynamic_sampling(4)…
0.28 1 0.28 99.65 0.00 79.94 350myuyx0t1d6   insert into wrh$_tablespace_st…
0.27 375 0.00 94.36 17.47 87.22 bvn085xhxfn9f   insert into sys.wri$_optstat_h…
0.26 161 0.00 90.72 6.06 90.14 74anujtt8zw4h   select o.owner#, o.name, o.nam…
0.25 1 0.25 86.79 6.33 95.67 47x8f8z2hx1a2   delete from WRH$_MUTEX_SLEEP t…
0.24 1 0.24 85.22 6.45 92.64 az8cj7dfnpp45   delete from WRH$_MEM_DYNAMIC_C…
0.23 1 0.23 81.60 6.73 41.10 3mqvkt9as1phq   merge /*+ dynamic_sampling(mm…
0.22 1 0.22 76.17 0.00 87.96 du2bqfpfj9au2   delete from WRH$_LIBRARYCACHE …
0.21 1 0.21 73.67 14.92 93.36 b7ba0zt64wtc7   delete from WRH$_RSRC_CONSUMER…
0.20 44 0.00 71.59 7.68 71.51 as3uq6ggb3gx6   delete from hist_head$ where o…
0.20 89 0.00 71.52 30.73 92.39 3ktacv9r56b51   select owner#, name, namespace…
0.20 10 0.02 71.14 0.00 99.04 c6awqs517jpj0   select /*+ index(idl_char$ i_i…
0.20 1,098 0.00 70.92 23.24 58.47 3c1kubcdjnppq   update sys.col_usage$ set equa…
0.19 22 0.01 68.12 56.47 8.79 9dy457uf4uxzd MMON_SLAVE SELECT /*+ all_rows */ ‘"’||UI…
0.19 1 0.19 67.57 0.00 0.00 f318xdxdn0pdc   insert into wrh$_log (snap_id,…
0.19 113 0.00 67.01 0.00 84.34 59vjj34vugaav   delete from obj$ where obj# = …
0.18 10 0.02 64.60 0.00 96.35 cvn54b7yz0s8u   select /*+ index(idl_ub1$ i_id…
0.18 375 0.00 63.34 43.38 53.87 95mpkn5xz9001   insert into hist_head$(obj#, i…
0.18 22 0.01 62.59 0.00 71.91 dr277b6yv83uy MMON_SLAVE SELECT COLUMN_NAME FROM DBA_PA…
0.18 671 0.00 62.17 17.68 0.00 5ms6rbzdnq16t   select job, nvl2(last_date, 1,…
0.18 1 0.18 61.81 8.89 96.19 7f8bfqbjxdssr   delete from WRH$_LOG tab where…
0.17 1 0.17 61.43 17.89 95.77 7g732rx16j8jc   insert into WRH$_SERVICE_STAT …
0.16 48 0.00 56.37 48.74 48.27 4h8danbxj5nz6   begin dbms_utility.validate(:b…
0.16 1 0.16 54.88 80.11 0.00 dayq182sk41ks   insert into wrh$_memory_target…
0.15 252 0.00 54.58 30.21 39.65 7ng34ruy5awxq   select i.obj#, i.ts#, i.file#,…
0.15 1 0.15 54.51 80.65 0.00 bm2pwrpcr8ru6   select sga_size s, sga_size_fa…
0.15 1 0.15 51.42 106.87 0.00 bunssq950snhf   insert into wrh$_sga_target_ad…
0.14 1 0.14 49.59 0.00 93.41 084z6qq3wwkdc   delete from WRH$_SQL_WORKAREA_…
0.14 1 0.14 48.68 0.00 90.79 17g3cqghntwb6   delete from WRH$_PGA_TARGET_AD…
0.13 22 0.01 44.60 0.00 35.38 7kc5p5bw4uz55   update tabpart$ set dataobj# =…
0.12 1 0.12 41.02 26.79 91.44 83hpfc7nhjpvg   delete from WRH$_IOSTAT_FILETY…
0.12 22 0.01 40.72 121.45 16.37 267s83vd49cfa MMON_SLAVE SELECT PARTITION_NAME FROM DBA…
0.11 66 0.00 40.29 13.64 87.09 9fg7fjbwmv7v8   insert into sys.wri$_optstat_t…
0.11 1 0.11 39.34 13.97 65.25 9xcfxn794wp5h   select 1 from WRH$_LATCH_CHILD…
0.11 1 0.11 37.29 0.00 98.77 3hy3nssu9shxj   select /*+ FIRST_ROWS(1) PARAL…
0.10 6 0.02 36.11 0.00 63.61 9wncfacx0nj9h   insert into smon_scn_time (thr…
0.10 2 0.05 35.13 0.00 1.98 934ur8r7tqbjx   SELECT DBID FROM V$DATABASE
0.10 1 0.10 33.93 0.00 74.29 1v44r7vam2wbt   delete from WRH$_IOSTAT_FUNCTI…
0.10 1 0.10 33.68 0.00 96.85 cs78htw2dhb0x   delete from WRH$_IC_DEVICE_STA…
0.09 66 0.00 33.17 0.00 92.58 gc7b0drtzbyc6   select max(intcol#) from hist_…
0.09 89 0.00 33.07 0.00 92.64 8swypbbr0m372   select order#, columns, types …
0.09 1 0.09 32.97 16.66 86.94 8h77nkt8bnukh   delete from WRH$_IOSTAT_DETAIL…
0.09 1 0.09 31.42 17.49 83.52 7gfhbhxrxx8wf   delete /*+ dynamic_sampling(4)…

This section can help determine which queries may be creating performance issues; using the sql_id and text you can retrive the plans to see if a rewrite may improve the situation. It can also show where statistics may not be current, although it would also be a good idea to run a 10053 trace against each suspect query to report what statistics the optimizer is using.

The ASH, or Active Session History, report digs deeper into important wait events and timings, providing a better picture of the wait activity during the requested window; a section of this report is shown below:

SQL ID Planhash Sampled # of Executions % Activity Event % Event Top Row Source % RwSrc SQL Text
572fbaj0fdw2b 1648993636 1 7.69 CPU + Wait for CPU 5.77 SELECT STATEMENT 5.77 select output from table(dbms_…
572fbaj0fdw2b 1648993636 1 7.69 db file sequential read 1.92 ** Row Source Not Available ** 1.92
6ajkhukk78nsr   1 3.85 CPU + Wait for CPU 3.85 ** Row Source Not Available ** 3.85 begin prvt_hdm.auto_execute( :…
1bx8mgs8by25x 2223429142 1 1.92 CPU + Wait for CPU 1.92 HASH JOIN 1.92 select coalesce( p1s.parameter…
2asgk01xtb6p0 3702291220 1 1.92 control file sequential read 1.92 FIXED TABLE – FULL 1.92 select d.dbid dbid , d.name db…
317v5hnvvd49h 1656119837 1 1.92 CPU + Wait for CPU 1.92 UPDATE 1.92 UPDATE wrh$_seg_stat_obj ob S…

Again, use this report to initially determine if the problem is database-related; if that is the case then this report should provide a wealth of data aimed at directing your database/query tuning efforts. If there is no indication of database problems then other areas are likely at fault. Waits to examine from both reports include SQL*Net more data to client, Disk file operations I/O and os thread startup. The first wait in that list an be an indicator of network problems, the second storage issues (array problems, pending disk failures) or CPU issues (excessive time for O/S thread startups can indicate over-taxed CPUs which may be caused by other, non-database processes).

After examining reports generated by AWR, ASH or Statspack you find the database is not the source of the performance issue you may be able to discern which area outside of the database is the likely suspect. SQL*Net more data to client waits, if the time per wait is relatively high, can indicate problems in the network layer. Utilities such as netstat can show configured network interfaces and network traffic data on those interfaces. Example output is shown below:

$ netstat -ginw
Kernel Interface table
Iface       MTU   Met      RX-OK RX-ERR RX-DRP RX-OVR      TX-OK TX-ERR TX-DRP TX-OVR Flg
bondeth0     1500   0 5084620269      0      0      0 3923352780      0      0      0 BMmRU
bondeth0:1   1500   0      - no statistics available -                                BMmRU
bondeth0:3   1500   0      - no statistics available -                                BMmRU
bondeth0:4   1500   0      - no statistics available -                                BMmRU
eth0         1500   0   42182099      0      0      0   21397203      0      0      0 BMRU
eth4         1500   0     305021      0      0      0     281488      0      0      0 BMsRU
eth5         1500   0 5084315248      0      0      0 3923071292      0      0      0 BMsRU
ib0         64000   0 2419856775      0      1      0 2178061942      3    271      0 BMsRU
ib1         64000   0   21305668      0      0      0          0      0      0      0 BMsRU
lo          16436   0 4495752311      0      0      0 4495752311      0      0      0 LRU
$

The utility reports transmit and receive statistics, invluding dropped packets and packet errors. Running netstat -ginw may show which local interface is having issues; send such output to the Network Administrator so he or she can interpret the results and take appropriate action.

What if the problem isn’t the database, and it isn’t the network? The next area that can adversely affect performance is the storage tier. Statspack and ASH/AWR reports also provide disk read and write wait information. Again, if the wait time for an individual read or write wait is high (for example, a single wait over 300 milliseconds) it may be the storage that’s at fault. As disks age their performance can deteriorate as the alignment the platter had when new is compromised by wear and tear around the spindle. Firmware can also be an issue, especially when disks are upgraded and the firmware is not. This is not an area for the DBA to assess, but it is one he or she can report to the Storage Administrator for further investigation.

Network-related issues may also be a problem with either the local network interface card (NIC) or the cabling. This is, again, a task for the Network Administrator to assess and address, but you may be able to ‘point’ him or her in the right direction with output from the ping utility. Provided you have two or more systems that can ‘see’ the same endpoint a ping comparison may show network problems local to the database server. For example if server A can ping server C with response times less than 20 milliseconds and server B, using the same subnet, reports ping times in the 100 millisecond or greater range this could indicate either cabling issues or a NIC that may be misconfigured or failing. This isn’t likely to be valid if the two servers are on different subnets which is why it’s necessary to get the Network Administrator involved as soon as you suspect a network issue. He or she can provide needed configuration data and can better analyze and diagnose such problems.

And yet another area where performance can suffer is at the operating system level. A system starved for CPU, memory or both because of a temporarily high load can severly slow down database processes along with many other non-database-related services. The System Administrator may need to be called in on the issue to report what processes were running during a particular period when response time was slow. Remember that the end users equate performance with response time; a large, resource-intensive report that runs once a month can seriously slow down other database activity during that same period. System backups can also consume resources normally available for user and database processes. Knowing the period of poor performance can greatly assist the System Administrator investigating the issue. If it’s a defined window, for example from 8 PM to Midnight on Tuesdays, the System Administrator can track down the process causing the slowdown by monitoring the system activity during that period. It may be a process that can’t be stopped or rescheduled, but at least the end users will know what is causing their normally fast transactions to run much slower than usual. As mentioned before when you know it’s not a database issue get the Administrators of the suspected systems involved, because the earlier you get more eyes on the problem the faster the cause can be found and a solution put in place.

The DBA is the ‘go-to’ person when the database is to blame for performance issues (long run times for specific queries, as an example). But, when the DBA finds nothing awry in the database, other parties may need to get involved. Again, just because it seems to be the database at fault doesn’t mean that the database IS at fault; it could be other, related areas affecting performance. Knowing what database reports can be of use is the first step in being able to assess a situation so other areas, such as network and storage, can be brought in if need be. Knowing who to involve is just as important as knowing that a performance issue exists. Communication is key in these situations; a network or storage administrator may ask for additional information regarding the problem; this is where the AWR and/or Statspack reports may be helpful. Learn how to generate these reports now (if you don’t already know) so that when you need them you can concentrate on the issues at hand rather than how to generate a report.

Performance is a term that can have a broad interpretation, depending upon who is doing the interpreting. End users may more often report performance issues when a report that ran fine yesterday now takes much longer to complete because all they may see is their process running slowly. Knowing when the problem is the database and when it isn’t is key to getting such issues diagnosed and resolved. Hopefully this article has given you some tools to help in that regard; use them wisely and communicate well so that you can be part of the solution by knowing who else to involve when it isn’t the database that is at fault.

Now, on with the show!

June 6, 2014

“A Ponderous Chain”

Filed under: General — dfitzjarrell @ 11:59

Chained and migrated rows in an Oracle database can seriously impact I/O performance, especially with Exadata. In a ‘normal’ Oracle database [read that as "non-Exadata, non-ASM" database] chained rows, in small numbers where that usually means 1% or less of the total rows in a table, are generally a nuisance. The performance hit they generate in such a situation is small and may not even be noticed by the end users. Add more chained/migrated rows to that mix and the I/O work increase starts becoming noticeable. How do these rows get created, what is the problem they cause and how can you mitigate the issue? Let’s look into what chained/migrated rows are, what can cause them and how to deal with them.

Chained and migrated rows are a unique lot — they exist in two or more data blocks and are linked by the “head” rowid, usually the rowid of the original row. The chained piece or pieces come into being when a row is updated and the data block containing it can’t store the entire row any more. A migrated row is one that is moved entirely to a new block with a ‘pointer’ left behind associated with the original rowid. Let’s try to draw a rather simplified picture to illustrate that concept:

Original row -- Data fits into the available space in the block

       ------------------------------------------------------
000001:|Bonsai Bob|14734|23-NOV-2013|343.27|Duluth|Dept 300 |
       ------------------------------------------------------

Update is made -- Data now takes more space than block has available

       ------------------------------------------------------
000001:|Samurai Sam The Singing Elf|14734|23-NOV-2013|343.27|Duluth|Dept 300 |
       ------------------------------------------------------

Data no longer 'fits' in one block, it's divided between blocks --

       ------------------------------------------------------
000001:|Samurai Sam The Singing Elf|14734|23-NOV-2013|000017|
       ------------------------------------------------------
000017:|343.27|Duluth|Dept 300                              |
       ------------------------------------------------------

Notice there is a ‘pointer’ to the next piece of the row at the end of the original row. This occurs for every ‘split’ this row takes to get the data to ‘fit’ in the table. It’s also possible for a row to be migrated, in its entirety, to an empty block to minimize the trips Oracle has to take to retrieve the data; this occurs when the original row was small and occupied minimal space in a data block. Updates cause it to expand in length and make Oracle relocate the data to another block or blocks with more available space:

Original row -- Data occupies available space

       ------------------------------------------------------
000009:||Bonsai Bob|14734|23-NOV-2013|
       ------------------------------------------------------

Update is made -- Data no longer fits

       ------------------------------------------------------
000009:|Samurai Sam The Singing Elf|14734|23-NOV-2013|343.27|Duluth|Dept 300 |
       ------------------------------------------------------

Data is divided between blocks --

       ------------------------------------------------------
000009:||000019|
       ------------------------------------------------------
000019:|Samurai Sam The Singing Elf|14734|23-NOV-2013|000037|
       ------------------------------------------------------
000037:|343.27|Duluth|Dept 300                              |
       ------------------------------------------------------

The original rowid now has a pointer to the new location of the modified data. That migrated location may also be chained to an additional block because the length of the modified data eventually exceeds the data block size.

What does this mean for performance? It depends, really, on the overall number of rows in the table or schema that are chained or migrated. For a database not on Exadata, or one not using ASM, each chained/migrated row requires at least two fetches to return the data, which at least doubles the work Oracle has to do to process that row. For one, ten or one hundred rows this may be a very small decrease in performance. When 10% or more of the data in a table is chained/migrated I/O processing slows down, and this can happen for tables where the row length can exceed the block size. Without intervention Oracle usually sets the block size to 8196, or 8K, bytes. After taking into consideration any ‘overhead’ introduced by Oracle the maximum available space in an 8K block is rougly 7900 to 8000 bytes. Any table row longer than that will create a chained row even for an empty block. This can be changed at database creation or by creating tablespaces using a larger block size. [Smaller block sizes can also be used but would not benefit any process to reduce row chaining.] The largest possible block size supported by Oracle is 32K so if a table row exceeds that length when fully populated chained rows are guaranteed to exist. The more ‘passes’ Oracle must make to fetch a row the longer the I/O waits can be resulting in dimished performance so it’s best to minimize their occurrence.

Exadata and ASM present a different scenario. Since a diskgroup is striped across all available disks associated with that group the pieces of a chained row can exist on different disks, and, for Exadata, different storage cells. For a ‘regular’ database using ASM the performance can be slower because not only is Oracle needing to visit different blocks to return data it may also have to visit different disks as well, introducing additional latency. On top of that for an Exadata database chained rows, if not on the same disk or accessed by the same storage cell, causes Oracle to revert to regular block I/O, eliminating any chance of using a Smart Scan and any of the optimizations a Smart Scan would provide.

The question to answer is “How do I eliminate, or at least minimize, chained and migrated rows?” The first, and most effective, way is to design tables with the database block size in mind, taking care to not exceed the available space in the block when a row is fully populated and all variable-length columns contain data sized at the maximum defined length. It is possible to increase that available space somewhat, by setting two parameters, PCTFREE and PCTUSED, to tell Oracle how to allocate the datablock space. PCTUSED defines how much of the block can be used for data and governs when a block is taken off of the ‘available’ list. PCTFREE reserves the declared percentage for updates to existing data in the block. If rows are being constantly updated, such as in a financial system, it might be best to set PCTUSED to a smaller value than the default (which is 60) and PCTFREE to 100-PCTUSED, to ensure more space is available for updates to existing data. I have seen at least one system where PCTUSED was set to 1 and PCTFREE set to 99, allowing 1 row per block with plenty of room for updates. In that system the existence of chained rows dropped significantly. Please note that making such a change to an existing table using ‘alter table … ‘ won’t affect the currently populated data blocks. To do that in Oracle releases 9.0.1.x and later you’ll need to perform a move operation, to rebuild the table with the new PCTUSED and PCTFREE settings:


SQL> alter table empdrop move pctfree 99 pctused 1;

Table altered.

SQL>

Unfortunately such a move can increase the table size considerably since the rows per block has now been reduced to a much smaller value. [If you're still in the 'dark ages', using Oracle 8.1 or earlier, you'll need to export the table, pre-create it with a script modified to set the PCTUSED and PCTFREE 'properly' then import the data ignoring the table create error.] This is not a task to be taken lightly, nor one to use ‘canned’ scripts and ‘best practice’ values as each implementation is different. Simply because one table benefitted from such drastic settings for PCTUSED and PCTFREE doesn’t mean EVERY table will need such a dramatic change. Base your changes on the incidence of chained rows and the maximum row length as compared to the database block size.

“How do I know if I have chained rows?”, you ask. Oracle keeps a system statistic, ‘table fetch continued row’, that increments each time a continued/migrated row is fetched. It doesn’t tell you WHERE those rows are, however the ANALYZE TABLE command is still available to provide that information. It can take a while to run on a large table but by default it will populate a table named, brilliantly enough, CHAINED_ROWS, designed to contain all of the information you could need to identify which rows are chained in a table. There is no way to gather this information at the schema level (DBMS_STATS doesn’t have any procedure or function to gather that data) but you can create a script to do the deed. There is some preparation involved, namely creating the CHAINED_ROWS table, but Oracle has conveniently provided a script for that named utlchain.sql, located in $ORACLE_HOME/rdbms/admin:


rem
rem $Header: utlchain.sql 07-may-96.19:40:01 sbasu Exp $
rem
Rem Copyright (c) 1990, 1995, 1996, 1998 by Oracle Corporation
Rem NAME
REM    UTLCHAIN.SQL
Rem  FUNCTION
Rem    Creates the default table for storing the output of the
Rem    analyze list chained rows command
Rem  NOTES
Rem  MODIFIED
Rem     syeung     06/17/98  - add subpartition_name
Rem     mmonajje   05/21/96 -  Replace timestamp col name with analyze_timestam
Rem     sbasu      05/07/96 -  Remove echo setting
Rem     ssamu      08/14/95 -  merge PTI with Objects
Rem     ssamu      07/24/95 -  add field for partition name
Rem     glumpkin   10/19/92 -  Renamed from CHAINROW.SQL
Rem     ggatlin    03/09/92 -  add set echo on
Rem     rlim       04/29/91 -         change char to varchar2
Rem   Klein      01/10/91 - add owner name for chained rows
Rem   Klein      12/04/90 - Creation
Rem

create table CHAINED_ROWS (
  owner_name         varchar2(30),
  table_name         varchar2(30),
  cluster_name       varchar2(30),
  partition_name     varchar2(30),
  subpartition_name  varchar2(30),
  head_rowid         rowid,
  analyze_timestamp  date
);

Oracle also provides another version of that table in the $ORACLE_HOME/rdbms/admin/utlchn1.sql script, that uses the UROWID datatype which supports both Oracle and non-Oracle database rowid values:


Rem
Rem $Header: utlchn1.sql 24-jun-99.07:57:57 echong Exp $
Rem
Rem utlchn1.sql
Rem
Rem  Copyright (c) Oracle Corporation 1998, 1999. All Rights Reserved.
Rem
Rem    NAME
Rem      utlchn1.sql - 
Rem
Rem    DESCRIPTION
Rem      
Rem
Rem    NOTES
Rem      
Rem
Rem    MODIFIED   (MM/DD/YY)
Rem    echong      06/24/99 - rename
Rem    syeung      06/22/98 - add subpartition_name
Rem    echong      06/05/98 - chained rows table with urowid type
Rem    echong      06/05/98 - Created
Rem

create table CHAINED_ROWS (
  owner_name         varchar2(30),
  table_name         varchar2(30),
  cluster_name       varchar2(30),
  partition_name     varchar2(30),
  subpartition_name  varchar2(30),
  head_rowid         urowid,
  analyze_timestamp  date
);

Generally speaking you probably won’t need the table using the UROWID data type but either table can be used to report chained rows in Oracle databases.

Once the CHAINED_ROWS table is created it’s a simple task to generate a script to analyze all of the tables in a given schema for chained rows:


select 'analyze table '||owner||'.'||table_name||' list chained rows;'
from dba_tables
where owner = upper('&1');

Spool the output from that query to a file then execute it from the SQL> prompt. Remember that by default the command inserts data into the CHAINED ROWS table. You can, if you’re feeling adventurous, create a similar table with a different name, and have ANALYZE TABLE insert the chained row data into your table by telling Oracle where you want the data to go. For example if you decide to create a table named BAD_JUJU with the same structure as the CHAINED_ROWS table you can do this:


select 'analyze table '||owner||'.'||table_name||' list chained rows into bad_juju;'
from dba_tables
where owner = upper('&1');

You can then query your BAD_JUJU table for chained row information.

If the ‘table fetch continued row’ statistic reports 0 then you have nothing to worry about and nothing to ‘fix':


SQL> select value from v$sysstat where name = 'table fetch continued row';

     VALUE
----------
         0

SQL>

In reality that isn’t likely to happen as some of the SYS-owned tables can, and do, contain chained rows. This is the result from a small, personal database I have on my laptop:


SQL> select value from v$sysstat where name = 'table fetch continued row';

     VALUE
----------
       778

SQL>

Let’s see where those chained rows are:


SQL> select owner_name, table_name, count(*) chain_ct
  2  from bad_juju
  3  group by owner_name, table_name;

OWNER_NAME                     TABLE_NAME                       CHAIN_CT
------------------------------ ------------------------------ ----------
SYS                            SMON_SCN_TIME                           1
SYS                            TABPART$                                1
SYS                            WRI$_DBU_FEATURE_USAGE                  5
APEX_030200                    WWV_FLOW_STEPS                         10
SYS                            IDL_CHAR$                             165
SYS                            METASTYLESHEET                        113
APEX_030200                    WWV_FLOW_TEMPLATES                     18
MDSYS                          SDO_DATUMS                              2
SYS                            COLLECTION$                             2
SYS                            IDL_UB1$                             6683
APEX_030200                    WWV_FLOW_STEP_PROCESSING               57
MDSYS                          SDO_ELLIPSOIDS                          8
SYS                            HIST_HEAD$                            265
SYS                            USER$                                   1
MDSYS                          SDO_COORD_OPS                          45
SYS                            IDL_SB4$                               31
SYS                            IND$                                    4
SYS                            METHOD$                                 3
SYS                            RESULT$                                 3
SYS                            TRIGGER$                                1
APEX_030200                    WWV_FLOW_PAGE_PLUG_TEMPLATES            1
SYS                            PARAMETER$                             13
SYS                            TAB$                                    2
SYS                            VIEW$                                  67
APEX_030200                    WWV_FLOW_PAGE_PLUGS                    36
MDSYS                          SDO_COORD_OP_PARAM_VALS               116
MDSYS                          SDO_COORD_OP_PATHS                     53
SYS                            ATTRIBUTE$                             29
SYS                            IDL_UB2$                              837
SYS                            JAVA$MC$                              124

30 rows selected.

SQL>

Notice that no user tables or vendor-supplied application schemas have chained rows, only Oracle-supplied tables, so do not be confused if you analyze the tables in a given schema and find there are no chained rows to address. Is it a good idea to change PCTFREE and PCTUSED on those Oracle-supplied tables? It’s your decision to make; if recursive sql calls to these tables are consuming large amounts of time it may be beneficial to change those values. I really suspect it’s not that big of a performance hit and you’d be better off addressing chained rows, should they exist, in any application schemas present in your database.

With Exadata chained rows do more than simply increase the I/O workload, they can shift it from the storage cells to the database server by reverting to regular block I/O rather than using a Smart Scan. Since Smart Scans can’t process chained rows that span disks and/or storage cells this eliminates any use of Storage Indexes, Predicate Filtering and Column Projection. There may be chained rows that can be processed by a Smart Scan (where the row is chained on the same disk accessed by the same storage cell) but those are very likely rare occurrences. Exadata provides three statistics regarding chained rows and their processing — ‘chained rows processed by cell’, ‘chained rows rejected by cell’ and ‘chained rows skipped by cell’. These can report the number of rows in each ‘category’ from the time the system was started. Each statistic, and what it reports, is listed below:


chained rows processed by cell

This statistic, chained rows processed by cell, reports on the chained rows that were processed within a storage cell, that is, where the row pieces (or at least the ones Oracle needed to
return results) are located within the storage accessed by a given storage cell.  This is known as inter-block chaining, where the row piece is located in the same data block as the head 
row.  This is a special case of row chaining for rows with more than 255 columns; the next piece is in the same block as the head and can be fetched with no additional effort.

chained rows rejected by cell

This counter records the chained rows where the next piece is not in the same block or cell, as described in the previous section.  This statistic is incremented in, apparently,
special cases as it is not incremented very often in the systems I have dealt with.  When this is incremented the Smart Scan fell back to regular block I/O to process the row.

chained rows skipped by cell

This is the statistic most often incremented when a Smart Scan needs to revert back to regular block I/O.  Again it's incremented when a chained row is found and the remaining pieces reside
across the entire storage stripe, spanning storage cells in the process.  It isn't clear when the previous counter or this counter should be incremented in such situations; I have found that
this counter is the one most often incremented when Smart Scans revert to regular block I/O to process chained rows.

The same methods for non-Exadata databases can be applied to those on Exadata; since ASM stripes the diskgroup across all available disks it may not be possible to correct every chained row that exists. That does not mean that the chained/migrated row situation should not be addressed, but remember that the storage volume on Exadata systems is often times far larger than on conventional databases using commodity hardware and that fact can make any process intended to reduce or eliminate chained rows take longer. If the performance hit from the chained rows is large enough any effort to reduce that I/O workload is worth the time and the trouble it may take to implement.

Chained and migrated rows can be a bother, and can introduce performance issues when dealing with large volumes of them. Sometimes it’s not possible to eliminate them because the row length exceeds even the largest available database block size. But there are ways to mitigate their creation for tables where the row length easily fits within the database block, either by altering how the block space is allocated using PCTUSED and PCTFREE, moving the table or tables to a tablespace having a larger block size or by moving the table, in place, to relocate the rows and reduce, or eliminate, the existing chaining. This isn’t a quick operation, and it certainly should not be done online especially in production databases, but the problem, in many cases, can be successfully addressed.

It’s a situation worth pondering.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 706 other followers