Oracle Tips and Tricks — David Fitzjarrell

November 14, 2014

In Deference

Filed under: General — dfitzjarrell @ 08:08

An interesting ‘problem’ surfaced a while ago, one where a user with zero quota on every tablespace could successfully create tables. Of course once it was time to insert data the inserts failed, but this was confusing the user creating the tables. The ‘problem’ stems from enabling deferred segment creation in the database. Let’s see how that can create a confusing situation,

Deferred segment creation allows tables and indexes to be created without physical segments which can be a double-edged sword as it allows objects to be created even if there is no space available for those objects in the specified tablespace. Just like the user who has no quota on the desired tablespace a false sense of security can result when the database reports ‘Table created’ or ‘Index created’. The data dictionary contains the object information so a query of DBA_TABLES, DBA_INDEXES or any of the %TAB% and %IND% tables and views will return the expected information. Querying DBA_SEGMENTS, however, may fail to return data since an object may not have any segments associated with it:


SQL> --
SQL> -- Prove the table exists
SQL> --
SQL> select table_name, tablespace_name
  2  from user_tables;

TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------
KRONK                          USERS

SQL>
SQL> --
SQL> -- Now prove it doesn't exist
SQL> --
SQL> select segment_name, tablespace_name
  2  from user_segments;

no rows selected

SQL>

Looking at a session creating a table, owned by a user who has 0 quota on every tablespace, reveals this possibly confusing behavior:


SQL> 
SQL> --
SQL> -- See if deferred segment creation is enabled
SQL> --
SQL> show parameter defer

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
deferred_segment_creation            boolean     TRUE
SQL> 
SQL> --
SQL> -- We see that it is
SQL> --
SQL> -- Show quotas for user BLORPOFORST
SQL> --
SQL> @get_user_quotas.sql blorpoforst
SQL> with ttlbytes as (
  2  	select tablespace_name, sum(bytes) ttl_bytes, sum(blocks) ttl_blocks
  3  	from dba_data_files
  4  	group by tablespace_name
  5  ),
  6  userquotas as(
  7  	select
  8  	TABLESPACE_NAME,
  9  	USERNAME,
 10  	bytes,
 11  	nvl(max_bytes,0) max_bytes,
 12  	nvl(blocks,0) blocks,
 13  	nvl(max_blocks,0) MAX_BLOCKS
 14  	from dba_ts_quotas
 15  	where username = upper('&&1')
 16  )
 17  select
 18  s.tablespace_name,
 19  nvl(q.username, upper('&&1')) username,
 20  nvl(BYTES,0) bytes,
 21  case when nvl(q.MAX_BYTES,0) = -1 then s.ttl_bytes else nvl(q.max_bytes,0) end max_bytes,
 22  nvl(blocks, 0) blocks,
 23  case when nvl(q.MAX_BLOCKS,0) = -1 then s.ttl_blocks else nvl(q.max_blocks,0) end max_blocks,
 24  case when nvl(q.bytes,0) > 0 and abs(nvl(q.max_bytes,0) - nvl(q.bytes,0))  0 and abs(nvl(q.max_bytes,0) - nvl(q.bytes,0)) between 102400 and 10240000 then 'WARNING'
 26  	  else 'OK' end status
 27  from userquotas q full outer join ttlbytes s
 28  	   on (q.tablespace_name = s.tablespace_name)
 29  order by q.username, s.tablespace_name;
old  15:    where username = upper('&&1')
new  15:    where username = upper('blorpoforst')
old  19: nvl(q.username, upper('&&1')) username,
new  19: nvl(q.username, upper('blorpoforst')) username,

TABLESPACE_NAME                USERNAME                            BYTES  MAX_BYTES     BLOCKS MAX_BLOCKS STATUS
------------------------------ ------------------------------ ---------- ---------- ---------- ---------- -------
DFLT                           BLORPOFORST                             0          0          0          0 OK
INDX                           BLORPOFORST                             0          0          0          0 OK
SYSAUX                         BLORPOFORST                             0          0          0          0 OK
SYSTEM                         BLORPOFORST                             0          0          0          0 OK
UNDOTBS1                       BLORPOFORST                             0          0          0          0 OK
USERS                          BLORPOFORST                             0          0          0          0 OK

6 rows selected.

SQL> 
SQL> 
SQL> --
SQL> -- Absolutely no quotas on any tablespace
SQL> --
SQL> -- Try to create a table, which should fail
SQL> --
SQL> -- Succeeds due to deferred segment creation
SQL> --
SQL> create table kronk(ipso number, egrud varchar2(40))
  2  tablespace users;

Table created.

SQL> 
SQL> --
SQL> -- This will be confusing to users creating tables
SQL> --
SQL> -- 'Table' creates, but no data can be added
SQL> --
SQL> insert into kronk(ipso, egrud)
  2  values (1, 'Insert fails!!!!!');
insert into kronk(ipso, egrud)
            *
ERROR at line 1:
ORA-01950: no privileges on tablespace 'USERS'


SQL> 
SQL> --
SQL> -- Deferred segment creation allows the 'object' to
SQL> -- exist in the data dictionary even though the quota
SQL> -- prohibits any actual segments being created
SQL> --
SQL> -- Try this again, with an additional parameter
SQL> -- to the create table statement
SQL> --
SQL> drop table kronk purge;

Table dropped.

SQL> 
SQL> create table kronk(ipso number, egrud varchar2(40))
  2  segment creation immediate
  3  tablespace users;
create table kronk(ipso number, egrud varchar2(40))
*
ERROR at line 1:
ORA-01950: no privileges on tablespace 'USERS'


SQL> 
SQL> 
SQL> --
SQL> -- Now the table create fails, as expected
SQL> -- because segment creation was forced
SQL> --
SQL> -- Another way to do this is to disable
SQL> -- deferred segment creation at the session
SQL> -- level
SQL> --
SQL> -- This can also be done at the system level
SQL> -- without restarting the database
SQL> --
SQL> alter session set deferred_segment_creation=false;

Session altered.

SQL> 
SQL> --
SQL> -- Now the original create table statement will fail
SQL> --
SQL> create table kronk(ipso number, egrud varchar2(40))
  2  tablespace users;
create table kronk(ipso number, egrud varchar2(40))
*
ERROR at line 1:
ORA-01950: no privileges on tablespace 'USERS'


SQL> 

Besides confusing any users creating tables it can also take a DBA by surprise if, in earlier releases of Oracle, the DBA checked for object existence by querying the DBA_SEGMENTS view:


SQL> show parameter defer

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
deferred_segment_creation            boolean     TRUE
SQL>
SQL> create table yazzoo(plunk number, geepo varchar2(40), alpinga date)
  2  tablespace users;

Table created.

SQL>
SQL> select table_name, tablespace_name
  2  from user_tables;

TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------
YAZZOO                         USERS
DUMMY                          USERS
SALGRADE                       USERS
BONUS                          USERS
DEPT                           USERS
EMP                            USERS

6 rows selected.

SQL>
SQL> select segment_name, tablespace_name
  2  from user_segments;

SEGMENT_NAME                                                                      TABLESPACE_NAME
--------------------------------------------------------------------------------- ------------------------------
DEPT                                                                              USERS
DUMMY                                                                             USERS
EMP                                                                               USERS
SALGRADE                                                                          USERS

SQL>

Notice that two tables, the one just created (YAZZOO) and one of the demonstration tables, BONUS, have no segments even though they do exist in USER_TABLES. I know of DBAs who use DBA_SEGMENTS to check objects and with deferred segment creation that may provide a list that doesn’t include every table a user has created. With deferred segment creation enabled older scripts that used DBA_SEGMENTS to successfully report on all objects need to be modified to use DBA_TABLES and DBA_INDEXES to return all tables and indexes regardless of whether segments are created or not; the older scripts can also be modified to show tables and indexes which have no segments as these example using USER_TABLES, USER_INDEXES and USER_SEGMENTS illustrate:


SQL> select table_name, tablespace_name, nvl(ttlbytes,0) ttlbytes, nvl(ttlext,0) ttlext
  2  from
  3  (select t.table_name, t.tablespace_name, sum(s.bytes) ttlbytes, sum(s.extents) ttlext
  4  from user_tables t left outer join user_segments s on (s.segment_name = t.table_name)
  5  group by t.table_name, t.tablespace_name);

TABLE_NAME                     TABLESPACE_NAME                  TTLBYTES     TTLEXT
------------------------------ ------------------------------ ---------- ----------
DUMMY                          USERS                               65536          1
EMP                            USERS                               65536          1
DEPT                           USERS                               65536          1
BONUS                          USERS                                   0          0
YAZZOO                         USERS                                   0          0
SALGRADE                       USERS                               65536          1

6 rows selected.

SQL>
SQL> select index_name, tablespace_name, nvl(ttlbytes,0) ttlbytes, nvl(ttlext,0) ttlext
  2  from
  3  (select t.index_name, t.tablespace_name, sum(s.bytes) ttlbytes, sum(s.extents) ttlext
  4  from user_indexes t left outer join user_segments s on (s.segment_name = t.index_name)
  5  group by t.index_name, t.tablespace_name);

INDEX_NAME                     TABLESPACE_NAME                  TTLBYTES     TTLEXT
------------------------------ ------------------------------ ---------- ----------
SALGRD_IDX                     INDX                                65536          1
YAZZOO_IDX                     INDX                                    0          0
DEPT_IDX                       INDX                                65536          1
EMP_IDX                        INDX                                65536          1
BONUS_IDX                      INDX                                    0          0

SQL>

Deferred segment creation can be very helpful when creating objects that may not be used immediately; it can also create problems when those objects are created by a user account with no quota on a tablespace or when the available space in a tablespace would prevent a segment from being created. The problem won’t become apparent until the first insert is met with failure because of a missing quota or insufficient available space. The DBA should get used to checking DBA_TABLES, DBA_INDEXES and DBA_SEGMENTS to verify the object can contain data. Also users who create tables should test an insert after the table is created to prevent any surprises later. Indexes would be tested with that same test insert; if the index is the ‘offending’ object the test insert will reveal this:


SQL> insert into kronk(ipso, egrud)
  2  values (1, 'Insert fails!!!!!');
insert into kronk(ipso, egrud)
            *
ERROR at line 1:
ORA-01950: no privileges on tablespace 'INDX'


SQL>

Deferred segment creation is, for the most part, a good thing as it allows tables and indexes to be created even when space isn’t immediately available. (Of course it’s not a good idea to create tables and indexes when it’s known there is insufficient space, but in the course of table/index maintenance space may be freed by dropping old versions of a table or by archiving and purging old data no longer needed for daily operation.) A good example for deferred segment creation would be migrating a table to a new definition; deferring segmnent creation allows the new table to grow as the data is inserted rather than creating a large extent where some of that allocated space may remain unused. Another benefit is the new table may be smaller initially than the original even though the number of rows has not decreased. Deferred segment creation can also be ignored, either by setting the deferred_segment_creation parameter to FALSE or by explicitly coding SEGMENT CREATION IMMEDIATE in the create table/create index statement, as shown in the examples above.

How to manage segment creation is a decision that should not be made without serious discussion between the development teams and the DBAs. It can generate a lot of create table activity in a database, creating various incarnations of a table before deciding on the final structure (a task usually left to the architects as an exercise on paper before any code is written and executed). It’s also possible that the development team isn’t aware of deferred segment creation; this can result in the ‘surprise’ reported at the beginning of this article when the first attempt to use the created table is executed. It may be better to see the error when the statement is executed rather than later, when data cannot be inserted because segments cannot be created.

Should you use deferred segment creation? That choice is one both the DBA team and development team need to make. Look at all aspects of the argument then decide accordingly. Remember that the ‘right’ answer is the one that’s suited to your environment. One size does not fit all. Choose wisely for the best fit for your environment.

October 22, 2014

Memory Improvement

Filed under: General — dfitzjarrell @ 09:37

Oracle 12c (version 12.1.0.2) offers the option of using in-memory processing to speed things along. Called the In-memory option it’s installed when you install the 12.1.0.2 software. Using it can make a considerable difference in processing speed, provided you have sufficient resources (RAM) available. Let’s revisit an older example, on Bloom filters, and see if Oracle processes things any faster in-memory.

Looking again at the Bloom filter example using 11.2.0.3 on a Windows installation it’s evident that the Bloom filter provides an increased level of performance (only the abbreviated output section is reproduced here):


SQL> select /*+ Bloom join 2 parallel 2 use_hash(emp emp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal	  --, count(*) emps
  2  from emp e join emp_dept ed on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  where ed.empdept = 20;

     EMPID EMPNM                                    DEPTNM                        EMPSAL                                                              
---------- ---------------------------------------- ------------------------- ----------                                                              
   1670281 Fnarm1670281                             PROCUREMENT                     5000                                                              
   1670289 Fnarm1670289                             PROCUREMENT                     6000                                                              
   1670297 Fnarm1670297                             PROCUREMENT                     7000                                                              
   1670305 Fnarm1670305                             PROCUREMENT                     1000                                                              
   1670313 Fnarm1670313                             PROCUREMENT                     2000                                                              
   1670321 Fnarm1670321                             PROCUREMENT                     3000                                                              
   1670329 Fnarm1670329                             PROCUREMENT                     4000                                                              
   1670337 Fnarm1670337                             PROCUREMENT                     5000                                                              
   1670345 Fnarm1670345                             PROCUREMENT                     6000                                                              
   1670353 Fnarm1670353                             PROCUREMENT                     7000                                                              
   1670361 Fnarm1670361                             PROCUREMENT                     1000                                                              
...
   1857369 Fnarm1857369                             PROCUREMENT                     4000                                                              
   1857377 Fnarm1857377                             PROCUREMENT                     5000                                                              
   1857385 Fnarm1857385                             PROCUREMENT                     6000                                                              

250000 rows selected.

Elapsed: 00:00:54.86

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2643012915                                                                                                                           
                                                                                                                                                      
----------------------------------------------------------------------------------------------------------------------------------                    
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |                    
----------------------------------------------------------------------------------------------------------------------------------                    
|   0 | SELECT STATEMENT                     |              |   273K|    26M|  1378   (1)| 00:00:01 |        |      |            |                    
|   1 |  PX COORDINATOR                      |              |       |       |            |          |        |      |            |                    
|   2 |   PX SEND QC (RANDOM)                | :TQ10002     |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |                    
|*  3 |    HASH JOIN                         |              |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   4 |     PX RECEIVE                       |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   5 |      PX SEND BROADCAST               | :TQ10001     |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | P->P | BROADCAST  |                    
|   6 |       NESTED LOOPS                   |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|   7 |        BUFFER SORT                   |              |       |       |            |          |  Q1,01 | PCWC |            |                    
|   8 |         PX RECEIVE                   |              |       |       |            |          |  Q1,01 | PCWP |            |                    
|   9 |          PX SEND BROADCAST           | :TQ10000     |       |       |            |          |        | S->P | BROADCAST  |                    
|  10 |           TABLE ACCESS BY INDEX ROWID| DEPT_INFO    |     1 |    27 |     1   (0)| 00:00:01 |        |      |            |                    
|* 11 |            INDEX UNIQUE SCAN         | DEPT_INFO_PK |     1 |       |     1   (0)| 00:00:01 |        |      |            |                    
|  12 |        PX BLOCK ITERATOR             |              |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWC |            |                    
|* 13 |         TABLE ACCESS FULL            | EMP_DEPT     |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|  14 |     PX BLOCK ITERATOR                |              |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWC |            |                    
|* 15 |      TABLE ACCESS FULL               | EMP          |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
----------------------------------------------------------------------------------------------------------------------------------                    
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   3 - access("ED"."EMPID"="E"."EMPID")                                                                                                               
  11 - access("D"."DEPTNUM"=20)                                                                                                                       
  13 - filter("ED"."EMPDEPT"=20)                                                                                                                      
  15 - filter(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))                                                                                               
                                                                                                                                                      
Note                                                                                                                                                  
-----                                                                                                                                                 
   - dynamic sampling used for this statement (level=2)                                                                                               


Statistics
----------------------------------------------------------                                                                                            
         33  recursive calls                                                                                                                          
        139  db block gets                                                                                                                            
      36224  consistent gets                                                                                                                          
      17657  physical reads                                                                                                                           
          0  redo size                                                                                                                                
    9526012  bytes sent via SQL*Net to client                                                                                                         
     183846  bytes received via SQL*Net from client                                                                                                   
      16668  SQL*Net roundtrips to/from client                                                                                                        
          5  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
     250000  rows processed                                                                                                                           

SQL>

Now let’s get Oracle’s In-memory option working. By default it’s disabled; the initialization parameter inmemory_size is set to 0. To get In-memory configured and working three steps need to be performed:


1. Set inmemory_size to a non-zero value; in the database used for this example it's been set to 576M.  As this is a parameter that cannot be modified directly
   it will need to be changed using scope=spfile.
2. If desired increase the sga_max_size by the value that inmemory_size has been set to.  This is not a requirement but remember that any memory allocated
   to the in-memory database will be taken from the SGA allocation.
3. Restart the database to effect the changes.

One caveat on this is calculating how much memory will be needed to provide a workable In-memory configuration; it’s similar to configuring the KEEP pool (a task from earlier releases) as you should allocate sufficient memory to contain all of the tables that will be used to prevent delays in loading the data to the IM column store. Now you have the in-memory option configured and ready to use. Alter a table to use inmemory and you will start seeing the possible benefits of this configuration. (By default the following in-memory options are set:


  PRIORITY NONE -- Oracle controls when table data is populated in the IM column store.  Population may be delayed if memory is needed for other processes.
                   Other values are LOW, MEDIUM, HIGH and CRITICAL.  As expected a higher PRIORITY loads before any lower priority.
  MEMCOMPRESS FOR QUERY LOW -- This option provides the best query performance and the least compression.
  DISTRIBUTE AUTO -- Used for RAC this controls how the data is distributed between nodes.  BY ROWID RANGE, BY PARTITION and BY SUBPARTITION are other options.
  NO DUPLICATE -- Another RAC parameter, this controls how many nodes will have duplicated data.  DUPLICATE involves one node for a total of two (so in a two-node
                  RAC configuration DUPLICATE and DUPLICATE ALL provide the same duplication),  DUPLICATE ALL duplicates the table data across all available RAC nodes.

Re-writing the Bloom filter example to take advantage of this new feature allows us to see how it will perform:


SQL>
SQL> --
SQL> -- Create sample tables
SQL> --
SQL> create table emp(
  2  	     empid   number,
  3  	     empnm   varchar2(40),
  4  	     empsal  number,
  5  	     empssn  varchar2(12),
  6  	     constraint emp_pk primary key (empid)
  7  ) inmemory;

Table created.

SQL>
SQL> create table emp_dept(
  2  	     empid   number,
  3  	     empdept number,
  4  	     emploc  varchar2(60),
  5  	     constraint emp_dept_pk primary key(empid)
  6  ) inmemory;

Table created.

SQL>
SQL> create table dept_info(
  2  	     deptnum number,
  3  	     deptnm  varchar2(25),
  4  	     constraint dept_info_pk primary key(deptnum)
  5  ) inmemory;

Table created.

SQL>
SQL> --
SQL> -- Load sample tables with data
SQL> --
SQL> begin
  2  	     for i in 1..2000000 loop
  3  		     insert into emp
  4  		     values(i, 'Fnarm'||i, (mod(i, 7)+1)*1000, mod(i,10)||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||mod(i,10)||mod(i,10));
  5  		     insert into emp_dept
  6  		     values(i, (mod(i,8)+1)*10, 'Zanzwalla'||(mod(i,8)+1)*10);
  7  		     commit;
  8  	     end loop;
  9  	     insert into dept_info
 10  	     select distinct empdept, case when empdept = 10 then 'SALES'
 11  					   when empdept = 20 then 'PROCUREMENT'
 12  					   when empdept = 30 then 'HR'
 13  					   when empdept = 40 then 'RESEARCH'
 14  					   when empdept = 50 then 'DEVELOPMENT'
 15  					   when empdept = 60 then 'EMPLOYEE RELATIONS'
 16  					   when empdept = 70 then 'FACILITIES'
 17  					   when empdept = 80 then 'FINANCE' end
 18  	     from emp_dept;
 19
 20  end;
 21  /

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- Gather statistics
SQL> --
SQL> exec dbms_stats.gather_schema_stats('BING')

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- Run join query using bloom filter
SQL> --

SQL> set autotrace on
SQL> set timing on
SQL>
SQL> select /*+ bloom join 2 parallel 2 use_hash(emp emp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal	  --, count(*) emps
  2  from emp e join emp_dept ed on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  where ed.empdept = 20;

     EMPID EMPNM                                    DEPTNM                        EMPSAL
---------- ---------------------------------------- ------------------------- ----------
   1548633 Fnarm1548633                             PROCUREMENT                     3000
   1548641 Fnarm1548641                             PROCUREMENT                     4000
   1548649 Fnarm1548649                             PROCUREMENT                     5000
 ...
    897969 Fnarm897969                              PROCUREMENT                     3000
    897977 Fnarm897977                              PROCUREMENT                     4000
    897985 Fnarm897985                              PROCUREMENT                     5000
    897993 Fnarm897993                              PROCUREMENT                     6000
    898001 Fnarm898001                              PROCUREMENT                     7000

250000 rows selected.

Elapsed: 00:00:25.67

Execution Plan
----------------------------------------------------------
Plan hash value: 2613779428

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |   250K|    10M|       |  3863   (2)| 00:00:01 |
|*  1 |  HASH JOIN                    |           |   250K|    10M|  8304K|  3863   (2)| 00:00:01 |
|   2 |   JOIN FILTER CREATE          | :BF0000   |   250K|    10M|       |  3863   (2)| 00:00:01 |
|*  3 |    HASH JOIN                  |           |   250K|  5371K|       |    94  (23)| 00:00:01 |
|*  4 |     TABLE ACCESS INMEMORY FULL| DEPT_INFO |     1 |    13 |       |     1   (0)| 00:00:01 |
|*  5 |     TABLE ACCESS INMEMORY FULL| EMP_DEPT  |   250K|  2197K|       |    92  (22)| 00:00:01 |
|   6 |   JOIN FILTER USE             | :BF0000   |  2000K|    41M|       |   137  (17)| 00:00:01 |
|*  7 |    TABLE ACCESS INMEMORY FULL | EMP       |  2000K|    41M|       |   137  (17)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ED"."EMPID"="E"."EMPID")
   3 - access("ED"."EMPDEPT"="D"."DEPTNUM")
   4 - inmemory("D"."DEPTNUM"=20)
       filter("D"."DEPTNUM"=20)
   5 - inmemory("ED"."EMPDEPT"=20)
       filter("ED"."EMPDEPT"=20)
   7 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"E"."EMPID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"E"."EMPID"))

Note
-----
   - this is an adaptive plan


Statistics
----------------------------------------------------------
         14  recursive calls
          0  db block gets
         26  consistent gets
          0  physical reads
          0  redo size
    9659722  bytes sent via SQL*Net to client
     183877  bytes received via SQL*Net from client
      16668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     250000  rows processed

SQL>

Notice that the Bloom filter is still used, and in conjunction with the In-memory option the execution time was considerably shorter than the original, 11.2.0.3 Windows server, example. Notice also that since the table was in-memory the memory sorts are 0 (since all processing was done in-memory). If you have the resources available using the In-memory option for Oracle 12.1.0.2 might improve performance significantly.

Bloom filters plus the In-memory database option with Oracle 12.1.0.2 can improve query performance dramatically, and that can make for very happy end users. It’s a bit of a chore to get In-memory configured (the hardest part being the available memory calculations) but it appears to be worth every bit of time spent to get it working.

We could all use a little memory improvement now and then.

October 7, 2014

To Skip, Or Not To Skip

Filed under: General — dfitzjarrell @ 14:02

An interesting issue presented itself just recently with a logical standby database I manage. The database is used for generating reports and the client wanted to skip all DML activity for a given schema as it wasn’t necessary for reporting purposes. I had done this in version 10.2.0.x; it was a simple procedure on a low-traffic database:


alter database stop logical standby apply;
exec dbms_logstdby.skip('DML','','%')
alter database start logical standby apply;

Pretty simple stuff, really. In the 10.2.0.x standby configuration I had been lucky that no DDL changes for the specified schema had occurred, so missing a step wasn’t obvious as it didn’t impact the apply process. For a while the 11.2 standby was happily applying transactions and skipping DML for the specified schema. Overnight, however, the apply process shut down for no immediately apparent reason. The schema in question had all of its DML ignored so why did apply stop? Looking into the alert log provided the following set of errors:


ORA-26808: Apply process AS01 died unexpectedly.
ORA-02149: Specified partition does not exist
LOGSTDBY Apply process AS03 server id=3 pid=85 OS id=27432 stopped

This required further investigation. Checking the primary database I found 594 partitions for tables owned by the schema that was supposed to be skipped. On the logical standby I found 43 fewer partitions for that same schema. So it appeared that the actions I took the day before, to skip all DML activity for this schema, weren’t enough. Pondering the issue I went to the Oracle online documentation and eventually found information showing a step I had missed. Skipping the schema DML wasn’t enough, the DDL also had to be ignored and to do that required the following set of instructions:


ALTER DATABASE ABORT LOGICAL STANDBY APPLY;
EXECUTE DBMS_LOGSTDBY.SKIP(stmt => 'SCHEMA_DDL',-
     schema_name => '', -
     object_name => '%');
EXECUTE DBMS_LOGSTDBY.SKIP ('DML', '', '%');
ALTER DATABASE START LOGICAL STANDBY APPLY IMMEDIATE;  

Using the abort command does two things: it stops the apply process and it rolls back any pending transactions. This is necessary so that ‘offending’ transactions, the ones that throw the listed error, get rolled back; this ‘resets’ the transaction ‘pointer’ for Log Miner so that when the apply process is restarted it can skip the transactions throwing the errors. Notice the second step in the skip process:


EXECUTE DBMS_LOGSTDBY.SKIP(stmt => 'SCHEMA_DDL',-
     schema_name => '', -
     object_name => '%');

This is necessary to tell Oracle to completely skip any DDL for the specified schema and keeps transaction processing going even if there are DDL changes on the primary database for the schema you want to skip. Since I hadn’t executed the skip procedure twice, with the two sets of slightly different parameters, only half of the job was done; it wasn’t apparent until partitions were added to the specified schema on the primary that the skip wasn’t processed completely. Once that missing step had been executed, and the original skip statement was re-submitted, the apply process started applying transactions and the standby lag started getting shorter and shorter.

It pays to read the documentation when performing unfamiliar tasks or what seem to be familiar tasks in a new release. Missed steps can create issues that can seem to be mysterious; verifying the required steps is necessary to ensure a task is completed successfully. What you may have given the least thought to may be the very thing that stops you in your tracks.

Skip, skip, skip to my Lou.

September 25, 2014

Map Reading

Filed under: General — dfitzjarrell @ 11:55

Consider the following concept: When you are born you are issued a map showing the direction your life will take. Along the way people will come into your life and help you make sense of parts of that map. You may not know these people at the time but they will be important in establishing where you are to be headed and possibly what you should be doing. Eventually you can read the entire map and know where you should go; hopefully you’ll already be there.

Continuing with that premise there have been four extremely influential people who have come into my life to help me read my map and set the direction that landed me here, as an Oracle DBA. I would like to acknowledge these people here; they will be listed in ‘order of appearance’, to take a phrase from the television industry. Any one of them would have been very influential; it’s the fact that they all entered my life at some point and helped guide me along the way that I find important.

If it weren’t for Janet Hamilton, Ph.D., I probably never would have stayed with my plans for a college degree in Chemistry. Janet is a one of a kind person, extremely intelligent yet willing and able to share her knowledge. She taught both Freshman Inorganic Chemistry and Sophomore Organic Chemistry so those of us who were Chemistry majors, Pre-Med or Nursing students had her for two full years. She also taught the Organic Chemistry labs so many of us saw a lot of her each week. She was always willing to help during her office hours and I took advantage of that as often as I could. She was passionate about Chemistry and instilled that passion in us. What she taught me was this: knowledge is best when it’s shared. She was a shining example of this every day and fueled in me a desire to share what I know.

Next in line was Cora Schweitzer, a lovely woman who taught Advanced World Literature and did it with a passion and flair I have yet to experience again. Unlike many English teachers Cora brought the subject to life and did it on a daily basis. I could not wait to go to class and experience her knowledge and wisdom. I learned from her that passion is a key to workplace success and happiness, and that you should do what you love as a career. That is sometimes a difficult task to complete as thought, logic and passion can ‘collide’; doing what you love may not be the most financially advantageous direction to take but it can make a difference in how your life is spent. Sometimes it’s better to make a living doing something you are passionate about than it is to go for the financial success, especially if financial success leads to a great deal of stress. One day in class Cora admitted she would sit near her office door waiting for the sound of others leaving the building. She would then call out, “Going home?” in hopes of getting a ride after her day was done. I made regular pilgrimages to her office soon after that to offer to drive her home. We had lovely conversations while I drove, and I learned even more how interesting and fascinating she was. Sadly Cora died in 1982; she left this world a better place than she found it and I am truly grateful to have been a part of her wonderful life.

Eunice Bonar taught me Quantitative Analysis, and did it like no other person could. She was a stickler for details, but that’s how you had to be when performing quantitative procedures and tests. She showed me the beauty of the Q-test and the satisfaction of producing repeatable results through care and planning. No detail was too small to overlook; she was preparing us for work as Analytical Chemists, a Chemistry career based on precision and accuracy. And she taught us the difference between the two as you can have good precision and poor accuracy, good accuracy and poor precision, or good precision and good accuracy. Regardless of the tasks at hand Eunice made the work interesting and fun, at least to me. Just like my grandfather taught me years before Eunice further instilled in me the desire to do my best at whatever task lies ahead; if you can’t sign your name to it, it isn’t worth doing.

Last, but certainly not least, is Dennis Marynick, who taught Physical Chemistry. This is the toughest subject a Chemistry or Chemical Engineering major would take. Bumper stickers were sold by the local American Chemical Society chapter stating “Honk If You Passed P-Chem!”. The subject matter was tough but Dennis could make it interesting. He could translate all of his years of Doctoral research into concepts that promoted understanding. True, there were topics that he just couldn’t work his magic on but, for the most part, he made Physical Chemistry less intimidating, possibly because he, too, loved what he did for a living. He also never stopped learning, not because he had to but because he wanted to. That Is an important concept that I integrated into my life.

What I learned from these four outstanding people is this: never stop learning, do what you are passionate about and share what you know, no matter how unimportant you think it might be. Knowledge is power, and sharing what you know can empower others. Because of these people I have become who I am today, a person constantly learning new things and sharing that knowledge as often as I can.

Each of us has at least one person who has given us direction and purpose; remember these people and what they did for you. Maybe some day someone will remember you for your positive contributions to their life.

September 23, 2014

“ID, Please”

Filed under: General — dfitzjarrell @ 10:13

In a forum I frequent the following question was raised:

We see some sqls that have been running for a long time in our prod DB, 11.2. when we check the v$session, i see null sql ids for them.
The sql_address shows up as 00. I would like to know what these sqls are doing because they correspond to processes that are burning up
the CPU and driving the usage to > 85%.

An interesting question, indeed. What are those NULL sql_ids doing, and are they actually NULL? To find out I set up an experiment using both a long series of insert statements and a PL/SQL block fetching 100,000 rows at a time. The results were surprising; let’s look at the example code and see what happened when it ran. Here is the code:


set echo on serveroutput on size 1000000

create table emp (empid number not null,
                  empname varchar2(40),
                  deptno        number);

set timing on

begin
        for i in 1..5000000 loop
                insert into emp
                values (i, 'Smorthorper'||i, mod(i, 40)+1);
        end loop;

        commit;
end;
/

set timing off

exec dbms_stats.gather_schema_stats('BING')

set timing on
declare
        type empcoltyp is table of emp%rowtype;
        emp_c empcoltyp;
        recct   number:=0;

        cursor get_emp_data is
        select * from emp;

begin
        open get_emp_data;
        loop
        fetch get_emp_data bulk collect into emp_c limit 100000;
        exit when get_emp_data%notfound;

        for i in 1..emp_c.count loop
                recct:=recct+1;
        end loop;
        dbms_output.put_line('Total records processed: '||recct);
        dbms_lock.sleep(3);
        end loop;
end;
/

set timing off

It’s a fairly simple script, really. I used the bulk collect and sleep to create a long-running fetch cycle to see if that displayed the observed behavior. Now, let’s run the script and see how the sql_id column in V$SESSION behaved. I used two sessions, one running the script and one to check the sql_id values from V$SESSION.  Session 1 generated the following output for the long-running insert:


SQL> create table emp (empid number not null,
  2                    empname varchar2(40),
  3                    deptno        number);

Table created.

SQL>
SQL> set timing on
SQL>
SQL> begin
  2          for i in 1..5000000 loop
  3                  insert into emp
  4                  values (i, 'Smorthorper'||i, mod(i, 40)+1);
  5          end loop;
  6
  7          commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

Elapsed: 00:02:43.27
SQL>
SQL> set timing off

Session 2 reported the following behavior for the sql_id column:


SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469               gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469               gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469               gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        782        8469 gc37k1290dm1c gc37k1290dm1c

SQL>

Notice that occasionally the sql_id becomes NULL then returns to its previous value, as reported in prev_sql_id. When long-running SQL statements are being processed it’s possible that the sql_id column can go NULL, and the sql_address be reported as 00, even though the same statement that was started minutes or even hours ago is still running. In such cases the prev_sql_id holds the sql_id for the currently running SQL. This doesn’t hold true for PL/SQL blocks, even with a sleep added. Session 1 reports the PLSQL block and it’s apparent progress::


SQL> set timing on
SQL> declare
  2          type empcoltyp is table of emp%rowtype;
  3          emp_c empcoltyp;
  4          recct   number:=0;
  5
  6          cursor get_emp_data is
  7          select * from emp;
  8
  9  begin
 10          open get_emp_data;
 11          loop
 12          fetch get_emp_data bulk collect into emp_c limit 100000;
 13          exit when get_emp_data%notfound;
 14
 15          for i in 1..emp_c.count loop
 16                  recct:=recct+1;
 17          end loop;
 18          dbms_output.put_line('Total records processed: '||recct);
 19          dbms_lock.sleep(3);
 20          end loop;
 21  end;
 22  /
Total records processed: 100000
Total records processed: 200000
Total records processed: 300000
Total records processed: 400000
Total records processed: 500000
Total records processed: 600000
Total records processed: 700000
Total records processed: 800000
Total records processed: 900000
Total records processed: 1000000
Total records processed: 1100000
Total records processed: 1200000
Total records processed: 1300000
Total records processed: 1400000
Total records processed: 1500000
Total records processed: 1600000
Total records processed: 1700000
Total records processed: 1800000
Total records processed: 1900000
Total records processed: 2000000
Total records processed: 2100000
Total records processed: 2200000
Total records processed: 2300000
Total records processed: 2400000
Total records processed: 2500000
Total records processed: 2600000
Total records processed: 2700000
Total records processed: 2800000
Total records processed: 2900000
Total records processed: 3000000
Total records processed: 3100000
Total records processed: 3200000
Total records processed: 3300000
Total records processed: 3400000
Total records processed: 3500000
Total records processed: 3600000
Total records processed: 3700000
Total records processed: 3800000
Total records processed: 3900000
Total records processed: 4000000
Total records processed: 4100000
Total records processed: 4200000
Total records processed: 4300000
Total records processed: 4400000
Total records processed: 4500000
Total records processed: 4600000
Total records processed: 4700000
Total records processed: 4800000
Total records processed: 4900000
Total records processed: 5000000

PL/SQL procedure successfully completed.

Elapsed: 00:02:34.16
SQL>
SQL> set timing off

Session 2 again is used to display the sql_id throughout the PL/SQL run; notice the sql_id does not go NULL at any time during the run [the output has been abbreviated to reduce the large number of repeated lines]:


SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

...
SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

...
SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

...
SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

SQL> /

        SID     SERIAL# SQL_ID        PREV_SQL_ID
----------- ----------- ------------- -------------
        717        6367 gjcn0r825u7wz 9babjv8yq8ru3

SQL>

At no time did the sql_id display as NULL during the PL/SQL execution. I suspect it’s because the repeated fetch operations, limited to 1/50th of the total data, kept the cursor, and its associated SQL statement, active the entire duration of the run.

It would appear (and I think I have a valid test case as proof) that during long-running SQL statements the sql_id can be reported as NULL even though the statement is still executing. In such cases the prev_sql_id usually provides the identifier for the currently running SQL. Of course not all NULL sql_ids are associated with long-running statements, and V$SESSION_LONGOPS does not, by default, report every long-running statement or operation to hit a database. The ‘long-running’ criteria according to Oracle is any statement that runs for more than 6 seconds. A partial list of operations that are recorded follows:


•Table scan
•Index Fast Full Scan
•Hash join
•Sort/Merge
•Sort Output
•Rollback
•Gather Table Index Statistics

NESTED LOOP JOINS are not automatically recorded in V$SESSION_LONGOPS although HASH JOINS are. It is possible to circumvent this by using the DBMS_APPLICATION_INFO.SET_SESSION_LONGOPS procedure to register your long-running operation with the view. Modifying the previous code to make V$SESSION_LONGOPS report interim activity:


declare
        type empcoltyp is table of emp%rowtype;
        emp_c empcoltyp;
        recct   number:=0;
        r_idx    BINARY_INTEGER;
        l_no      BINARY_INTEGER;
        ttlwrk number;
        so_far     number;
        obj       BINARY_INTEGER;

        cursor get_emp_data is
        select * from emp;

begin
        r_idx := dbms_application_info.set_session_longops_nohint;
        so_far := 0;
        ttlwrk := 50;
        select object_id
        into obj
        from user_objects
        where object_name = 'EMP';

        open get_emp_data;
        loop
        fetch get_emp_data bulk collect into emp_c limit 100000;
        exit when get_emp_data%notfound;

        while so_far < 50 loop
             for i in 1..emp_c.count loop
                recct:=recct+1;
             end loop;
             dbms_output.put_line('Total records processed: '||recct);
             dbms_lock.sleep(3);
             so_far := so_far + 1;
             dbms_application_info.set_session_longops(r_idx, l_no, 'Bounded fetch ', obj, 0, so_far, ttlwrk, 'EMP', 'fetches');
             end loop;
        end loop;
end;
/

V$SESSION_LONGOPS now reports progress for the bulk collect fetch operations:


SQL> select message from v$session_longops where sofar  totalwork;

MESSAGE
-----------------------------------------------------------------
Bounded fetch : EMP 113315: 1 out of 50 fetches done

SQL> /

MESSAGE
-----------------------------------------------------------------
Bounded fetch : EMP 113315: 2 out of 50 fetches done

SQL> /

MESSAGE
-----------------------------------------------------------------
Bounded fetch : EMP 113315: 3 out of 50 fetches done

SQL>
...

Oracle can exhibit ‘strange’ behavior with the sql_id values in V$SESSION, but just because the sql_id is NULL that doesn’t mean Oracle isn’t processing a long-running statement. It’s likely processing the same SQL reported in prev_sql_id so you can look to that column to find (possibly) active SQL. As a side note you can also get V$SESSION_LONGOPS to report on any operation you consider to be long using the example code as a guide. Obviously you can’t get V$SESSION_LONGOPS to report on ‘canned’ application operations if they don’t meet the reporting criteria (that would require the vendor to modify their code, not an easy task to accomplish). The idea to remember is that the sql_id for some long-running operations isn’t lost when the sql_id value goes NULL, it’s just been moved to the prev_sql_id column.

I need to see your ID, please.

September 8, 2014

Blooms

Filed under: Exadata,General,Performance — dfitzjarrell @ 09:41

In Oracle releases 10.2.0.x and later join processing can be made more efficient by the use of Bloom filters, primarily to reduce traffic between parallel query slaves. What is a Bloom filter? Named after Burton Howard Bloom, who came up with the concept in the 1970s, it’s an efficient data structure used to quickly determine if an element has a high probability of being a member of a given set. It’s based on a bit array that allows for rapid searches and returns one of two results: either the element is probably in the set (which can produce false positives) or the element is definitely not in the set. The filter cannot produce false negatives, and the incidence of false positives is relatively rare. Another advantage to Bloom filters is their small size relative to other data structures used for similar purposes (self-balancing binary search trees, hash tables, or linked lists). The possibility of false positives necessitates the addition of another filter to eliminate them from the results, yet such a filter doesn’t add appreciably to the process time and, therefore, goes relatively unnoticed.

When Bloom filters are used for a query on an Exadata system, the filter predicate and the storage predicate will list the SYS_OP_Bloom_FILTER function as being called. This function includes the additional filter to eliminate any false positives that could be returned. It’s the storage predicate that provides the real power of the Bloom filter on Exadata. Using a Bloom filter to pre-join the tables at the storage server level reduces the volume of data the database servers need to process and can significantly reduce the execution time of the given query.

An example of Bloom filters in action follows; a script named Bloom_fltr_ex.sql was executed to generate this output on an Exadata system; the script can be emailed on request:


SQL> --
SQL> -- Create sample tables
SQL> --
SQL> -- Create them parallel, necessary 
SQL> -- to get a Smart Scan on these tables
SQL> --
SQL> create table emp(
  2  	empid   number,
  3  	empnmvarchar2(40),
  4  	empsal  number,
  5  	empssn  varchar2(12),
  6  	constraint emp_pk primary key (empid)
  7  ) parallel 4;

Table created.

SQL>
SQL> create table emp_dept(
  2  	empid   number,
  3  	empdept number,
  4  	emploc  varchar2(60),
  5    constraint emp_dept_pk primary key(empid)
  6  ) parallel 4;

Table created.

SQL>
SQL> create table dept_info(
  2  	deptnum number,
  3  	deptnm  varchar2(25),
  4    constraint dept_info_pk primary key(deptnum)
  5  ) parallel 4;

Table created.

SQL>
SQL> --
SQL> -- Load sample tables with data
SQL> --
SQL> begin
  2  	     for i in 1..2000000 loop
  3  		     insert into emp
  4  		     values(i, 'Fnarm'||i, (mod(i, 7)+1)*1000, mod(i,10)||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||'-'||mod(i,10)||mod(i,10)||mod(i,10)||mod(i,10));
  5  		     insert into emp_dept
  6  		values(i, (mod(i,8)+1)*10, 'Zanzwalla'||(mod(i,8)+1)*10);
  7  		     commit;
  8  	     end loop;
  9  	     insert into dept_info
 10  	     select distinct empdept, case when empdept = 10 then 'SALES'
 11  					   when empdept = 20 then 'PROCUREMENT'
 12  					   when empdept = 30 then 'HR'
 13  					   when empdept = 40 then 'RESEARCH'
 14  					   when empdept = 50 then 'DEVELOPMENT'
 15  					   when empdept = 60 then 'EMPLOYEE RELATIONS'
 16  					   when empdept = 70 then 'FACILITIES'
 17  					   when empdept = 80 then 'FINANCE' end
 18  	     from emp_dept;
 19  
 20  end;
 21  /

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- Run join query using Bloom filter
SQL> --
SQL> -- Generate execution plan to prove Bloom
SQL> -- filter usage
SQL> --
SQL> -- Also report query execution time
SQL> --
SQL> set autotrace on
SQL> set timing on
SQL>
SQL> select /*+ Bloom join 2 parallel 2 use_hash(empemp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal, count(*) emps
  2  fromemp e join emp_depted on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  whereed.empdept = 20
  4  group by e.empid, e.empnm, d.deptnm, e.empsal;

     EMPID EMPNM                  DEPTNM                        EMPSAL       EMPS
---------- ---------------------- ------------------------- ---------- ----------
    904505 Fnarm904505            PROCUREMENT                     1000          1
    907769 Fnarm907769            PROCUREMENT                     3000          1
    909241 Fnarm909241            PROCUREMENT                     5000          1
    909505 Fnarm909505            PROCUREMENT                     3000          1
    909641 Fnarm909641            PROCUREMENT                     6000          1
    910145 Fnarm910145            PROCUREMENT                     6000          1
...
    155833 Fnarm155833            PROCUREMENT                     7000          1
    155905 Fnarm155905            PROCUREMENT                     2000          1
    151081 Fnarm151081            PROCUREMENT                     1000          1
    151145 Fnarm151145            PROCUREMENT                     2000          1

250000 rows selected.

Elapsed: 00:00:14.27

Execution Plan
----------------------------------------------------------
Plan hash value: 2313925751

------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name         | Rows | Bytes | Cost (%CPU)| Time     |   TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |              |  218K|    21M|  1378   (1)| 00:00:01 |       |      |            |
|   1 |  PX COORDINATOR                          |              |      |       |            |          |       |      |            |
|   2 |   PX SEND QC (RANDOM)                    | :TQ10003     |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY                         |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | PCWP |            |
|   4 |     PX RECEIVE                           |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,03 | PCWP |            |
|   5 |      PX SEND HASH                        | :TQ10002     |  218K|    21M|  1378   (1)| 00:00:01 | Q1,02 | P->P | HASH       |
|   6 |       HASH GROUP BY                      |              |  218K|    21M|  1378   (1)| 00:00:01 | Q1,02 | PCWP |            |
|*  7 |        HASH JOIN                         |              |  218K|    21M|  1376   (1)| 00:00:01 | Q1,02 | PCWP |            |
|   8 |         PX RECEIVE                       |              |  218K|    11M|   535   (1)| 00:00:01 | Q1,02 | PCWP |            |
|   9 |          PX SEND BROADCAST               | :TQ10001     |  218K|    11M|   535   (1)| 00:00:01 | Q1,01 | P->P | BROADCAST  |
|  10 |           NESTED LOOPS                   |              |  218K|    11M|   535   (1)| 00:00:01 | Q1,01 | PCWP |            |
|  11 |            BUFFER SORT                   |              |      |       |            |          | Q1,01 | PCWC |            |
|  12 |             PX RECEIVE                   |              |      |       |            |          | Q1,01 | PCWP |            |
|  13 |              PX SEND BROADCAST           | :TQ10000     |      |       |            |          |       | S->P | BROADCAST  |
|  14 |               TABLE ACCESS BY INDEX ROWID| DEPT_INFO    |    1 |    27 |     1   (0)| 00:00:01 |       |      |            |
|* 15 |                INDEX UNIQUE SCAN         | DEPT_INFO_PK |    1 |       |     1   (0)| 00:00:01 |       |      |            |
|  16 |            PX BLOCK ITERATOR             |              |  218K|  5556K|   534   (1)| 00:00:01 | Q1,01 | PCWC |            |
|* 17 |             TABLE ACCESS STORAGE FULL    | EMP_DEPT     |  218K|  5556K|   534   (1)| 00:00:01 | Q1,01 | PCWP |            |
|  18 |         PX BLOCK ITERATOR                |              | 1657K|    75M|   839   (1)| 00:00:01 | Q1,02 | PCWC |            |
|* 19 |          TABLE ACCESS STORAGE FULL       | EMP          | 1657K|    75M|   839   (1)| 00:00:01 | Q1,02 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("ED"."EMPID"="E"."EMPID")
  15 - access("D"."DEPTNUM"=20)
  17 - storage("ED"."EMPDEPT"=20)
       filter("ED"."EMPDEPT"=20)
  19 - storage(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))
       filter(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
     60  recursive calls
    174  db block gets
  40753  consistent gets
  17710  physical reads
   2128  redo size
9437983  bytes sent via SQL*Net to client
 183850  bytes received via SQL*Net from client
  16668  SQL*Net roundtrips to/from client
      6  sorts (memory)
      0  sorts (disk)
 250000  rows processed

SQL>

In less than 15 seconds 250,000 rows were returned from a three table join of over 4 million rows; the Bloom filter made a dramatic difference in how this query was processed and provided exceptional performance given the volume of data queried.

Without the storage level execution of the Bloom filter, the query execution time increased by 2.33 seconds, a 16.3 percent increase. For longer execution times, this difference can be significant. It isn’t the Bloom filter that gives Exadata such power with joins, it’s the fact that Exadata can execute it not only at the database level but also at the storage level, something commodity hardware configurations can’t do.

Of course Bloom filters are not exclusive to Exadata, as the following run of the same script on a non-Exadata system shows (only the abbreviated output section is reproduced here):


SQL> select /*+ Bloom join 2 parallel 2 use_hash(emp emp_dept) */ e.empid, e.empnm, d.deptnm, e.empsal	  --, count(*) emps
  2  from emp e join emp_dept ed on (ed.empid = e.empid) join dept_info d on (ed.empdept = d.deptnum)
  3  where ed.empdept = 20;

     EMPID EMPNM                                    DEPTNM                        EMPSAL                                                              
---------- ---------------------------------------- ------------------------- ----------                                                              
   1670281 Fnarm1670281                             PROCUREMENT                     5000                                                              
   1670289 Fnarm1670289                             PROCUREMENT                     6000                                                              
   1670297 Fnarm1670297                             PROCUREMENT                     7000                                                              
   1670305 Fnarm1670305                             PROCUREMENT                     1000                                                              
   1670313 Fnarm1670313                             PROCUREMENT                     2000                                                              
   1670321 Fnarm1670321                             PROCUREMENT                     3000                                                              
   1670329 Fnarm1670329                             PROCUREMENT                     4000                                                              
   1670337 Fnarm1670337                             PROCUREMENT                     5000                                                              
   1670345 Fnarm1670345                             PROCUREMENT                     6000                                                              
   1670353 Fnarm1670353                             PROCUREMENT                     7000                                                              
   1670361 Fnarm1670361                             PROCUREMENT                     1000                                                              
...
   1857369 Fnarm1857369                             PROCUREMENT                     4000                                                              
   1857377 Fnarm1857377                             PROCUREMENT                     5000                                                              
   1857385 Fnarm1857385                             PROCUREMENT                     6000                                                              

250000 rows selected.

Elapsed: 00:00:54.86

Execution Plan
----------------------------------------------------------                                                                                            
Plan hash value: 2643012915                                                                                                                           
                                                                                                                                                      
----------------------------------------------------------------------------------------------------------------------------------                    
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |                    
----------------------------------------------------------------------------------------------------------------------------------                    
|   0 | SELECT STATEMENT                     |              |   273K|    26M|  1378   (1)| 00:00:01 |        |      |            |                    
|   1 |  PX COORDINATOR                      |              |       |       |            |          |        |      |            |                    
|   2 |   PX SEND QC (RANDOM)                | :TQ10002     |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |                    
|*  3 |    HASH JOIN                         |              |   273K|    26M|  1378   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   4 |     PX RECEIVE                       |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
|   5 |      PX SEND BROADCAST               | :TQ10001     |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | P->P | BROADCAST  |                    
|   6 |       NESTED LOOPS                   |              |   273K|    13M|   536   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|   7 |        BUFFER SORT                   |              |       |       |            |          |  Q1,01 | PCWC |            |                    
|   8 |         PX RECEIVE                   |              |       |       |            |          |  Q1,01 | PCWP |            |                    
|   9 |          PX SEND BROADCAST           | :TQ10000     |       |       |            |          |        | S->P | BROADCAST  |                    
|  10 |           TABLE ACCESS BY INDEX ROWID| DEPT_INFO    |     1 |    27 |     1   (0)| 00:00:01 |        |      |            |                    
|* 11 |            INDEX UNIQUE SCAN         | DEPT_INFO_PK |     1 |       |     1   (0)| 00:00:01 |        |      |            |                    
|  12 |        PX BLOCK ITERATOR             |              |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWC |            |                    
|* 13 |         TABLE ACCESS FULL            | EMP_DEPT     |   273K|  6947K|   535   (1)| 00:00:01 |  Q1,01 | PCWP |            |                    
|  14 |     PX BLOCK ITERATOR                |              |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWC |            |                    
|* 15 |      TABLE ACCESS FULL               | EMP          |  2099K|    96M|   840   (1)| 00:00:01 |  Q1,02 | PCWP |            |                    
----------------------------------------------------------------------------------------------------------------------------------                    
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   3 - access("ED"."EMPID"="E"."EMPID")                                                                                                               
  11 - access("D"."DEPTNUM"=20)                                                                                                                       
  13 - filter("ED"."EMPDEPT"=20)                                                                                                                      
  15 - filter(SYS_OP_Bloom_FILTER(:BF0000,"E"."EMPID"))                                                                                               
                                                                                                                                                      
Note                                                                                                                                                  
-----                                                                                                                                                 
   - dynamic sampling used for this statement (level=2)                                                                                               


Statistics
----------------------------------------------------------                                                                                            
         33  recursive calls                                                                                                                          
        139  db block gets                                                                                                                            
      36224  consistent gets                                                                                                                          
      17657  physical reads                                                                                                                           
          0  redo size                                                                                                                                
    9526012  bytes sent via SQL*Net to client                                                                                                         
     183846  bytes received via SQL*Net from client                                                                                                   
      16668  SQL*Net roundtrips to/from client                                                                                                        
          5  sorts (memory)                                                                                                                           
          0  sorts (disk)                                                                                                                             
     250000  rows processed                                                                                                                           

SQL>

The only differences between the two plans involve the storage component of Exadata. Notice also the vast difference in elapsed times between Exadata and non-Exadata systems; Exadata ran the query returning 250,000 rows in just over 14 seconds and the non-Exadata system returned the same volume of data in just under 55 seconds. Both times are substantial improvements over not using a Bloom filter so any hardware running 11.2.0.2 or later will benefit from Bloom filters.

Of course there is a ‘gotcha’ with Bloom filters, and Oracle recognizes this as a bug in releases older than 12.1.0.2. Hash left outer joins, hash right outer joins and hash full outer joins do not use a Bloom Filter and for those interested the unpublished bug number is 17716301. MOS document 1919508.1 reports details on this behavior including an example. As reported in that document the patch can be backported to earlier releases when an SR is filed requesting it.

Bloom filters are a welcome addition to both Exadata and non-Exadata installations as they can improve query performance with parallel queries, as evidenced here. Now that you know how to spot when they are used you may find they are utilized more often than you expected.

It’s a bloomin’ miracle.

August 17, 2014

“You’re A Real ACE”

Filed under: General — dfitzjarrell @ 13:40

Recently I was declared, for want of a better word, an Oracle ACE. I was nominated by someone I respect and admire; just the nomination itself was enough, really, to make my day. When I received notification that I had been selected I was truly surprised. I immediately thanked my nominator, who told me:


                     "Still, it is just a nomination form, and you did all the work that
                     made it so impressive. Congratulations!"

Honestly, it wasn’t, and isn’t, work. It’s a pleasure to share my knowledge with others, which is why I blog, write for http://www.databasejournal.com twice a month and wrote “Oracle Exadata Survival Guide”.

Being an Oracle ACE is an honor, but it’s also a responsibility. What got me here was writing and participating in discussion groups, and that won’t change. Knowledge is to be shared, not hoarded. What you know does no one else any good if you don’t share that knowledge and experience. If Da Vinci had kept his notes to himself, if Newton hadn’t published his Laws of Motion, if Copernicus has kept quiet our world may have been quite different. It’s because these people had the foresight to look deeper into our world and then tell us what they found that puts us where we are today. It’s only right that we, as beneficiaries of the knowledge others before us shared, share our knowledge no matter how unimportant it may seem. Someone, somewhere, will find it useful and will be grateful that we did.

It may be that our knowledge is gained equally by insight and by mistakes. I know that I’ve made mistakes I’ve learned from, and I know others have done the same. Norton Juster, an architect who wrote a wonderful book named “The Phantom Tollbooth”, said it best:


                     "You may not see it now," said the Princess of Pure Reason, looking
                     knowingly at Milo's puzzled face, "but whatever we learn has a
                     purpose and whatever we do affects everything and everyone else, if
                     even in the tiniest way. Why, when a housefly flaps his wings, a
                     breeze goes round the world; when a speck of dust falls to the
                     ground, the entire planet weighs a little more; and when you stamp
                     your foot, the earth moves slightly off its course. Whenever you
                     laugh, gladness spreads like the ripples in a pond; and whenever
                     you're sad, no one anywhere can be really happy. And it's much the
                     same thing with knowledge, for whenever you learn something new,
                     the whole world becomes that much richer."

                     "And remember, also," added the Princess of Sweet Rhyme, "that many
                     places you would like to see are just off the map and many things 
                     you want to know are just out of sight or a little beyond your 
                     reach. But someday you'll reach them all, for what you learn today, 
                     for no reason at all, will help you discover all the wonderful 
                     secrets of tomorrow."

I think this sums up being an Oracle ACE better than any words I could put on a page. I keep this quote with me, to remind me that mistakes aren’t the end of the world if you learn from them and move on. And what I learn from those mistakes may help others so they don’t make those same mistakes.

I love sharing what I know; I’ve been doing it for over 16 years now, in various forums, some that are no longer as popular as they once were. I never realized how far my commentary reached until I became an Oracle ACE; I have received congratulations and comments that I never expected, mostly of the ‘it’s about time’ sentiment. Simply because you don’t see the gratitude of others who benefit from your knowledge doesn’t mean that gratitude doesn’t exist. I see now that it does, and I am humbled by it.

“Oracle ACE”. To be among those who have been graced by this honor gives me pause. It’s a good pause, awesome if I may be allowed to say it. And it’s an honor that will only make me work harder so that I can live up to such high standards.

Now, back to our regularly scheduled programming.

July 22, 2014

That’s … Huge!

Filed under: Exadata,General,Performance — dfitzjarrell @ 14:27

Recently I’ve noticed the occasional thread in Oracle newsgroups and lists asking about hugepages support in Linux, including ‘best practices’ for hugepages configuration. This information is out on that ‘world-wide web’ in various places; I’d rather put a lot of that information in this post to provide an easier way to get to it. I’ll cover what hugepages are, what they do, what they can’t do and how best to allocate them for your particular installation. Let’s get started.

“Normal” memory pages in Linux are 4 KB in size and are allocated as needed where the memory map will allow so they are likely not contiguous. Hugepages, in comparison, are 2 MB pages locked in memory and are allocated in a contiguous ‘chunk’; these are allocated at boot time using a parameter in the /etc/sysctl.conf file named vm.nr_hugepages for RedHat Linux kernel 2.6 and a parameter named vm.hugetlb_pool for RedHat Linux kernel 2.4. You must remember that each page is 2 MB is size, as this affects how many hugepages you need to allocate to cover all of the SGAs of running Oracle databases. Set this too high and your system might not boot; set this too low and you won’t get the benefit of using hugepages in one or more Oracle instances. [When use_large_pages is set to ONLY and insufficient hugepages are available the database won't start; changing use_large_pages to TRUE allows the database to start but it won't be using hugepages, which results in more paging activity, lowering performance.] Since hugepages are contiguous it stands to reason that performance will improve since Oracle won’t need to access memory segments across the entire range of installed memory. Hugepages also reduce the size of ‘page tables’ by reducing the overhead for page tables for each connection to the database. Page tables are part of the Linux memory management, created and used to map a virtual address to a physical memory location. For SGAs 100 MB and greater using hugepages is probably a good idea, and that idea gets better as the SGA size increases. Kevin Closson reported back in 2009 on the page table overhead for an Oracle database with an 8000 MB SGA running 500 simultaneous connections. The page table overhead reported was startling — almost 6.8 GB of memory used for page tables when hugepages were not configured. He ran the same test again, on the same Oracle database, only that test was run after hugepages were allocated and used by the Oracle database. The page table overhead was reduced to slightly less than 23 MB. That is a dramatic change in memory usage, one that can definitely improve performance.

Hugepages are available for IPC (Inter-Process Communication) shared memory; this is the ‘standard’ shared memory model (starting with UNIX System V) allowing multiple processes to access the same shared memory segment. There is also another form of shared memory segment, the memory-mapped file, and currently such segments are not supported by hugepages. Oracle, on Linux, gives you a choice of using hugepages or memory-mapped files and you implement that choice by selecting to use (or not use) Automatic Memory Management (AMM). Choosing AMM disallows hugepages since AMM uses memory-mapped files located in /dev/shm; you cannot use AMM and hugepages together. For databases with SGAs up to 100 MB using AMM is probably a good idea; a script to determine how many hugepages to allocate won’t consider SGAs of less than 100 MB. For Oracle databases with larger SGAs hugepages is, in my opinion, the choice to make as you can significantly reduce shared memory overhead where page tables are concerned.

Calculating how many hugepages to allocate has been a difficult task in the past; Oracle supplies a script that can calculate that value for you provided you have all of the databases that you want to use hugepages running; the script from Oracle won’t consider databases you have created that aren’t up and available as it uses the ipcs -m command to return the allocated shared memory segments. The script is available from My Oracle Support in Doc ID 401749.1. It generates a list of IPC shared memory segment sizes owned by Oracle and computes the total number of hugepages required. It outputs the result to be used in setting the vm.nr_hugepages parameter in /etc/sysctl.conf. Once that setting is modified the system has to be rebooted before the changes are implemented. Sample output from that script is shown below:


This script is provided by Doc ID 401749.1 from My Oracle Support
(http://support.oracle.com) where it is intended to compute values for
the recommended HugePages/HugeTLB configuration for the current shared
memory segments. Before proceeding with the execution please note following:
 * For ASM instance, it needs to configure ASMM instead of AMM.
 * The 'pga_aggregate_target' is outside the SGA and
   you should accommodate this while calculating SGA size.
 * In case you changes the DB SGA size,
   as the new SGA will not fit in the previous HugePages configuration,
   it had better disable the whole HugePages,
   start the DB with new SGA size and run the script again.
And make sure that:
 * Oracle Database instance(s) are up and running
 * Oracle Database 11g Automatic Memory Management (AMM) is not setup
   (See Doc ID 749851.1)
 * The shared memory segments can be listed by command:
     # ipcs -m


Press Enter to proceed...

Recommended setting: vm.nr_hugepages = 25240

For Linux RedHat kernel 2.6 this is the number of hugepages, not the overall size of the memory allocated for those hugepages. For RedHat kernel 2.4 a different parameter, vm.hugetlb_pool, is set to the actual size that pool needs to be. Fortunately the script knows which kernel is running and reports the correct parameter and value to set.

After you have hugepages allocated you need to tell Oracle to use them exclusively. An init parameter, use_large_pages, should be set to ONLY to keep Oracle from allocating conventionally managed memory segments for the SGA. If your hugepages calculations are correct then all of the databases on your database server will start without error. Allocate too few hugepages and databases with use_large_pages set to ONLY won’t start after all of the existing hugepages are used. Setting use_large_pages to TRUE will allow a database to start without using hugepages but that’s not what you want to get better performance from Oracle.

Earlier I mentioned that allocating too many hugepages can be worse than allocating too few hugepages. Since each hugepage consumes 2 MB of space multiplying the vm.nr_hugepages setting by 2097152 will show how much memory is occupied by hugepages. It is possible to try to allocate all of the system memory as hugepages resulting in a system that will not boot in multi-user mode. The kernel isn’t configured to use hugepages; the more hugepages that are allocated the less conventionally managed memory is available. Take it all away and there is no place for the operating system to start necessary processes after the kernel is loaded at boot time. Using single-user mode will allow the system adminstrator to modify the offending entry and change it to a more reasonable value so the system can be rebooted. Of course you don’t want to end up in that situation; manually calculating such a value can result in errors so use the Oracle-supplied script to calculate the hugepages value you actually need.

Exadata systems run best when hugepages are allocated and used exclusively. The exachk script checks for such settings in each running database on the system and reports those that do not have use_large_pages set to ONLY. It also checks if the currently configured hugepages setting meets or exceeds the estimated number of hugepages required by all running instances on the given database (compute) node. If this check fails it most likely will fail for all available nodes on the system. Failing this check isn’t the end of the world but it does mean that the failing databases aren’t using memory as efficiently as they could. For each node run the script, make the reported changes then reboot. Before starting any of the databases ensure that all have use_large_pages set to ONLY; start the databases across the cluster using srvctl. The next run ef exachk should report that the hugepages check passed.

RAC databases need to be configured the same way Exadata databases are configured (since Exadata databases are RAC databases). Modify kernel parameters on every database server in the cluster, then reboot. Modify the spfile for each database in the cluster then use srvctl to start those databases cluster-wide. Presuming the hugepages settings are correct across the cluster there should be no error when starting the databases.

Using hugepages for Oracle databases on Linux is, to me, the only way to go if you want to increase database performance. Of course, if you have loads memory to spare you may not see the need. Most systems I’ve worked on don’t have that luxury so being as ‘frugal’ with memory as you can is good. Making the most of available resources is key and hugepages on Linux can definitely help with respect to Oracle.

Size does matter.

July 15, 2014

It Pays To Be Smart

Filed under: Exadata,General — dfitzjarrell @ 09:40

Exadata is a powerful system, able to provide exceptional performance. Much of this peformance is due to Smart Scans, Exadata’s mechanism for shifting the workload to the storage cells and having them filter and reduce the amount of data the database servers must process. Not every Smart Scan that starts ends up completing, though. Oracle may decide that a different path, one that doesn’t include a Smart Scan, may be more efficient. Certain conditions must be met and there is a metric that records Smart Scan starts and another that records how many of those starts actually execute Smart Scans. Let’s look at why Oracle may decide a Smart Scan isn’t the ‘smartest’ route to the data and which metrics you can use to see how many Smart Scans end up using another execution path.

Three conditions can trigger Oracle foregoing a smart scan: setting optimizer_mode to either FIRST_ROWS or FIRST_ROWS_n or executing a query using the ‘where rownum’ predicate. Oracle originally opts to do the Smart Scan then can perform a few block I/O operations to see if that pathway can satisfy the query more efficiently. When that happens a metric, cell num fast response sessions, is incremented. If the block I/O is indeed faster Oracle foregoes the Smart Scan and completes the operation with conventional block I/O. If the block I/O proves that correct results won’t be returned to the calling session in a timely fashion Oracle resumes the Smart Scan and continues processing. Simply because this counter is incremented does not mean, by itself, that Oracle didn’t execute a Smart Scan. You’ll need another metric for that.

When Oracle decides to return to Smart Scan execution another metric is incremented, cell num fast response sessions continuing to smart scan. The same conditions that triggered the previous metric also apply here. Thus when Oracle decides to forego a Smart Scan only the previous metric, cell num fast response sessions, is incremented.

It’s fairly easy to see how many Smart Scans followed through and how many didn’t with the following query:


select a.value smart_scan_started, b.value smart_scan_cont, a.value - b.value no_smart_scan
from v$sysstat a, v$sysstat b
where a.statistic#=262
and b.statistic#=263
/

You may find that none of the Smart Scans that were started actually went on to finish:


SMART_SCAN_STARTED SMART_SCAN_CONT NO_SMART_SCAN
------------------ --------------- -------------
              1222               0          1222

Or you may find that all of them finished:


SMART_SCAN_STARTED SMART_SCAN_CONT NO_SMART_SCAN
------------------ --------------- -------------
               407             407             0

You may also find that some did, and some didn’t, continue:


SMART_SCAN_STARTED SMART_SCAN_CONT NO_SMART_SCAN
------------------ --------------- -------------
                53              27            26

What is important to note is that simply because Oracle started a Smart Scan then chose to not continue on with it doesn’t indicate there is any problem with your Exadata system. On the contrary it indicates that Exadata is doing exactly what it’s supposed to do, efficiently return query results to the calling session. That may be through a Smart Scan, it may be through conventional I/O. Think of it as adjusting your route to a destination to avoid heavy traffic. If Oracle can get the correct results through a few conventional I/O operations it will do so; if a Smart Scan is necessary Oracle can again ‘change gears’ and resume it to ensure correct data is sent to the calling session. It all comes down to using the right tool for the job.

Smart Scans are great when you need them; it’s also good to know that Oracle, on Exadata, can decide when to use them and when it’s best not to. It’s also good to realize that when that happens it’s not because of a problem with Exadata. It’s simply Oracle choosing the right method to get the correct data. Sometimes the shorter path is the better path.

Robert Frost may have said it best:


 
"Two roads diverged in a wood, and I—  
I took the one less traveled by,  
And that has made all the difference." 

Oracle, on Exadata, may take the ‘road less traveled by’ when it comes to choosing a Smart Scan. That’s normally nothing to worry about as Oracle is selecting the most efficient path to return the data.

Anyone for a stroll?

July 1, 2014

What A Performance!

Filed under: General,Performance,stats — dfitzjarrell @ 10:09

Performance is one of the big watchwords in tht IT industry; nowhere is it more often used than in the database arena, and it’s usually not a good sign. End users know nothing, really, of the inner workings of the database yet they are the first to point the ‘finger of blame’ at the database when the processing speed slows or queries take longer than they would like to return data. [For end users performance is time, and the longer a process takes the more dismal performance seems to be.] Yes, there ARE times when the database is at fault — ‘missing’ indexes or stale statistics can cause execution plans to change, as can inserts and updates to heavily transacted tables (which many times goes back to stale statistics since they now no longer accurately reflect the data and its distribution in the table). Bugs in the database software can also cause issues by causing the gathered statistics to miscalculate values or generate ‘useless’ frequency histograms, such as on single-valued not null columns. Such histograms can direct the optimizer to choose a path it wouldn’t normally choose when the statistics and histograms contain useful data.

Here is where the problem can get complicated: it may not be the database at fault when performance is down. Yes, it’s the database server you’re geting data from. Yes, things were running just fine until today. By ‘logical’ deduction it’s the database at fault, right? Not necessarily. The database also has to deal with the operating system, the storage system, the available memory and the network to get its work done. Problems in any of those areas are likely to adversely affect database performance. So it’s not necesarily the database causing the problem. How do you know where the problem originates? Oracle does provide some tools to help in assessing where a problem is likely to be, but you may also need to involve System Administrators, Network Administrators and/or Storage Administrators to fully investigate some issues. Let’s take a look at the Oracle supplied tools, along with some operating system utilities that can be used to diagnose performance issues.

At the first sign of trouble many look to the DBA to investigate the database for bottlenecks or other performance-robbing possibilities that surely must exist. Oracle provides at least two tools in releases 10.1 and later to investigate the database for performance problems — the AWR report, a general overall ‘picture’ of the database ‘health’ that can give direction to such a quest and the ASH report, a more detailed ‘picture’ for a given window of activity that provides more detailed wait information. These utilities should be used at the first report of performance issues to determine if the problem is database-related or caused by another area, such as the operating system or network. Starting with the AWR (Automatic Workload Repository) report the two sections to begin an investigation, in my estimation, are Wait Events Statistics and SQL Statistics. The first provides a system-wide view of the waits encountered and the times those waits consumed. Background Wait Events and the Wait Event Histogram sections can provide some insight into what was occurring during the report window; waits to watch are I/O related waits (reads, writes) which can signal issues with a query plan or issues with the storage (issues not addressable by the DBA). The SQL Statistics list SQL statements in order of several areas, including overall elapsed time, consumed CPU time, I/O wait time and physical reads. The SQL statistics can usually point to ‘bad’ plans, making performance problems a database issue because such plans are the result of bad or stale statistics. In 11.2 and later releases dynamic sampling can mitigate missing statistics by automatically sampling the data prior to generating a query execution plan. We won’t get into the details of using these reports to effect database tuning procedures as that’s a topic for another article. The intent here is to provide a means to determine IF the problem is a database issue or if it’s caused by outside ‘forces’. For releases earlier than 10.1 Oracle provides Statspack, the pre-cursor to AWR and the next step after the original database snapshot utilities utlbstat and utlestst. This utiility can provide a very good snapshot of database activity, although not quite as robust as the AWR report. It is still available in 11.2 if you would like to install it. AWR and ASH are utilities which are part of the Diagnostic and Tuning Pack for OEM, and thus need to be licensed if they are to be used. Statspack doesn’t require such a license.

The “Top 5 Timed Foreground Events” section gives an overview of the wait events which consumed the most time during the report window:

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU   0 82.42  
log file sync 8 0 11 30.72 Commit
Disk file operations I/O 1 0 1 0.33 User I/O
os thread startup 0 0   0.00 Concurrency
log file single write 0 0   0.00 System I/O

Host CPU (CPUs: 4 Cores: 2 Sockets: 1)

This provides a quick look at the ‘worst offenders’ in the wait event category (although they may not be performance-robbing events). Start here then work down to the Operating System Statistics section:

Operating System Statistics – Detail

Snap Time Load %busy %user %sys %idle %iowait
20-Jun 07:04:42 0.00          
20-Jun 08:00:45 0.00 7.12 4.65 2.48 92.88 0.00

The Wait Event Histogram Detail can be very informative:

Wait Event Histogram Detail (64 msec to 2 sec)

  • Units for Total Waits column: K is 1000, M is 1000000, G is 1000000000
  • Units for % of Total Waits: ms is milliseconds s is 1024 milliseconds (approximately 1 second)
  • % of Total Waits: total waits for all wait classes, including Idle
  • % of Total Waits: value of .0 indicates value was <.05%; value of null is truly 0
  • Ordered by Event (only non-idle events are displayed)
    % of Total Waits
Event Waits 64ms to 2s <32ms <64ms <1/8s <1/4s <1/2s <1s <2s >=2s
ADR block file read 3 81.3 12.5 6.3          
Disk file operations I/O 1 98.2 1.8            
control file parallel write 15 98.7 .8 .3 .1        
control file sequential read 139 94.0 4.7 .8 .3 .1   .0  
db file parallel read 2   50.0   50.0        
db file parallel write 185 91.4 5.7 1.8 .8 .1 .2    
db file scattered read 63 93.5 5.0 1.2 .3        
db file sequential read 142 94.2 4.5 1.2 .1 .0      
direct path read 1     100.0          
enq: CR – block range reuse ckpt 16 82.2 15.6   1.1   1.1    
log file parallel write 82 92.5 4.6 2.6 .3 .1      
log file switch completion 2     50.0 50.0        
log file sync 1 90.0 10.0            

Wait Event Histogram Detail (4 sec to 2 min)

No data exists for this section of the report.

Wait Event Histogram Detail (4 min to 1 hr)

No data exists for this section of the report.

Service Statistics

  • ordered by DB Time
Service Name DB Time (s) DB CPU (s) Physical Reads (K) Logical Reads (K)
SYS$USERS 0 0 0 2
SYS$BACKGROUND 0 0 8 436
smedley 0 0 0 0
smedleyXDB 0 0 0 0

Such histograms can show where the preponderance of waits lie making it easier to determine if they are problematic.

Next on the list is the SQL Statistics report. There are several sections, the one shown is ordered by elapsed time:

SQL ordered by Elapsed Time

  • Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
  • % Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
  • %Total – Elapsed Time as a percentage of Total DB time
  • %CPU – CPU Time as a percentage of Elapsed Time
  • %IO – User I/O Time as a percentage of Elapsed Time
  • Captured SQL account for 1.1E+04% of Total DB Time (s): 0
  • Captured PL/SQL account for 2.3E+03% of Total DB Time (s): 0
Elapsed Time (s) Executions Elapsed Time per Exec (s) %Total %CPU %IO SQL Id SQL Module SQL Text
5.88 1 5.88 2070.31 15.66 79.60 acc988uzvjmmt   delete from WRH$_SYSMETRIC_HIS…
4.60 22 0.21 1618.80 27.16 64.39 70vs1d7ywk5m0 MMON_SLAVE begin dbms_stats.copy_table_st…
2.49 44 0.06 876.07 10.66 83.39 2mp99nzd9u1qp   delete from histgrm$ where obj…
2.14 769 0.00 753.49 4.38 96.01 db78fxqxwxt7r   select /*+ rule */ bucket, end…
1.57 212 0.01 551.64 21.91 74.49 43ruajmwv9fus   insert into histgrm$(obj#, int…
1.42 1 1.42 501.54 7.67 86.00 0sfsrf0qzvsjv   delete from WRH$_SYSMETRIC_SUM…
1.19 22 0.05 418.37 2.63 93.41 d89c1mh5pvbkz MMON_SLAVE SELECT /*+ rule */ U.NAME, OT….
0.83 1 0.83 291.84 13.18 77.73 2whm2vvjb98k7   delete from WRH$_SQL_PLAN tab …
0.80 22 0.04 281.90 25.34 67.82 8sd37ztuwkyyc MMON_SLAVE begin dbms_swrf_internal.massa…
0.66 1 0.66 231.35 2.38 98.81 28bm5y53nujvz   delete from WRH$_RESOURCE_LIMI…
0.54 1 0.54 188.64 0.00 91.89 4y3yvmbav7uwb   delete /*+ UNNEST(@SEL$2) UNNE…
0.53 22 0.02 185.34 0.00 96.67 63kf3an2j0pkc   insert into sys.wri$_optstat_h…
0.51 1 0.51 179.21 9.20 75.31 96w2cbx7ntmtv   delete from WRH$_BG_EVENT_SUMM…
0.50 1 0.50 175.27 6.27 94.54 31sdu97k3baq1   delete from WRH$_SHARED_POOL_A…
0.49 22 0.02 171.58 12.81 66.96 480utgnq0t6mf MMON_SLAVE begin dbms_stats.unlock_partit…
0.44 1 0.44 156.14 17.60 80.45 1h7zt6jks80pa   delete from WRH$_ENQUEUE_STAT …
0.42 1 0.42 146.27 15.03 94.38 bzscyq07w79ab   delete /*+ dynamic_sampling(4)…
0.38 2,361 0.00 133.20 24.75 80.65 96g93hntrzjtr   select /*+ rule */ bucket_cnt,…
0.35 1 0.35 123.71 4.44 94.32 9v9n97qj8z1dg   delete /*+ dynamic_sampling(4)…
0.35 22 0.02 122.04 27.02 66.45 2rn08n06qr261   update histgrm$ set obj# = dec…
0.34 1 0.34 121.06 0.00 96.73 6ajkhukk78nsr   begin prvt_hdm.auto_execute( :…
0.31 1 0.31 107.67 0.00 97.51 48ffykkmk7sgc   delete /*+ dynamic_sampling(4)…
0.28 1 0.28 99.65 0.00 79.94 350myuyx0t1d6   insert into wrh$_tablespace_st…
0.27 375 0.00 94.36 17.47 87.22 bvn085xhxfn9f   insert into sys.wri$_optstat_h…
0.26 161 0.00 90.72 6.06 90.14 74anujtt8zw4h   select o.owner#, o.name, o.nam…
0.25 1 0.25 86.79 6.33 95.67 47x8f8z2hx1a2   delete from WRH$_MUTEX_SLEEP t…
0.24 1 0.24 85.22 6.45 92.64 az8cj7dfnpp45   delete from WRH$_MEM_DYNAMIC_C…
0.23 1 0.23 81.60 6.73 41.10 3mqvkt9as1phq   merge /*+ dynamic_sampling(mm…
0.22 1 0.22 76.17 0.00 87.96 du2bqfpfj9au2   delete from WRH$_LIBRARYCACHE …
0.21 1 0.21 73.67 14.92 93.36 b7ba0zt64wtc7   delete from WRH$_RSRC_CONSUMER…
0.20 44 0.00 71.59 7.68 71.51 as3uq6ggb3gx6   delete from hist_head$ where o…
0.20 89 0.00 71.52 30.73 92.39 3ktacv9r56b51   select owner#, name, namespace…
0.20 10 0.02 71.14 0.00 99.04 c6awqs517jpj0   select /*+ index(idl_char$ i_i…
0.20 1,098 0.00 70.92 23.24 58.47 3c1kubcdjnppq   update sys.col_usage$ set equa…
0.19 22 0.01 68.12 56.47 8.79 9dy457uf4uxzd MMON_SLAVE SELECT /*+ all_rows */ ‘"’||UI…
0.19 1 0.19 67.57 0.00 0.00 f318xdxdn0pdc   insert into wrh$_log (snap_id,…
0.19 113 0.00 67.01 0.00 84.34 59vjj34vugaav   delete from obj$ where obj# = …
0.18 10 0.02 64.60 0.00 96.35 cvn54b7yz0s8u   select /*+ index(idl_ub1$ i_id…
0.18 375 0.00 63.34 43.38 53.87 95mpkn5xz9001   insert into hist_head$(obj#, i…
0.18 22 0.01 62.59 0.00 71.91 dr277b6yv83uy MMON_SLAVE SELECT COLUMN_NAME FROM DBA_PA…
0.18 671 0.00 62.17 17.68 0.00 5ms6rbzdnq16t   select job, nvl2(last_date, 1,…
0.18 1 0.18 61.81 8.89 96.19 7f8bfqbjxdssr   delete from WRH$_LOG tab where…
0.17 1 0.17 61.43 17.89 95.77 7g732rx16j8jc   insert into WRH$_SERVICE_STAT …
0.16 48 0.00 56.37 48.74 48.27 4h8danbxj5nz6   begin dbms_utility.validate(:b…
0.16 1 0.16 54.88 80.11 0.00 dayq182sk41ks   insert into wrh$_memory_target…
0.15 252 0.00 54.58 30.21 39.65 7ng34ruy5awxq   select i.obj#, i.ts#, i.file#,…
0.15 1 0.15 54.51 80.65 0.00 bm2pwrpcr8ru6   select sga_size s, sga_size_fa…
0.15 1 0.15 51.42 106.87 0.00 bunssq950snhf   insert into wrh$_sga_target_ad…
0.14 1 0.14 49.59 0.00 93.41 084z6qq3wwkdc   delete from WRH$_SQL_WORKAREA_…
0.14 1 0.14 48.68 0.00 90.79 17g3cqghntwb6   delete from WRH$_PGA_TARGET_AD…
0.13 22 0.01 44.60 0.00 35.38 7kc5p5bw4uz55   update tabpart$ set dataobj# =…
0.12 1 0.12 41.02 26.79 91.44 83hpfc7nhjpvg   delete from WRH$_IOSTAT_FILETY…
0.12 22 0.01 40.72 121.45 16.37 267s83vd49cfa MMON_SLAVE SELECT PARTITION_NAME FROM DBA…
0.11 66 0.00 40.29 13.64 87.09 9fg7fjbwmv7v8   insert into sys.wri$_optstat_t…
0.11 1 0.11 39.34 13.97 65.25 9xcfxn794wp5h   select 1 from WRH$_LATCH_CHILD…
0.11 1 0.11 37.29 0.00 98.77 3hy3nssu9shxj   select /*+ FIRST_ROWS(1) PARAL…
0.10 6 0.02 36.11 0.00 63.61 9wncfacx0nj9h   insert into smon_scn_time (thr…
0.10 2 0.05 35.13 0.00 1.98 934ur8r7tqbjx   SELECT DBID FROM V$DATABASE
0.10 1 0.10 33.93 0.00 74.29 1v44r7vam2wbt   delete from WRH$_IOSTAT_FUNCTI…
0.10 1 0.10 33.68 0.00 96.85 cs78htw2dhb0x   delete from WRH$_IC_DEVICE_STA…
0.09 66 0.00 33.17 0.00 92.58 gc7b0drtzbyc6   select max(intcol#) from hist_…
0.09 89 0.00 33.07 0.00 92.64 8swypbbr0m372   select order#, columns, types …
0.09 1 0.09 32.97 16.66 86.94 8h77nkt8bnukh   delete from WRH$_IOSTAT_DETAIL…
0.09 1 0.09 31.42 17.49 83.52 7gfhbhxrxx8wf   delete /*+ dynamic_sampling(4)…

This section can help determine which queries may be creating performance issues; using the sql_id and text you can retrive the plans to see if a rewrite may improve the situation. It can also show where statistics may not be current, although it would also be a good idea to run a 10053 trace against each suspect query to report what statistics the optimizer is using.

The ASH, or Active Session History, report digs deeper into important wait events and timings, providing a better picture of the wait activity during the requested window; a section of this report is shown below:

SQL ID Planhash Sampled # of Executions % Activity Event % Event Top Row Source % RwSrc SQL Text
572fbaj0fdw2b 1648993636 1 7.69 CPU + Wait for CPU 5.77 SELECT STATEMENT 5.77 select output from table(dbms_…
572fbaj0fdw2b 1648993636 1 7.69 db file sequential read 1.92 ** Row Source Not Available ** 1.92
6ajkhukk78nsr   1 3.85 CPU + Wait for CPU 3.85 ** Row Source Not Available ** 3.85 begin prvt_hdm.auto_execute( :…
1bx8mgs8by25x 2223429142 1 1.92 CPU + Wait for CPU 1.92 HASH JOIN 1.92 select coalesce( p1s.parameter…
2asgk01xtb6p0 3702291220 1 1.92 control file sequential read 1.92 FIXED TABLE – FULL 1.92 select d.dbid dbid , d.name db…
317v5hnvvd49h 1656119837 1 1.92 CPU + Wait for CPU 1.92 UPDATE 1.92 UPDATE wrh$_seg_stat_obj ob S…

Again, use this report to initially determine if the problem is database-related; if that is the case then this report should provide a wealth of data aimed at directing your database/query tuning efforts. If there is no indication of database problems then other areas are likely at fault. Waits to examine from both reports include SQL*Net more data to client, Disk file operations I/O and os thread startup. The first wait in that list an be an indicator of network problems, the second storage issues (array problems, pending disk failures) or CPU issues (excessive time for O/S thread startups can indicate over-taxed CPUs which may be caused by other, non-database processes).

After examining reports generated by AWR, ASH or Statspack you find the database is not the source of the performance issue you may be able to discern which area outside of the database is the likely suspect. SQL*Net more data to client waits, if the time per wait is relatively high, can indicate problems in the network layer. Utilities such as netstat can show configured network interfaces and network traffic data on those interfaces. Example output is shown below:

$ netstat -ginw
Kernel Interface table
Iface       MTU   Met      RX-OK RX-ERR RX-DRP RX-OVR      TX-OK TX-ERR TX-DRP TX-OVR Flg
bondeth0     1500   0 5084620269      0      0      0 3923352780      0      0      0 BMmRU
bondeth0:1   1500   0      - no statistics available -                                BMmRU
bondeth0:3   1500   0      - no statistics available -                                BMmRU
bondeth0:4   1500   0      - no statistics available -                                BMmRU
eth0         1500   0   42182099      0      0      0   21397203      0      0      0 BMRU
eth4         1500   0     305021      0      0      0     281488      0      0      0 BMsRU
eth5         1500   0 5084315248      0      0      0 3923071292      0      0      0 BMsRU
ib0         64000   0 2419856775      0      1      0 2178061942      3    271      0 BMsRU
ib1         64000   0   21305668      0      0      0          0      0      0      0 BMsRU
lo          16436   0 4495752311      0      0      0 4495752311      0      0      0 LRU
$

The utility reports transmit and receive statistics, invluding dropped packets and packet errors. Running netstat -ginw may show which local interface is having issues; send such output to the Network Administrator so he or she can interpret the results and take appropriate action.

What if the problem isn’t the database, and it isn’t the network? The next area that can adversely affect performance is the storage tier. Statspack and ASH/AWR reports also provide disk read and write wait information. Again, if the wait time for an individual read or write wait is high (for example, a single wait over 300 milliseconds) it may be the storage that’s at fault. As disks age their performance can deteriorate as the alignment the platter had when new is compromised by wear and tear around the spindle. Firmware can also be an issue, especially when disks are upgraded and the firmware is not. This is not an area for the DBA to assess, but it is one he or she can report to the Storage Administrator for further investigation.

Network-related issues may also be a problem with either the local network interface card (NIC) or the cabling. This is, again, a task for the Network Administrator to assess and address, but you may be able to ‘point’ him or her in the right direction with output from the ping utility. Provided you have two or more systems that can ‘see’ the same endpoint a ping comparison may show network problems local to the database server. For example if server A can ping server C with response times less than 20 milliseconds and server B, using the same subnet, reports ping times in the 100 millisecond or greater range this could indicate either cabling issues or a NIC that may be misconfigured or failing. This isn’t likely to be valid if the two servers are on different subnets which is why it’s necessary to get the Network Administrator involved as soon as you suspect a network issue. He or she can provide needed configuration data and can better analyze and diagnose such problems.

And yet another area where performance can suffer is at the operating system level. A system starved for CPU, memory or both because of a temporarily high load can severly slow down database processes along with many other non-database-related services. The System Administrator may need to be called in on the issue to report what processes were running during a particular period when response time was slow. Remember that the end users equate performance with response time; a large, resource-intensive report that runs once a month can seriously slow down other database activity during that same period. System backups can also consume resources normally available for user and database processes. Knowing the period of poor performance can greatly assist the System Administrator investigating the issue. If it’s a defined window, for example from 8 PM to Midnight on Tuesdays, the System Administrator can track down the process causing the slowdown by monitoring the system activity during that period. It may be a process that can’t be stopped or rescheduled, but at least the end users will know what is causing their normally fast transactions to run much slower than usual. As mentioned before when you know it’s not a database issue get the Administrators of the suspected systems involved, because the earlier you get more eyes on the problem the faster the cause can be found and a solution put in place.

The DBA is the ‘go-to’ person when the database is to blame for performance issues (long run times for specific queries, as an example). But, when the DBA finds nothing awry in the database, other parties may need to get involved. Again, just because it seems to be the database at fault doesn’t mean that the database IS at fault; it could be other, related areas affecting performance. Knowing what database reports can be of use is the first step in being able to assess a situation so other areas, such as network and storage, can be brought in if need be. Knowing who to involve is just as important as knowing that a performance issue exists. Communication is key in these situations; a network or storage administrator may ask for additional information regarding the problem; this is where the AWR and/or Statspack reports may be helpful. Learn how to generate these reports now (if you don’t already know) so that when you need them you can concentrate on the issues at hand rather than how to generate a report.

Performance is a term that can have a broad interpretation, depending upon who is doing the interpreting. End users may more often report performance issues when a report that ran fine yesterday now takes much longer to complete because all they may see is their process running slowly. Knowing when the problem is the database and when it isn’t is key to getting such issues diagnosed and resolved. Hopefully this article has given you some tools to help in that regard; use them wisely and communicate well so that you can be part of the solution by knowing who else to involve when it isn’t the database that is at fault.

Now, on with the show!

Next Page »

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 724 other followers