Oracle Tips and Tricks — David Fitzjarrell

March 1, 2017

Return To Sender

Filed under: General — dfitzjarrell @ 16:06

"The most important reason for going from one place to another is to see what's in between."
-- Norton Juster, The Phantom Tollbooth

Recently in an Oracle forum a question resurfaced regarding enabling row movement for tables. The posted question, from five years ago, asked if row movement was safe and if there could be any ‘undesired impact on application data’. The answer to the first part of that question is ‘yes’ (it’s safe because Oracle, except under extreme conditions, won’t lose your data) and the answer to the second part is also ‘yes’. That may be confusing so let’s look at what the end result could be.

Data rows are uniquely identified by a construct known far and wide as the ROWID. ROWIDs contain a wealth of information as to the location of a given row; the file number, the block number and row number are all encoded in this curious value. Updates can change pretty much everything in a row except the ROWID and primary key values (and, yes, there’s a way to change PK values but it involves deleting and inserting the row — Oracle does this when, for some bizarre reason known only to the user making the change, a PK value is updated). The ONLY way to change a ROWID value is to physically move the row, which is what enabling row movement will allow. This is undesirable for the reasons listed below:


	* Applications coded to store ROWID values can fail as the data that was once in Location A is now in Location B.
	* Indexes will become invalid or unusable, requiring that they be rebuilt.

Storing ROWID values in application tables isn’t the wisest of choices a developer can make. Exporting from the source and importing into a new destination will automatically cause those stored ROWID values to be useless. Cloning the database via RMAN will do the same thing since ROWID values are unique only within the database where they are generated; they do not transport across servers or platforms. Consider two imaginary countries, Blaggoflerp and Snormaflop. Each is unique in geography so that locations found in Blaggoflerp are not found in Snormaflop, with the reverse also being true. If the traveler has but one map, of Blaggoflerp, and tries to use that to navigate Snormaflop our traveler will become hopelessly lost and confused. Enable row movement on a table where indexes are present, an application stores ROWIDs for easy data access, or both and Oracle starts singing that old Elvis Presley hit, written by Winfield Scott, “Return To Sender”:


Return to sender, address unknown.
No such person, no such zone.

Don’t misunderstand, the data is STILL in the table, it’s just moved from its original location and left no forwarding address. It’s possible that new data now occupies the space where that trusty old row used to live, so the application doesn’t break but it does return unexpected results because the values that were once at that location are no longer there. And any indexes that referenced that row’s original ROWID are now invalidated, making them useless until manual intervention is employed to rebuild them.


"Since you got here by not thinking, it seems reasonable to expect that, in order to get out, you must start thinking."
-- Norton Juster, The Phantom Tollbooth

Maybe it’s not that the DBA didn’t think about the act before he or she did it, it might simply be that he or she didn’t think far enough ahead about the results of such an act to make a reasonable choice. Changes to a database can affect downstream processing; failing to consider the ripple effect of such changes can be disastrous, indeed. It isn’t enough in this day and age to consider the database as a ‘lone ranger’; many systems can depend on a single database and essentially haphazard changes can stop them in their tracks.

There may be times when enabling row movement is necessary; changing a partitioning key is one of them. Granted making such changes on a partitioned table will be part of a larger outage window where the local and global indexes can be maintained so the impact will be positive, not negative. Absent such tasks (ones where row movement would be necessary) it’s not recommended to enable row movement as it will certainly break things, especially things no one was expecting because of lack knowledge of the affected systems.

It’s not always good to go travelling.

January 27, 2017

“Back It Up!!!”

Filed under: General — dfitzjarrell @ 14:21

“Expectations is the place you must always go to before you get to where you're going.” 
― Norton Juster, The Phantom Tollbooth   

In a recent post in the Oracle General database forum the following question was asked:


Hi,

I have 3 schema's of almost equal sizes, off which only two are creating the backup's (Hot backup). One of the schema's is not creating 
any backup, by not creating I mean, the backup file that is generated size is too small than the other two files.

The size of the other two backup files is almost 20 GB while the third one is only 54 Bytes !!!

Below are the commands I am using for backup,

alter tablespace SCHEMA begin backup; ( DB level )
tar cpvf - SCHEMA_TS | compress -f > backup.tar.Z ( OS level )

The DB (Oracle Database 11g) is in Archive log mode and no error is being thrown while running the above commands.

Could you please help me in solving this issue.

Any reference related to this would also be of great help.

Thanks in advance !!

There are issues with that sequence of statements, the first is calling that a ‘backup’. The issue with that is it’s highly likely that, after the tablespace files are restored, the recovery will fail and the database will be left in an unusable state. The obvious omission is the archivelogs; nowhere in that command sequence is found any statement using tar to copy the archivelogs generated before, during and after that ‘backup’ is completed; apparently the entire script was not posted to the thread so additional steps that script might execute were not available to view. Since no recovery testing is reported (if such a script exists its contents were not presented) it’s very possible that this ‘backup’ is taken on faith, and unfortunately faith isn’t going to be of much help here.

Yet another problem is the lack of any query to determine the actual datafiles associated with the given tablespace; a ‘backup’ missing that important information means that not all required datafiles will be copied, making the tablespace incomplete and Oracle unable to recover it. This again leads to a down database with no hope of opening.

It was suggested several times in the thread that the poster stop using this ‘backup’ and move to RMAN to create dependable, reliable and recoverable backups. Why this method was in use was explained with this post:


I am new to Oracle DB and the guy who worked before me, wrote a (backup) script where he just created a tar of the table space files.

which leads one to wonder how this DBA thought he or she would restore these tablespaces to a useful and usable state should the time come. The poster added:


I want to use RMAN but thought of solving this issuse first (worst case scenario) and then create a new backup script using RMAN.

Honestly this is not the problem that needs to be solved; the problem is generating a reliable backup and RMAN has been proven time and again as the tool for that job. Further discussion lead to the realization that not all files were being sent to tar which explained the size discrepancy but didn’t truly address the recoverability issue. Anyone can take a so-called ‘backup’ using any number of tools and operating system utilities; it’s restoring and recovering from those ‘backups’ that tells the tale of success or failure, and failure in restoring and recovering a production database isn’t an option.

Sometimes you don’t get what you expect.

December 21, 2016

“Do You Have A Key?”

Filed under: General — dfitzjarrell @ 11:27

“Don't you know anything at all about numbers?"
"Well, I don't think they're very important," snapped Milo, too embarrassed to admit the truth.
"NOT IMPORTANT!" roared the Dodecahedron, turning red with fury. "Could you have tea for two without the two — or three blind mice 
without the three? Would there be four corners of the earth if there weren't a four? And how would you sail the seven seas without a seven?"
"All I meant was—" began Milo, but the Dodecahedron, overcome with emotion and shouting furiously, carried right on.
"If you had high hopes, how would you know how high they were? And did you know that narrow escapes come in all different widths? 
Would you travel the whole wide world without ever knowing how wide it was? And how could you do anything at long last," he concluded, 
waving his arms over his head, "without knowing how long the last was? Why, numbers are the most beautiful and valuable things in the world. 
Just follow me and I'll show you." He turned on his heel and stalked off into the cave.” 
-- Norton Juster, The Phantom Tollbooth

How best to generate a primary key is a discussion that seems to never get an answer suitable to everyone. Sequences are a popular choice, as are other forms of generated or “artificial” keys. In addition to that Oracle provides a function named sys_guid() that generates unique identifiers that can be used for a number of purposes, one of which is, as one might expect, as a generated primary key. Occasionally discussions in the Oracle forums discuss the merits and issues of using such global unique identifiers as primary keys; one current discussion asked if using sys_guid() was faster or slower than using a sequence; on a Linux system generating sys_guid() values was illustrated as being faster than using a sequence. The example code, slightly modified, is presented here having been run on Oracle 12.1.0.2 on Windows. Both 12c configurations were used (standard and container) to see if any differences appeared, and since both runs provided similar results the non-container results are provided here. Timing was set on to record and display the elapsed time for each set of serial tests; parallel tests were also completed and log table entries report the elapsed time for those runs. Separate tables were used for each set of tests, the results that were produced are shown below. The test begins with a table and a sequence created with the default settings:


SQL> -- default 20 cache
SQL> create sequence seq1;

Sequence created.

SQL>
SQL> create table t_seq
  2  	 ( id	  number(9) primary key
  3  	 , filler varchar2(1000)
  4  	 );

Table created.

SQL>

The next step inserts 999,999 records into the t_seq table; the execution plan and run statistics are shown below:


SQL> insert into t_seq
  2    select seq1.nextval ,'sdfsf' from dual connect by level < 1000000;

999999 rows created.

Elapsed: 00:00:10.19

Execution Plan
----------------------------------------------------------
Plan hash value: 3365622274

--------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | INSERT STATEMENT               |       |     1 |     2   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL       | T_SEQ |       |            |          |
|   2 |   SEQUENCE                     | SEQ1  |       |            |          |
|*  3 |    CONNECT BY WITHOUT FILTERING|       |       |            |          |
|   4 |     FAST DUAL                  |       |     1 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(LEVEL<1000000)


Statistics
----------------------------------------------------------
      50149  recursive calls
     227827  db block gets
      59085  consistent gets
          0  physical reads
  113671864  redo size
        855  bytes sent via SQL*Net to client
        893  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
     999999  rows processed

SQL>
SQL> drop table t_seq;

Table dropped.

SQL> drop sequence seq1;

Sequence dropped.

SQL>

The amount of redo generated is large, a result of using the sequence. Another sequence test was executed using a sequence created with a larger cache value. Before each run the user was re-connected to reset the session statistics:


SQL> connect bing/!@#!@#
Connected.
SQL> create sequence seq1 cache 10000;

Sequence created.

SQL>
SQL> create table t_seq
  2  	 ( id	  number(9) primary key
  3  	 , filler varchar2(1000)
  4  	 );

Table created.

SQL>

The same insert statement was executed using the sequence having the larger cache; the execution plan and session statistics are shown below:


SQL> insert into t_seq
  2    select seq1.nextval ,'sdfsf' from dual connect by level < 1000000;

999999 rows created.

Elapsed: 00:00:05.24

Execution Plan
----------------------------------------------------------
Plan hash value: 3365622274

--------------------------------------------------------------------------------
| Id  | Operation                      | Name  | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | INSERT STATEMENT               |       |     1 |     2   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL       | T_SEQ |       |            |          |
|   2 |   SEQUENCE                     | SEQ1  |       |            |          |
|*  3 |    CONNECT BY WITHOUT FILTERING|       |       |            |          |
|   4 |     FAST DUAL                  |       |     1 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(LEVEL<1000000)


Statistics
----------------------------------------------------------
        249  recursive calls
      77911  db block gets
       9188  consistent gets
          1  physical reads
   79744836  redo size
        854  bytes sent via SQL*Net to client
        893  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
     999999  rows processed

SQL>

Using the larger sequence cache reduced the redo size by 33927028, which cut the execution time roughly in half. On to the sys_guid() part of the serial testing, with a new table created and a new connection established:


SQL> connect bing/!@#!@#
Connected.
SQL> create table t_raw
  2  	 ( id	  raw(16) primary key
  3  	 , filler varchar2(1000)
  4  	 );

Table created.

SQL>
SQL> insert into t_raw
  2    select sys_guid(),'sdfsf' from dual connect by level < 1000000;

999999 rows created.

Elapsed: 00:00:54.15

Execution Plan
----------------------------------------------------------
Plan hash value: 1236776825

-------------------------------------------------------------------------------
| Id  | Operation                     | Name  | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | INSERT STATEMENT              |       |     1 |     2   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL      | T_RAW |       |            |          |
|*  2 |   CONNECT BY WITHOUT FILTERING|       |       |            |          |
|   3 |    FAST DUAL                  |       |     1 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(LEVEL<1000000)


Statistics
----------------------------------------------------------
       1442  recursive calls
    2956342  db block gets
      23736  consistent gets
         13  physical reads
  375573628  redo size
        854  bytes sent via SQL*Net to client
        890  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
         34  sorts (memory)
          0  sorts (disk)
     999999  rows processed

SQL>

A benefit of using the sequence with the larger cache size is the redo size is less than half that of using the sys_guid() call. Additionally, on Windows, the execution took almost 11 times longer than using a sequence with a large cache value. Returning to the default cache size for the sequence a PL/SQL loop is used to return the generated values of the sequence; since serveroutput is not turned on the time to return the values isn’t included in the execution time (and as a result the values aren’t displayed). The elapsed time to run the block is found at the end of the execution as well as the redo statistics for the session:

`


SQL> connect bing/!@#!@#
Connected.
SQL> create sequence seq1 ;

Sequence created.

SQL>
SQL> set timing on
SQL> declare
  2    x number(38);
  3    function sf return number is
  4    begin
  5  	 return seq1.nextval;
  6    end;
  7  begin
  8    for i in 1..100000 loop
  9  	 x := sf;
 10    end loop;
 11  end;
 12  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.23
SQL>
SQL> select m.statistic#, n.name, m.value
  2  from v$mystat m, v$statname n
  3  where m.statistic# = n.statistic#
  4  and n.name like '%redo%'
  5  and m.value > 0;

STATISTIC# NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
       257 redo entries                                                          10014
       258 redo size                                                           3620676
       302 redo subscn max counts                                                    1
       307 redo synch time (usec)                                                  251
       308 redo synch time overhead (usec)                                         553
       309 redo synch time overhead count (  2ms)                                    2
       314 redo synch writes                                                         3
       323 redo write info find                                                      2

8 rows selected.

Elapsed: 00:00:00.00

SQL>

The reported redo size is smaller, but there were no inserts into a table performed in this test. A similar test was run using a sequence with a cache value of 10000:


SQL> drop sequence seq1;

Sequence dropped.

SQL> connect bing/!@#!@#
Connected.
SQL> create sequence seq1 cache 10000;

Sequence created.

SQL>
SQL> set timing on
SQL> declare
  2    x number(38);
  3    function sf return number is
  4    begin
  5  	 return seq1.nextval;
  6    end;
  7  begin
  8    for i in 1..100000 loop
  9  	 x := sf;
 10    end loop;
 11  end;
 12  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.19
SQL>
SQL> select m.statistic#, n.name, m.value
  2  from v$mystat m, v$statname n
  3  where m.statistic# = n.statistic#
  4  and n.name like '%redo%'
  5  and m.value > 0;

STATISTIC# NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
       257 redo entries                                                             34
       258 redo size                                                             11940
       307 redo synch time (usec)                                                  110
       308 redo synch time overhead (usec)                                      303802
       309 redo synch time overhead count (  2ms)                                    1
       313 redo synch time overhead count (inf)                                      1
       314 redo synch writes                                                         3
       323 redo write info find                                                      2

8 rows selected.

Elapsed: 00:00:00.00
SQL>

As shown in a prior test the redo statistics show a smaller redo size for the larger cache. On to the sys_guid() test:


SQL> connect bing/!@#!@#
Connected.
SQL> declare
  2    x raw(16);
  3    function sf return varchar2 is
  4    begin
  5  	 return sys_guid();
  6    end;
  7  begin
  8    for i in 1..100000 loop
  9  	 x := sf;
 10    end loop;
 11  end;
 12  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:04.70
SQL>
SQL> select m.statistic#, n.name, m.value
  2  from v$mystat m, v$statname n
  3  where m.statistic# = n.statistic#
  4  and n.name like '%redo%'
  5  and m.value > 0;

STATISTIC# NAME                                                                  VALUE
---------- ---------------------------------------------------------------- ----------
       257 redo entries                                                              6
       258 redo size                                                              1476
       302 redo subscn max counts                                                    1
       307 redo synch time (usec)                                                    1
       308 redo synch time overhead (usec)                                         575
       309 redo synch time overhead count (  2ms)                                    1
       314 redo synch writes                                                         1
       323 redo write info find                                                      1

8 rows selected.

Elapsed: 00:00:00.00
SQL>

Now, absent the insert, the redo generation is much less. The execution time, however, is at least twice as long as it was for the large-cache sequence. So on a Windows-based Oracle installation using a sequence takes much less time than using the sys_guid() calls. Parallel execution may be faster and the results may be reversed so further testing is necessary. These tests begin with the sequence created with the default cache value but starting at 100000000 (to more accurately reflect real production conditions) and multiple tables created with the CACHE option, which places the blocks in the most-recently used area of the cache to speed access and delay aging:


SQL> -- SEQUENCE MULTI SESSION TEST
SQL> drop table t_seq;

Table dropped.

Elapsed: 00:00:00.02
SQL> drop table tmp1;

Table dropped.

Elapsed: 00:00:00.01
SQL> drop table tmp2;

Table dropped.

Elapsed: 00:00:00.01
SQL> drop table tmp3;

Table dropped.

Elapsed: 00:00:00.01
SQL> drop table tmp4;

Table dropped.

Elapsed: 00:00:00.01
SQL>
SQL> create table tmp1 cache as select 1 dummy from dual connect by level  create table tmp2 cache as select 1 dummy from tmp1;

Table created.

Elapsed: 00:00:00.54
SQL> create table tmp3 cache as select 1 dummy from tmp1;

Table created.

Elapsed: 00:00:00.61
SQL> create table tmp4 cache as select 1 dummy from tmp1;

Table created.

Elapsed: 00:00:00.49
SQL>
SQL> drop sequence seq1 ;

Sequence dropped.

Elapsed: 00:00:00.00
SQL> create sequence seq1 start with 100000000 ;

Sequence created.

Elapsed: 00:00:00.00
SQL>
SQL> create table t_seq
  2  	 ( id	  number(9) primary key
  3  	 , filler varchar2(1000)
  4  	 );

Table created.

Elapsed: 00:00:00.01
SQL>
SQL> alter system switch logfile;

System altered.

Elapsed: 00:00:01.29
SQL> alter system checkpoint;

System altered.

Elapsed: 00:00:00.20
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.13
SQL>
SQL> select /*+ full(tmp1) */ count(*) from tmp1;

  COUNT(*)
----------
    999999

Elapsed: 00:00:00.05
SQL> select /*+ full(tmp2) */ count(*) from tmp2;

  COUNT(*)
----------
    999999

Elapsed: 00:00:00.04
SQL> select /*+ full(tmp3) */ count(*) from tmp3;

  COUNT(*)
----------
    999999

Elapsed: 00:00:00.04
SQL> select /*+ full(tmp4) */ count(*) from tmp4;

  COUNT(*)
----------
    999999

Elapsed: 00:00:00.04
SQL>
SQL> drop table tmp_log;

Table dropped.

Elapsed: 00:00:00.04
SQL> create table tmp_log(mydata varchar2(4000), optime timestamp);

Table created.

Elapsed: 00:00:00.01
SQL>
SQL> create or replace PROCEDURE    sp_log(p varchar2) as
  2    PRAGMA AUTONOMOUS_TRANSACTION;
  3  begin
  4    insert into tmp_log values (p , systimestamp);
  5    commit;
  6  end;
  7  /

Procedure created.

Elapsed: 00:00:00.03
SQL>
SQL> show errors
No errors.
SQL>
SQL> create or replace procedure sp_test_seq(p number) as
  2  begin
  3    sp_log('JOB ' || p || ' BEGIN');
  4
  5    if p = 1 then
  6  	 insert  into t_seq
  7  	   select seq1.nextval ,'sdfsf' from tmp1;
  8    elsif p = 2 then
  9  	 insert into t_seq
 10  	   select seq1.nextval ,'sdfsf' from tmp2;
 11    elsif p = 3 then
 12  	 insert  into t_seq
 13  	   select seq1.nextval ,'sdfsf' from tmp3;
 14    elsif p = 4 then
 15  	 insert into t_seq
 16  	   select seq1.nextval ,'sdfsf' from tmp4;
 17    end if;
 18    commit;
 19
 20    sp_log('JOB ' || p || ' END');
 21  end;
 22  /

Procedure created.

Elapsed: 00:00:00.02
SQL>
SQL> show errors
No errors.
SQL>
SQL> declare
  2    x_time date := sysdate + 1/1440;
  3  begin
  4
  5
  6    dbms_scheduler.create_job(job_name => 'TEST_SEQ1',
  7  				 job_type => 'PLSQL_BLOCK',
  8  				 job_action => 'begin sp_test_seq(1); end;',
  9  				 enabled=> true,
 10  				 start_date=> x_time
 11  			       );
 12    dbms_scheduler.create_job(job_name => 'TEST_SEQ2',
 13  				 job_type => 'PLSQL_BLOCK',
 14  				 job_action => 'begin sp_test_seq(2); end;',
 15  				 enabled=> true,
 16  				 start_date=> x_time
 17  			       );
 18    dbms_scheduler.create_job(job_name => 'TEST_SEQ3',
 19  				 job_type => 'PLSQL_BLOCK',
 20  				 job_action => 'begin sp_test_seq(3); end;',
 21  				 enabled=> true,
 22  				 start_date=> x_time
 23  			       );
 24    dbms_scheduler.create_job(job_name => 'TEST_SEQ4',
 25  				 job_type => 'PLSQL_BLOCK',
 26  				 job_action => 'begin sp_test_seq(4); end;',
 27  				 enabled=> true,
 28  				 start_date=> x_time
 29  			       );
 30  end;
 31  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.02
SQL>
SQL> select job_name, start_date from user_scheduler_jobs where job_name like 'TEST%';

JOB_NAME     START_DATE
------------ ---------------------------------------------------------------------------
TEST_SEQ1    27-NOV-16 01.46.47.000000 PM -07:00
TEST_SEQ2    27-NOV-16 01.46.47.000000 PM -07:00
TEST_SEQ3    27-NOV-16 01.46.47.000000 PM -07:00
TEST_SEQ4    27-NOV-16 01.46.47.000000 PM -07:00

Elapsed: 00:00:00.00
SQL>
SQL> exec dbms_lock.sleep(120)

PL/SQL procedure successfully completed.

Elapsed: 00:02:00.00
SQL>
SQL> select * from
  2  (select mydata, optime, lead(optime) over (order by mydata) optime_end, lead(optime) over (order by mydata) - optime elapsed
  3  from tmp_log)
  4  where mydata like '%BEGIN%'
  5  /

MYDATA          OPTIME                       OPTIME_END                   ELAPSED
--------------- ---------------------------- ---------------------------- ----------------------------
JOB 1 BEGIN     27-NOV-16 01.46.50.233000 PM 27-NOV-16 01.47.28.113000 PM +000000000 00:00:37.880000
JOB 2 BEGIN     27-NOV-16 01.46.50.234000 PM 27-NOV-16 01.47.27.904000 PM +000000000 00:00:37.670000
JOB 3 BEGIN     27-NOV-16 01.46.50.235000 PM 27-NOV-16 01.47.28.169000 PM +000000000 00:00:37.934000
JOB 4 BEGIN     27-NOV-16 01.46.50.244000 PM 27-NOV-16 01.47.28.121000 PM +000000000 00:00:37.877000

Elapsed: 00:00:00.00
SQL>

Parallel takes longer on Windows, possibly because of the underlying architecture, but each of the four concurrent inserts consumed aboutn 39 seconds. To see if concurrent processes using the sys_guid() call may end up faster we set up the test again, this time using the sys_guid() call:


SQL> -- SYS_GUID MULTI SESSION TEST
SQL> drop table t_raw;

Table dropped.

Elapsed: 00:00:00.01
SQL> drop table tmp1;

Table dropped.

Elapsed: 00:00:00.01
SQL> drop table tmp2;

Table dropped.

Elapsed: 00:00:00.01
SQL> drop table tmp3;

Table dropped.

Elapsed: 00:00:00.01
SQL> drop table tmp4;

Table dropped.

Elapsed: 00:00:00.01
SQL>
SQL> create table tmp1 cache as select 1 dummy from dual connect by level  create table tmp2 cache as select 1 dummy from tmp1;

Table created.

Elapsed: 00:00:00.62
SQL> create table tmp3 cache as select 1 dummy from tmp1;

Table created.

Elapsed: 00:00:00.57
SQL> create table tmp4 cache as select 1 dummy from tmp1;

Table created.

Elapsed: 00:00:00.48
SQL>
SQL> create table t_raw
  2  	 ( id	  raw(16) primary key
  3  	 , filler varchar2(1000)
  4  	 );

Table created.

Elapsed: 00:00:00.01
SQL>
SQL> alter system switch logfile;

System altered.

Elapsed: 00:00:03.02
SQL> alter system checkpoint;

System altered.

Elapsed: 00:00:00.17
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.34
SQL>
SQL> select /*+ full(tmp1) */ count(*) from tmp1; -- to make sure table is in buffer_cache
  2  select /*+ full(tmp2) */ count(*) from tmp2; -- to make sure table is in buffer_cache
  3  select /*+ full(tmp3) */ count(*) from tmp3; -- to make sure table is in buffer_cache
  4  select /*+ full(tmp4) */ count(*) from tmp4; -- to make sure table is in buffer_cache
  5
SQL> drop table tmp_log;

Table dropped.

Elapsed: 00:00:00.03
SQL> create table tmp_log(mydata varchar2(4000), optime timestamp);

Table created.

Elapsed: 00:00:00.01
SQL>
SQL> create or replace PROCEDURE    sp_log(p varchar2) as
  2    PRAGMA AUTONOMOUS_TRANSACTION;
  3  begin
  4    insert into tmp_log values (p , systimestamp);
  5    commit;
  6  end;
  7  /

Procedure created.

Elapsed: 00:00:00.03
SQL>
SQL> show errors
No errors.
SQL>
SQL> create or replace procedure sp_test_guid(p number) as
  2  begin
  3    sp_log('JOB ' || p || ' BEGIN');
  4
  5    if p = 1 then
  6  	 insert  into t_raw
  7  	   select sys_guid() ,'sdfsf' from tmp1;
  8    elsif p = 2 then
  9  	 insert  into t_raw
 10  	   select sys_guid() ,'sdfsf' from tmp2;
 11    elsif p = 3 then
 12  	 insert  into t_raw
 13  	   select sys_guid() ,'sdfsf' from tmp3;
 14    elsif p = 4 then
 15  	 insert into t_raw
 16  	   select sys_guid() ,'sdfsf' from tmp4;
 17    end if;
 18    commit;
 19
 20    sp_log('JOB ' || p || ' END');
 21  end;
 22  /

Procedure created.

Elapsed: 00:00:00.02
SQL>
SQL> show errors
No errors.
SQL>
SQL> declare
  2    x_time date := sysdate + 1/1440;
  3  begin
  4
  5    dbms_scheduler.create_job(job_name => 'TEST_GUID1',
  6  				 job_type => 'PLSQL_BLOCK',
  7  				 job_action => 'begin sp_test_guid(1); end;',
  8  				 enabled=> true,
  9  				 start_date=> x_time
 10  			       );
 11    dbms_scheduler.create_job(job_name => 'TEST_GUID2',
 12  				 job_type => 'PLSQL_BLOCK',
 13  				 job_action => 'begin sp_test_guid(2); end;',
 14  				 enabled=> true,
 15  				 start_date=> x_time
 16  			       );
 17    dbms_scheduler.create_job(job_name => 'TEST_GUID3',
 18  				 job_type => 'PLSQL_BLOCK',
 19  				 job_action => 'begin sp_test_guid(3); end;',
 20  				 enabled=> true,
 21  				 start_date=> x_time
 22  			       );
 23    dbms_scheduler.create_job(job_name => 'TEST_GUID4',
 24  				 job_type => 'PLSQL_BLOCK',
 25  				 job_action => 'begin sp_test_guid(4); end;',
 26  				 enabled=> true,
 27  				 start_date=> x_time
 28  			       );
 29  end;
 30  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
SQL>
SQL> select job_name, start_date from user_scheduler_jobs where job_name like 'TEST%';

JOB_NAME     START_DATE
------------ ---------------------------------------------------------------------------
TEST_GUID1   27-NOV-16 01.48.53.000000 PM -07:00
TEST_GUID2   27-NOV-16 01.48.53.000000 PM -07:00
TEST_GUID3   27-NOV-16 01.48.53.000000 PM -07:00
TEST_GUID4   27-NOV-16 01.48.53.000000 PM -07:00

Elapsed: 00:00:00.00
SQL>
SQL> exec dbms_lock.sleep(180);

PL/SQL procedure successfully completed.

Elapsed: 00:03:00.00
SQL>
SQL> select * from
  2  (select mydata, optime, lead(optime) over (order by mydata) optime_end, lead(optime) over (order by mydata) - optime elapsed
  3  from tmp_log)
  4  where mydata like '%BEGIN%'
  5  /

MYDATA          OPTIME                       OPTIME_END                   ELAPSED
--------------- ---------------------------- ---------------------------- ----------------------------
JOB 1 BEGIN     27-NOV-16 01.48.54.228000 PM 27-NOV-16 01.50.49.312000 PM +000000000 00:01:55.084000
JOB 2 BEGIN     27-NOV-16 01.48.54.236000 PM 27-NOV-16 01.50.46.200000 PM +000000000 00:01:51.964000
JOB 3 BEGIN     27-NOV-16 01.48.54.245000 PM 27-NOV-16 01.50.47.742000 PM +000000000 00:01:53.497000
JOB 4 BEGIN     27-NOV-16 01.48.54.267000 PM 27-NOV-16 01.50.48.966000 PM +000000000 00:01:54.699000

Elapsed: 00:00:00.00
SQL>
SQL> set timing off echo off linesize 80 trimspool off

Table dropped.


Table dropped.


Sequence dropped.

Generating sys_guid() values is not faster when run in parallel in a Windows environment; each process ran almost two minutes before completing, roughly four times longer than the parallel sequence executions and twice as long as the serial sys_guid() runs.

The redo size using the sys_guid() call and insert statements was consistent regardless of the operating system used (Windows or Linux) and was larger than that when using a sequence and insert statements; absent any DML the sys_guid() call generated far less redo. Sequence cache size can affect the redo generation as a larger cache generates smaller amounts of redo, and the redo generation affects the execution time with a sequence. Without fail on Windows using sys_guid() takes longer. This is one area where testing on the operating system you are using is a must since Oracle on Linux can, and does, perform much differently than Oracle on Windows. Before you decide to change your primary key strategy to using sys_guid() test to see how it performs; you may be surprised at the results.

It would seem that 42 isn’t the only important number.

November 21, 2016

Taking Things For Granted

Filed under: General — dfitzjarrell @ 09:54

"Sometimes I find the best way of getting from one place to another is simply to erase everything and begin again."
-- Norton Juster, The Phantom Tollbooth

In one of the Oracle forums a question was asked regarding revoking selected privileges from the DBA role. Unfortunately for the person posting the question the answer is a resounding “No”; granting a role grants all privileges assigned to that role and there is no “picking and choosing” as if you were in a cafeteria. Roles are designed (or should be, at least) to grant all necessary privileges a user would need to access objects that role will use. And roles supplied by Oracle are designed for the jobs they are named after, such as DBA. Changing the role affects functionality and can seriously impact those granted that role. Let’s look at that in a bit more detail.

Oracle provides pre-configured roles with every installation of the database, and the list can vary based on the options you choose to install. A partial list of these roles from 12.1.0.2 is shown below (remember this is not a complete and exhaustive list):


ROLE                           O
------------------------------ -
CONNECT                        Y
RESOURCE                       Y
DBA                            Y
AUDIT_ADMIN                    Y
AUDIT_VIEWER                   Y
SELECT_CATALOG_ROLE            Y
EXECUTE_CATALOG_ROLE           Y
DELETE_CATALOG_ROLE            Y
CAPTURE_ADMIN                  Y
EXP_FULL_DATABASE              Y
IMP_FULL_DATABASE              Y
CDB_DBA                        Y
PDB_DBA                        Y
RECOVERY_CATALOG_OWNER         Y
LOGSTDBY_ADMINISTRATOR         Y
DBFS_ROLE                      Y
GSMUSER_ROLE                   Y
AQ_ADMINISTRATOR_ROLE          Y
AQ_USER_ROLE                   Y
DATAPUMP_EXP_FULL_DATABASE     Y
DATAPUMP_IMP_FULL_DATABASE     Y
ADM_PARALLEL_EXECUTE_TASK      Y
PROVISIONER                    Y
XS_RESOURCE                    Y
XS_SESSION_ADMIN               Y
XS_NAMESPACE_ADMIN             Y
XS_CACHE_ADMIN                 Y
GATHER_SYSTEM_STATISTICS       Y
OPTIMIZER_PROCESSING_RATE      Y
GSMADMIN_ROLE                  Y
RECOVERY_CATALOG_USER          Y
EM_EXPRESS_BASIC               Y
EM_EXPRESS_ALL                 Y
SCHEDULER_ADMIN                Y
HS_ADMIN_SELECT_ROLE           Y
HS_ADMIN_EXECUTE_ROLE          Y
HS_ADMIN_ROLE                  Y
GLOBAL_AQ_USER_ROLE            Y
OEM_ADVISOR                    Y
OEM_MONITOR                    Y
XDBADMIN                       Y
XDB_SET_INVOKER                Y
AUTHENTICATEDUSER              Y
XDB_WEBSERVICES                Y
XDB_WEBSERVICES_WITH_PUBLIC    Y
XDB_WEBSERVICES_OVER_HTTP      Y
GSM_POOLADMIN_ROLE             Y
GDS_CATALOG_SELECT             Y
WM_ADMIN_ROLE                  Y
JAVAUSERPRIV                   Y
JAVAIDPRIV                     Y
JAVASYSPRIV                    Y
JAVADEBUGPRIV                  Y
EJBCLIENT                      Y
JMXSERVER                      Y
JAVA_ADMIN                     Y
JAVA_DEPLOY                    Y
CTXAPP                         Y
ORDADMIN                       Y
OLAP_XS_ADMIN                  Y
OLAP_DBA                       Y
OLAP_USER                      Y
SPATIAL_WFS_ADMIN              Y
WFS_USR_ROLE                   Y
SPATIAL_CSW_ADMIN              Y
CSW_USR_ROLE                   Y
LBAC_DBA                       Y
APEX_ADMINISTRATOR_ROLE        Y
APEX_GRANTS_FOR_NEW_USERS_ROLE Y
DV_SECANALYST                  Y
DV_MONITOR                     Y
DV_ADMIN                       Y
DV_OWNER                       Y
DV_ACCTMGR                     Y
DV_PUBLIC                      Y
DV_PATCH_ADMIN                 Y
DV_STREAMS_ADMIN               Y
DV_GOLDENGATE_ADMIN            Y
DV_XSTREAM_ADMIN               Y
DV_GOLDENGATE_REDO_ACCESS      Y
DV_AUDIT_CLEANUP               Y
DV_DATAPUMP_NETWORK_LINK       Y
DV_REALM_RESOURCE              Y
DV_REALM_OWNER                 Y

The ‘O’ header is for the ORACLE_MAINTAINED column which indicates the role is supplied by Oracle. [This is a new column in the DBA_ROLES view for 12.1; earlier releases do not have this column in the view definition.] That list has 84 different roles all created when your database was created. What privileges do these roles have? That’s a question answered by the ROLE_SYS_PRIVS and ROLE_TAB_PRIVS views; let’s look at the DBA role and see what Oracle deems as necessary system privileges to be an effective DBA:


PRIVILEGE
----------------------------------------
CREATE SESSION
ALTER SESSION
DROP TABLESPACE
BECOME USER
DROP ROLLBACK SEGMENT
SELECT ANY TABLE
INSERT ANY TABLE
UPDATE ANY TABLE
DROP ANY INDEX
SELECT ANY SEQUENCE
CREATE ROLE
EXECUTE ANY PROCEDURE
ALTER PROFILE
CREATE ANY DIRECTORY
CREATE ANY LIBRARY
EXECUTE ANY LIBRARY
ALTER ANY INDEXTYPE
DROP ANY INDEXTYPE
DEQUEUE ANY QUEUE
EXECUTE ANY EVALUATION CONTEXT
EXPORT FULL DATABASE
CREATE RULE
ALTER ANY SQL PROFILE
ADMINISTER ANY SQL TUNING SET
CHANGE NOTIFICATION
DROP ANY EDITION
DROP ANY MINING MODEL
ALTER ANY MINING MODEL
ALTER ANY CUBE DIMENSION
CREATE CUBE
DROP ANY CUBE BUILD PROCESS
USE ANY SQL TRANSLATION PROFILE
CREATE PLUGGABLE DATABASE
ALTER ROLLBACK SEGMENT
DELETE ANY TABLE
ALTER DATABASE
FORCE ANY TRANSACTION
ALTER ANY PROCEDURE
DROP ANY TRIGGER
DROP ANY MATERIALIZED VIEW
UNDER ANY TYPE
ALTER ANY LIBRARY
CREATE DIMENSION
DEBUG ANY PROCEDURE
CREATE RULE SET
ALTER ANY RULE SET
ANALYZE ANY DICTIONARY
ALTER ANY EDITION
CREATE ANY ASSEMBLY
ALTER ANY CUBE
SELECT ANY CUBE
DROP ANY MEASURE FOLDER
RESTRICTED SESSION
CREATE TABLESPACE
ALTER TABLESPACE
CREATE USER
ALTER USER
LOCK ANY TABLE
CREATE VIEW
DROP ANY VIEW
GRANT ANY ROLE
CREATE TRIGGER
CREATE TYPE
EXECUTE ANY OPERATOR
CREATE ANY DIMENSION
ALTER ANY DIMENSION
CREATE ANY OUTLINE
ADMINISTER DATABASE TRIGGER
RESUMABLE
FLASHBACK ANY TABLE
CREATE ANY RULE SET
EXECUTE ANY RULE SET
IMPORT FULL DATABASE
EXECUTE ANY RULE
EXECUTE ANY PROGRAM
CREATE ANY EDITION
CREATE ASSEMBLY
ALTER ANY ASSEMBLY
CREATE CUBE DIMENSION
CREATE ANY CUBE BUILD PROCESS
UPDATE ANY CUBE DIMENSION
EM EXPRESS CONNECT
SET CONTAINER
ALTER ANY MEASURE FOLDER
CREATE ANY TABLE
CREATE ANY INDEX
CREATE ANY SEQUENCE
ALTER ANY ROLE
ANALYZE ANY
DROP ANY LIBRARY
CREATE ANY OPERATOR
CREATE INDEXTYPE
UNDER ANY TABLE
DROP ANY DIMENSION
SELECT ANY DICTIONARY
GRANT ANY OBJECT PRIVILEGE
CREATE EVALUATION CONTEXT
CREATE ANY EVALUATION CONTEXT
DROP ANY EVALUATION CONTEXT
CREATE ANY RULE
CREATE JOB
CREATE ANY JOB
CREATE MINING MODEL
INSERT ANY CUBE DIMENSION
DROP ANY CUBE
UPDATE ANY CUBE BUILD PROCESS
EXEMPT DML REDACTION POLICY
READ ANY TABLE
ALTER SYSTEM
AUDIT SYSTEM
CREATE ROLLBACK SEGMENT
DROP ANY TABLE
COMMENT ANY TABLE
REDEFINE ANY TABLE
CREATE CLUSTER
ALTER ANY INDEX
DROP PUBLIC DATABASE LINK
CREATE PROFILE
ALTER ANY MATERIALIZED VIEW
ALTER ANY TYPE
DROP ANY TYPE
UNDER ANY VIEW
EXECUTE ANY INDEXTYPE
DROP ANY CONTEXT
ALTER ANY OUTLINE
ADMINISTER RESOURCE MANAGER
MANAGE SCHEDULER
MANAGE FILE GROUP
CREATE ANY MINING MODEL
SELECT ANY MINING MODEL
CREATE ANY MEASURE FOLDER
DELETE ANY MEASURE FOLDER
CREATE ANY SQL TRANSLATION PROFILE
CREATE ANY CREDENTIAL
EXEMPT DDL REDACTION POLICY
SELECT ANY MEASURE FOLDER
SELECT ANY CUBE BUILD PROCESS
ALTER ANY CUBE BUILD PROCESS
CREATE TABLE
BACKUP ANY TABLE
CREATE ANY CLUSTER
DROP ANY SYNONYM
DROP PUBLIC SYNONYM
CREATE ANY VIEW
CREATE SEQUENCE
ALTER ANY SEQUENCE
FORCE TRANSACTION
CREATE PROCEDURE
CREATE ANY PROCEDURE
ALTER RESOURCE COST
DROP ANY DIRECTORY
CREATE ANY TYPE
ALTER ANY OPERATOR
CREATE ANY INDEXTYPE
ENQUEUE ANY QUEUE
ON COMMIT REFRESH
DEBUG CONNECT SESSION
DROP ANY RULE SET
EXECUTE ANY CLASS
MANAGE ANY FILE GROUP
EXECUTE ANY ASSEMBLY
EXECUTE ASSEMBLY
COMMENT ANY MINING MODEL
CREATE ANY CUBE DIMENSION
DELETE ANY CUBE DIMENSION
SELECT ANY CUBE DIMENSION
DROP ANY SQL TRANSLATION PROFILE
CREATE CREDENTIAL
ALTER ANY TABLE
DROP ANY CLUSTER
CREATE SYNONYM
CREATE PUBLIC SYNONYM
DROP ANY SEQUENCE
DROP ANY ROLE
AUDIT ANY
DROP ANY PROCEDURE
CREATE ANY TRIGGER
ALTER ANY TRIGGER
DROP PROFILE
GRANT ANY PRIVILEGE
CREATE LIBRARY
CREATE OPERATOR
DROP ANY OUTLINE
MERGE ANY VIEW
ADMINISTER SQL TUNING SET
UPDATE ANY CUBE
INSERT ANY MEASURE FOLDER
ADMINISTER SQL MANAGEMENT OBJECT
CREATE SQL TRANSLATION PROFILE
LOGMINING
MANAGE TABLESPACE
DROP USER
ALTER ANY CLUSTER
CREATE ANY SYNONYM
CREATE DATABASE LINK
CREATE PUBLIC DATABASE LINK
CREATE MATERIALIZED VIEW
CREATE ANY MATERIALIZED VIEW
EXECUTE ANY TYPE
DROP ANY OPERATOR
QUERY REWRITE
GLOBAL QUERY REWRITE
MANAGE ANY QUEUE
CREATE ANY CONTEXT
ALTER ANY EVALUATION CONTEXT
ALTER ANY RULE
DROP ANY RULE
ADVISOR
SELECT ANY TRANSACTION
DROP ANY SQL PROFILE
CREATE ANY SQL PROFILE
READ ANY FILE GROUP
CREATE EXTERNAL JOB
DROP ANY ASSEMBLY
DROP ANY CUBE DIMENSION
CREATE ANY CUBE
CREATE MEASURE FOLDER
CREATE CUBE BUILD PROCESS
ALTER ANY SQL TRANSLATION PROFILE
FLASHBACK ARCHIVE ADMINISTER

220 system privileges are required to be an effective DBA for an Oracle database, an impressive list, indeed. Of course once a user is granted the DBA role he or she gets ALL of those system privileges and since the role is the only direct grant that gives those privileges that list cannot be modified by selectively revoking one or more of those privileges:


SQL> grant DBA to blorpo identified by gussyflorp;

Grant succeeded.

SQL> revoke select any transaction from blorpo;
revoke select any transaction from blorpo
*
ERROR at line 1:
ORA-01952: system privileges not granted to 'BLORPO'


SQL>

Yes, the user DOES have that privilege, albeit indirectly. It’s indirect because it’s the ROLE that was granted that privilege, among others, and no attempt was made to revoke the role from the user. It’s a ‘package deal’; you grant a role to a user and it’s all or nothing, and even though it behaves like the user has the privileges granted directly that’s not the case.

You could, of course, get all of the privileges the DBA role has (both system and table) and create a script to grant each individual privilege to the desired user. It would be a LONG script and such grants require attention from the DBA granting them to ensure they are both current and not being abused. In that case individual privileges can be revoked which would be a maintenance nightmare for the DBA having keep track of which user has which set of privileges. Another option presents itself, creating a new role with only the privileges the DBA wants to assign to a user. The privilege list for DBA could be shortened to create, say, a DB_OPERATOR or DBO role. Such privileges would depend upon the job description; creating such a role would make granting such access easier and make maintenance simpler since when the role grants are changed those who are granted that role have there privileges adjusted the next time they login.

Roles make granting privileges very easy and straightforward, provided the role is properly created and maintained. Roles also make it impossible to “pick and choose” privileges a user should have. It’s an “all or nothing” proposition and there’s no way around that when using a pre-defined role.

Sometimes you just need to begin again.

November 17, 2016

“That STILL Ain’t Right”

Filed under: General — dfitzjarrell @ 07:45

"If you want sense, you'll have to make it yourself."
- Norton Juster, The Phantom Tollbooth

In a recent post to an Oracle forum a query was presented and a tuning request was made. It appears that the query was taking 20 hours to complete. Through further interrogation and responses it was discovered that the dates were being stored in a VARCHAR2 column and implicit date conversion was being used. To show how much of a problem this can cause the following example was created; notice the results returned and the execution plans generated for each run of the query, once with the table defined in the manner the original poster described and one with the date column using the DATE datatype. We begin:


SQL>
SQL> create table datetst(
  2  myid    number,
  3  res_value	varchar2(3),
  4  mydt    varchar2(20));

Table created.

SQL>
SQL> create index datetest_idx on datetst(res_value);

Index created.

SQL>
SQL> begin
  2  	     for i in 1..10000 loop
  3  		     if mod(i,23) = 0 then
  4  		     insert into datetst
  5  		     values(i, 'PUR', to_char(sysdate+i, 'MM/DD/RRRR'));
  6  		     else
  7  		     insert into datetst
  8  		     values(i, 'BID', to_char(sysdate+i, 'MM/DD/RRRR'));
  9  		     end if;
 10  	     end loop;
 11
 12  	     commit;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>

Let’s now run a query using conditions similar to the posted query and see what Oracle returns:


SQL>
SQL>
SQL> select *
  2  from datetst
  3  where mydt <= sysdate +230;
where mydt <= sysdate +230
      *
ERROR at line 3:
ORA-01843: not a valid month


SQL>

As expected the implicit date conversion failed; modifying the query to explicitly convert the strings to dates produces ‘interesting’ results:


SQL>
SQL> select *
  2  from datetst
  3  where res_value = 'PUR'
  4  and mydt <= to_char(sysdate +230, 'MM/DD/RRRR');

      MYID RES MYDT
---------- --- --------------------
       920 PUR 05/26/2019
       943 PUR 06/18/2019
      1150 PUR 01/11/2020
      1173 PUR 02/03/2020
      1196 PUR 02/26/2020
      1219 PUR 03/20/2020
      1242 PUR 04/12/2020
      1265 PUR 05/05/2020
      1288 PUR 05/28/2020
      1311 PUR 06/20/2020
        46 PUR 01/02/2017
        69 PUR 01/25/2017
        92 PUR 02/17/2017
       115 PUR 03/12/2017
       138 PUR 04/04/2017
       161 PUR 04/27/2017
       184 PUR 05/20/2017
       207 PUR 06/12/2017
       230 PUR 07/05/2017
       414 PUR 01/05/2018
       437 PUR 01/28/2018
       460 PUR 02/20/2018
       483 PUR 03/15/2018
       506 PUR 04/07/2018
       529 PUR 04/30/2018
       552 PUR 05/23/2018
       575 PUR 06/15/2018
       782 PUR 01/08/2019
       805 PUR 01/31/2019
       828 PUR 02/23/2019
       851 PUR 03/18/2019
       874 PUR 04/10/2019
       897 PUR 05/03/2019
      2622 PUR 01/22/2024
      2645 PUR 02/14/2024
      2668 PUR 03/08/2024
      2691 PUR 03/31/2024
      2714 PUR 04/23/2024
      3335 PUR 01/04/2026
      3358 PUR 01/27/2026
      3381 PUR 02/19/2026
      3404 PUR 03/14/2026
      3427 PUR 04/06/2026
      3450 PUR 04/29/2026
      3473 PUR 05/22/2026
      3496 PUR 06/14/2026
      3703 PUR 01/07/2027
      3726 PUR 01/30/2027
      3749 PUR 02/22/2027
      3772 PUR 03/17/2027
      3795 PUR 04/09/2027
      3818 PUR 05/02/2027
      3841 PUR 05/25/2027
      3864 PUR 06/17/2027
      2737 PUR 05/16/2024
      2760 PUR 06/08/2024
      2783 PUR 07/01/2024
      2967 PUR 01/01/2025
      2990 PUR 01/24/2025
      3013 PUR 02/16/2025
      3036 PUR 03/11/2025
      3059 PUR 04/03/2025
      3082 PUR 04/26/2025
      3105 PUR 05/19/2025
      3128 PUR 06/11/2025
      3151 PUR 07/04/2025
      1518 PUR 01/13/2021
      1541 PUR 02/05/2021
      1564 PUR 02/28/2021
      1587 PUR 03/23/2021
      1610 PUR 04/15/2021
      1633 PUR 05/08/2021
      1656 PUR 05/31/2021
      1679 PUR 06/23/2021
      1886 PUR 01/16/2022
      1909 PUR 02/08/2022
      1932 PUR 03/03/2022

      MYID RES MYDT
---------- --- --------------------
      1955 PUR 03/26/2022
      1978 PUR 04/18/2022
      2001 PUR 05/11/2022
      2024 PUR 06/03/2022
      2047 PUR 06/26/2022
      2254 PUR 01/19/2023
      2277 PUR 02/11/2023
      2300 PUR 03/06/2023
      2323 PUR 03/29/2023
      2346 PUR 04/21/2023
      2369 PUR 05/14/2023
      2392 PUR 06/06/2023
      2415 PUR 06/29/2023
      5543 PUR 01/21/2032
      5566 PUR 02/13/2032
      5589 PUR 03/07/2032
      5612 PUR 03/30/2032
      5635 PUR 04/22/2032
      5658 PUR 05/15/2032
      5681 PUR 06/07/2032
      5704 PUR 06/30/2032
      5911 PUR 01/23/2033
      5934 PUR 02/15/2033
      5957 PUR 03/10/2033
      5980 PUR 04/02/2033
      6003 PUR 04/25/2033
      6026 PUR 05/18/2033
      4071 PUR 01/10/2028
      4094 PUR 02/02/2028
      4117 PUR 02/25/2028
      4140 PUR 03/19/2028
      4163 PUR 04/11/2028
      4186 PUR 05/04/2028
      4209 PUR 05/27/2028
      4232 PUR 06/19/2028
      4439 PUR 01/12/2029
      4462 PUR 02/04/2029
      4485 PUR 02/27/2029
      4508 PUR 03/22/2029
      4531 PUR 04/14/2029
      4554 PUR 05/07/2029
      4577 PUR 05/30/2029
      4600 PUR 06/22/2029
      4807 PUR 01/15/2030
      4830 PUR 02/07/2030
      4853 PUR 03/02/2030
      4876 PUR 03/25/2030
      4899 PUR 04/17/2030
      4922 PUR 05/10/2030
      4945 PUR 06/02/2030
      4968 PUR 06/25/2030
      5175 PUR 01/18/2031
      5198 PUR 02/10/2031
      5221 PUR 03/05/2031
      5244 PUR 03/28/2031
      5267 PUR 04/20/2031
      5290 PUR 05/13/2031
      5313 PUR 06/05/2031
      5336 PUR 06/28/2031
      6992 PUR 01/09/2036
      7015 PUR 02/01/2036
      7038 PUR 02/24/2036
      7061 PUR 03/18/2036
      7084 PUR 04/10/2036
      7107 PUR 05/03/2036
      7130 PUR 05/26/2036
      7153 PUR 06/18/2036
      7866 PUR 06/01/2038
      7889 PUR 06/24/2038
      8096 PUR 01/17/2039
      8119 PUR 02/09/2039
      8142 PUR 03/04/2039
      8165 PUR 03/27/2039
      8188 PUR 04/19/2039
      8211 PUR 05/12/2039
      8234 PUR 06/04/2039
      8257 PUR 06/27/2039

      MYID RES MYDT
---------- --- --------------------
      7360 PUR 01/11/2037
      7383 PUR 02/03/2037
      7406 PUR 02/26/2037
      7429 PUR 03/21/2037
      7452 PUR 04/13/2037
      7475 PUR 05/06/2037
      7498 PUR 05/29/2037
      7521 PUR 06/21/2037
      7728 PUR 01/14/2038
      7751 PUR 02/06/2038
      7774 PUR 03/01/2038
      7797 PUR 03/24/2038
      7820 PUR 04/16/2038
      7843 PUR 05/09/2038
      6049 PUR 06/10/2033
      6072 PUR 07/03/2033
      6256 PUR 01/03/2034
      6279 PUR 01/26/2034
      6302 PUR 02/18/2034
      6325 PUR 03/13/2034
      6348 PUR 04/05/2034
      6371 PUR 04/28/2034
      6394 PUR 05/21/2034
      6417 PUR 06/13/2034
      6624 PUR 01/06/2035
      6647 PUR 01/29/2035
      6670 PUR 02/21/2035
      6693 PUR 03/16/2035
      6716 PUR 04/08/2035
      6739 PUR 05/01/2035
      6762 PUR 05/24/2035
      6785 PUR 06/16/2035
      9982 PUR 03/17/2044
      8464 PUR 01/20/2040
      8487 PUR 02/12/2040
      8510 PUR 03/06/2040
      8533 PUR 03/29/2040
      8556 PUR 04/21/2040
      8579 PUR 05/14/2040
      8602 PUR 06/06/2040
      8625 PUR 06/29/2040
      8832 PUR 01/22/2041
      8855 PUR 02/14/2041
      8878 PUR 03/09/2041
      8901 PUR 04/01/2041
      8924 PUR 04/24/2041
      8947 PUR 05/17/2041
      8970 PUR 06/09/2041
      8993 PUR 07/02/2041
      9177 PUR 01/02/2042
      9200 PUR 01/25/2042
      9223 PUR 02/17/2042
      9246 PUR 03/12/2042
      9269 PUR 04/04/2042
      9292 PUR 04/27/2042
      9315 PUR 05/20/2042
      9338 PUR 06/12/2042
      9545 PUR 01/05/2043
      9568 PUR 01/28/2043
      9591 PUR 02/20/2043
      9614 PUR 03/15/2043
      9637 PUR 04/07/2043
      9660 PUR 04/30/2043
      9683 PUR 05/23/2043
      9706 PUR 06/15/2043
      9913 PUR 01/08/2044
      9936 PUR 01/31/2044
      9959 PUR 02/23/2044

222 rows selected.

SQL>

The query should have returned no more than 10 rows that met the criteria, and it returned 222. Looking at the plan we see:


SQL>
SQL> select * From table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6qatrtphp5wjt, child number 0
-------------------------------------
select * from datetst where res_value = 'PUR' and mydt <=
to_char(sysdate +230, 'MM/DD/RRRR')

Plan hash value: 3255216368

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATETST      |   222 |  6216 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | DATETEST_IDX |   434 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MYDT"<=TO_CHAR(SYSDATE@!+230,'MM/DD/RRRR'))
   2 - access("RES_VALUE"='PUR')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


25 rows selected.

SQL>

Oracle found 222 rows that ‘matched’ the conditions, illustrating a problem of using an incorrect datatype; Oracle can’t know these are dates and compares them as ASCII strings creating a result set that is much larger than it should be. We drop the table and start over:


SQL>
SQL> drop table datetst purge;

Table dropped.

SQL>
SQL> create table datetst(
  2  myid    number,
  3  res_value varchar2(3),
  4  mydt    date);

Table created.

SQL>
SQL> create index datetest_idx on datetst(res_value);

Index created.

SQL>
SQL> begin
  2  	     for i in 1..10000 loop
  3  		     if mod(i,23) = 0 then
  4  		     insert into datetst
  5  		     values(i, 'PUR', sysdate+i);
  6  		     else
  7  		     insert into datetst
  8  		     values(i, 'BID', sysdate+i);
  9  		     end if;
 10  	     end loop;
 11
 12  	     commit;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>

We now run the original query (that didn’t have explicit date conversion, since we no longer need it) and examine the results:


SQL>
SQL> select *
  2  from datetst
  3  where res_value = 'PUR'
  4  and mydt <= sysdate +230;

      MYID RES MYDT
---------- --- ---------
        23 PUR 10-DEC-16
        46 PUR 02-JAN-17
        69 PUR 25-JAN-17
        92 PUR 17-FEB-17
       115 PUR 12-MAR-17
       138 PUR 04-APR-17
       161 PUR 27-APR-17
       184 PUR 20-MAY-17
       207 PUR 12-JUN-17
       230 PUR 05-JUL-17

10 rows selected.

SQL>
SQL> select * From table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0m2c2mv7zhx49, child number 0
-------------------------------------
select * from datetst where res_value = 'PUR' and mydt <= sysdate +230

Plan hash value: 3255216368

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATETST      |    10 |   250 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | DATETEST_IDX |   434 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MYDT"<=SYSDATE@!+230)
   2 - access("RES_VALUE"='PUR')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


24 rows selected.

SQL>

Oracle now found the 10 rows we sought using the conditions we specified because the date data was correctly stored as a DATE datatype. Using VARCHAR2 made the result set 2,220 percent larger, and that was for a 10000 row table. Let’s re-run the example with 1,000,000 rows and see what numbers Oracle produces:


SQL>
SQL> create table datetst(
  2  myid    number,
  3  res_value	varchar2(3),
  4  mydt    varchar2(20));

Table created.

SQL>
SQL> create index datetest_idx on datetst(res_value);

Index created.

SQL>
SQL> begin
  2  	     for i in 1..1000000 loop
  3  		     if mod(i,23) = 0 then
  4  		     insert into datetst
  5  		     values(i, 'PUR', to_char(sysdate+i, 'MM/DD/RRRR'));
  6  		     else
  7  		     insert into datetst
  8  		     values(i, 'BID', to_char(sysdate+i, 'MM/DD/RRRR'));
  9  		     end if;
 10  	     end loop;
 11
 12  	     commit;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>
SQL> select *
  2  from datetst
  3  where mydt <= sysdate +230;
where mydt <= sysdate +230
      *
ERROR at line 3:
ORA-01843: not a valid month


SQL>
SQL> select *
  2  from datetst
  3  where res_value = 'PUR'
  4  and mydt <= to_char(sysdate +230, 'MM/DD/RRRR');

      MYID RES MYDT
---------- --- --------------------
       920 PUR 05/26/2019
       943 PUR 06/18/2019
      1150 PUR 01/11/2020
      1173 PUR 02/03/2020
      1196 PUR 02/26/2020
      1219 PUR 03/20/2020
      1242 PUR 04/12/2020
      1265 PUR 05/05/2020
      1288 PUR 05/28/2020
      1311 PUR 06/20/2020
        46 PUR 01/02/2017
        69 PUR 01/25/2017
        92 PUR 02/17/2017
       115 PUR 03/12/2017
       138 PUR 04/04/2017
       161 PUR 04/27/2017
       184 PUR 05/20/2017
       207 PUR 06/12/2017
       230 PUR 07/05/2017
       414 PUR 01/05/2018
       437 PUR 01/28/2018
       460 PUR 02/20/2018
       483 PUR 03/15/2018
       506 PUR 04/07/2018
       529 PUR 04/30/2018
       552 PUR 05/23/2018
       575 PUR 06/15/2018
       782 PUR 01/08/2019
       805 PUR 01/31/2019
       828 PUR 02/23/2019
       851 PUR 03/18/2019
       874 PUR 04/10/2019
       897 PUR 05/03/2019
      2622 PUR 01/22/2024
      2645 PUR 02/14/2024
      2668 PUR 03/08/2024
      2691 PUR 03/31/2024
      2714 PUR 04/23/2024
      3335 PUR 01/04/2026
      3358 PUR 01/27/2026
      3381 PUR 02/19/2026
      3404 PUR 03/14/2026
      3427 PUR 04/06/2026
      3450 PUR 04/29/2026
      3473 PUR 05/22/2026
      3496 PUR 06/14/2026
      3703 PUR 01/07/2027
      3726 PUR 01/30/2027
      3749 PUR 02/22/2027
      3772 PUR 03/17/2027
      3795 PUR 04/09/2027
      3818 PUR 05/02/2027
      3841 PUR 05/25/2027
      3864 PUR 06/17/2027
      2737 PUR 05/16/2024
      2760 PUR 06/08/2024
      2783 PUR 07/01/2024
      2967 PUR 01/01/2025
      2990 PUR 01/24/2025
      3013 PUR 02/16/2025
      3036 PUR 03/11/2025
      3059 PUR 04/03/2025
      3082 PUR 04/26/2025
      3105 PUR 05/19/2025
      3128 PUR 06/11/2025
      3151 PUR 07/04/2025
      1518 PUR 01/13/2021
      1541 PUR 02/05/2021
      1564 PUR 02/28/2021
      1587 PUR 03/23/2021
      1610 PUR 04/15/2021
      1633 PUR 05/08/2021
      1656 PUR 05/31/2021
      1679 PUR 06/23/2021
      1886 PUR 01/16/2022
      1909 PUR 02/08/2022
      1932 PUR 03/03/2022
...
    986309 PUR 04/21/4717
    986332 PUR 05/14/4717
    986355 PUR 06/06/4717
    986378 PUR 06/29/4717
    990955 PUR 01/09/4730
    990978 PUR 02/01/4730
    991001 PUR 02/24/4730
    991024 PUR 03/19/4730
    995394 PUR 03/06/4742
    995417 PUR 03/29/4742
    995440 PUR 04/21/4742
    995463 PUR 05/14/4742
    995486 PUR 06/06/4742
    995509 PUR 06/29/4742
    986585 PUR 01/22/4718
    986608 PUR 02/14/4718
    986631 PUR 03/09/4718
    986654 PUR 04/01/4718
    986677 PUR 04/24/4718
    991047 PUR 04/11/4730
    991070 PUR 05/04/4730
    991093 PUR 05/27/4730
    991116 PUR 06/19/4730
    991323 PUR 01/12/4731
    995716 PUR 01/22/4743
    995739 PUR 02/14/4743
    995762 PUR 03/09/4743
    995785 PUR 04/01/4743
    995808 PUR 04/24/4743
    995831 PUR 05/17/4743
    995854 PUR 06/09/4743
    995877 PUR 07/02/4743

22054 rows selected.

SQL>
SQL> select * From table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6qatrtphp5wjt, child number 0
-------------------------------------
select * from datetst where res_value = 'PUR' and mydt <=
to_char(sysdate +230, 'MM/DD/RRRR')

Plan hash value: 3255216368

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |       |       |   158 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATETST      | 25757 |   704K|   158   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | DATETEST_IDX | 49112 |       |   158   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MYDT"<=TO_CHAR(SYSDATE@!+230,'MM/DD/RRRR'))
   2 - access("RES_VALUE"='PUR')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


25 rows selected.

SQL>
SQL> drop table datetst purge;

Table dropped.

SQL>

SQL> create table datetst(
  2  myid    number,
  3  res_value varchar2(3),
  4  mydt    date);

Table created.

SQL>
SQL> create index datetest_idx on datetst(res_value);

Index created.

SQL>
SQL> begin
  2  	     for i in 1..1000000 loop
  3  		     if mod(i,23) = 0 then
  4  		     insert into datetst
  5  		     values(i, 'PUR', sysdate+i);
  6  		     else
  7  		     insert into datetst
  8  		     values(i, 'BID', sysdate+i);
  9  		     end if;
 10  	     end loop;
 11
 12  	     commit;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>
SQL> select *
  2  from datetst
  3  where res_value = 'PUR'
  4  and mydt <= sysdate +230;

      MYID RES MYDT
---------- --- ---------
        23 PUR 10-DEC-16
        46 PUR 02-JAN-17
        69 PUR 25-JAN-17
        92 PUR 17-FEB-17
       115 PUR 12-MAR-17
       138 PUR 04-APR-17
       161 PUR 27-APR-17
       184 PUR 20-MAY-17
       207 PUR 12-JUN-17
       230 PUR 05-JUL-17

10 rows selected.

SQL>
SQL> select * From table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0m2c2mv7zhx49, child number 0
-------------------------------------
select * from datetst where res_value = 'PUR' and mydt <= sysdate +230

Plan hash value: 3255216368

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |       |       |   148 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATETST      |    35 |   875 |   148   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | DATETEST_IDX | 46482 |       |   148   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MYDT"<=SYSDATE@!+230)
   2 - access("RES_VALUE"='PUR')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


24 rows selected.

SQL>

With 1,000,000 rows of data Oracle inflated the original 10-row result set to 22,054 rows, a whopping 220,440 percent increase. As the data volumes increase this result set will increase to even larger numbers, resulting in extremely long query times and vast numbers of incorrect results, something I doubt the original poster had counted on.

Using the correct datatype is essential in ensuring Oracle can do its job and do it properly, returning result sets that are reliable. Storing data in formst that doesn’t reflect the actual data type can be disastrous, as illustrated here. It pays when writing or purchasing applications that the proper datatype is in force for the columns being used.

It only makes sense.

November 14, 2016

“That’s Not What I Wanted.”

Filed under: General — dfitzjarrell @ 11:40

"Let me try once more," Milo said in an effort to explain. "In other words--"
"You mean you have other words?" cried the bird happily. "Well, by all means, 
use them. You're certainly not doing very well with the ones you have now."
-- Norton Juster, The Phantom Tollbooth

A recent question in the Oracle database forums has a user wanting to open his physical standby database in READ WRITE mode but still keep it as a physical standby. He found this blog post, followed the instructions and did, indeed, get what used to be his standby database opened in READ WRITE mode. What the blogger failed to mention was that his instructions performed a failover of the standby to primary, and did so in a manner that Oracle does not recommend.

To be honest there isn’t a way to open a physical standby database READ WRITE, keep the primary as a primary and maintain the physical standby as a physical standby. Active Data Guard allows the physical standby to be opened READ ONLY while continuing to apply redo to keep the standby in synch with the primary. (This option has extra cost as it requires additional licensing.) Once a failover has occurred the only recourse is to rebuild the standby database; it can’t be simply ‘brought back’ at the will and whim of the DBA.

What was suggested by the author of that blog post was to use:


alter database activate standby database;

which WILL eventually open the standby READ WRITE, failing it over to be the new primary in the process. Yet Oracle, in the documentation, clearly does not recommend using this option:


ACTIVATE [PHYSICAL|LOGICAL] STANDBY DATABASE [FINISH APPLY]

Performs a failover. The standby database must be mounted before it can be activated with this statement.

Note: Do not use the ALTER DATABASE ACTIVATE STANDBY DATABASE statement to failover because it causes
data loss. Instead, use the following best practices:

For physical standby databases, use the ALTER DATABASE RECOVER MANAGED STANDBY DATABASE statement 
with the FINISH keyword to perform the role transition as quickly as possible with little or no data loss
and without rendering other standby databases unusable.

For logical standby databases, use the ALTER DATABASE PREPARE TO SWITCHOVER and
ALTER DATABASE COMMIT TO SWITCHOVER statements.

Notice the warning given by Oracle:


Note: Do not use the ALTER DATABASE ACTIVATE STANDBY DATABASE statement to failover because it causes
data loss.

A far better option, although not permanent, would be to execute a switchover, where the primary and physical standby switch roles for a period of time before restoring the physical standby to its original role as PHYSICAL STANDBY. This takes more work to accomplish but it is reversible. An example of a switchover is shown below:


Ensure that no gaps exist in the redo apply:

SQL> select status, gap_status
  2  from v$archive_dest_status
  3  where dest_id = 2;
 
STATUS    GAP_STATUS
--------- ------------------------
VALID     NO GAP
 
SQL>

Make certain that the temp file name or names on the standby match those on the primary.  On the primary execute:

SQL> select file_name from dba_temp_files;
 
FILE_NAME
--------------------------------------------------------------------------------
/u01/app/oradata/yumpo/temp01.dbf
 
SQL>

and on the standby execute:

SQL> select name from v$tempfile;
 
NAME
--------------------------------------------------------------------------------
/u01/app/oradata/yumpo/temp01.dbf
 
SQL> 

There may be a delay configured for redo apply on the standby; to determine if a delay is configured execute:

SQL> select delay_mins from v$archive_dest where dest_id = 2; 
 
DELAY_MINS
----------
         0
 
SQL>

(If the DELAY_MINS is greater than 0 the delay can be removed by executing:

 ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY;

)

Check the SWITCHOVER_STATUS on the primary:

SQL> select switchover_status from v$database;
 
SWITCHOVER_STATUS
--------------------
TO STANDBY
 
SQL>

Valid values are TO STANDBY and SESSION ACTIVE.  Any other value signals that the redo transport 
isn't functioning and the switchover will fail.  If the status is SESSION ACTIVE query the
V$SESSION view for active sessions:

SQL> select sid, process, program
  2  from v$session where type = 'USER';

       SID PROCESS                  PROGRAM
---------- ------------------------ ---------------------------------------
         6 16952:11396              sqlplus.exe
        17 15728                    ORACLE.EXE (J002)
        18 10668                    ORACLE.EXE (J006)
       131 13196                    ORACLE.EXE (J003)
       134 16576:16772              sqlplus.exe
       135 9720                     ORACLE.EXE (J007)
       244 17404                    ORACLE.EXE (J004)
       248 17844                    ORACLE.EXE (J008)
       254 12992                    ORACLE.EXE (J000)
       364 8280                     ORACLE.EXE (J005)
       366 2320                     ORACLE.EXE (J001)

11 rows selected.

SQL>

When everything looks good execute the switchover; if the status is TO STANDBY the command is:

SQL> alter database commit to switchover to physical standby;
 
Database altered.

SQL>

If the status is SESSION ACTIVE you should modify the command as shown below:

SQL> alter database commit to switchover to physical standby with session shutdown;
 
Database altered.
 
SQL>

Shutdown the 'new' standby and afterwards mount it:

SQL> shutdown immediate
ORA-01507: database not mounted
 
 
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
 
...
Database mounted.
SQL> 

If all has gone according to plan the alert log on the 'new' standby should report:

Successful mount of redo thread 1, with mount id ...
Physical Standby Database mounted.

It's now time to check the 'old' standby for SWITCHOVER_STATUS.  The two valid values are  
TO PRIMARY and SESSIONS ACTIVE:

SQL> select switchover_status from v$database;
 
SWITCHOVER_STATUS
--------------------
TO PRIMARY
 
SQL>

Since no sessions are active the WITH SESSION SHUTDOWN clause will not be necessary:

SQL> alter database commit to switchover to primary;
 
Database altered.
 
SQL>

Open the 'new' primary database:

SQL> alter database open;
 
Database altered.
 
SQL>

Looking in the alert log you should see:

Standby became primary SCN: ...
Switchover: Complete - Database mounted as primary
Completed: alter database commit to switchover to primary

Check the LOG_ARCHIVE_DEST_n parameters and set up a new destination, if necessary:


SQL> alter system set log_archive_dest_2='service="yumpo" LGWR ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name="yumpo" net_timeout=30 valid_for=(all_logfiles,primary_role)' scope = both;

 
System altered.
 
SQL>

Start the redo apply on the 'new' standby:

SQL> alter database recover managed standby database using current logfile disconnect from session;
 
Database altered.
 
SQL>

It's a good idea to verify that the redo transport is working; execute at least two log switches
on the 'new' primary to prove the redo is being transported:

SQL> alter system switch logfile;
 
System altered.
 
SQL> /
 
System altered.
 
SQL> /
 
System altered.
 
SQL>/
 
System altered.
 
SQL> select status, gap_status from v$archive_dest_status where dest_id = 2;
 
STATUS    GAP_STATUS
--------- ------------------------
VALID     NO GAP
 
SQL>

Presuming the status is VALID and there is no gap the switchover is complete.

To reverse the switchover the same steps are applied, starting at the ‘new’ primary and ending with the ‘new’ standby. Should there be no problems the primary database will be returned to the PRIMARY role, likewise the physical standby will return to the PHYSICAL STANDBY role. In between the switchover steps the original physical standby will be temporarily the primary database, opened in READ WRITE mode.

If you have Data Guard Broker configured a switchover is considerably easier to execute; you can read the Oracle documentation for a complete example.

Understanding Data Guard/Standby concepts is essential for knowing what can, and cannot, be done. And blog advice, even from this one, should be investigated before taking any action as the title and text may not accurately reflect what the recommended actions actually accomplish. It would be a shame to put yourself in an unrecoverable situation just because you jumped first and looked later.

In other words …

November 13, 2016

“If You Change This, Right Here …”

Filed under: Performance,stats — dfitzjarrell @ 10:47

"changes are so frightening."
-- Norton Juster, The Phantom Tollbooth

It isn’t unusual for Oracle to expand functionality with each new release; a short list includes AWR, ADDM, the SYSAUX tablespace, Flashback Query, Flashback Database and Edition-based Redefinition. Oracle 12.1 is no different as it provides an interesting performance enhancement, the in-memory column store, that can increase performance by reducing physical I/O load by making tables memory-resident (to simplify the description somewhat). In addition to the in-memory access to table data Oracle has introduced a new optimization called “Vector Transformation, including Vector Optimization” for in-memory operations. Even when this new optimization isn’t in use there are new code paths to take advantage of this technology. As a result execution plans can change; let’s look at an example originally provided by Jonathan Lewis that illustrates this.

(In-memory is fairly easy to configure. Set inmemory_size to a ‘reasonable’ non-zero value and then restart the database:


SQL> alter system set inmemory_size=500m scope=spfile;

System altered.

SQL>

For a complete reference to in-memory configuration read here.)

Two tables are created, neither one especially noteworthy, an index is created on each table and extended statistics are generated on the (n1,n2) column grouping:


SQL> 
SQL> create table t1
  2  nologging
  3  as
  4  with generator as (
  5  	     select  --+ materialize
  6  		     rownum id
  7  	     from dual
  8  	     connect by
  9  		     level  
SQL> create table t2
  2  nologging
  3  as
  4  with generator as (
  5  	     select  --+ materialize
  6  		     rownum id
  7  	     from dual
  8  	     connect by
  9  		     level 'T1',
  5  		     method_opt       => 'for columns (n1,n2) size 1'
  6  	     );
  7  end;
  8  /
SQL> 

Executing the following query and displaying the execution plan shows no surprises since neither table has been altered to be in-memory:


SQL> 
SQL> select
  2  	     /*+
  3  		     qb_name(main)
  4  	     */
  5  	     count(*)
  6  from    (
  7  	     select
  8  		     /*+ qb_name(inline) */
  9  		     distinct t1.v1, t2.v1
 10  	     from
 11  		     t1,t2
 12  	     where
 13  		     t1.n1 = 50
 14  	     and     t1.n2 = 50
 15  	     and     t2.rand = t1.id
 16  	     )
 17  ;

      9912                                                                                                                                            
SQL> 
SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  a7gb3ncf922zz, child number 0                                                                                                                 
-------------------------------------                                                                                                                 
select         /*+                 qb_name(main)         */                                                                                           
count(*) from    (         select                 /*+ qb_name(inline)                                                                                 
*/                 distinct t1.v1, t2.v1         from                                                                                                 
t1,t2         where                 t1.n1 = 50         and     t1.n2 =                                                                                
50         and     t2.rand = t1.id         )                                                                                                          
                                                                                                                                                      
Plan hash value: 1718706536                                                                                                                           
                                                                                                                                                      
-------------------------------------------------------------------------------------------------                                                     
| Id  | Operation                               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |                                                     
-------------------------------------------------------------------------------------------------                                                     
|   0 | SELECT STATEMENT                        |       |       |       |  5127 (100)|          |                                                     
|   1 |  SORT AGGREGATE                         |       |     1 |       |            |          |                                                     
|   2 |   VIEW                                  |       | 10001 |       |  5127   (1)| 00:00:01 |                                                     
|   3 |    HASH UNIQUE                          |       | 10001 |   351K|  5127   (1)| 00:00:01 |                                                     
|*  4 |     HASH JOIN                           |       | 10001 |   351K|  5125   (1)| 00:00:01 |                                                     
|*  5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   100 |  2100 |     3   (0)| 00:00:01 |                                                     
|*  6 |       INDEX RANGE SCAN                  | T1_N1 |   100 |       |     1   (0)| 00:00:01 |                                                     
|   7 |      TABLE ACCESS FULL                  | T2    |  1000K|    14M|  5113   (1)| 00:00:01 |                                                     
-------------------------------------------------------------------------------------------------                                                     
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   4 - access("T2"."RAND"="T1"."ID")                                                                                                                  
   5 - filter("T1"."N2"=50)                                                                                                                           
   6 - access("T1"."N1"=50)                                                                                                                           
                                                                                                                                                      
SQL> 

Having the extended statistics allowed the optimizer to estimate the correct cardinality between columns n1 and n2, providing a better initial plan. Now things will get a bit interesting; table T2 is altered and put in-memory:


SQL> 
SQL> alter table t2 inmemory;
SQL> 

The same query is executed again and the plan changes because of T2 now being in-memory, even though the new vector transformation has not been applied:


SQL> 
SQL> select
  2  	     /*+
  3  		     qb_name(main)
  4  	     */
  5  	     count(*)
  6  from    (
  7  	     select
  8  		     /*+ qb_name(inline) */
  9  		     distinct t1.v1, t2.v1
 10  	     from
 11  		     t1,t2
 12  	     where
 13  		     t1.n1 = 50
 14  	     and     t1.n2 = 50
 15  	     and     t2.rand = t1.id
 16  	     )
 17  ;

      9912                                                                                                                                            
SQL> 
SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  a7gb3ncf922zz, child number 0                                                                                                                 
-------------------------------------                                                                                                                 
select         /*+                 qb_name(main)         */                                                                                           
count(*) from    (         select                 /*+ qb_name(inline)                                                                                 
*/                 distinct t1.v1, t2.v1         from                                                                                                 
t1,t2         where                 t1.n1 = 50         and     t1.n2 =                                                                                
50         and     t2.rand = t1.id         )                                                                                                          
                                                                                                                                                      
Plan hash value: 106371239                                                                                                                            
                                                                                                                                                      
----------------------------------------------------------------------------------------------------                                                  
| Id  | Operation                                | Name    | Rows  | Bytes | Cost (%CPU)| Time     |                                                  
----------------------------------------------------------------------------------------------------                                                  
|   0 | SELECT STATEMENT                         |         |       |       |   223 (100)|          |                                                  
|   1 |  SORT AGGREGATE                          |         |     1 |       |            |          |                                                  
|   2 |   VIEW                                   |         | 10001 |       |   223  (15)| 00:00:01 |                                                  
|   3 |    HASH UNIQUE                           |         | 10001 |   351K|   223  (15)| 00:00:01 |                                                  
|*  4 |     HASH JOIN                            |         | 10001 |   351K|   222  (14)| 00:00:01 |                                                  
|   5 |      JOIN FILTER CREATE                  | :BF0000 |   100 |  2100 |     3   (0)| 00:00:01 |
|*  6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1      |   100 |  2100 |     3   (0)| 00:00:01 |                                                  
|*  7 |        INDEX RANGE SCAN                  | T1_N1   |   100 |       |     1   (0)| 00:00:01 |                                                  
|   8 |      JOIN FILTER USE                     | :BF0000 |  1000K|    14M|   209  (11)| 00:00:01 |
|*  9 |       TABLE ACCESS INMEMORY FULL         | T2      |  1000K|    14M|   209  (11)| 00:00:01 |                                                  
----------------------------------------------------------------------------------------------------                                                  
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   4 - access("T2"."RAND"="T1"."ID")                                                                                                                  
   6 - filter("T1"."N2"=50)                                                                                                                           
   7 - access("T1"."N1"=50)                                                                                                                           
   9 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))
                                                                                                                                                      
SQL> 

The cost of the tablescan was reduced considerably, from 5113 to 209 but that isn’t the most thought-provoking part of the plan, however, as a serial Bloom filter is used for the hash join due to the in-memory code path. With a table stored ‘normally’ (not in-memory) a Bloom filter would not be effective; with the in-memory option a Bloom filter can use enhancements such as simultaneous multiple probe access into table t2 (in this example) in addition to the benefits of avoiding disk access, disk buffer activity and row-by-row data retrieval. All of those avoidances reduce CPU time considerably when compared to a ‘regular’ table, one that is not in-memory.

The in-memory column store is a very nice addition to an already powerful database, made even better by new optimizer transformations that speed data access by providing execution paths not available for tables accessed from disk. ‘Experiment’ with the in-memory option to see what performance improvements it might provide; you might be pleasantly surprised at what you find.

Sometimes unexpected change is good.

October 29, 2016

It’s A Match

Filed under: General — dfitzjarrell @ 10:29

"So many things are possible as long as you don't know they're impossible."
-- Norton Juster, The Phantom Tollbooth

With every new release Oracle provides new features that are very useful, even if you didn’t think of them before. In 12.1.0.2 Oracle provides a new functionality for finding pattern matches in a data set; using the MATCH_RECOGNIZE function it’s fairly easy to generate results based on a defined pattern in the data. Let’s look at some examples and see how to use MATCH_RECOGNIZE to find and report user-defined patterns in a data set. The examples provided were taken from the Live SQL website (https://livesql.oracle.com) and from the Oracle documentation (https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8981), where MATCH_RECOGNIZE is more thoroughly explained. That being said let’s go through an introduction to the function and some of what it can accomplish.

These examples begin with a table of click data, similar to what an advertising agency might collect from users clicking on an advertisement. Sessions are defined as being 10 minutes or more apart; all data within a 10-minute window for the same user is considered to be the same session. We create the table and load the data:


SQL> 
SQL> --
SQL> -- Create table and populate
SQL> --
SQL> CREATE TABLE clickdata (tstamp integer, userid varchar2(15));

Table created.

SQL> INSERT INTO clickdata VALUES(1, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(2, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(11, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(12, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(22, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(23, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(32, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(34, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(43, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(44, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(47, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(48, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(53, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(59, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(60, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(63, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(68, 'Sam');

1 row created.

SQL> commit;

Commit complete.

Using a very basic MATCH_RECOGNIZE query we return the timestamp and the user id to see how many sessions each user had:


SQL> 
SQL> --
SQL> -- Return rows that are part of the same
SQL> -- session for a login
SQL> --
SQL> -- Any session tstamp value within 10
SQL> -- minutes of the prior tstamp is considered
SQL> -- part of the same session
SQL> --
SQL> -- In MATCH_RECOGNIZE the PATTERN is the
SQL> -- sequence or data pattern we are looking for
SQL> --
SQL> -- In this case b maps to any row, which gives
SQL> -- the ability to check for matches
SQL> --
SQL> -- s* states that 0 or more rows must be mapped
SQL> -- to confirm a match
SQL> --
SQL> -- s is defined as the difference between the
SQL> -- current tstamp value and the prior tstamp
SQL> -- value and the difference must be less than
SQL> -- or equal to 10
SQL> --
SQL> -- This will give a list of the activity of the
SQL> -- different continuous sessions
SQL> --
SQL> -- The ALL ROWS PER MATCH returns every matching
SQL> -- row
SQL> --
SQL> 
SQL> SELECT
  2   tstamp,
  3   userid
  4  FROM clickdata MATCH_RECOGNIZE(
  5  	ALL ROWS PER MATCH
  6  	PATTERN (b s*)
  7  	DEFINE
  8  	    s as (s.tstamp - prev(s.tstamp) <= 10)
  9   );

    TSTAMP USERID                                                                                                                                     
---------- ---------------                                                                                                                            
         1 Mary                                                                                                                                       
         2 Sam                                                                                                                                        
        11 Mary                                                                                                                                       
        12 Sam                                                                                                                                        
        22 Sam                                                                                                                                        
        23 Mary                                                                                                                                       
        32 Sam                                                                                                                                        
        34 Mary                                                                                                                                       
        43 Sam                                                                                                                                        
        44 Mary                                                                                                                                       
        47 Sam                                                                                                                                        

    TSTAMP USERID                                                                                                                                     
---------- ---------------                                                                                                                            
        48 Sam                                                                                                                                        
        53 Mary                                                                                                                                       
        59 Sam                                                                                                                                        
        60 Sam                                                                                                                                        
        63 Mary                                                                                                                                       
        68 Sam                                                                                                                                        

17 rows selected.

Two interesting things are found in this construct, the PATTERN keyword and the DEFINE keyword. PATTERN defines a generalized pattern, using Perl-style regular expression syntax, to declare a particular pattern in the data. DEFINE sets up the definitions for the symbols used in the PATTERN statement. Since b has no conditions on it every row is returned; the conditions on s state that the difference between the current timestamp and the previous timestamp must be less than or equal to 10. There are 17 rows of data in the table and all 17 are returned with this MATCH_RECOGNIZE query. Modifying that query to find the unique session identifier for each session fails because not enough pieces of the MATCH_RECOGNZE syntax are included; it ‘fails’ by returning 1 for the count of each session and that isn’t true:


SQL> 
SQL> --
SQL> -- The MATCH_NUMBER() Function should return the
SQL> -- session id for each different session
SQL> --
SQL> -- In this case it doesn't as we are missing a
SQL> -- piece of the puzzle
SQL> --
SQL> SELECT
  2   tstamp,
  3   userid,
  4   session_id
  5  FROM clickdata MATCH_RECOGNIZE(
  6  	MEASURES match_number() as session_id
  7  	ALL ROWS PER MATCH
  8  	PATTERN (b s*)
  9  	DEFINE
 10  	    s as (s.tstamp - prev(s.tstamp) <= 10)
 11   );

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
         1 Mary                     1                                                                                                                 
         2 Sam                      1                                                                                                                 
        11 Mary                     1                                                                                                                 
        12 Sam                      1                                                                                                                 
        22 Sam                      1                                                                                                                 
        23 Mary                     1                                                                                                                 
        32 Sam                      1                                                                                                                 
        34 Mary                     1                                                                                                                 
        43 Sam                      1                                                                                                                 
        44 Mary                     1                                                                                                                 
        47 Sam                      1                                                                                                                 

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
        48 Sam                      1                                                                                                                 
        53 Mary                     1                                                                                                                 
        59 Sam                      1                                                                                                                 
        60 Sam                      1                                                                                                                 
        63 Mary                     1                                                                                                                 
        68 Sam                      1                                                                                                                 

17 rows selected.

Proving Mary has three sessions is fairly easy:


SQL> 
SQL> --
SQL> -- As all records show 1 session this isn't accurate
SQL> -- since Mary has 3 separate sessions according to
SQL> -- the previously established criteria
SQL> --
SQL> SELECT *
  2  FROM clickdata
  3  WHERE userid = 'Mary';

    TSTAMP USERID                                                                                                                                     
---------- ---------------                                                                                                                            
         1 Mary                                                                                                                                       
        11 Mary                                                                                                                                       
        23 Mary                                                                                                                                       
        34 Mary                                                                                                                                       
        44 Mary                                                                                                                                       
        53 Mary                                                                                                                                       
        63 Mary                                                                                                                                       

7 rows selected.

The first two records belong to the first session, the third is another session and the remaining records make up the third session. Correcting the MATCH_RECOGNIZE query is a simply matter of partitioning (grouping) by userid and ordering each group by the tstamp values:


SQL> 
SQL> --
SQL> -- To 'fix' this we need to add some instructions
SQL> -- to the MATCH_RECOGNIZE function call
SQL> --
SQL> -- By partitioning  by userid  (collecting the data
SQL> -- based on the userid column) and ordering each
SQL> -- grouping by tstamp the MATCH_RECOGNIZE() call
SQL> -- can  accurately return the session id
SQL> --
SQL> SELECT
  2   tstamp,
  3   userid,
  4   session_id
  5  FROM clickdata MATCH_RECOGNIZE(
  6  	PARTITION BY userid ORDER BY tstamp
  7  	MEASURES match_number() as session_id
  8  	ALL ROWS PER MATCH
  9  	PATTERN (b s*)
 10  	DEFINE
 11  	    s as (s.tstamp - prev(s.tstamp) <=10)
 12   );

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
         1 Mary                     1                                                                                                                 
        11 Mary                     1                                                                                                                 
        23 Mary                     2                                                                                                                 
        34 Mary                     3                                                                                                                 
        44 Mary                     3                                                                                                                 
        53 Mary                     3                                                                                                                 
        63 Mary                     3                                                                                                                 
         2 Sam                      1                                                                                                                 
        12 Sam                      1                                                                                                                 
        22 Sam                      1                                                                                                                 
        32 Sam                      1                                                                                                                 

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
        43 Sam                      2                                                                                                                 
        47 Sam                      2                                                                                                                 
        48 Sam                      2                                                                                                                 
        59 Sam                      3                                                                                                                 
        60 Sam                      3                                                                                                                 
        68 Sam                      3                                                                                                                 

17 rows selected.

Now the report correctly displays that there were three sessions for each user.

Before we go further let’s look at some parts of the MATCH_RECOGNIZE statement by defining the keywords being used. PARTITION groups the data by the column or columns specified, similar to the way the analytic functions work. ORDER BY orders the data in each group by the column or columns provided. The MEASURES keyword allows you to define the values you want to return and how to compute them based on the pattern being matched. The MATCH_NUMBER() function returns the number of the pattern match, starting with 1 and increasing incrementally as each pattern match is found. The FIRST and LAST keywords return the first and last values for a given pattern match, and can be used in calculations. Finally the ALL ROWS PER MATCH directive returns one row for each criteria match in the data. Using these in the following modifications to the MATCH_RECOGNIZE query produces a report of the session number, the number of events in each session, the start time and the session duration:


SQL> 
SQL> --
SQL> -- Let's expand this report
SQL> --
SQL> -- Setting up the MEASURES clause allows us
SQL> -- to return:
SQL> --
SQL> --  * the number of events per session
SQL> --  * the start time of each session
SQL> --  * the session duration
SQL> --
SQL> -- We have already successfully returned the session id
SQL> -- so now we add the count(*) for each group, the
SQL> -- start time for each group and the elapsed time for
SQL> -- each group calculated from the last tstamp for
SQL> -- each group
SQL> --
SQL> -- In one group first and last are the same so the difference
SQL> -- is 0
SQL> --
SQL> SELECT
  2   tstamp,
  3   userid,
  4   session_id,
  5   no_of_events,
  6   start_time,
  7   session_duration
  8  FROM clickdata MATCH_RECOGNIZE(
  9  	PARTITION BY userid ORDER BY tstamp
 10  	MEASURES match_number() as session_id,
 11  		 count(*) as no_of_events,
 12  		 first(tstamp) start_time,
 13  		 last(tstamp) - first(tstamp) session_duration
 14  	ALL ROWS PER MATCH
 15  	PATTERN (b s*)
 16  	DEFINE
 17  	    s as (s.tstamp - prev(s.tstamp) <= 10)
 18   );

    TSTAMP USERID          SESSION_ID NO_OF_EVENTS START_TIME SESSION_DURATION                                                                        
---------- --------------- ---------- ------------ ---------- ----------------                                                                        
         1 Mary                     1            1          1                0                                                                        
        11 Mary                     1            2          1               10                                                                        
        23 Mary                     2            1         23                0                                                                        
        34 Mary                     3            1         34                0                                                                        
        44 Mary                     3            2         34               10                                                                        
        53 Mary                     3            3         34               19                                                                        
        63 Mary                     3            4         34               29                                                                        
         2 Sam                      1            1          2                0                                                                        
        12 Sam                      1            2          2               10                                                                        
        22 Sam                      1            3          2               20                                                                        
        32 Sam                      1            4          2               30                                                                        

    TSTAMP USERID          SESSION_ID NO_OF_EVENTS START_TIME SESSION_DURATION                                                                        
---------- --------------- ---------- ------------ ---------- ----------------                                                                        
        43 Sam                      2            1         43                0                                                                        
        47 Sam                      2            2         43                4                                                                        
        48 Sam                      2            3         43                5                                                                        
        59 Sam                      3            1         59                0                                                                        
        60 Sam                      3            2         59                1                                                                        
        68 Sam                      3            3         59                9                                                                        

17 rows selected.

As mentioned above using ALL ROWS PER MATCH can produce a long report. Taking this one step further and changing ALL ROWS PER MATCH to ONE ROW PER MATCH produces a shorter, summary report of the activity:


SQL> 
SQL> --
SQL> -- One of the nice aspects of MATCH_RECOGNIZE is the
SQL> -- ability to generate summary reports
SQL> --
SQL> -- The last report was nice but a bit lengthy
SQL> --
SQL> -- Using the ONE ROW PER MATCH directive and removing
SQL> -- the tstamp column from the select statement
SQL> -- allows MATCH_RECOGNIZE to return only the summary
SQL> -- information, making a more compact and, in this case,
SQL> -- usable report
SQL> --
SQL> SELECT
  2   userid,
  3   session_id,
  4   no_of_events,
  5   start_time,
  6   session_duration
  7  FROM clickdata MATCH_RECOGNIZE(
  8  	PARTITION BY userid ORDER BY tstamp
  9  	MEASURES match_number() as session_id,
 10  		 count(*) as no_of_events,
 11  		 first(tstamp) start_time,
 12  		 last(tstamp) - first(tstamp) session_duration
 13  	ONE ROW PER MATCH
 14  	PATTERN (b s*)
 15  	DEFINE
 16  	    s as (s.tstamp - prev(s.tstamp) 
 17   );

USERID          SESSION_ID NO_OF_EVENTS START_TIME SESSION_DURATION
--------------- ---------- ------------ ---------- ----------------
Mary                     1            2          1               10
Mary                     2            1         23                0
Mary                     3            4         34               29
Sam                      1            4          2               30
Sam                      2            3         43                5
Sam                      3            3         59                9

6 rows selected.

SQL>

Let’s create another table and data set, this time for stock prices for the month of April:


SQL> 
SQL> --
SQL> -- Create table for stock data
SQL> --
SQL> -- Populate the table with data
SQL> -- for April
SQL> --
SQL> CREATE TABLE Ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);

Table created.

SQL> 
SQL> INSERT INTO Ticker VALUES('ACME', '01-Apr-11', 12);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '02-Apr-11', 17);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '03-Apr-11', 19);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '04-Apr-11', 21);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '05-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '06-Apr-11', 12);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '07-Apr-11', 15);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '08-Apr-11', 20);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '09-Apr-11', 24);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '10-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '11-Apr-11', 19);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '12-Apr-11', 15);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '13-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '14-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '15-Apr-11', 14);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '16-Apr-11', 12);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '17-Apr-11', 14);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '18-Apr-11', 24);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '19-Apr-11', 23);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '20-Apr-11', 22);

1 row created.

SQL> 
SQL> COMMIT;

Commit complete.

This data has a different pattern structure, where V-shaped patterns are generated by the changing stock prices. Let’s use MATCH_RECOGNIZE to find and report those patterns. This first query will return the total number of patterns found in the data:


SQL> 
SQL> --
SQL> -- Use MATCH_RECOGNIZE to find areas
SQL> -- in the data where the stock price
SQL> -- 'dipped' then rose
SQL> --
SQL> -- The data provided describes three
SQL> -- patterns of dipping and rising prices
SQL> --
SQL> -- The query will return three rows,
SQL> -- one per matching pattern
SQL> --
SQL> -- The only stock symbol in the table
SQL> -- is 'ACME' but this query would return
SQL> -- one row for each pattern matched per
SQL> -- stock symbol
SQL> --
SQL> -- Here we group by symbol and order
SQL> -- by the tstamp values
SQL> --
SQL> -- The MEASURES section defines the
SQL> -- starting timestamp of the pattern,
SQL> -- the lowest value of the group and
SQL> -- the highest value after the low has
SQL> -- been reached
SQL> --
SQL> -- Note that in one case the ending
SQL> -- tstamp for one pattern is the starting
SQL> -- tstamp for the	next pattern
SQL> --
SQL> -- This query introduces a new wrinkle.
SQL> -- AFTER MATCH SKIP TO LAST UP, which
SQL> -- causes the pattern matching for the
SQL> -- next group to begin at the last UP
SQL> -- value from the current pattern
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3  	  PARTITION BY symbol
  4  	  ORDER BY tstamp
  5  	  MEASURES  STRT.tstamp AS start_tstamp,
  6  		    LAST(DOWN.tstamp) AS bottom_tstamp,
  7  		    LAST(UP.tstamp) AS end_tstamp
  8  	  ONE ROW PER MATCH
  9  	  AFTER MATCH SKIP TO LAST UP
 10  	  PATTERN (STRT DOWN+ UP+)
 11  	  DEFINE
 12  	     DOWN AS DOWN.price  PREV(UP.price)
 14  	  ) MR
 15  ORDER BY MR.symbol, MR.start_tstamp;

SYMBOL     START_TST BOTTOM_TS END_TSTAM                                                                                                              
---------- --------- --------- ---------                                                                                                              
ACME       05-APR-11 06-APR-11 10-APR-11                                                                                                              
ACME       10-APR-11 12-APR-11 13-APR-11                                                                                                              
ACME       14-APR-11 16-APR-11 18-APR-11                                                                                                              

We expand the report to return more data. The comments explain what is being returned and how the MEASURES are coded:


SQL> 
SQL> --
SQL> -- This next query returns all rows for
SQL> -- each pattern match and returns a bit more
SQL> -- data
SQL> --
SQL> -- There are 5 MEASURES being used
SQL> --
SQL> -- The start timestamp
SQL> --
SQL> -- The bottom of the v-shaped pattern
SQL> --
SQL> -- The top of the v-shaped pattern
SQL> --
SQL> -- The row number to know which values
SQL> -- belong to a given pattern match
SQL> --
SQL> -- The classifier (UP, DOWN, STRT)
SQL> -- to know where the value lies in the
SQL> -- pattern
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3  	  PARTITION BY symbol
  4  	  ORDER BY tstamp
  5  	  MEASURES  STRT.tstamp AS start_tstamp,
  6  		    FINAL LAST(DOWN.tstamp) AS bottom_tstamp,
  7  		    FINAL LAST(UP.tstamp) AS end_tstamp,
  8  		    MATCH_NUMBER() AS match_num,
  9  		    CLASSIFIER() AS var_match
 10  	  ALL ROWS PER MATCH
 11  	  AFTER MATCH SKIP TO LAST UP
 12  	  PATTERN (STRT DOWN+ UP+)
 13  	  DEFINE
 14  	     DOWN AS DOWN.price  PREV(UP.price)
 16  	  ) MR
 17  ORDER BY MR.symbol, MR.match_num, MR.tstamp;

SYMBOL     TSTAMP    START_TST BOTTOM_TS END_TSTAM  MATCH_NUM VAR_MATCH                           PRICE                                               
---------- --------- --------- --------- --------- ---------- ------------------------------ ----------                                               
ACME       05-APR-11 05-APR-11 06-APR-11 10-APR-11          1 STRT                                   25                                               
ACME       06-APR-11 05-APR-11 06-APR-11 10-APR-11          1 DOWN                                   12                                               
ACME       07-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     15                                               
ACME       08-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     20                                               
ACME       09-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     24                                               
ACME       10-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     25                                               
ACME       10-APR-11 10-APR-11 12-APR-11 13-APR-11          2 STRT                                   25                                               
ACME       11-APR-11 10-APR-11 12-APR-11 13-APR-11          2 DOWN                                   19                                               
ACME       12-APR-11 10-APR-11 12-APR-11 13-APR-11          2 DOWN                                   15                                               
ACME       13-APR-11 10-APR-11 12-APR-11 13-APR-11          2 UP                                     25                                               
ACME       14-APR-11 14-APR-11 16-APR-11 18-APR-11          3 STRT                                   25                                               

SYMBOL     TSTAMP    START_TST BOTTOM_TS END_TSTAM  MATCH_NUM VAR_MATCH                           PRICE                                               
---------- --------- --------- --------- --------- ---------- ------------------------------ ----------                                               
ACME       15-APR-11 14-APR-11 16-APR-11 18-APR-11          3 DOWN                                   14                                               
ACME       16-APR-11 14-APR-11 16-APR-11 18-APR-11          3 DOWN                                   12                                               
ACME       17-APR-11 14-APR-11 16-APR-11 18-APR-11          3 UP                                     14                                               
ACME       18-APR-11 14-APR-11 16-APR-11 18-APR-11          3 UP                                     24                                               

15 rows selected.

Going even further let’s generate a report that displayes where the row falls in the pattern (STRT, DOWN, UP), the total number of UP days in the pattern, the total number of days for each pattern, the day number in each pattern, the running price difference and the the starting price for the pattern. Remember that the FINAL keyword returns the last value for each pattern:


SQL> 
SQL> --
SQL> -- This query aggregates data and returns
SQL> -- the number of days mapped to the UP pattern
SQL> -- (up_days), the total number of days each
SQL> -- for each pattern found, the day number within
SQL> -- each match (a running count that resets when
SQL> -- a new pattern match begins) and the running
SQL> -- price difference between each row in a match
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3    PARTITION BY symbol
  4    ORDER BY tstamp
  5    MEASURES
  6  	 MATCH_NUMBER() AS match_num,
  7  	 CLASSIFIER() AS var_match,
  8  	 FINAL COUNT(UP.tstamp) AS up_days,
  9  	 FINAL COUNT(tstamp) AS total_days,
 10  	 RUNNING COUNT(tstamp) AS cnt_days,
 11  	 price - STRT.price AS price_dif
 12    ALL ROWS PER MATCH
 13    AFTER MATCH SKIP TO LAST UP
 14    PATTERN (STRT DOWN+ UP+)
 15    DEFINE
 16  	 DOWN AS DOWN.price  PREV(UP.price)
 18    ) MR
 19  ORDER BY MR.symbol, MR.match_num, MR.tstamp;

SYMBOL     TSTAMP     MATCH_NUM VAR_MATCH                         UP_DAYS TOTAL_DAYS   CNT_DAYS  PRICE_DIF      PRICE                                 
---------- --------- ---------- ------------------------------ ---------- ---------- ---------- ---------- ----------                                 
ACME       05-APR-11          1 STRT                                    4          6          1          0         25                                 
ACME       06-APR-11          1 DOWN                                    4          6          2        -13         12                                 
ACME       07-APR-11          1 UP                                      4          6          3        -10         15                                 
ACME       08-APR-11          1 UP                                      4          6          4         -5         20                                 
ACME       09-APR-11          1 UP                                      4          6          5         -1         24                                 
ACME       10-APR-11          1 UP                                      4          6          6          0         25                                 
ACME       10-APR-11          2 STRT                                    1          4          1          0         25                                 
ACME       11-APR-11          2 DOWN                                    1          4          2         -6         19                                 
ACME       12-APR-11          2 DOWN                                    1          4          3        -10         15                                 
ACME       13-APR-11          2 UP                                      1          4          4          0         25                                 
ACME       14-APR-11          3 STRT                                    2          5          1          0         25                                 

SYMBOL     TSTAMP     MATCH_NUM VAR_MATCH                         UP_DAYS TOTAL_DAYS   CNT_DAYS  PRICE_DIF      PRICE                                 
---------- --------- ---------- ------------------------------ ---------- ---------- ---------- ---------- ----------                                 
ACME       15-APR-11          3 DOWN                                    2          5          2        -11         14                                 
ACME       16-APR-11          3 DOWN                                    2          5          3        -13         12                                 
ACME       17-APR-11          3 UP                                      2          5          4        -11         14                                 
ACME       18-APR-11          3 UP                                      2          5          5         -1         24                                 

15 rows selected.

Remember that the original pattern to match was a V-shape, and there were three. Changing the query to look for W shapes, where the end of the previous V shape is the start of the next V shape, we find only one pattern in the data:


SQL> 
SQL> --
SQL> -- This query looks for W shapes, of which
SQL> -- there is one in the data
SQL> --
SQL> -- The query was changed to look for two
SQL> -- consecutive down/up patterns where the
SQL> -- ending value of the first down/up is the
SQL> -- starting value of the next down/up
SQL> -- pattern
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3    PARTITION BY symbol
  4    ORDER BY tstamp
  5    MEASURES
  6  	 MATCH_NUMBER() AS match_num,
  7  	 CLASSIFIER()  AS  var_match,
  8  	 STRT.tstamp AS start_tstamp,
  9  	 FINAL LAST(UP.tstamp) AS end_tstamp
 10    ALL ROWS PER MATCH
 11    AFTER MATCH SKIP TO LAST UP
 12    PATTERN (STRT DOWN+ UP+ DOWN+ UP+)
 13    DEFINE
 14  	 DOWN AS DOWN.price  PREV(UP.price)
 16    ) MR
 17  ORDER BY MR.symbol, MR.match_num, MR.tstamp;

SYMBOL     TSTAMP     MATCH_NUM VAR_MATCH                      START_TST END_TSTAM      PRICE                                                         
---------- --------- ---------- ------------------------------ --------- --------- ----------                                                         
ACME       05-APR-11          1 STRT                           05-APR-11 13-APR-11         25                                                         
ACME       06-APR-11          1 DOWN                           05-APR-11 13-APR-11         12                                                         
ACME       07-APR-11          1 UP                             05-APR-11 13-APR-11         15                                                         
ACME       08-APR-11          1 UP                             05-APR-11 13-APR-11         20                                                         
ACME       09-APR-11          1 UP                             05-APR-11 13-APR-11         24                                                         
ACME       10-APR-11          1 UP                             05-APR-11 13-APR-11         25                                                         
ACME       11-APR-11          1 DOWN                           05-APR-11 13-APR-11         19                                                         
ACME       12-APR-11          1 DOWN                           05-APR-11 13-APR-11         15                                                         
ACME       13-APR-11          1 UP                             05-APR-11 13-APR-11         25                                                         

9 rows selected.

SQL> 

Why weren’t there two W-shaped patterns? The last V-shape didn’t start at the same point as the end of the prior V-shape so it didn’t make a second W-shaped pattern in the data.

MATCH_RECOGNIZE is a powerful and useful additon to Oracle 12.1.0.x as it can enable you to define and report patterns in your data. It’s ideal for data warehousing applications and as an analytical tool to report repeating patterns in user data. Everyone may not get benefit from MATCH_RECOGNIZE but, for those who do, it’s a welcome change to an already exceptional RDBMS.

Sometimes the impossible is possible.

October 11, 2016

“I Appreciate You”

Filed under: General — dfitzjarrell @ 13:55

Today, October 11, 2016, is OTN Appreciation Day. In honor of that I wanted to post my favorite offering from Oracle — Exadata. Yes, I’m biased but even at that the Exadata platform is an outstanding offering from Oracle. Let’s look at some of the ways Exadata provides performance and value.

Exadata is an engineered system, meaning that all of the components are selected and configured to work together to provide performance. Part of that performance comes from features like Predicate Filtering, Column Projection and Storage Indexes. Add to that the Smart Flash Cache (which has been steadily improved since its inception), faster disks and Infiniband and Exadata truly is a winner. And the benefits are not simply numbers on a piece of paper. I have personally witnessed overnight batch payroll processing for one customer reduced from 12 hours to 4 hours and the only change was migrating the application to Exadata. No other tweaks or adjustments were made to provide that gain. In my book there are examples, run in real time on Exadata machines, proving the performance gain that can be realized. Exadata truly is an awesome product.

I won’t go into great detail since I have already extolled the virtues of Exadata on this blog here, here, to some extent here, here, and here. If you’re interested give one (or more) of those links a visit.

So, thank you, OTN and thank you, Oracle. And the pleasure has been all mine.

September 26, 2016

“My Kingdom For A Histogram”

Filed under: General — dfitzjarrell @ 09:55

"Don't you know anything at all about numbers?"
"Well, I don't think they're very important," snapped Milo, too embarrassed to admit the truth.
"NOT IMPORTANT!" roared the Dodecahedron, turning red with fury.  "Could you have tea for two without the two, or three blind mice
without the three? Would there be four corners of the earth if there weren't a four? And how would you sail the seven seas without
a seven?"
"All I meant was..." began Milo, but the Dodecahedron, overcome with emotion and shouting furiously, carried right on.
"If you had high hopes, how would you know how high they were? And did you know that narrow escapes come in all different widths? 
Would you travel the whole wide world without ever knowing how wide it was? And how could you do anything at long last," he concluded,
waving his arms over his head, "without knowing how long the last was? Why, numbers are the most beautiful and valuable things in the
world.  Just follow me and I'll show you." He turned on his heel and stalked off into the cave.
-- Norton Juster, The Phantom Tollbooth

In one of the current Oracle forums an “old saw” was resurrected, namely that a single-column primary key has no need of a histogram. This presumes that the primary key has no ‘holes’ in the data and has a continuous range of data. The reality of that situation is such primary keys are rarely, if ever, unbroken and the range of data may not be continuous. With that last sentence in mind it’s not strange to consider a histogram on a single-column primary key. And depending on the release of Oracle in use that histogram may be an old standard (a height-balanced histogram) or a new histogram offered in 12c, the hybrid histogram. Let’s look at an example how this plays out in 11.2.0.4 and in 12.1.0.2. First, though, let’s look at that new Hybrid histogram.

The Hybrid histogram is a new type of histogram that uses a number of buckets less than the number of distinct values in the table data; it also determines, and utilizes, the repeat frequency of the endpoint values. The bucket size can be adjusted from the initial size based on endpoint values; there are two special buckets which will not change in size, and these are the ‘boundary’ buckets, bucket 0 and the last bucket of the histogram. The Hybrid histogram is based on four apparent ‘rules’, which are:


	1) a value should not be found in more than one bucket
	2) the bucket size is allowed to be extended in order to contain all instances of the same distinct value
	3) adjusted bucket size cannot be less than the original size (not applicable at either end of the data set)
	4) the original number of buckets should not be reduced

One way to create a hybrid histogram is shown in the example below, part of which was provided by Jonathan Lewis, the remainder of this example was adapted from work done by Mohammed Houri. We begin:


SQL>
SQL> --
SQL> -- Create and populate a table so we can
SQL> -- build a hybrid histogram
SQL> --
SQL> -- Thanks to Jonathan Lewis for generating
SQL> -- this table and data set
SQL> --
SQL> create table t1 (id number, n1 number);

Table created.

Elapsed: 00:00:00.01
SQL>
SQL> @InsT1.sql

Elapsed: 00:00:00.00
SQL>

Choosing a number of buckets less than the number of distinct values in the table is one way to create a hybrid histogram:


SQL>
SQL> --
SQL> -- Return the number of distinct values
SQL> -- in column n1
SQL> --
SQL>
SQL> select count(distinct n1) from t1;

COUNT(DISTINCTN1)
-----------------
               37

Elapsed: 00:00:00.00
SQL>

20 buckets will be used for our histogram and use DBMS_STATS to create it:


SQL>
SQL> --
SQL> -- Create a hybrid histogram on column n1
SQL> -- by setting the number of buckets less
SQL> -- than the number of distinct values
SQL> --
SQL> BEGIN
  2  	dbms_stats.gather_table_stats
  3  	  (user, 't1', method_opt => 'for columns n1 size 20');
  4  END;
  5  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.43
SQL>

During the creation of this histogram Oracle places the column values in order to find values that occur in more than one bucket. When such values are found new bucket sizes are computed for all but the boundary buckets, in this case bucket 0 and bucket 19:


SQL>
SQL> --
SQL> -- Four rules exist to create a hybrid histogram, and the
SQL> -- first two are:
SQL> --
SQL> -- a value should not be found in more than one bucket
SQL> -- a bucket size is allowed to be extended in order to contain
SQL> --   all instances of the same value
SQL> --
SQL> -- Compute the 'new' bucket size based on the data and
SQL> -- endpoint values
SQL> --
SQL> -- To illustrate how the bucket size adjustment is
SQL> -- calculated let's look at the value 13 in the data
SQL> --
SQL> -- Oracle 'walks' the ordered list of values until
SQL> -- it finds 13 in two buckets
SQL> --
SQL> -- Oracle extends the bucket where 13 is first found (bucket n)
SQL> -- and moves all other instances of 13 (from bucket n+1)
SQL> -- to this bucket (n)
SQL> --
SQL> -- Moving all instances of 13 to a single bucket
SQL> -- also makes 13 an endpoint value allowing Oracle
SQL> -- to compute the repeat count of that value
SQL> --
SQL> -- This continues on through the values so that
SQL> -- Oracle can place all occurrences of a distinct
SQL> -- value so they don't populate more than one bucket
SQL> --
SQL> SELECT
  2  	     (row_number() over(order by ept_nbr)-1) NumBucket
  3  	     ,ept_nbr
  4  	     ,ept_act_val
  5  	     ,rpt_cnt
  6  	     ,ept_nbr - (lag(ept_nbr,1,0) over(order by ept_nbr)) "new bucket size"
  7  	     ,bucket_size "original bucket_size"
  8  	 FROM
  9  	     (SELECT
 10  		  ah.endpoint_number		ept_nbr
 11  		 ,ah.endpoint_actual_value	ept_act_val
 12  		 ,lag(ah.endpoint_number,1,0) over(order by ah.endpoint_number) ept_lag
 13  		 ,ah.endpoint_repeat_count rpt_cnt
 14  		 ,at.sample_size/at.num_buckets bucket_size
 15  	      FROM
 16  		 user_tab_histograms	  ah
 17  		,user_tab_col_statistics  at
 18  	      WHERE ah.table_name  = at.table_name
 19  	      AND ah.column_name = at.column_name
 20  	      AND ah.table_name  = 'T1'
 21  	      AND ah.column_name = 'N1'
 22  	    ) ORDER BY ept_nbr;

 NUMBUCKET    EPT_NBR EPT_ACT_VAL     RPT_CNT new bucket size original bucket_size
---------- ---------- ------------ ---------- --------------- --------------------
         0          1 8                     1               1                    5
         1          6 13                    3               5                    5
         2         12 18                    2               6                    5
         3         20 20                    5               8                    5
         4         26 23                    2               6                    5
         5         32 26                    3               6                    5
         6         38 27                    6               6                    5
         7         44 28                    6               6                    5
         8         50 29                    6               6                    5
         9         58 31                    5               8                    5
        10         69 33                    8              11                    5

 NUMBUCKET    EPT_NBR EPT_ACT_VAL     RPT_CNT new bucket size original bucket_size
---------- ---------- ------------ ---------- --------------- --------------------
        11         79 35                    7              10                    5
        12         86 38                    5               7                    5
        13         90 41                    1               4                    5
        14         92 42                    2               2                    5
        15         95 43                    3               3                    5
        16         96 44                    1               1                    5
        17         97 45                    1               1                    5
        18         98 46                    1               1                    5
        19        100 59                    1               2                    5

20 rows selected.

Elapsed: 00:00:00.21
SQL>

Notice that for buckets 1-12 the bucket size increased; for the remaining non-boundary buckets the bucket size should decrease. But according to Rule 3 no bucket can be adjusted to a size smaller than the original bucket size so those buckets remain unadjusted. The bucket size computation in the query results below is based on the sample size divided by the number of buckets and, thus, displays a constant value for all buckets:


SQL>
SQL> --
SQL> -- Given that boundary values are the exception computing the new
SQL> -- bucket sizes shows that for bucket numbers 1-12 the sizes are
SQL> -- the same or larger than the originally computed size (5)
SQL> --
SQL> -- All remaining non-endpoint buckets compute to a smaller size
SQL> -- than the original
SQL> --
SQL> -- Now the third rule appears:
SQL> --
SQL> -- No non-endpoint bucket can be smaller than the originally
SQL> -- computed size
SQL> --
SQL> select
  2  	      uth.endpoint_number
  3  	     ,uth.endpoint_actual_value
  4  	     ,uth.endpoint_repeat_count
  5  	     ,ucs.sample_size/ucs.num_buckets bucket_size
  6  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
  7  	 from
  8  	     user_tab_histograms uth
  9  	    ,user_tab_col_statistics ucs
 10  	where
 11  	     uth.table_name   = ucs.table_name
 12  	 and uth.column_name   = ucs.column_name
 13  	 and uth.table_name    = 'T1'
 14  	 and uth.column_name   = 'N1'
 15  	order by uth.endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULARITY
--------------- ------------ --------------------- ----------- ----------
              1 8                                1           5         -4
              6 13                               3           5         -2
             12 18                               2           5         -3
             20 20                               5           5          0
             26 23                               2           5         -3
             32 26                               3           5         -2
             38 27                               6           5          1
             44 28                               6           5          1
             50 29                               6           5          1
             58 31                               5           5          0
             69 33                               8           5          3

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULARITY
--------------- ------------ --------------------- ----------- ----------
             79 35                               7           5          2
             86 38                               5           5          0
             90 41                               1           5         -4
             92 42                               2           5         -3
             95 43                               3           5         -2
             96 44                               1           5         -4
             97 45                               1           5         -4
             98 46                               1           5         -4
            100 59                               1           5         -4

20 rows selected.

Elapsed: 00:00:00.19
SQL>

For a Hybrid histogram three types of column value exist to base cardinality estimates upon: popular values, non-popular values which are endpoint and non-popular values that are not an endpoint. Let’s look at how Oracle estimates the cardinality for each case, starting with popular values:


SQL>
SQL> --
SQL> -- Display the endpoint, value, repeat count,
SQL> -- bucket size and the 'popularity'
SQL> --
SQL> -- Popularity is determined by computing the
SQL> -- difference between the frequency of appearance
SQL> -- (repeat count) and the bucket size (sample size/number of buckets)
SQL> -- Positive values are considered 'popular'
SQL> --
SQL> select
  2  	      endpoint_number
  3  	     ,endpoint_actual_value
  4  	     ,endpoint_repeat_count
  5  	     ,bucket_size
  6  	     ,case when Popularity > 0 then 'Pop'
  7  			else 'Non-Pop'
  8  	       end Popularity
  9  	 from
 10  	(
 11  	  select
 12  	      uth.endpoint_number
 13  	     ,uth.endpoint_actual_value
 14  	     ,uth.endpoint_repeat_count
 15  	     ,ucs.sample_size/ucs.num_buckets bucket_size
 16  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
 17  	 from
 18  	     user_tab_histograms uth
 19  	    ,user_tab_col_statistics ucs
 20  	where
 21  	     uth.table_name   = ucs.table_name
 22  	 and uth.column_name   = ucs.column_name
 23  	 and uth.table_name    = 'T1'
 24  	 and uth.column_name   = 'N1'
 25  	 )
 26  	order by endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
--------------- ------------ --------------------- ----------- -------
              1 8                                1           5 Non-Pop
              6 13                               3           5 Non-Pop
             12 18                               2           5 Non-Pop
             20 20                               5           5 Non-Pop
             26 23                               2           5 Non-Pop
             32 26                               3           5 Non-Pop
             38 27                               6           5 Pop
             44 28                               6           5 Pop
             50 29                               6           5 Pop
             58 31                               5           5 Non-Pop
             69 33                               8           5 Pop

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
--------------- ------------ --------------------- ----------- -------
             79 35                               7           5 Pop
             86 38                               5           5 Non-Pop
             90 41                               1           5 Non-Pop
             92 42                               2           5 Non-Pop
             95 43                               3           5 Non-Pop
             96 44                               1           5 Non-Pop
             97 45                               1           5 Non-Pop
             98 46                               1           5 Non-Pop
            100 59                               1           5 Non-Pop

20 rows selected.

Elapsed: 00:00:00.19
SQL>

Let’s return only the popular values and work through the cardinality estimate Oracle generates:


SQL> --
SQL> -- Using the popularity Oracle estimates the cardinality
SQL> -- by considering the following three types of values:
SQL> --
SQL> -- popular value
SQL> -- non-popular value with an endpoint number
SQL> -- non-popular value not present in the histogram table
SQL> --
SQL> -- Case 1: Popular values
SQL> --
SQL> -- Display the 'popular' values in this data set
SQL> --
SQL> select
  2  	      endpoint_actual_value
  3  	     ,endpoint_repeat_count
  4  	     ,bucket_size
  5  	     ,Popularity
  6  from
  7  (
  8    select
  9  	      endpoint_number
 10  	     ,endpoint_actual_value
 11  	     ,endpoint_repeat_count
 12  	     ,bucket_size
 13  	     ,case when Popularity > 0 then 'Pop'
 14  			else 'Non-Pop'
 15  	       end Popularity
 16  	 from
 17  	(
 18  	  select
 19  	      uth.endpoint_number
 20  	     ,uth.endpoint_actual_value
 21  	     ,uth.endpoint_repeat_count
 22  	     ,ucs.sample_size/ucs.num_buckets bucket_size
 23  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
 24  	 from
 25  	     user_tab_histograms uth
 26  	    ,user_tab_col_statistics ucs
 27  	where
 28  	     uth.table_name   = ucs.table_name
 29  	 and uth.column_name   = ucs.column_name
 30  	 and uth.table_name    = 'T1'
 31  	 and uth.column_name   = 'N1'
 32  	 )
 33    )
 34  where Popularity = 'Pop';

ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
27                               6           5 Pop
28                               6           5 Pop
29                               6           5 Pop
33                               8           5 Pop
35                               7           5 Pop

Elapsed: 00:00:00.20
SQL>

Looking at one of these popular values (33) let’s return the cardinality estimate Oracle calculated:


SQL> --
SQL> -- Using explain plan the cardinality estimation
SQL> -- can be displayed for one of the 'popular' values
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 33;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> --
SQL> -- The cardinality estimate displayed below was
SQL> -- calculated using the following formula:
SQL> --
SQL> -- E-Rows = ENDPOINT_REPEAT_COUNT * num_rows/sample_size
SQL> -- E-Rows = 8 * 100/100 = 8
SQL> --
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     8 |    24 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   2 - filter("N1"=33)

14 rows selected.

Elapsed: 00:00:00.03
SQL>

From the formula found in the above comments our calculated cardinality matches with that provided by the optimizer, so our understanding of the Hybrid histogram seems sound. To continue that verification we consider the case where we have a non-popular value that is an endpoint. The formula for that cardinality calculation is: E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/sample_size). We calculate the NewDensity with a query based on the work of Alberto Dell’Era. First we return the non-popular values that are endpoints:


SQL> --
SQL> -- Case 2: Non-popular values with an endpoint
SQL> --
SQL> -- First return all non-popular values that are an
SQL> -- endpoint
SQL> --
SQL> select
  2  	      endpoint_actual_value
  3  	     ,endpoint_repeat_count
  4  	     ,bucket_size
  5  	     ,Popularity
  6  from
  7  (
  8    select
  9  	      endpoint_number
 10  	     ,endpoint_actual_value
 11  	     ,endpoint_repeat_count
 12  	     ,bucket_size
 13  	     ,case when Popularity > 0 then 'Pop'
 14  			else 'Non-Pop'
 15  	       end Popularity
 16  	 from
 17  	(
 18  	  select
 19  	      uth.endpoint_number
 20  	     ,uth.endpoint_actual_value
 21  	     ,uth.endpoint_repeat_count
 22  	     ,ucs.sample_size/ucs.num_buckets bucket_size
 23  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
 24  	 from
 25  	     user_tab_histograms uth
 26  	    ,user_tab_col_statistics ucs
 27  	where
 28  	     uth.table_name   = ucs.table_name
 29  	 and uth.column_name   = ucs.column_name
 30  	 and uth.table_name    = 'T1'
 31  	 and uth.column_name   = 'N1'
 32  	 )
 33    )
 34  where Popularity = 'Non-Pop';

ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
8                                1           5 Non-Pop
13                               3           5 Non-Pop
18                               2           5 Non-Pop
20                               5           5 Non-Pop
23                               2           5 Non-Pop
26                               3           5 Non-Pop
31                               5           5 Non-Pop
38                               5           5 Non-Pop
41                               1           5 Non-Pop
42                               2           5 Non-Pop
43                               3           5 Non-Pop

ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
44                               1           5 Non-Pop
45                               1           5 Non-Pop
46                               1           5 Non-Pop
59                               1           5 Non-Pop

15 rows selected.

Elapsed: 00:00:00.20
SQL>

Let’s return Oracle’s estimated cardinality for one of these values (45):


SQL> --
SQL> -- Use explain plan to return the estimated cardinality
SQL> --
SQL> -- In this case the estimated cardinality is computed as:
SQL> --
SQL> -- E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/sample_size)
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 45;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |     6 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------

   2 - filter("N1"=45)

14 rows selected.

Elapsed: 00:00:00.02
SQL>

Now let’s use our query to return the value for NewDensity:


SQL> --
SQL> -- The NewDensity is computed by an internal algorithm but
SQL> -- we can get a reliable density using the following query
SQL> -- based on work done by Alberto Dell'Era:
SQL> --
SQL> SELECT
  2  	      BktCnt
  3  	     ,PopBktCnt
  4  	     ,PopValCnt
  5  	     ,NDV
  6  	     ,pop_bucketSize
  7  	     ,trunc(((BktCnt-PopBktCnt)/BktCnt)/(NDV-PopValCnt),10) NewDensity
  8  	  FROM
  9  	     (SELECT
 10  		COUNT(1) PopValCnt,
 11  		SUM(endpoint_repeat_count) PopBktCnt,
 12  		ndv,
 13  		BktCnt,
 14  		pop_bucketSize
 15  	      FROM
 16  	       (SELECT
 17  		 (sample_size - num_nulls) BktCnt,
 18  		 num_distinct ndv,
 19  		 num_buckets,
 20  		 density OldDensity,
 21  		 (sample_size-num_nulls)/num_buckets pop_bucketSize
 22  	       FROM user_tab_col_statistics
 23  	       WHERE
 24  		   table_name  = 'T1'
 25  	       AND column_name = 'N1'
 26  	       ),
 27  	       user_histograms
 28  	     WHERE table_name	      = 'T1'
 29  	     AND column_name	      = 'N1'
 30  	     AND endpoint_repeat_count> pop_bucketSize
 31  	     GROUP BY ndv,
 32  	       BktCnt,
 33  	       pop_bucketSize
 34  	     );

    BKTCNT  POPBKTCNT  POPVALCNT        NDV POP_BUCKETSIZE NEWDENSITY
---------- ---------- ---------- ---------- -------------- ----------
       100         33          5         37              5   .0209375

Elapsed: 00:00:00.19
SQL>

We now see if our calculation returned a valid value by ‘plugging’ it into the following equation: E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/ sample_size). The cardinality estimate we compute is:


SQL> --
SQL> -- 'Plugging' this new density into the equation above produces:
SQL> --
SQL> -- E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/ sample_size)
SQL> -- E-Rows = 100 * greatest (.0209375, 1/100) = 2.09375 ~ 2
SQL> --
SQL> -- which matches the value returned by Oracle
SQL> --
SQL>
SQL> --
SQL> -- To validate these findings use a different non-popular value
SQL> -- that's also an endpoint:
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 43;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     3 |     9 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   2 - filter("N1"=43)

14 rows selected.

Elapsed: 00:00:00.02
SQL>

Oracle returned 3 as the estimated cardinality and so did our calculation (E-Rows = 100 * greatest (.0209375, 3/100) = 3), more proof that our concept of the Hybrid histogram is correct. One final test will prove how well we understand this histogram, considering non-popular values that are not endpoints. Using our calculated value of NewDensity in this equation: E-Rows = num_rows * NewDensity gives us the following result: E-Rows = 100 * .0209375 = 2.09375 ~ 2. Time to see if our calculated value matches that which Oracle has estimated:


SQL>
SQL> --
SQL> -- Case 3: Non-popular value without an endpoint number
SQL> --
SQL> -- This calculation is fairly simple and straightforward:
SQL> --
SQL> -- E-Rows = num_rows * NewDensity = 100 * .0209375 = 2.09375 ~ 2
SQL> --
SQL>
SQL> explain plan for select count(1) from t1 where n1 = 17;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |     6 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   2 - filter("N1"=17)

14 rows selected.

Elapsed: 00:00:00.02
SQL>

Our calculated value, as evidenced by the Rows value in the execution plan, matches Oracle’s estimate, giving further evidence that our concept of the Hybrid histogram is sound.

As mentioned earlier in this post Hybrid histograms can exhibit instability in the form of endpoint changes based on the changing frequency of values as inserts are executed against the table. To illustrate this let’s add another 16 to the data and see what happens:


SQL>
SQL> --
SQL> -- Hybrid histograms can display instability in terms of
SQL> -- endpoints and bucket size
SQL> --
SQL> -- To illustrate this we add 16 to the data, increasing
SQL> -- the repeat count to 3, causing this newly added 16
SQL> -- to shift into the previous bucket leaving 17
SQL> -- as the new endpoint
SQL> --
SQL> insert into t1 values (2, 16);

1 row created.

Elapsed: 00:00:00.00
SQL>
SQL> BEGIN
  2  	dbms_stats.gather_table_stats
  3  	  (user, 't1', method_opt => 'for all columns size 20');
  4  END;
  5  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.08
SQL>
SQL> select
  2  	      uth.endpoint_number
  3  	     ,uth.endpoint_actual_value
  4  	     ,uth.endpoint_repeat_count
  5  	 from
  6  	    user_tab_histograms uth
  7  	    ,user_tables ut
  8  	    ,user_tab_col_statistics ucs
  9  	where
 10  	     uth.table_name    = 'T1'
 11  	    and uth.column_name   = 'N1'
 12  	    and uth.table_name	  = ut.table_name
 13  	    and ut.table_name	  = ucs.table_name
 14  	    and uth.column_name   = ucs.column_name
 15  	   order by uth.endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
              1 8                                1
              6 13                               3
             11 17                               1 <----
             16 19                               3
             21 20                               5
             27 23                               2
             33 26                               3
             39 27                               6
             45 28                               6
             51 29                               6
             59 31                               5

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
             70 33                               8
             80 35                               7
             87 38                               5
             91 41                               1
             96 43                               3
             97 44                               1
             98 45                               1
             99 46                               1
            101 59                               1

20 rows selected.

Elapsed: 00:00:00.24

The endpoint value of 18 has been replaced by the value 17, because of Rule 2, that no value can span buckets. Before the insert the histogram bucket looked like this: 8 12 12 13 13 13|15 16 16 17 18 18|19 19 19 20 20 20 20 20| … After the insert the ordered value list looked like this: 8 12 12 13 13|13 15 16 16 16|17 18 18 19 19 19 … After Oracle ‘walks’ through the list to consolidate like values the buckets look like this: 8 12 12 13 13 13| 15 16 16 16 17 |18 18 19 19 19… giving a new endpoint number of 17. If we add yet another 16 to the data set we see:


SQL>
SQL> --
SQL> -- Add one more 16 and the endpoint shifts to 16
SQL> --
SQL> insert into t1 values (3, 16);

1 row created.

Elapsed: 00:00:00.00
SQL>
SQL> BEGIN
  2  	   dbms_stats.gather_table_stats
  3  	       (user, 't1', method_opt => 'for all columns size 20');
  4  	 END;
  5  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> select
  2  	      uth.endpoint_number
  3  	     ,uth.endpoint_actual_value
  4  	     ,uth.endpoint_repeat_count
  5  	 from
  6  	    user_tab_histograms uth
  7  	    ,user_tables ut
  8  	    ,user_tab_col_statistics ucs
  9  	where
 10  	     uth.table_name    = 'T1'
 11  	    and uth.column_name   = 'N1'
 12  	    and uth.table_name	  = ut.table_name
 13  	    and ut.table_name	  = ucs.table_name
 14  	    and uth.column_name   = ucs.column_name
 15  	   order by uth.endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
              1 8                                1
              6 13                               3
             11 16                               4 

The endpoint has shifted again and is now 16.

Now that we know what a Hybrid histogram is and what it can do let’s look at another way to generate such a histogram in 12.1.0.2, and this goes back (finally!) to that “old saw” mentioned earlier. Let’s create a table with a single-column primary key and see what Oracle does with it when we run DBMS_STATS.

This example, again by Jonathan Lewis, creates a table and populates the id column (the eventual primary key for the table) in a rather interesting way:


SQL> 
SQL> create table t1
  2  as
  3  with generator as (
  4  	     select  --+ materialize
  5  		     rownum id
  6  	     from dual
  7  	     connect by
  8  		     level  column maxid1 new_value max1
SQL> column maxid2 new_value max2
SQL> column maxid3 new_value max3
SQL> 
SQL> select min(id), max(id) maxid1
  2  from t1;

   MIN(ID)     MAXID1                                                           
---------- ----------                                                           
         1      10000                                                           

SQL> 
SQL> insert into t1 select id + 1e6 from t1;

10000 rows created.

SQL> 
SQL> select min(id), max(id) maxid2
  2  from t1;

   MIN(ID)     MAXID2                                                           
---------- ----------                                                           
         1    1010000                                                           

SQL> 
SQL> insert into t1 select id + 1e7 from t1;

20000 rows created.

SQL> 
SQL> select min(id), max(id) maxid3
  2  from t1;

   MIN(ID)     MAXID3                                                           
---------- ----------                                                           
         1   11010000                                                           

SQL> 

Notice the values in the column; the values go from 1 to 10000 then jump to 1000001, continuing to 1010000, and jump again to 11000001 continuing on to 11010000. The breaks in the value sequence are what make this data set qualify for a Hybrid histogram in 12.1.0.2; in 11.2.0.4 a height-balanced histogram is created. Let’s add the primary key to the table:

`


SQL> 
SQL> alter table t1 add constraint t1_pk primary key(id);

Table altered.

Now let’s query for data that isn’t in the table:


SQL> 
SQL> select
  2  	     /*+ dynamic_sampling(0) */
  3  	     *
  4  from    t1
  5  where
  6  	     id between 12000 and 13000
  7  ;

no rows selected

SQL> 

Doing nothing special we generate statistics on the table; Oracle will automatically determine that the PK column needs a histogram and create it:


SQL> begin
  2  	     dbms_stats.gather_table_stats(
  3  		     ownname	      => user,
  4  		     tabname	      =>'T1'
  5  	     );
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> 
SQL> column column_name format a35
SQL> set linesize 150
SQL> 
SQL> select
  2  	     column_name, sample_size,
  3  	     num_distinct, num_nulls, density,
  4  	     histogram, num_buckets
  5  from
  6  	     user_tab_cols
  7  where
  8  	     table_name = 'T1'
  9  order by
 10  	     column_name
 11  ;

COLUMN_NAME                         SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM       NUM_BUCKETS                                        
----------------------------------- ----------- ------------ ---------- ---------- --------------- -----------                                        
ID                                         5502        40000          0    .000025 HYBRID                  254                                        

SQL> 

To prove that 11.2.0.4 will create a height-balanced histogram let’s run the example there:


SQL> 
SQL> create table t1
  2  as
  3  with generator as (
  4  	     select  --+ materialize
  5  		     rownum id
  6  	     from dual
  7  	     connect by
  8  		     level 'T1'
  5  	     );
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> 
SQL> column column_name format a35
SQL> set linesize 150
SQL> 
SQL> select
  2  	     column_name, sample_size,
  3  	     num_distinct, num_nulls, density,
  4  	     histogram, num_buckets
  5  from
  6  	     user_tab_cols
  7  where
  8  	     table_name = 'T1'
  9  order by
 10  	     column_name
 11  ;

COLUMN_NAME                         SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM       NUM_BUCKETS                                        
----------------------------------- ----------- ------------ ---------- ---------- --------------- -----------                                        
ID                                         5509        40000          0 .000024959 HEIGHT BALANCED         254                                        

SQL> 

So it would appear that having a histogram on a single-column primary key isn’t so far-fetched, after all. And we didn’t have to do anything special to get it created.

It all comes down to numbers.

« Previous PageNext Page »

Blog at WordPress.com.