Oracle Tips and Tricks — David Fitzjarrell

November 21, 2016

Taking Things For Granted

Filed under: General — dfitzjarrell @ 09:54

"Sometimes I find the best way of getting from one place to another is simply to erase everything and begin again."
-- Norton Juster, The Phantom Tollbooth

In one of the Oracle forums a question was asked regarding revoking selected privileges from the DBA role. Unfortunately for the person posting the question the answer is a resounding “No”; granting a role grants all privileges assigned to that role and there is no “picking and choosing” as if you were in a cafeteria. Roles are designed (or should be, at least) to grant all necessary privileges a user would need to access objects that role will use. And roles supplied by Oracle are designed for the jobs they are named after, such as DBA. Changing the role affects functionality and can seriously impact those granted that role. Let’s look at that in a bit more detail.

Oracle provides pre-configured roles with every installation of the database, and the list can vary based on the options you choose to install. A partial list of these roles from 12.1.0.2 is shown below (remember this is not a complete and exhaustive list):


ROLE                           O
------------------------------ -
CONNECT                        Y
RESOURCE                       Y
DBA                            Y
AUDIT_ADMIN                    Y
AUDIT_VIEWER                   Y
SELECT_CATALOG_ROLE            Y
EXECUTE_CATALOG_ROLE           Y
DELETE_CATALOG_ROLE            Y
CAPTURE_ADMIN                  Y
EXP_FULL_DATABASE              Y
IMP_FULL_DATABASE              Y
CDB_DBA                        Y
PDB_DBA                        Y
RECOVERY_CATALOG_OWNER         Y
LOGSTDBY_ADMINISTRATOR         Y
DBFS_ROLE                      Y
GSMUSER_ROLE                   Y
AQ_ADMINISTRATOR_ROLE          Y
AQ_USER_ROLE                   Y
DATAPUMP_EXP_FULL_DATABASE     Y
DATAPUMP_IMP_FULL_DATABASE     Y
ADM_PARALLEL_EXECUTE_TASK      Y
PROVISIONER                    Y
XS_RESOURCE                    Y
XS_SESSION_ADMIN               Y
XS_NAMESPACE_ADMIN             Y
XS_CACHE_ADMIN                 Y
GATHER_SYSTEM_STATISTICS       Y
OPTIMIZER_PROCESSING_RATE      Y
GSMADMIN_ROLE                  Y
RECOVERY_CATALOG_USER          Y
EM_EXPRESS_BASIC               Y
EM_EXPRESS_ALL                 Y
SCHEDULER_ADMIN                Y
HS_ADMIN_SELECT_ROLE           Y
HS_ADMIN_EXECUTE_ROLE          Y
HS_ADMIN_ROLE                  Y
GLOBAL_AQ_USER_ROLE            Y
OEM_ADVISOR                    Y
OEM_MONITOR                    Y
XDBADMIN                       Y
XDB_SET_INVOKER                Y
AUTHENTICATEDUSER              Y
XDB_WEBSERVICES                Y
XDB_WEBSERVICES_WITH_PUBLIC    Y
XDB_WEBSERVICES_OVER_HTTP      Y
GSM_POOLADMIN_ROLE             Y
GDS_CATALOG_SELECT             Y
WM_ADMIN_ROLE                  Y
JAVAUSERPRIV                   Y
JAVAIDPRIV                     Y
JAVASYSPRIV                    Y
JAVADEBUGPRIV                  Y
EJBCLIENT                      Y
JMXSERVER                      Y
JAVA_ADMIN                     Y
JAVA_DEPLOY                    Y
CTXAPP                         Y
ORDADMIN                       Y
OLAP_XS_ADMIN                  Y
OLAP_DBA                       Y
OLAP_USER                      Y
SPATIAL_WFS_ADMIN              Y
WFS_USR_ROLE                   Y
SPATIAL_CSW_ADMIN              Y
CSW_USR_ROLE                   Y
LBAC_DBA                       Y
APEX_ADMINISTRATOR_ROLE        Y
APEX_GRANTS_FOR_NEW_USERS_ROLE Y
DV_SECANALYST                  Y
DV_MONITOR                     Y
DV_ADMIN                       Y
DV_OWNER                       Y
DV_ACCTMGR                     Y
DV_PUBLIC                      Y
DV_PATCH_ADMIN                 Y
DV_STREAMS_ADMIN               Y
DV_GOLDENGATE_ADMIN            Y
DV_XSTREAM_ADMIN               Y
DV_GOLDENGATE_REDO_ACCESS      Y
DV_AUDIT_CLEANUP               Y
DV_DATAPUMP_NETWORK_LINK       Y
DV_REALM_RESOURCE              Y
DV_REALM_OWNER                 Y

The ‘O’ header is for the ORACLE_MAINTAINED column which indicates the role is supplied by Oracle. [This is a new column in the DBA_ROLES view for 12.1; earlier releases do not have this column in the view definition.] That list has 84 different roles all created when your database was created. What privileges do these roles have? That’s a question answered by the ROLE_SYS_PRIVS and ROLE_TAB_PRIVS views; let’s look at the DBA role and see what Oracle deems as necessary system privileges to be an effective DBA:


PRIVILEGE
----------------------------------------
CREATE SESSION
ALTER SESSION
DROP TABLESPACE
BECOME USER
DROP ROLLBACK SEGMENT
SELECT ANY TABLE
INSERT ANY TABLE
UPDATE ANY TABLE
DROP ANY INDEX
SELECT ANY SEQUENCE
CREATE ROLE
EXECUTE ANY PROCEDURE
ALTER PROFILE
CREATE ANY DIRECTORY
CREATE ANY LIBRARY
EXECUTE ANY LIBRARY
ALTER ANY INDEXTYPE
DROP ANY INDEXTYPE
DEQUEUE ANY QUEUE
EXECUTE ANY EVALUATION CONTEXT
EXPORT FULL DATABASE
CREATE RULE
ALTER ANY SQL PROFILE
ADMINISTER ANY SQL TUNING SET
CHANGE NOTIFICATION
DROP ANY EDITION
DROP ANY MINING MODEL
ALTER ANY MINING MODEL
ALTER ANY CUBE DIMENSION
CREATE CUBE
DROP ANY CUBE BUILD PROCESS
USE ANY SQL TRANSLATION PROFILE
CREATE PLUGGABLE DATABASE
ALTER ROLLBACK SEGMENT
DELETE ANY TABLE
ALTER DATABASE
FORCE ANY TRANSACTION
ALTER ANY PROCEDURE
DROP ANY TRIGGER
DROP ANY MATERIALIZED VIEW
UNDER ANY TYPE
ALTER ANY LIBRARY
CREATE DIMENSION
DEBUG ANY PROCEDURE
CREATE RULE SET
ALTER ANY RULE SET
ANALYZE ANY DICTIONARY
ALTER ANY EDITION
CREATE ANY ASSEMBLY
ALTER ANY CUBE
SELECT ANY CUBE
DROP ANY MEASURE FOLDER
RESTRICTED SESSION
CREATE TABLESPACE
ALTER TABLESPACE
CREATE USER
ALTER USER
LOCK ANY TABLE
CREATE VIEW
DROP ANY VIEW
GRANT ANY ROLE
CREATE TRIGGER
CREATE TYPE
EXECUTE ANY OPERATOR
CREATE ANY DIMENSION
ALTER ANY DIMENSION
CREATE ANY OUTLINE
ADMINISTER DATABASE TRIGGER
RESUMABLE
FLASHBACK ANY TABLE
CREATE ANY RULE SET
EXECUTE ANY RULE SET
IMPORT FULL DATABASE
EXECUTE ANY RULE
EXECUTE ANY PROGRAM
CREATE ANY EDITION
CREATE ASSEMBLY
ALTER ANY ASSEMBLY
CREATE CUBE DIMENSION
CREATE ANY CUBE BUILD PROCESS
UPDATE ANY CUBE DIMENSION
EM EXPRESS CONNECT
SET CONTAINER
ALTER ANY MEASURE FOLDER
CREATE ANY TABLE
CREATE ANY INDEX
CREATE ANY SEQUENCE
ALTER ANY ROLE
ANALYZE ANY
DROP ANY LIBRARY
CREATE ANY OPERATOR
CREATE INDEXTYPE
UNDER ANY TABLE
DROP ANY DIMENSION
SELECT ANY DICTIONARY
GRANT ANY OBJECT PRIVILEGE
CREATE EVALUATION CONTEXT
CREATE ANY EVALUATION CONTEXT
DROP ANY EVALUATION CONTEXT
CREATE ANY RULE
CREATE JOB
CREATE ANY JOB
CREATE MINING MODEL
INSERT ANY CUBE DIMENSION
DROP ANY CUBE
UPDATE ANY CUBE BUILD PROCESS
EXEMPT DML REDACTION POLICY
READ ANY TABLE
ALTER SYSTEM
AUDIT SYSTEM
CREATE ROLLBACK SEGMENT
DROP ANY TABLE
COMMENT ANY TABLE
REDEFINE ANY TABLE
CREATE CLUSTER
ALTER ANY INDEX
DROP PUBLIC DATABASE LINK
CREATE PROFILE
ALTER ANY MATERIALIZED VIEW
ALTER ANY TYPE
DROP ANY TYPE
UNDER ANY VIEW
EXECUTE ANY INDEXTYPE
DROP ANY CONTEXT
ALTER ANY OUTLINE
ADMINISTER RESOURCE MANAGER
MANAGE SCHEDULER
MANAGE FILE GROUP
CREATE ANY MINING MODEL
SELECT ANY MINING MODEL
CREATE ANY MEASURE FOLDER
DELETE ANY MEASURE FOLDER
CREATE ANY SQL TRANSLATION PROFILE
CREATE ANY CREDENTIAL
EXEMPT DDL REDACTION POLICY
SELECT ANY MEASURE FOLDER
SELECT ANY CUBE BUILD PROCESS
ALTER ANY CUBE BUILD PROCESS
CREATE TABLE
BACKUP ANY TABLE
CREATE ANY CLUSTER
DROP ANY SYNONYM
DROP PUBLIC SYNONYM
CREATE ANY VIEW
CREATE SEQUENCE
ALTER ANY SEQUENCE
FORCE TRANSACTION
CREATE PROCEDURE
CREATE ANY PROCEDURE
ALTER RESOURCE COST
DROP ANY DIRECTORY
CREATE ANY TYPE
ALTER ANY OPERATOR
CREATE ANY INDEXTYPE
ENQUEUE ANY QUEUE
ON COMMIT REFRESH
DEBUG CONNECT SESSION
DROP ANY RULE SET
EXECUTE ANY CLASS
MANAGE ANY FILE GROUP
EXECUTE ANY ASSEMBLY
EXECUTE ASSEMBLY
COMMENT ANY MINING MODEL
CREATE ANY CUBE DIMENSION
DELETE ANY CUBE DIMENSION
SELECT ANY CUBE DIMENSION
DROP ANY SQL TRANSLATION PROFILE
CREATE CREDENTIAL
ALTER ANY TABLE
DROP ANY CLUSTER
CREATE SYNONYM
CREATE PUBLIC SYNONYM
DROP ANY SEQUENCE
DROP ANY ROLE
AUDIT ANY
DROP ANY PROCEDURE
CREATE ANY TRIGGER
ALTER ANY TRIGGER
DROP PROFILE
GRANT ANY PRIVILEGE
CREATE LIBRARY
CREATE OPERATOR
DROP ANY OUTLINE
MERGE ANY VIEW
ADMINISTER SQL TUNING SET
UPDATE ANY CUBE
INSERT ANY MEASURE FOLDER
ADMINISTER SQL MANAGEMENT OBJECT
CREATE SQL TRANSLATION PROFILE
LOGMINING
MANAGE TABLESPACE
DROP USER
ALTER ANY CLUSTER
CREATE ANY SYNONYM
CREATE DATABASE LINK
CREATE PUBLIC DATABASE LINK
CREATE MATERIALIZED VIEW
CREATE ANY MATERIALIZED VIEW
EXECUTE ANY TYPE
DROP ANY OPERATOR
QUERY REWRITE
GLOBAL QUERY REWRITE
MANAGE ANY QUEUE
CREATE ANY CONTEXT
ALTER ANY EVALUATION CONTEXT
ALTER ANY RULE
DROP ANY RULE
ADVISOR
SELECT ANY TRANSACTION
DROP ANY SQL PROFILE
CREATE ANY SQL PROFILE
READ ANY FILE GROUP
CREATE EXTERNAL JOB
DROP ANY ASSEMBLY
DROP ANY CUBE DIMENSION
CREATE ANY CUBE
CREATE MEASURE FOLDER
CREATE CUBE BUILD PROCESS
ALTER ANY SQL TRANSLATION PROFILE
FLASHBACK ARCHIVE ADMINISTER

220 system privileges are required to be an effective DBA for an Oracle database, an impressive list, indeed. Of course once a user is granted the DBA role he or she gets ALL of those system privileges and since the role is the only direct grant that gives those privileges that list cannot be modified by selectively revoking one or more of those privileges:


SQL> grant DBA to blorpo identified by gussyflorp;

Grant succeeded.

SQL> revoke select any transaction from blorpo;
revoke select any transaction from blorpo
*
ERROR at line 1:
ORA-01952: system privileges not granted to 'BLORPO'


SQL>

Yes, the user DOES have that privilege, albeit indirectly. It’s indirect because it’s the ROLE that was granted that privilege, among others, and no attempt was made to revoke the role from the user. It’s a ‘package deal’; you grant a role to a user and it’s all or nothing, and even though it behaves like the user has the privileges granted directly that’s not the case.

You could, of course, get all of the privileges the DBA role has (both system and table) and create a script to grant each individual privilege to the desired user. It would be a LONG script and such grants require attention from the DBA granting them to ensure they are both current and not being abused. In that case individual privileges can be revoked which would be a maintenance nightmare for the DBA having keep track of which user has which set of privileges. Another option presents itself, creating a new role with only the privileges the DBA wants to assign to a user. The privilege list for DBA could be shortened to create, say, a DB_OPERATOR or DBO role. Such privileges would depend upon the job description; creating such a role would make granting such access easier and make maintenance simpler since when the role grants are changed those who are granted that role have there privileges adjusted the next time they login.

Roles make granting privileges very easy and straightforward, provided the role is properly created and maintained. Roles also make it impossible to “pick and choose” privileges a user should have. It’s an “all or nothing” proposition and there’s no way around that when using a pre-defined role.

Sometimes you just need to begin again.

November 17, 2016

“That STILL Ain’t Right”

Filed under: General — dfitzjarrell @ 07:45

"If you want sense, you'll have to make it yourself."
- Norton Juster, The Phantom Tollbooth

In a recent post to an Oracle forum a query was presented and a tuning request was made. It appears that the query was taking 20 hours to complete. Through further interrogation and responses it was discovered that the dates were being stored in a VARCHAR2 column and implicit date conversion was being used. To show how much of a problem this can cause the following example was created; notice the results returned and the execution plans generated for each run of the query, once with the table defined in the manner the original poster described and one with the date column using the DATE datatype. We begin:


SQL>
SQL> create table datetst(
  2  myid    number,
  3  res_value	varchar2(3),
  4  mydt    varchar2(20));

Table created.

SQL>
SQL> create index datetest_idx on datetst(res_value);

Index created.

SQL>
SQL> begin
  2  	     for i in 1..10000 loop
  3  		     if mod(i,23) = 0 then
  4  		     insert into datetst
  5  		     values(i, 'PUR', to_char(sysdate+i, 'MM/DD/RRRR'));
  6  		     else
  7  		     insert into datetst
  8  		     values(i, 'BID', to_char(sysdate+i, 'MM/DD/RRRR'));
  9  		     end if;
 10  	     end loop;
 11
 12  	     commit;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>

Let’s now run a query using conditions similar to the posted query and see what Oracle returns:


SQL>
SQL>
SQL> select *
  2  from datetst
  3  where mydt <= sysdate +230;
where mydt <= sysdate +230
      *
ERROR at line 3:
ORA-01843: not a valid month


SQL>

As expected the implicit date conversion failed; modifying the query to explicitly convert the strings to dates produces ‘interesting’ results:


SQL>
SQL> select *
  2  from datetst
  3  where res_value = 'PUR'
  4  and mydt <= to_char(sysdate +230, 'MM/DD/RRRR');

      MYID RES MYDT
---------- --- --------------------
       920 PUR 05/26/2019
       943 PUR 06/18/2019
      1150 PUR 01/11/2020
      1173 PUR 02/03/2020
      1196 PUR 02/26/2020
      1219 PUR 03/20/2020
      1242 PUR 04/12/2020
      1265 PUR 05/05/2020
      1288 PUR 05/28/2020
      1311 PUR 06/20/2020
        46 PUR 01/02/2017
        69 PUR 01/25/2017
        92 PUR 02/17/2017
       115 PUR 03/12/2017
       138 PUR 04/04/2017
       161 PUR 04/27/2017
       184 PUR 05/20/2017
       207 PUR 06/12/2017
       230 PUR 07/05/2017
       414 PUR 01/05/2018
       437 PUR 01/28/2018
       460 PUR 02/20/2018
       483 PUR 03/15/2018
       506 PUR 04/07/2018
       529 PUR 04/30/2018
       552 PUR 05/23/2018
       575 PUR 06/15/2018
       782 PUR 01/08/2019
       805 PUR 01/31/2019
       828 PUR 02/23/2019
       851 PUR 03/18/2019
       874 PUR 04/10/2019
       897 PUR 05/03/2019
      2622 PUR 01/22/2024
      2645 PUR 02/14/2024
      2668 PUR 03/08/2024
      2691 PUR 03/31/2024
      2714 PUR 04/23/2024
      3335 PUR 01/04/2026
      3358 PUR 01/27/2026
      3381 PUR 02/19/2026
      3404 PUR 03/14/2026
      3427 PUR 04/06/2026
      3450 PUR 04/29/2026
      3473 PUR 05/22/2026
      3496 PUR 06/14/2026
      3703 PUR 01/07/2027
      3726 PUR 01/30/2027
      3749 PUR 02/22/2027
      3772 PUR 03/17/2027
      3795 PUR 04/09/2027
      3818 PUR 05/02/2027
      3841 PUR 05/25/2027
      3864 PUR 06/17/2027
      2737 PUR 05/16/2024
      2760 PUR 06/08/2024
      2783 PUR 07/01/2024
      2967 PUR 01/01/2025
      2990 PUR 01/24/2025
      3013 PUR 02/16/2025
      3036 PUR 03/11/2025
      3059 PUR 04/03/2025
      3082 PUR 04/26/2025
      3105 PUR 05/19/2025
      3128 PUR 06/11/2025
      3151 PUR 07/04/2025
      1518 PUR 01/13/2021
      1541 PUR 02/05/2021
      1564 PUR 02/28/2021
      1587 PUR 03/23/2021
      1610 PUR 04/15/2021
      1633 PUR 05/08/2021
      1656 PUR 05/31/2021
      1679 PUR 06/23/2021
      1886 PUR 01/16/2022
      1909 PUR 02/08/2022
      1932 PUR 03/03/2022

      MYID RES MYDT
---------- --- --------------------
      1955 PUR 03/26/2022
      1978 PUR 04/18/2022
      2001 PUR 05/11/2022
      2024 PUR 06/03/2022
      2047 PUR 06/26/2022
      2254 PUR 01/19/2023
      2277 PUR 02/11/2023
      2300 PUR 03/06/2023
      2323 PUR 03/29/2023
      2346 PUR 04/21/2023
      2369 PUR 05/14/2023
      2392 PUR 06/06/2023
      2415 PUR 06/29/2023
      5543 PUR 01/21/2032
      5566 PUR 02/13/2032
      5589 PUR 03/07/2032
      5612 PUR 03/30/2032
      5635 PUR 04/22/2032
      5658 PUR 05/15/2032
      5681 PUR 06/07/2032
      5704 PUR 06/30/2032
      5911 PUR 01/23/2033
      5934 PUR 02/15/2033
      5957 PUR 03/10/2033
      5980 PUR 04/02/2033
      6003 PUR 04/25/2033
      6026 PUR 05/18/2033
      4071 PUR 01/10/2028
      4094 PUR 02/02/2028
      4117 PUR 02/25/2028
      4140 PUR 03/19/2028
      4163 PUR 04/11/2028
      4186 PUR 05/04/2028
      4209 PUR 05/27/2028
      4232 PUR 06/19/2028
      4439 PUR 01/12/2029
      4462 PUR 02/04/2029
      4485 PUR 02/27/2029
      4508 PUR 03/22/2029
      4531 PUR 04/14/2029
      4554 PUR 05/07/2029
      4577 PUR 05/30/2029
      4600 PUR 06/22/2029
      4807 PUR 01/15/2030
      4830 PUR 02/07/2030
      4853 PUR 03/02/2030
      4876 PUR 03/25/2030
      4899 PUR 04/17/2030
      4922 PUR 05/10/2030
      4945 PUR 06/02/2030
      4968 PUR 06/25/2030
      5175 PUR 01/18/2031
      5198 PUR 02/10/2031
      5221 PUR 03/05/2031
      5244 PUR 03/28/2031
      5267 PUR 04/20/2031
      5290 PUR 05/13/2031
      5313 PUR 06/05/2031
      5336 PUR 06/28/2031
      6992 PUR 01/09/2036
      7015 PUR 02/01/2036
      7038 PUR 02/24/2036
      7061 PUR 03/18/2036
      7084 PUR 04/10/2036
      7107 PUR 05/03/2036
      7130 PUR 05/26/2036
      7153 PUR 06/18/2036
      7866 PUR 06/01/2038
      7889 PUR 06/24/2038
      8096 PUR 01/17/2039
      8119 PUR 02/09/2039
      8142 PUR 03/04/2039
      8165 PUR 03/27/2039
      8188 PUR 04/19/2039
      8211 PUR 05/12/2039
      8234 PUR 06/04/2039
      8257 PUR 06/27/2039

      MYID RES MYDT
---------- --- --------------------
      7360 PUR 01/11/2037
      7383 PUR 02/03/2037
      7406 PUR 02/26/2037
      7429 PUR 03/21/2037
      7452 PUR 04/13/2037
      7475 PUR 05/06/2037
      7498 PUR 05/29/2037
      7521 PUR 06/21/2037
      7728 PUR 01/14/2038
      7751 PUR 02/06/2038
      7774 PUR 03/01/2038
      7797 PUR 03/24/2038
      7820 PUR 04/16/2038
      7843 PUR 05/09/2038
      6049 PUR 06/10/2033
      6072 PUR 07/03/2033
      6256 PUR 01/03/2034
      6279 PUR 01/26/2034
      6302 PUR 02/18/2034
      6325 PUR 03/13/2034
      6348 PUR 04/05/2034
      6371 PUR 04/28/2034
      6394 PUR 05/21/2034
      6417 PUR 06/13/2034
      6624 PUR 01/06/2035
      6647 PUR 01/29/2035
      6670 PUR 02/21/2035
      6693 PUR 03/16/2035
      6716 PUR 04/08/2035
      6739 PUR 05/01/2035
      6762 PUR 05/24/2035
      6785 PUR 06/16/2035
      9982 PUR 03/17/2044
      8464 PUR 01/20/2040
      8487 PUR 02/12/2040
      8510 PUR 03/06/2040
      8533 PUR 03/29/2040
      8556 PUR 04/21/2040
      8579 PUR 05/14/2040
      8602 PUR 06/06/2040
      8625 PUR 06/29/2040
      8832 PUR 01/22/2041
      8855 PUR 02/14/2041
      8878 PUR 03/09/2041
      8901 PUR 04/01/2041
      8924 PUR 04/24/2041
      8947 PUR 05/17/2041
      8970 PUR 06/09/2041
      8993 PUR 07/02/2041
      9177 PUR 01/02/2042
      9200 PUR 01/25/2042
      9223 PUR 02/17/2042
      9246 PUR 03/12/2042
      9269 PUR 04/04/2042
      9292 PUR 04/27/2042
      9315 PUR 05/20/2042
      9338 PUR 06/12/2042
      9545 PUR 01/05/2043
      9568 PUR 01/28/2043
      9591 PUR 02/20/2043
      9614 PUR 03/15/2043
      9637 PUR 04/07/2043
      9660 PUR 04/30/2043
      9683 PUR 05/23/2043
      9706 PUR 06/15/2043
      9913 PUR 01/08/2044
      9936 PUR 01/31/2044
      9959 PUR 02/23/2044

222 rows selected.

SQL>

The query should have returned no more than 10 rows that met the criteria, and it returned 222. Looking at the plan we see:


SQL>
SQL> select * From table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6qatrtphp5wjt, child number 0
-------------------------------------
select * from datetst where res_value = 'PUR' and mydt <=
to_char(sysdate +230, 'MM/DD/RRRR')

Plan hash value: 3255216368

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATETST      |   222 |  6216 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | DATETEST_IDX |   434 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MYDT"<=TO_CHAR(SYSDATE@!+230,'MM/DD/RRRR'))
   2 - access("RES_VALUE"='PUR')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


25 rows selected.

SQL>

Oracle found 222 rows that ‘matched’ the conditions, illustrating a problem of using an incorrect datatype; Oracle can’t know these are dates and compares them as ASCII strings creating a result set that is much larger than it should be. We drop the table and start over:


SQL>
SQL> drop table datetst purge;

Table dropped.

SQL>
SQL> create table datetst(
  2  myid    number,
  3  res_value varchar2(3),
  4  mydt    date);

Table created.

SQL>
SQL> create index datetest_idx on datetst(res_value);

Index created.

SQL>
SQL> begin
  2  	     for i in 1..10000 loop
  3  		     if mod(i,23) = 0 then
  4  		     insert into datetst
  5  		     values(i, 'PUR', sysdate+i);
  6  		     else
  7  		     insert into datetst
  8  		     values(i, 'BID', sysdate+i);
  9  		     end if;
 10  	     end loop;
 11
 12  	     commit;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>

We now run the original query (that didn’t have explicit date conversion, since we no longer need it) and examine the results:


SQL>
SQL> select *
  2  from datetst
  3  where res_value = 'PUR'
  4  and mydt <= sysdate +230;

      MYID RES MYDT
---------- --- ---------
        23 PUR 10-DEC-16
        46 PUR 02-JAN-17
        69 PUR 25-JAN-17
        92 PUR 17-FEB-17
       115 PUR 12-MAR-17
       138 PUR 04-APR-17
       161 PUR 27-APR-17
       184 PUR 20-MAY-17
       207 PUR 12-JUN-17
       230 PUR 05-JUL-17

10 rows selected.

SQL>
SQL> select * From table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0m2c2mv7zhx49, child number 0
-------------------------------------
select * from datetst where res_value = 'PUR' and mydt <= sysdate +230

Plan hash value: 3255216368

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |       |       |     2 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATETST      |    10 |   250 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | DATETEST_IDX |   434 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MYDT"<=SYSDATE@!+230)
   2 - access("RES_VALUE"='PUR')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


24 rows selected.

SQL>

Oracle now found the 10 rows we sought using the conditions we specified because the date data was correctly stored as a DATE datatype. Using VARCHAR2 made the result set 2,220 percent larger, and that was for a 10000 row table. Let’s re-run the example with 1,000,000 rows and see what numbers Oracle produces:


SQL>
SQL> create table datetst(
  2  myid    number,
  3  res_value	varchar2(3),
  4  mydt    varchar2(20));

Table created.

SQL>
SQL> create index datetest_idx on datetst(res_value);

Index created.

SQL>
SQL> begin
  2  	     for i in 1..1000000 loop
  3  		     if mod(i,23) = 0 then
  4  		     insert into datetst
  5  		     values(i, 'PUR', to_char(sysdate+i, 'MM/DD/RRRR'));
  6  		     else
  7  		     insert into datetst
  8  		     values(i, 'BID', to_char(sysdate+i, 'MM/DD/RRRR'));
  9  		     end if;
 10  	     end loop;
 11
 12  	     commit;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>
SQL> select *
  2  from datetst
  3  where mydt <= sysdate +230;
where mydt <= sysdate +230
      *
ERROR at line 3:
ORA-01843: not a valid month


SQL>
SQL> select *
  2  from datetst
  3  where res_value = 'PUR'
  4  and mydt <= to_char(sysdate +230, 'MM/DD/RRRR');

      MYID RES MYDT
---------- --- --------------------
       920 PUR 05/26/2019
       943 PUR 06/18/2019
      1150 PUR 01/11/2020
      1173 PUR 02/03/2020
      1196 PUR 02/26/2020
      1219 PUR 03/20/2020
      1242 PUR 04/12/2020
      1265 PUR 05/05/2020
      1288 PUR 05/28/2020
      1311 PUR 06/20/2020
        46 PUR 01/02/2017
        69 PUR 01/25/2017
        92 PUR 02/17/2017
       115 PUR 03/12/2017
       138 PUR 04/04/2017
       161 PUR 04/27/2017
       184 PUR 05/20/2017
       207 PUR 06/12/2017
       230 PUR 07/05/2017
       414 PUR 01/05/2018
       437 PUR 01/28/2018
       460 PUR 02/20/2018
       483 PUR 03/15/2018
       506 PUR 04/07/2018
       529 PUR 04/30/2018
       552 PUR 05/23/2018
       575 PUR 06/15/2018
       782 PUR 01/08/2019
       805 PUR 01/31/2019
       828 PUR 02/23/2019
       851 PUR 03/18/2019
       874 PUR 04/10/2019
       897 PUR 05/03/2019
      2622 PUR 01/22/2024
      2645 PUR 02/14/2024
      2668 PUR 03/08/2024
      2691 PUR 03/31/2024
      2714 PUR 04/23/2024
      3335 PUR 01/04/2026
      3358 PUR 01/27/2026
      3381 PUR 02/19/2026
      3404 PUR 03/14/2026
      3427 PUR 04/06/2026
      3450 PUR 04/29/2026
      3473 PUR 05/22/2026
      3496 PUR 06/14/2026
      3703 PUR 01/07/2027
      3726 PUR 01/30/2027
      3749 PUR 02/22/2027
      3772 PUR 03/17/2027
      3795 PUR 04/09/2027
      3818 PUR 05/02/2027
      3841 PUR 05/25/2027
      3864 PUR 06/17/2027
      2737 PUR 05/16/2024
      2760 PUR 06/08/2024
      2783 PUR 07/01/2024
      2967 PUR 01/01/2025
      2990 PUR 01/24/2025
      3013 PUR 02/16/2025
      3036 PUR 03/11/2025
      3059 PUR 04/03/2025
      3082 PUR 04/26/2025
      3105 PUR 05/19/2025
      3128 PUR 06/11/2025
      3151 PUR 07/04/2025
      1518 PUR 01/13/2021
      1541 PUR 02/05/2021
      1564 PUR 02/28/2021
      1587 PUR 03/23/2021
      1610 PUR 04/15/2021
      1633 PUR 05/08/2021
      1656 PUR 05/31/2021
      1679 PUR 06/23/2021
      1886 PUR 01/16/2022
      1909 PUR 02/08/2022
      1932 PUR 03/03/2022
...
    986309 PUR 04/21/4717
    986332 PUR 05/14/4717
    986355 PUR 06/06/4717
    986378 PUR 06/29/4717
    990955 PUR 01/09/4730
    990978 PUR 02/01/4730
    991001 PUR 02/24/4730
    991024 PUR 03/19/4730
    995394 PUR 03/06/4742
    995417 PUR 03/29/4742
    995440 PUR 04/21/4742
    995463 PUR 05/14/4742
    995486 PUR 06/06/4742
    995509 PUR 06/29/4742
    986585 PUR 01/22/4718
    986608 PUR 02/14/4718
    986631 PUR 03/09/4718
    986654 PUR 04/01/4718
    986677 PUR 04/24/4718
    991047 PUR 04/11/4730
    991070 PUR 05/04/4730
    991093 PUR 05/27/4730
    991116 PUR 06/19/4730
    991323 PUR 01/12/4731
    995716 PUR 01/22/4743
    995739 PUR 02/14/4743
    995762 PUR 03/09/4743
    995785 PUR 04/01/4743
    995808 PUR 04/24/4743
    995831 PUR 05/17/4743
    995854 PUR 06/09/4743
    995877 PUR 07/02/4743

22054 rows selected.

SQL>
SQL> select * From table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  6qatrtphp5wjt, child number 0
-------------------------------------
select * from datetst where res_value = 'PUR' and mydt <=
to_char(sysdate +230, 'MM/DD/RRRR')

Plan hash value: 3255216368

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |       |       |   158 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATETST      | 25757 |   704K|   158   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | DATETEST_IDX | 49112 |       |   158   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MYDT"<=TO_CHAR(SYSDATE@!+230,'MM/DD/RRRR'))
   2 - access("RES_VALUE"='PUR')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


25 rows selected.

SQL>
SQL> drop table datetst purge;

Table dropped.

SQL>

SQL> create table datetst(
  2  myid    number,
  3  res_value varchar2(3),
  4  mydt    date);

Table created.

SQL>
SQL> create index datetest_idx on datetst(res_value);

Index created.

SQL>
SQL> begin
  2  	     for i in 1..1000000 loop
  3  		     if mod(i,23) = 0 then
  4  		     insert into datetst
  5  		     values(i, 'PUR', sysdate+i);
  6  		     else
  7  		     insert into datetst
  8  		     values(i, 'BID', sysdate+i);
  9  		     end if;
 10  	     end loop;
 11
 12  	     commit;
 13  end;
 14  /

PL/SQL procedure successfully completed.

SQL>
SQL> select *
  2  from datetst
  3  where res_value = 'PUR'
  4  and mydt <= sysdate +230;

      MYID RES MYDT
---------- --- ---------
        23 PUR 10-DEC-16
        46 PUR 02-JAN-17
        69 PUR 25-JAN-17
        92 PUR 17-FEB-17
       115 PUR 12-MAR-17
       138 PUR 04-APR-17
       161 PUR 27-APR-17
       184 PUR 20-MAY-17
       207 PUR 12-JUN-17
       230 PUR 05-JUL-17

10 rows selected.

SQL>
SQL> select * From table(dbms_xplan.display_cursor());

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  0m2c2mv7zhx49, child number 0
-------------------------------------
select * from datetst where res_value = 'PUR' and mydt <= sysdate +230

Plan hash value: 3255216368

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |       |       |   148 (100)|          |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DATETST      |    35 |   875 |   148   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | DATETEST_IDX | 46482 |       |   148   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MYDT"<=SYSDATE@!+230)
   2 - access("RES_VALUE"='PUR')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


24 rows selected.

SQL>

With 1,000,000 rows of data Oracle inflated the original 10-row result set to 22,054 rows, a whopping 220,440 percent increase. As the data volumes increase this result set will increase to even larger numbers, resulting in extremely long query times and vast numbers of incorrect results, something I doubt the original poster had counted on.

Using the correct datatype is essential in ensuring Oracle can do its job and do it properly, returning result sets that are reliable. Storing data in formst that doesn’t reflect the actual data type can be disastrous, as illustrated here. It pays when writing or purchasing applications that the proper datatype is in force for the columns being used.

It only makes sense.

November 14, 2016

“That’s Not What I Wanted.”

Filed under: General — dfitzjarrell @ 11:40

"Let me try once more," Milo said in an effort to explain. "In other words--"
"You mean you have other words?" cried the bird happily. "Well, by all means, 
use them. You're certainly not doing very well with the ones you have now."
-- Norton Juster, The Phantom Tollbooth

A recent question in the Oracle database forums has a user wanting to open his physical standby database in READ WRITE mode but still keep it as a physical standby. He found this blog post, followed the instructions and did, indeed, get what used to be his standby database opened in READ WRITE mode. What the blogger failed to mention was that his instructions performed a failover of the standby to primary, and did so in a manner that Oracle does not recommend.

To be honest there isn’t a way to open a physical standby database READ WRITE, keep the primary as a primary and maintain the physical standby as a physical standby. Active Data Guard allows the physical standby to be opened READ ONLY while continuing to apply redo to keep the standby in synch with the primary. (This option has extra cost as it requires additional licensing.) Once a failover has occurred the only recourse is to rebuild the standby database; it can’t be simply ‘brought back’ at the will and whim of the DBA.

What was suggested by the author of that blog post was to use:


alter database activate standby database;

which WILL eventually open the standby READ WRITE, failing it over to be the new primary in the process. Yet Oracle, in the documentation, clearly does not recommend using this option:


ACTIVATE [PHYSICAL|LOGICAL] STANDBY DATABASE [FINISH APPLY]

Performs a failover. The standby database must be mounted before it can be activated with this statement.

Note: Do not use the ALTER DATABASE ACTIVATE STANDBY DATABASE statement to failover because it causes
data loss. Instead, use the following best practices:

For physical standby databases, use the ALTER DATABASE RECOVER MANAGED STANDBY DATABASE statement 
with the FINISH keyword to perform the role transition as quickly as possible with little or no data loss
and without rendering other standby databases unusable.

For logical standby databases, use the ALTER DATABASE PREPARE TO SWITCHOVER and
ALTER DATABASE COMMIT TO SWITCHOVER statements.

Notice the warning given by Oracle:


Note: Do not use the ALTER DATABASE ACTIVATE STANDBY DATABASE statement to failover because it causes
data loss.

A far better option, although not permanent, would be to execute a switchover, where the primary and physical standby switch roles for a period of time before restoring the physical standby to its original role as PHYSICAL STANDBY. This takes more work to accomplish but it is reversible. An example of a switchover is shown below:


Ensure that no gaps exist in the redo apply:

SQL> select status, gap_status
  2  from v$archive_dest_status
  3  where dest_id = 2;
 
STATUS    GAP_STATUS
--------- ------------------------
VALID     NO GAP
 
SQL>

Make certain that the temp file name or names on the standby match those on the primary.  On the primary execute:

SQL> select file_name from dba_temp_files;
 
FILE_NAME
--------------------------------------------------------------------------------
/u01/app/oradata/yumpo/temp01.dbf
 
SQL>

and on the standby execute:

SQL> select name from v$tempfile;
 
NAME
--------------------------------------------------------------------------------
/u01/app/oradata/yumpo/temp01.dbf
 
SQL> 

There may be a delay configured for redo apply on the standby; to determine if a delay is configured execute:

SQL> select delay_mins from v$archive_dest where dest_id = 2; 
 
DELAY_MINS
----------
         0
 
SQL>

(If the DELAY_MINS is greater than 0 the delay can be removed by executing:

 ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY;

)

Check the SWITCHOVER_STATUS on the primary:

SQL> select switchover_status from v$database;
 
SWITCHOVER_STATUS
--------------------
TO STANDBY
 
SQL>

Valid values are TO STANDBY and SESSION ACTIVE.  Any other value signals that the redo transport 
isn't functioning and the switchover will fail.  If the status is SESSION ACTIVE query the
V$SESSION view for active sessions:

SQL> select sid, process, program
  2  from v$session where type = 'USER';

       SID PROCESS                  PROGRAM
---------- ------------------------ ---------------------------------------
         6 16952:11396              sqlplus.exe
        17 15728                    ORACLE.EXE (J002)
        18 10668                    ORACLE.EXE (J006)
       131 13196                    ORACLE.EXE (J003)
       134 16576:16772              sqlplus.exe
       135 9720                     ORACLE.EXE (J007)
       244 17404                    ORACLE.EXE (J004)
       248 17844                    ORACLE.EXE (J008)
       254 12992                    ORACLE.EXE (J000)
       364 8280                     ORACLE.EXE (J005)
       366 2320                     ORACLE.EXE (J001)

11 rows selected.

SQL>

When everything looks good execute the switchover; if the status is TO STANDBY the command is:

SQL> alter database commit to switchover to physical standby;
 
Database altered.

SQL>

If the status is SESSION ACTIVE you should modify the command as shown below:

SQL> alter database commit to switchover to physical standby with session shutdown;
 
Database altered.
 
SQL>

Shutdown the 'new' standby and afterwards mount it:

SQL> shutdown immediate
ORA-01507: database not mounted
 
 
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.
 
...
Database mounted.
SQL> 

If all has gone according to plan the alert log on the 'new' standby should report:

Successful mount of redo thread 1, with mount id ...
Physical Standby Database mounted.

It's now time to check the 'old' standby for SWITCHOVER_STATUS.  The two valid values are  
TO PRIMARY and SESSIONS ACTIVE:

SQL> select switchover_status from v$database;
 
SWITCHOVER_STATUS
--------------------
TO PRIMARY
 
SQL>

Since no sessions are active the WITH SESSION SHUTDOWN clause will not be necessary:

SQL> alter database commit to switchover to primary;
 
Database altered.
 
SQL>

Open the 'new' primary database:

SQL> alter database open;
 
Database altered.
 
SQL>

Looking in the alert log you should see:

Standby became primary SCN: ...
Switchover: Complete - Database mounted as primary
Completed: alter database commit to switchover to primary

Check the LOG_ARCHIVE_DEST_n parameters and set up a new destination, if necessary:

SQL> alter system set log_archive_dest_2='service="yumpo" LGWR ASYNC NOAFFIRM delay=0 optional compression=disable max_failure=0 max_connections=1 reopen=300 db_unique_name="yumpo" net_timeout=30 valid_for=(all_logfiles,primary_role)' scope = both;
 
System altered.
 
SQL>

Start the redo apply on the 'new' standby:

SQL> alter database recover managed standby database using current logfile disconnect from session;
 
Database altered.
 
SQL>

It's a good idea to verify that the redo transport is working; execute at least two log switches
on the 'new' primary to prove the redo is being transported:

SQL> alter system switch logfile;
 
System altered.
 
SQL> /
 
System altered.
 
SQL> /
 
System altered.
 
SQL>/
 
System altered.
 
SQL> select status, gap_status from v$archive_dest_status where dest_id = 2;
 
STATUS    GAP_STATUS
--------- ------------------------
VALID     NO GAP
 
SQL>

Presuming the status is VALID and there is no gap the switchover is complete.

To reverse the switchover the same steps are applied, starting at the ‘new’ primary and ending with the ‘new’ standby. Should there be no problems the primary database will be returned to the PRIMARY role, likewise the physical standby will return to the PHYSICAL STANDBY role. In between the switchover steps the original physical standby will be temporarily the primary database, opened in READ WRITE mode.

If you have Data Guard Broker configured a switchover is considerably easier to execute; you can read the Oracle documentation for a complete example.

Understanding Data Guard/Standby concepts is essential for knowing what can, and cannot, be done. And blog advice, even from this one, should be investigated before taking any action as the title and text may not accurately reflect what the recommended actions actually accomplish. It would be a shame to put yourself in an unrecoverable situation just because you jumped first and looked later.

In other words …

November 13, 2016

“If You Change This, Right Here …”

Filed under: Performance,stats — dfitzjarrell @ 10:47

"changes are so frightening."
-- Norton Juster, The Phantom Tollbooth

It isn’t unusual for Oracle to expand functionality with each new release; a short list includes AWR, ADDM, the SYSAUX tablespace, Flashback Query, Flashback Database and Edition-based Redefinition. Oracle 12.1 is no different as it provides an interesting performance enhancement, the in-memory column store, that can increase performance by reducing physical I/O load by making tables memory-resident (to simplify the description somewhat). In addition to the in-memory access to table data Oracle has introduced a new optimization called “Vector Transformation, including Vector Optimization” for in-memory operations. Even when this new optimization isn’t in use there are new code paths to take advantage of this technology. As a result execution plans can change; let’s look at an example originally provided by Jonathan Lewis that illustrates this.

(In-memory is fairly easy to configure. Set inmemory_size to a ‘reasonable’ non-zero value and then restart the database:


SQL> alter system set inmemory_size=500m scope=spfile;

System altered.

SQL>

For a complete reference to in-memory configuration read here.)

Two tables are created, neither one especially noteworthy, an index is created on each table and extended statistics are generated on the (n1,n2) column grouping:


SQL> 
SQL> create table t1
  2  nologging
  3  as
  4  with generator as (
  5  	     select  --+ materialize
  6  		     rownum id
  7  	     from dual
  8  	     connect by
  9  		     level  
SQL> create table t2
  2  nologging
  3  as
  4  with generator as (
  5  	     select  --+ materialize
  6  		     rownum id
  7  	     from dual
  8  	     connect by
  9  		     level 'T1',
  5  		     method_opt       => 'for columns (n1,n2) size 1'
  6  	     );
  7  end;
  8  /
SQL> 

Executing the following query and displaying the execution plan shows no surprises since neither table has been altered to be in-memory:


SQL> 
SQL> select
  2  	     /*+
  3  		     qb_name(main)
  4  	     */
  5  	     count(*)
  6  from    (
  7  	     select
  8  		     /*+ qb_name(inline) */
  9  		     distinct t1.v1, t2.v1
 10  	     from
 11  		     t1,t2
 12  	     where
 13  		     t1.n1 = 50
 14  	     and     t1.n2 = 50
 15  	     and     t2.rand = t1.id
 16  	     )
 17  ;

      9912                                                                                                                                            
SQL> 
SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  a7gb3ncf922zz, child number 0                                                                                                                 
-------------------------------------                                                                                                                 
select         /*+                 qb_name(main)         */                                                                                           
count(*) from    (         select                 /*+ qb_name(inline)                                                                                 
*/                 distinct t1.v1, t2.v1         from                                                                                                 
t1,t2         where                 t1.n1 = 50         and     t1.n2 =                                                                                
50         and     t2.rand = t1.id         )                                                                                                          
                                                                                                                                                      
Plan hash value: 1718706536                                                                                                                           
                                                                                                                                                      
-------------------------------------------------------------------------------------------------                                                     
| Id  | Operation                               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |                                                     
-------------------------------------------------------------------------------------------------                                                     
|   0 | SELECT STATEMENT                        |       |       |       |  5127 (100)|          |                                                     
|   1 |  SORT AGGREGATE                         |       |     1 |       |            |          |                                                     
|   2 |   VIEW                                  |       | 10001 |       |  5127   (1)| 00:00:01 |                                                     
|   3 |    HASH UNIQUE                          |       | 10001 |   351K|  5127   (1)| 00:00:01 |                                                     
|*  4 |     HASH JOIN                           |       | 10001 |   351K|  5125   (1)| 00:00:01 |                                                     
|*  5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1    |   100 |  2100 |     3   (0)| 00:00:01 |                                                     
|*  6 |       INDEX RANGE SCAN                  | T1_N1 |   100 |       |     1   (0)| 00:00:01 |                                                     
|   7 |      TABLE ACCESS FULL                  | T2    |  1000K|    14M|  5113   (1)| 00:00:01 |                                                     
-------------------------------------------------------------------------------------------------                                                     
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   4 - access("T2"."RAND"="T1"."ID")                                                                                                                  
   5 - filter("T1"."N2"=50)                                                                                                                           
   6 - access("T1"."N1"=50)                                                                                                                           
                                                                                                                                                      
SQL> 

Having the extended statistics allowed the optimizer to estimate the correct cardinality between columns n1 and n2, providing a better initial plan. Now things will get a bit interesting; table T2 is altered and put in-memory:


SQL> 
SQL> alter table t2 inmemory;
SQL> 

The same query is executed again and the plan changes because of T2 now being in-memory, even though the new vector transformation has not been applied:


SQL> 
SQL> select
  2  	     /*+
  3  		     qb_name(main)
  4  	     */
  5  	     count(*)
  6  from    (
  7  	     select
  8  		     /*+ qb_name(inline) */
  9  		     distinct t1.v1, t2.v1
 10  	     from
 11  		     t1,t2
 12  	     where
 13  		     t1.n1 = 50
 14  	     and     t1.n2 = 50
 15  	     and     t2.rand = t1.id
 16  	     )
 17  ;

      9912                                                                                                                                            
SQL> 
SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  a7gb3ncf922zz, child number 0                                                                                                                 
-------------------------------------                                                                                                                 
select         /*+                 qb_name(main)         */                                                                                           
count(*) from    (         select                 /*+ qb_name(inline)                                                                                 
*/                 distinct t1.v1, t2.v1         from                                                                                                 
t1,t2         where                 t1.n1 = 50         and     t1.n2 =                                                                                
50         and     t2.rand = t1.id         )                                                                                                          
                                                                                                                                                      
Plan hash value: 106371239                                                                                                                            
                                                                                                                                                      
----------------------------------------------------------------------------------------------------                                                  
| Id  | Operation                                | Name    | Rows  | Bytes | Cost (%CPU)| Time     |                                                  
----------------------------------------------------------------------------------------------------                                                  
|   0 | SELECT STATEMENT                         |         |       |       |   223 (100)|          |                                                  
|   1 |  SORT AGGREGATE                          |         |     1 |       |            |          |                                                  
|   2 |   VIEW                                   |         | 10001 |       |   223  (15)| 00:00:01 |                                                  
|   3 |    HASH UNIQUE                           |         | 10001 |   351K|   223  (15)| 00:00:01 |                                                  
|*  4 |     HASH JOIN                            |         | 10001 |   351K|   222  (14)| 00:00:01 |                                                  
|   5 |      JOIN FILTER CREATE                  | :BF0000 |   100 |  2100 |     3   (0)| 00:00:01 |
|*  6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T1      |   100 |  2100 |     3   (0)| 00:00:01 |                                                  
|*  7 |        INDEX RANGE SCAN                  | T1_N1   |   100 |       |     1   (0)| 00:00:01 |                                                  
|   8 |      JOIN FILTER USE                     | :BF0000 |  1000K|    14M|   209  (11)| 00:00:01 |
|*  9 |       TABLE ACCESS INMEMORY FULL         | T2      |  1000K|    14M|   209  (11)| 00:00:01 |                                                  
----------------------------------------------------------------------------------------------------                                                  
                                                                                                                                                      
Predicate Information (identified by operation id):                                                                                                   
---------------------------------------------------                                                                                                   
                                                                                                                                                      
   4 - access("T2"."RAND"="T1"."ID")                                                                                                                  
   6 - filter("T1"."N2"=50)                                                                                                                           
   7 - access("T1"."N1"=50)                                                                                                                           
   9 - inmemory(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"T2"."RAND"))
                                                                                                                                                      
SQL> 

The cost of the tablescan was reduced considerably, from 5113 to 209 but that isn’t the most thought-provoking part of the plan, however, as a serial Bloom filter is used for the hash join due to the in-memory code path. With a table stored ‘normally’ (not in-memory) a Bloom filter would not be effective; with the in-memory option a Bloom filter can use enhancements such as simultaneous multiple probe access into table t2 (in this example) in addition to the benefits of avoiding disk access, disk buffer activity and row-by-row data retrieval. All of those avoidances reduce CPU time considerably when compared to a ‘regular’ table, one that is not in-memory.

The in-memory column store is a very nice addition to an already powerful database, made even better by new optimizer transformations that speed data access by providing execution paths not available for tables accessed from disk. ‘Experiment’ with the in-memory option to see what performance improvements it might provide; you might be pleasantly surprised at what you find.

Sometimes unexpected change is good.

October 29, 2016

It’s A Match

Filed under: General — dfitzjarrell @ 10:29

"So many things are possible as long as you don't know they're impossible."
-- Norton Juster, The Phantom Tollbooth

With every new release Oracle provides new features that are very useful, even if you didn’t think of them before. In 12.1.0.2 Oracle provides a new functionality for finding pattern matches in a data set; using the MATCH_RECOGNIZE function it’s fairly easy to generate results based on a defined pattern in the data. Let’s look at some examples and see how to use MATCH_RECOGNIZE to find and report user-defined patterns in a data set. The examples provided were taken from the Live SQL website (https://livesql.oracle.com) and from the Oracle documentation (https://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8981), where MATCH_RECOGNIZE is more thoroughly explained. That being said let’s go through an introduction to the function and some of what it can accomplish.

These examples begin with a table of click data, similar to what an advertising agency might collect from users clicking on an advertisement. Sessions are defined as being 10 minutes or more apart; all data within a 10-minute window for the same user is considered to be the same session. We create the table and load the data:


SQL> 
SQL> --
SQL> -- Create table and populate
SQL> --
SQL> CREATE TABLE clickdata (tstamp integer, userid varchar2(15));

Table created.

SQL> INSERT INTO clickdata VALUES(1, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(2, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(11, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(12, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(22, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(23, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(32, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(34, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(43, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(44, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(47, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(48, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(53, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(59, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(60, 'Sam');

1 row created.

SQL> INSERT INTO clickdata VALUES(63, 'Mary');

1 row created.

SQL> INSERT INTO clickdata VALUES(68, 'Sam');

1 row created.

SQL> commit;

Commit complete.

Using a very basic MATCH_RECOGNIZE query we return the timestamp and the user id to see how many sessions each user had:


SQL> 
SQL> --
SQL> -- Return rows that are part of the same
SQL> -- session for a login
SQL> --
SQL> -- Any session tstamp value within 10
SQL> -- minutes of the prior tstamp is considered
SQL> -- part of the same session
SQL> --
SQL> -- In MATCH_RECOGNIZE the PATTERN is the
SQL> -- sequence or data pattern we are looking for
SQL> --
SQL> -- In this case b maps to any row, which gives
SQL> -- the ability to check for matches
SQL> --
SQL> -- s* states that 0 or more rows must be mapped
SQL> -- to confirm a match
SQL> --
SQL> -- s is defined as the difference between the
SQL> -- current tstamp value and the prior tstamp
SQL> -- value and the difference must be less than
SQL> -- or equal to 10
SQL> --
SQL> -- This will give a list of the activity of the
SQL> -- different continuous sessions
SQL> --
SQL> -- The ALL ROWS PER MATCH returns every matching
SQL> -- row
SQL> --
SQL> 
SQL> SELECT
  2   tstamp,
  3   userid
  4  FROM clickdata MATCH_RECOGNIZE(
  5  	ALL ROWS PER MATCH
  6  	PATTERN (b s*)
  7  	DEFINE
  8  	    s as (s.tstamp - prev(s.tstamp) <= 10)
  9   );

    TSTAMP USERID                                                                                                                                     
---------- ---------------                                                                                                                            
         1 Mary                                                                                                                                       
         2 Sam                                                                                                                                        
        11 Mary                                                                                                                                       
        12 Sam                                                                                                                                        
        22 Sam                                                                                                                                        
        23 Mary                                                                                                                                       
        32 Sam                                                                                                                                        
        34 Mary                                                                                                                                       
        43 Sam                                                                                                                                        
        44 Mary                                                                                                                                       
        47 Sam                                                                                                                                        

    TSTAMP USERID                                                                                                                                     
---------- ---------------                                                                                                                            
        48 Sam                                                                                                                                        
        53 Mary                                                                                                                                       
        59 Sam                                                                                                                                        
        60 Sam                                                                                                                                        
        63 Mary                                                                                                                                       
        68 Sam                                                                                                                                        

17 rows selected.

Two interesting things are found in this construct, the PATTERN keyword and the DEFINE keyword. PATTERN defines a generalized pattern, using Perl-style regular expression syntax, to declare a particular pattern in the data. DEFINE sets up the definitions for the symbols used in the PATTERN statement. Since b has no conditions on it every row is returned; the conditions on s state that the difference between the current timestamp and the previous timestamp must be less than or equal to 10. There are 17 rows of data in the table and all 17 are returned with this MATCH_RECOGNIZE query. Modifying that query to find the unique session identifier for each session fails because not enough pieces of the MATCH_RECOGNZE syntax are included; it ‘fails’ by returning 1 for the count of each session and that isn’t true:


SQL> 
SQL> --
SQL> -- The MATCH_NUMBER() Function should return the
SQL> -- session id for each different session
SQL> --
SQL> -- In this case it doesn't as we are missing a
SQL> -- piece of the puzzle
SQL> --
SQL> SELECT
  2   tstamp,
  3   userid,
  4   session_id
  5  FROM clickdata MATCH_RECOGNIZE(
  6  	MEASURES match_number() as session_id
  7  	ALL ROWS PER MATCH
  8  	PATTERN (b s*)
  9  	DEFINE
 10  	    s as (s.tstamp - prev(s.tstamp) <= 10)
 11   );

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
         1 Mary                     1                                                                                                                 
         2 Sam                      1                                                                                                                 
        11 Mary                     1                                                                                                                 
        12 Sam                      1                                                                                                                 
        22 Sam                      1                                                                                                                 
        23 Mary                     1                                                                                                                 
        32 Sam                      1                                                                                                                 
        34 Mary                     1                                                                                                                 
        43 Sam                      1                                                                                                                 
        44 Mary                     1                                                                                                                 
        47 Sam                      1                                                                                                                 

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
        48 Sam                      1                                                                                                                 
        53 Mary                     1                                                                                                                 
        59 Sam                      1                                                                                                                 
        60 Sam                      1                                                                                                                 
        63 Mary                     1                                                                                                                 
        68 Sam                      1                                                                                                                 

17 rows selected.

Proving Mary has three sessions is fairly easy:


SQL> 
SQL> --
SQL> -- As all records show 1 session this isn't accurate
SQL> -- since Mary has 3 separate sessions according to
SQL> -- the previously established criteria
SQL> --
SQL> SELECT *
  2  FROM clickdata
  3  WHERE userid = 'Mary';

    TSTAMP USERID                                                                                                                                     
---------- ---------------                                                                                                                            
         1 Mary                                                                                                                                       
        11 Mary                                                                                                                                       
        23 Mary                                                                                                                                       
        34 Mary                                                                                                                                       
        44 Mary                                                                                                                                       
        53 Mary                                                                                                                                       
        63 Mary                                                                                                                                       

7 rows selected.

The first two records belong to the first session, the third is another session and the remaining records make up the third session. Correcting the MATCH_RECOGNIZE query is a simply matter of partitioning (grouping) by userid and ordering each group by the tstamp values:


SQL> 
SQL> --
SQL> -- To 'fix' this we need to add some instructions
SQL> -- to the MATCH_RECOGNIZE function call
SQL> --
SQL> -- By partitioning  by userid  (collecting the data
SQL> -- based on the userid column) and ordering each
SQL> -- grouping by tstamp the MATCH_RECOGNIZE() call
SQL> -- can  accurately return the session id
SQL> --
SQL> SELECT
  2   tstamp,
  3   userid,
  4   session_id
  5  FROM clickdata MATCH_RECOGNIZE(
  6  	PARTITION BY userid ORDER BY tstamp
  7  	MEASURES match_number() as session_id
  8  	ALL ROWS PER MATCH
  9  	PATTERN (b s*)
 10  	DEFINE
 11  	    s as (s.tstamp - prev(s.tstamp) <=10)
 12   );

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
         1 Mary                     1                                                                                                                 
        11 Mary                     1                                                                                                                 
        23 Mary                     2                                                                                                                 
        34 Mary                     3                                                                                                                 
        44 Mary                     3                                                                                                                 
        53 Mary                     3                                                                                                                 
        63 Mary                     3                                                                                                                 
         2 Sam                      1                                                                                                                 
        12 Sam                      1                                                                                                                 
        22 Sam                      1                                                                                                                 
        32 Sam                      1                                                                                                                 

    TSTAMP USERID          SESSION_ID                                                                                                                 
---------- --------------- ----------                                                                                                                 
        43 Sam                      2                                                                                                                 
        47 Sam                      2                                                                                                                 
        48 Sam                      2                                                                                                                 
        59 Sam                      3                                                                                                                 
        60 Sam                      3                                                                                                                 
        68 Sam                      3                                                                                                                 

17 rows selected.

Now the report correctly displays that there were three sessions for each user.

Before we go further let’s look at some parts of the MATCH_RECOGNIZE statement by defining the keywords being used. PARTITION groups the data by the column or columns specified, similar to the way the analytic functions work. ORDER BY orders the data in each group by the column or columns provided. The MEASURES keyword allows you to define the values you want to return and how to compute them based on the pattern being matched. The MATCH_NUMBER() function returns the number of the pattern match, starting with 1 and increasing incrementally as each pattern match is found. The FIRST and LAST keywords return the first and last values for a given pattern match, and can be used in calculations. Finally the ALL ROWS PER MATCH directive returns one row for each criteria match in the data. Using these in the following modifications to the MATCH_RECOGNIZE query produces a report of the session number, the number of events in each session, the start time and the session duration:


SQL> 
SQL> --
SQL> -- Let's expand this report
SQL> --
SQL> -- Setting up the MEASURES clause allows us
SQL> -- to return:
SQL> --
SQL> --  * the number of events per session
SQL> --  * the start time of each session
SQL> --  * the session duration
SQL> --
SQL> -- We have already successfully returned the session id
SQL> -- so now we add the count(*) for each group, the
SQL> -- start time for each group and the elapsed time for
SQL> -- each group calculated from the last tstamp for
SQL> -- each group
SQL> --
SQL> -- In one group first and last are the same so the difference
SQL> -- is 0
SQL> --
SQL> SELECT
  2   tstamp,
  3   userid,
  4   session_id,
  5   no_of_events,
  6   start_time,
  7   session_duration
  8  FROM clickdata MATCH_RECOGNIZE(
  9  	PARTITION BY userid ORDER BY tstamp
 10  	MEASURES match_number() as session_id,
 11  		 count(*) as no_of_events,
 12  		 first(tstamp) start_time,
 13  		 last(tstamp) - first(tstamp) session_duration
 14  	ALL ROWS PER MATCH
 15  	PATTERN (b s*)
 16  	DEFINE
 17  	    s as (s.tstamp - prev(s.tstamp) <= 10)
 18   );

    TSTAMP USERID          SESSION_ID NO_OF_EVENTS START_TIME SESSION_DURATION                                                                        
---------- --------------- ---------- ------------ ---------- ----------------                                                                        
         1 Mary                     1            1          1                0                                                                        
        11 Mary                     1            2          1               10                                                                        
        23 Mary                     2            1         23                0                                                                        
        34 Mary                     3            1         34                0                                                                        
        44 Mary                     3            2         34               10                                                                        
        53 Mary                     3            3         34               19                                                                        
        63 Mary                     3            4         34               29                                                                        
         2 Sam                      1            1          2                0                                                                        
        12 Sam                      1            2          2               10                                                                        
        22 Sam                      1            3          2               20                                                                        
        32 Sam                      1            4          2               30                                                                        

    TSTAMP USERID          SESSION_ID NO_OF_EVENTS START_TIME SESSION_DURATION                                                                        
---------- --------------- ---------- ------------ ---------- ----------------                                                                        
        43 Sam                      2            1         43                0                                                                        
        47 Sam                      2            2         43                4                                                                        
        48 Sam                      2            3         43                5                                                                        
        59 Sam                      3            1         59                0                                                                        
        60 Sam                      3            2         59                1                                                                        
        68 Sam                      3            3         59                9                                                                        

17 rows selected.

As mentioned above using ALL ROWS PER MATCH can produce a long report. Taking this one step further and changing ALL ROWS PER MATCH to ONE ROW PER MATCH produces a shorter, summary report of the activity:


SQL> 
SQL> --
SQL> -- One of the nice aspects of MATCH_RECOGNIZE is the
SQL> -- ability to generate summary reports
SQL> --
SQL> -- The last report was nice but a bit lengthy
SQL> --
SQL> -- Using the ONE ROW PER MATCH directive and removing
SQL> -- the tstamp column from the select statement
SQL> -- allows MATCH_RECOGNIZE to return only the summary
SQL> -- information, making a more compact and, in this case,
SQL> -- usable report
SQL> --
SQL> SELECT
  2   userid,
  3   session_id,
  4   no_of_events,
  5   start_time,
  6   session_duration
  7  FROM clickdata MATCH_RECOGNIZE(
  8  	PARTITION BY userid ORDER BY tstamp
  9  	MEASURES match_number() as session_id,
 10  		 count(*) as no_of_events,
 11  		 first(tstamp) start_time,
 12  		 last(tstamp) - first(tstamp) session_duration
 13  	ONE ROW PER MATCH
 14  	PATTERN (b s*)
 15  	DEFINE
 16  	    s as (s.tstamp - prev(s.tstamp) 
 17   );

USERID          SESSION_ID NO_OF_EVENTS START_TIME SESSION_DURATION
--------------- ---------- ------------ ---------- ----------------
Mary                     1            2          1               10
Mary                     2            1         23                0
Mary                     3            4         34               29
Sam                      1            4          2               30
Sam                      2            3         43                5
Sam                      3            3         59                9

6 rows selected.

SQL>

Let’s create another table and data set, this time for stock prices for the month of April:


SQL> 
SQL> --
SQL> -- Create table for stock data
SQL> --
SQL> -- Populate the table with data
SQL> -- for April
SQL> --
SQL> CREATE TABLE Ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);

Table created.

SQL> 
SQL> INSERT INTO Ticker VALUES('ACME', '01-Apr-11', 12);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '02-Apr-11', 17);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '03-Apr-11', 19);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '04-Apr-11', 21);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '05-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '06-Apr-11', 12);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '07-Apr-11', 15);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '08-Apr-11', 20);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '09-Apr-11', 24);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '10-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '11-Apr-11', 19);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '12-Apr-11', 15);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '13-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '14-Apr-11', 25);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '15-Apr-11', 14);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '16-Apr-11', 12);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '17-Apr-11', 14);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '18-Apr-11', 24);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '19-Apr-11', 23);

1 row created.

SQL> INSERT INTO Ticker VALUES('ACME', '20-Apr-11', 22);

1 row created.

SQL> 
SQL> COMMIT;

Commit complete.

This data has a different pattern structure, where V-shaped patterns are generated by the changing stock prices. Let’s use MATCH_RECOGNIZE to find and report those patterns. This first query will return the total number of patterns found in the data:


SQL> 
SQL> --
SQL> -- Use MATCH_RECOGNIZE to find areas
SQL> -- in the data where the stock price
SQL> -- 'dipped' then rose
SQL> --
SQL> -- The data provided describes three
SQL> -- patterns of dipping and rising prices
SQL> --
SQL> -- The query will return three rows,
SQL> -- one per matching pattern
SQL> --
SQL> -- The only stock symbol in the table
SQL> -- is 'ACME' but this query would return
SQL> -- one row for each pattern matched per
SQL> -- stock symbol
SQL> --
SQL> -- Here we group by symbol and order
SQL> -- by the tstamp values
SQL> --
SQL> -- The MEASURES section defines the
SQL> -- starting timestamp of the pattern,
SQL> -- the lowest value of the group and
SQL> -- the highest value after the low has
SQL> -- been reached
SQL> --
SQL> -- Note that in one case the ending
SQL> -- tstamp for one pattern is the starting
SQL> -- tstamp for the	next pattern
SQL> --
SQL> -- This query introduces a new wrinkle.
SQL> -- AFTER MATCH SKIP TO LAST UP, which
SQL> -- causes the pattern matching for the
SQL> -- next group to begin at the last UP
SQL> -- value from the current pattern
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3  	  PARTITION BY symbol
  4  	  ORDER BY tstamp
  5  	  MEASURES  STRT.tstamp AS start_tstamp,
  6  		    LAST(DOWN.tstamp) AS bottom_tstamp,
  7  		    LAST(UP.tstamp) AS end_tstamp
  8  	  ONE ROW PER MATCH
  9  	  AFTER MATCH SKIP TO LAST UP
 10  	  PATTERN (STRT DOWN+ UP+)
 11  	  DEFINE
 12  	     DOWN AS DOWN.price  PREV(UP.price)
 14  	  ) MR
 15  ORDER BY MR.symbol, MR.start_tstamp;

SYMBOL     START_TST BOTTOM_TS END_TSTAM                                                                                                              
---------- --------- --------- ---------                                                                                                              
ACME       05-APR-11 06-APR-11 10-APR-11                                                                                                              
ACME       10-APR-11 12-APR-11 13-APR-11                                                                                                              
ACME       14-APR-11 16-APR-11 18-APR-11                                                                                                              

We expand the report to return more data. The comments explain what is being returned and how the MEASURES are coded:


SQL> 
SQL> --
SQL> -- This next query returns all rows for
SQL> -- each pattern match and returns a bit more
SQL> -- data
SQL> --
SQL> -- There are 5 MEASURES being used
SQL> --
SQL> -- The start timestamp
SQL> --
SQL> -- The bottom of the v-shaped pattern
SQL> --
SQL> -- The top of the v-shaped pattern
SQL> --
SQL> -- The row number to know which values
SQL> -- belong to a given pattern match
SQL> --
SQL> -- The classifier (UP, DOWN, STRT)
SQL> -- to know where the value lies in the
SQL> -- pattern
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3  	  PARTITION BY symbol
  4  	  ORDER BY tstamp
  5  	  MEASURES  STRT.tstamp AS start_tstamp,
  6  		    FINAL LAST(DOWN.tstamp) AS bottom_tstamp,
  7  		    FINAL LAST(UP.tstamp) AS end_tstamp,
  8  		    MATCH_NUMBER() AS match_num,
  9  		    CLASSIFIER() AS var_match
 10  	  ALL ROWS PER MATCH
 11  	  AFTER MATCH SKIP TO LAST UP
 12  	  PATTERN (STRT DOWN+ UP+)
 13  	  DEFINE
 14  	     DOWN AS DOWN.price  PREV(UP.price)
 16  	  ) MR
 17  ORDER BY MR.symbol, MR.match_num, MR.tstamp;

SYMBOL     TSTAMP    START_TST BOTTOM_TS END_TSTAM  MATCH_NUM VAR_MATCH                           PRICE                                               
---------- --------- --------- --------- --------- ---------- ------------------------------ ----------                                               
ACME       05-APR-11 05-APR-11 06-APR-11 10-APR-11          1 STRT                                   25                                               
ACME       06-APR-11 05-APR-11 06-APR-11 10-APR-11          1 DOWN                                   12                                               
ACME       07-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     15                                               
ACME       08-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     20                                               
ACME       09-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     24                                               
ACME       10-APR-11 05-APR-11 06-APR-11 10-APR-11          1 UP                                     25                                               
ACME       10-APR-11 10-APR-11 12-APR-11 13-APR-11          2 STRT                                   25                                               
ACME       11-APR-11 10-APR-11 12-APR-11 13-APR-11          2 DOWN                                   19                                               
ACME       12-APR-11 10-APR-11 12-APR-11 13-APR-11          2 DOWN                                   15                                               
ACME       13-APR-11 10-APR-11 12-APR-11 13-APR-11          2 UP                                     25                                               
ACME       14-APR-11 14-APR-11 16-APR-11 18-APR-11          3 STRT                                   25                                               

SYMBOL     TSTAMP    START_TST BOTTOM_TS END_TSTAM  MATCH_NUM VAR_MATCH                           PRICE                                               
---------- --------- --------- --------- --------- ---------- ------------------------------ ----------                                               
ACME       15-APR-11 14-APR-11 16-APR-11 18-APR-11          3 DOWN                                   14                                               
ACME       16-APR-11 14-APR-11 16-APR-11 18-APR-11          3 DOWN                                   12                                               
ACME       17-APR-11 14-APR-11 16-APR-11 18-APR-11          3 UP                                     14                                               
ACME       18-APR-11 14-APR-11 16-APR-11 18-APR-11          3 UP                                     24                                               

15 rows selected.

Going even further let’s generate a report that displayes where the row falls in the pattern (STRT, DOWN, UP), the total number of UP days in the pattern, the total number of days for each pattern, the day number in each pattern, the running price difference and the the starting price for the pattern. Remember that the FINAL keyword returns the last value for each pattern:


SQL> 
SQL> --
SQL> -- This query aggregates data and returns
SQL> -- the number of days mapped to the UP pattern
SQL> -- (up_days), the total number of days each
SQL> -- for each pattern found, the day number within
SQL> -- each match (a running count that resets when
SQL> -- a new pattern match begins) and the running
SQL> -- price difference between each row in a match
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3    PARTITION BY symbol
  4    ORDER BY tstamp
  5    MEASURES
  6  	 MATCH_NUMBER() AS match_num,
  7  	 CLASSIFIER() AS var_match,
  8  	 FINAL COUNT(UP.tstamp) AS up_days,
  9  	 FINAL COUNT(tstamp) AS total_days,
 10  	 RUNNING COUNT(tstamp) AS cnt_days,
 11  	 price - STRT.price AS price_dif
 12    ALL ROWS PER MATCH
 13    AFTER MATCH SKIP TO LAST UP
 14    PATTERN (STRT DOWN+ UP+)
 15    DEFINE
 16  	 DOWN AS DOWN.price  PREV(UP.price)
 18    ) MR
 19  ORDER BY MR.symbol, MR.match_num, MR.tstamp;

SYMBOL     TSTAMP     MATCH_NUM VAR_MATCH                         UP_DAYS TOTAL_DAYS   CNT_DAYS  PRICE_DIF      PRICE                                 
---------- --------- ---------- ------------------------------ ---------- ---------- ---------- ---------- ----------                                 
ACME       05-APR-11          1 STRT                                    4          6          1          0         25                                 
ACME       06-APR-11          1 DOWN                                    4          6          2        -13         12                                 
ACME       07-APR-11          1 UP                                      4          6          3        -10         15                                 
ACME       08-APR-11          1 UP                                      4          6          4         -5         20                                 
ACME       09-APR-11          1 UP                                      4          6          5         -1         24                                 
ACME       10-APR-11          1 UP                                      4          6          6          0         25                                 
ACME       10-APR-11          2 STRT                                    1          4          1          0         25                                 
ACME       11-APR-11          2 DOWN                                    1          4          2         -6         19                                 
ACME       12-APR-11          2 DOWN                                    1          4          3        -10         15                                 
ACME       13-APR-11          2 UP                                      1          4          4          0         25                                 
ACME       14-APR-11          3 STRT                                    2          5          1          0         25                                 

SYMBOL     TSTAMP     MATCH_NUM VAR_MATCH                         UP_DAYS TOTAL_DAYS   CNT_DAYS  PRICE_DIF      PRICE                                 
---------- --------- ---------- ------------------------------ ---------- ---------- ---------- ---------- ----------                                 
ACME       15-APR-11          3 DOWN                                    2          5          2        -11         14                                 
ACME       16-APR-11          3 DOWN                                    2          5          3        -13         12                                 
ACME       17-APR-11          3 UP                                      2          5          4        -11         14                                 
ACME       18-APR-11          3 UP                                      2          5          5         -1         24                                 

15 rows selected.

Remember that the original pattern to match was a V-shape, and there were three. Changing the query to look for W shapes, where the end of the previous V shape is the start of the next V shape, we find only one pattern in the data:


SQL> 
SQL> --
SQL> -- This query looks for W shapes, of which
SQL> -- there is one in the data
SQL> --
SQL> -- The query was changed to look for two
SQL> -- consecutive down/up patterns where the
SQL> -- ending value of the first down/up is the
SQL> -- starting value of the next down/up
SQL> -- pattern
SQL> --
SQL> SELECT *
  2  FROM Ticker MATCH_RECOGNIZE (
  3    PARTITION BY symbol
  4    ORDER BY tstamp
  5    MEASURES
  6  	 MATCH_NUMBER() AS match_num,
  7  	 CLASSIFIER()  AS  var_match,
  8  	 STRT.tstamp AS start_tstamp,
  9  	 FINAL LAST(UP.tstamp) AS end_tstamp
 10    ALL ROWS PER MATCH
 11    AFTER MATCH SKIP TO LAST UP
 12    PATTERN (STRT DOWN+ UP+ DOWN+ UP+)
 13    DEFINE
 14  	 DOWN AS DOWN.price  PREV(UP.price)
 16    ) MR
 17  ORDER BY MR.symbol, MR.match_num, MR.tstamp;

SYMBOL     TSTAMP     MATCH_NUM VAR_MATCH                      START_TST END_TSTAM      PRICE                                                         
---------- --------- ---------- ------------------------------ --------- --------- ----------                                                         
ACME       05-APR-11          1 STRT                           05-APR-11 13-APR-11         25                                                         
ACME       06-APR-11          1 DOWN                           05-APR-11 13-APR-11         12                                                         
ACME       07-APR-11          1 UP                             05-APR-11 13-APR-11         15                                                         
ACME       08-APR-11          1 UP                             05-APR-11 13-APR-11         20                                                         
ACME       09-APR-11          1 UP                             05-APR-11 13-APR-11         24                                                         
ACME       10-APR-11          1 UP                             05-APR-11 13-APR-11         25                                                         
ACME       11-APR-11          1 DOWN                           05-APR-11 13-APR-11         19                                                         
ACME       12-APR-11          1 DOWN                           05-APR-11 13-APR-11         15                                                         
ACME       13-APR-11          1 UP                             05-APR-11 13-APR-11         25                                                         

9 rows selected.

SQL> 

Why weren’t there two W-shaped patterns? The last V-shape didn’t start at the same point as the end of the prior V-shape so it didn’t make a second W-shaped pattern in the data.

MATCH_RECOGNIZE is a powerful and useful additon to Oracle 12.1.0.x as it can enable you to define and report patterns in your data. It’s ideal for data warehousing applications and as an analytical tool to report repeating patterns in user data. Everyone may not get benefit from MATCH_RECOGNIZE but, for those who do, it’s a welcome change to an already exceptional RDBMS.

Sometimes the impossible is possible.

October 11, 2016

“I Appreciate You”

Filed under: General — dfitzjarrell @ 13:55

Today, October 11, 2016, is OTN Appreciation Day. In honor of that I wanted to post my favorite offering from Oracle — Exadata. Yes, I’m biased but even at that the Exadata platform is an outstanding offering from Oracle. Let’s look at some of the ways Exadata provides performance and value.

Exadata is an engineered system, meaning that all of the components are selected and configured to work together to provide performance. Part of that performance comes from features like Predicate Filtering, Column Projection and Storage Indexes. Add to that the Smart Flash Cache (which has been steadily improved since its inception), faster disks and Infiniband and Exadata truly is a winner. And the benefits are not simply numbers on a piece of paper. I have personally witnessed overnight batch payroll processing for one customer reduced from 12 hours to 4 hours and the only change was migrating the application to Exadata. No other tweaks or adjustments were made to provide that gain. In my book there are examples, run in real time on Exadata machines, proving the performance gain that can be realized. Exadata truly is an awesome product.

I won’t go into great detail since I have already extolled the virtues of Exadata on this blog here, here, to some extent here, here, and here. If you’re interested give one (or more) of those links a visit.

So, thank you, OTN and thank you, Oracle. And the pleasure has been all mine.

September 26, 2016

“My Kingdom For A Histogram”

Filed under: General — dfitzjarrell @ 09:55

"Don't you know anything at all about numbers?"
"Well, I don't think they're very important," snapped Milo, too embarrassed to admit the truth.
"NOT IMPORTANT!" roared the Dodecahedron, turning red with fury.  "Could you have tea for two without the two, or three blind mice
without the three? Would there be four corners of the earth if there weren't a four? And how would you sail the seven seas without
a seven?"
"All I meant was..." began Milo, but the Dodecahedron, overcome with emotion and shouting furiously, carried right on.
"If you had high hopes, how would you know how high they were? And did you know that narrow escapes come in all different widths? 
Would you travel the whole wide world without ever knowing how wide it was? And how could you do anything at long last," he concluded,
waving his arms over his head, "without knowing how long the last was? Why, numbers are the most beautiful and valuable things in the
world.  Just follow me and I'll show you." He turned on his heel and stalked off into the cave.
-- Norton Juster, The Phantom Tollbooth

In one of the current Oracle forums an “old saw” was resurrected, namely that a single-column primary key has no need of a histogram. This presumes that the primary key has no ‘holes’ in the data and has a continuous range of data. The reality of that situation is such primary keys are rarely, if ever, unbroken and the range of data may not be continuous. With that last sentence in mind it’s not strange to consider a histogram on a single-column primary key. And depending on the release of Oracle in use that histogram may be an old standard (a height-balanced histogram) or a new histogram offered in 12c, the hybrid histogram. Let’s look at an example how this plays out in 11.2.0.4 and in 12.1.0.2. First, though, let’s look at that new Hybrid histogram.

The Hybrid histogram is a new type of histogram that uses a number of buckets less than the number of distinct values in the table data; it also determines, and utilizes, the repeat frequency of the endpoint values. The bucket size can be adjusted from the initial size based on endpoint values; there are two special buckets which will not change in size, and these are the ‘boundary’ buckets, bucket 0 and the last bucket of the histogram. The Hybrid histogram is based on four apparent ‘rules’, which are:


	1) a value should not be found in more than one bucket
	2) the bucket size is allowed to be extended in order to contain all instances of the same distinct value
	3) adjusted bucket size cannot be less than the original size (not applicable at either end of the data set)
	4) the original number of buckets should not be reduced

One way to create a hybrid histogram is shown in the example below, part of which was provided by Jonathan Lewis, the remainder of this example was adapted from work done by Mohammed Houri. We begin:


SQL>
SQL> --
SQL> -- Create and populate a table so we can
SQL> -- build a hybrid histogram
SQL> --
SQL> -- Thanks to Jonathan Lewis for generating
SQL> -- this table and data set
SQL> --
SQL> create table t1 (id number, n1 number);

Table created.

Elapsed: 00:00:00.01
SQL>
SQL> @InsT1.sql

Elapsed: 00:00:00.00
SQL>

Choosing a number of buckets less than the number of distinct values in the table is one way to create a hybrid histogram:


SQL>
SQL> --
SQL> -- Return the number of distinct values
SQL> -- in column n1
SQL> --
SQL>
SQL> select count(distinct n1) from t1;

COUNT(DISTINCTN1)
-----------------
               37

Elapsed: 00:00:00.00
SQL>

20 buckets will be used for our histogram and use DBMS_STATS to create it:


SQL>
SQL> --
SQL> -- Create a hybrid histogram on column n1
SQL> -- by setting the number of buckets less
SQL> -- than the number of distinct values
SQL> --
SQL> BEGIN
  2  	dbms_stats.gather_table_stats
  3  	  (user, 't1', method_opt => 'for columns n1 size 20');
  4  END;
  5  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.43
SQL>

During the creation of this histogram Oracle places the column values in order to find values that occur in more than one bucket. When such values are found new bucket sizes are computed for all but the boundary buckets, in this case bucket 0 and bucket 19:


SQL>
SQL> --
SQL> -- Four rules exist to create a hybrid histogram, and the
SQL> -- first two are:
SQL> --
SQL> -- a value should not be found in more than one bucket
SQL> -- a bucket size is allowed to be extended in order to contain
SQL> --   all instances of the same value
SQL> --
SQL> -- Compute the 'new' bucket size based on the data and
SQL> -- endpoint values
SQL> --
SQL> -- To illustrate how the bucket size adjustment is
SQL> -- calculated let's look at the value 13 in the data
SQL> --
SQL> -- Oracle 'walks' the ordered list of values until
SQL> -- it finds 13 in two buckets
SQL> --
SQL> -- Oracle extends the bucket where 13 is first found (bucket n)
SQL> -- and moves all other instances of 13 (from bucket n+1)
SQL> -- to this bucket (n)
SQL> --
SQL> -- Moving all instances of 13 to a single bucket
SQL> -- also makes 13 an endpoint value allowing Oracle
SQL> -- to compute the repeat count of that value
SQL> --
SQL> -- This continues on through the values so that
SQL> -- Oracle can place all occurrences of a distinct
SQL> -- value so they don't populate more than one bucket
SQL> --
SQL> SELECT
  2  	     (row_number() over(order by ept_nbr)-1) NumBucket
  3  	     ,ept_nbr
  4  	     ,ept_act_val
  5  	     ,rpt_cnt
  6  	     ,ept_nbr - (lag(ept_nbr,1,0) over(order by ept_nbr)) "new bucket size"
  7  	     ,bucket_size "original bucket_size"
  8  	 FROM
  9  	     (SELECT
 10  		  ah.endpoint_number		ept_nbr
 11  		 ,ah.endpoint_actual_value	ept_act_val
 12  		 ,lag(ah.endpoint_number,1,0) over(order by ah.endpoint_number) ept_lag
 13  		 ,ah.endpoint_repeat_count rpt_cnt
 14  		 ,at.sample_size/at.num_buckets bucket_size
 15  	      FROM
 16  		 user_tab_histograms	  ah
 17  		,user_tab_col_statistics  at
 18  	      WHERE ah.table_name  = at.table_name
 19  	      AND ah.column_name = at.column_name
 20  	      AND ah.table_name  = 'T1'
 21  	      AND ah.column_name = 'N1'
 22  	    ) ORDER BY ept_nbr;

 NUMBUCKET    EPT_NBR EPT_ACT_VAL     RPT_CNT new bucket size original bucket_size
---------- ---------- ------------ ---------- --------------- --------------------
         0          1 8                     1               1                    5
         1          6 13                    3               5                    5
         2         12 18                    2               6                    5
         3         20 20                    5               8                    5
         4         26 23                    2               6                    5
         5         32 26                    3               6                    5
         6         38 27                    6               6                    5
         7         44 28                    6               6                    5
         8         50 29                    6               6                    5
         9         58 31                    5               8                    5
        10         69 33                    8              11                    5

 NUMBUCKET    EPT_NBR EPT_ACT_VAL     RPT_CNT new bucket size original bucket_size
---------- ---------- ------------ ---------- --------------- --------------------
        11         79 35                    7              10                    5
        12         86 38                    5               7                    5
        13         90 41                    1               4                    5
        14         92 42                    2               2                    5
        15         95 43                    3               3                    5
        16         96 44                    1               1                    5
        17         97 45                    1               1                    5
        18         98 46                    1               1                    5
        19        100 59                    1               2                    5

20 rows selected.

Elapsed: 00:00:00.21
SQL>

Notice that for buckets 1-12 the bucket size increased; for the remaining non-boundary buckets the bucket size should decrease. But according to Rule 3 no bucket can be adjusted to a size smaller than the original bucket size so those buckets remain unadjusted. The bucket size computation in the query results below is based on the sample size divided by the number of buckets and, thus, displays a constant value for all buckets:


SQL>
SQL> --
SQL> -- Given that boundary values are the exception computing the new
SQL> -- bucket sizes shows that for bucket numbers 1-12 the sizes are
SQL> -- the same or larger than the originally computed size (5)
SQL> --
SQL> -- All remaining non-endpoint buckets compute to a smaller size
SQL> -- than the original
SQL> --
SQL> -- Now the third rule appears:
SQL> --
SQL> -- No non-endpoint bucket can be smaller than the originally
SQL> -- computed size
SQL> --
SQL> select
  2  	      uth.endpoint_number
  3  	     ,uth.endpoint_actual_value
  4  	     ,uth.endpoint_repeat_count
  5  	     ,ucs.sample_size/ucs.num_buckets bucket_size
  6  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
  7  	 from
  8  	     user_tab_histograms uth
  9  	    ,user_tab_col_statistics ucs
 10  	where
 11  	     uth.table_name   = ucs.table_name
 12  	 and uth.column_name   = ucs.column_name
 13  	 and uth.table_name    = 'T1'
 14  	 and uth.column_name   = 'N1'
 15  	order by uth.endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULARITY
--------------- ------------ --------------------- ----------- ----------
              1 8                                1           5         -4
              6 13                               3           5         -2
             12 18                               2           5         -3
             20 20                               5           5          0
             26 23                               2           5         -3
             32 26                               3           5         -2
             38 27                               6           5          1
             44 28                               6           5          1
             50 29                               6           5          1
             58 31                               5           5          0
             69 33                               8           5          3

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULARITY
--------------- ------------ --------------------- ----------- ----------
             79 35                               7           5          2
             86 38                               5           5          0
             90 41                               1           5         -4
             92 42                               2           5         -3
             95 43                               3           5         -2
             96 44                               1           5         -4
             97 45                               1           5         -4
             98 46                               1           5         -4
            100 59                               1           5         -4

20 rows selected.

Elapsed: 00:00:00.19
SQL>

For a Hybrid histogram three types of column value exist to base cardinality estimates upon: popular values, non-popular values which are endpoint and non-popular values that are not an endpoint. Let’s look at how Oracle estimates the cardinality for each case, starting with popular values:


SQL>
SQL> --
SQL> -- Display the endpoint, value, repeat count,
SQL> -- bucket size and the 'popularity'
SQL> --
SQL> -- Popularity is determined by computing the
SQL> -- difference between the frequency of appearance
SQL> -- (repeat count) and the bucket size (sample size/number of buckets)
SQL> -- Positive values are considered 'popular'
SQL> --
SQL> select
  2  	      endpoint_number
  3  	     ,endpoint_actual_value
  4  	     ,endpoint_repeat_count
  5  	     ,bucket_size
  6  	     ,case when Popularity > 0 then 'Pop'
  7  			else 'Non-Pop'
  8  	       end Popularity
  9  	 from
 10  	(
 11  	  select
 12  	      uth.endpoint_number
 13  	     ,uth.endpoint_actual_value
 14  	     ,uth.endpoint_repeat_count
 15  	     ,ucs.sample_size/ucs.num_buckets bucket_size
 16  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
 17  	 from
 18  	     user_tab_histograms uth
 19  	    ,user_tab_col_statistics ucs
 20  	where
 21  	     uth.table_name   = ucs.table_name
 22  	 and uth.column_name   = ucs.column_name
 23  	 and uth.table_name    = 'T1'
 24  	 and uth.column_name   = 'N1'
 25  	 )
 26  	order by endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
--------------- ------------ --------------------- ----------- -------
              1 8                                1           5 Non-Pop
              6 13                               3           5 Non-Pop
             12 18                               2           5 Non-Pop
             20 20                               5           5 Non-Pop
             26 23                               2           5 Non-Pop
             32 26                               3           5 Non-Pop
             38 27                               6           5 Pop
             44 28                               6           5 Pop
             50 29                               6           5 Pop
             58 31                               5           5 Non-Pop
             69 33                               8           5 Pop

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
--------------- ------------ --------------------- ----------- -------
             79 35                               7           5 Pop
             86 38                               5           5 Non-Pop
             90 41                               1           5 Non-Pop
             92 42                               2           5 Non-Pop
             95 43                               3           5 Non-Pop
             96 44                               1           5 Non-Pop
             97 45                               1           5 Non-Pop
             98 46                               1           5 Non-Pop
            100 59                               1           5 Non-Pop

20 rows selected.

Elapsed: 00:00:00.19
SQL>

Let’s return only the popular values and work through the cardinality estimate Oracle generates:


SQL> --
SQL> -- Using the popularity Oracle estimates the cardinality
SQL> -- by considering the following three types of values:
SQL> --
SQL> -- popular value
SQL> -- non-popular value with an endpoint number
SQL> -- non-popular value not present in the histogram table
SQL> --
SQL> -- Case 1: Popular values
SQL> --
SQL> -- Display the 'popular' values in this data set
SQL> --
SQL> select
  2  	      endpoint_actual_value
  3  	     ,endpoint_repeat_count
  4  	     ,bucket_size
  5  	     ,Popularity
  6  from
  7  (
  8    select
  9  	      endpoint_number
 10  	     ,endpoint_actual_value
 11  	     ,endpoint_repeat_count
 12  	     ,bucket_size
 13  	     ,case when Popularity > 0 then 'Pop'
 14  			else 'Non-Pop'
 15  	       end Popularity
 16  	 from
 17  	(
 18  	  select
 19  	      uth.endpoint_number
 20  	     ,uth.endpoint_actual_value
 21  	     ,uth.endpoint_repeat_count
 22  	     ,ucs.sample_size/ucs.num_buckets bucket_size
 23  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
 24  	 from
 25  	     user_tab_histograms uth
 26  	    ,user_tab_col_statistics ucs
 27  	where
 28  	     uth.table_name   = ucs.table_name
 29  	 and uth.column_name   = ucs.column_name
 30  	 and uth.table_name    = 'T1'
 31  	 and uth.column_name   = 'N1'
 32  	 )
 33    )
 34  where Popularity = 'Pop';

ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
27                               6           5 Pop
28                               6           5 Pop
29                               6           5 Pop
33                               8           5 Pop
35                               7           5 Pop

Elapsed: 00:00:00.20
SQL>

Looking at one of these popular values (33) let’s return the cardinality estimate Oracle calculated:


SQL> --
SQL> -- Using explain plan the cardinality estimation
SQL> -- can be displayed for one of the 'popular' values
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 33;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> --
SQL> -- The cardinality estimate displayed below was
SQL> -- calculated using the following formula:
SQL> --
SQL> -- E-Rows = ENDPOINT_REPEAT_COUNT * num_rows/sample_size
SQL> -- E-Rows = 8 * 100/100 = 8
SQL> --
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     8 |    24 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   2 - filter("N1"=33)

14 rows selected.

Elapsed: 00:00:00.03
SQL>

From the formula found in the above comments our calculated cardinality matches with that provided by the optimizer, so our understanding of the Hybrid histogram seems sound. To continue that verification we consider the case where we have a non-popular value that is an endpoint. The formula for that cardinality calculation is: E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/sample_size). We calculate the NewDensity with a query based on the work of Alberto Dell’Era. First we return the non-popular values that are endpoints:


SQL> --
SQL> -- Case 2: Non-popular values with an endpoint
SQL> --
SQL> -- First return all non-popular values that are an
SQL> -- endpoint
SQL> --
SQL> select
  2  	      endpoint_actual_value
  3  	     ,endpoint_repeat_count
  4  	     ,bucket_size
  5  	     ,Popularity
  6  from
  7  (
  8    select
  9  	      endpoint_number
 10  	     ,endpoint_actual_value
 11  	     ,endpoint_repeat_count
 12  	     ,bucket_size
 13  	     ,case when Popularity > 0 then 'Pop'
 14  			else 'Non-Pop'
 15  	       end Popularity
 16  	 from
 17  	(
 18  	  select
 19  	      uth.endpoint_number
 20  	     ,uth.endpoint_actual_value
 21  	     ,uth.endpoint_repeat_count
 22  	     ,ucs.sample_size/ucs.num_buckets bucket_size
 23  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
 24  	 from
 25  	     user_tab_histograms uth
 26  	    ,user_tab_col_statistics ucs
 27  	where
 28  	     uth.table_name   = ucs.table_name
 29  	 and uth.column_name   = ucs.column_name
 30  	 and uth.table_name    = 'T1'
 31  	 and uth.column_name   = 'N1'
 32  	 )
 33    )
 34  where Popularity = 'Non-Pop';

ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
8                                1           5 Non-Pop
13                               3           5 Non-Pop
18                               2           5 Non-Pop
20                               5           5 Non-Pop
23                               2           5 Non-Pop
26                               3           5 Non-Pop
31                               5           5 Non-Pop
38                               5           5 Non-Pop
41                               1           5 Non-Pop
42                               2           5 Non-Pop
43                               3           5 Non-Pop

ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
44                               1           5 Non-Pop
45                               1           5 Non-Pop
46                               1           5 Non-Pop
59                               1           5 Non-Pop

15 rows selected.

Elapsed: 00:00:00.20
SQL>

Let’s return Oracle’s estimated cardinality for one of these values (45):


SQL> --
SQL> -- Use explain plan to return the estimated cardinality
SQL> --
SQL> -- In this case the estimated cardinality is computed as:
SQL> --
SQL> -- E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/sample_size)
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 45;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |     6 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------

   2 - filter("N1"=45)

14 rows selected.

Elapsed: 00:00:00.02
SQL>

Now let’s use our query to return the value for NewDensity:


SQL> --
SQL> -- The NewDensity is computed by an internal algorithm but
SQL> -- we can get a reliable density using the following query
SQL> -- based on work done by Alberto Dell'Era:
SQL> --
SQL> SELECT
  2  	      BktCnt
  3  	     ,PopBktCnt
  4  	     ,PopValCnt
  5  	     ,NDV
  6  	     ,pop_bucketSize
  7  	     ,trunc(((BktCnt-PopBktCnt)/BktCnt)/(NDV-PopValCnt),10) NewDensity
  8  	  FROM
  9  	     (SELECT
 10  		COUNT(1) PopValCnt,
 11  		SUM(endpoint_repeat_count) PopBktCnt,
 12  		ndv,
 13  		BktCnt,
 14  		pop_bucketSize
 15  	      FROM
 16  	       (SELECT
 17  		 (sample_size - num_nulls) BktCnt,
 18  		 num_distinct ndv,
 19  		 num_buckets,
 20  		 density OldDensity,
 21  		 (sample_size-num_nulls)/num_buckets pop_bucketSize
 22  	       FROM user_tab_col_statistics
 23  	       WHERE
 24  		   table_name  = 'T1'
 25  	       AND column_name = 'N1'
 26  	       ),
 27  	       user_histograms
 28  	     WHERE table_name	      = 'T1'
 29  	     AND column_name	      = 'N1'
 30  	     AND endpoint_repeat_count> pop_bucketSize
 31  	     GROUP BY ndv,
 32  	       BktCnt,
 33  	       pop_bucketSize
 34  	     );

    BKTCNT  POPBKTCNT  POPVALCNT        NDV POP_BUCKETSIZE NEWDENSITY
---------- ---------- ---------- ---------- -------------- ----------
       100         33          5         37              5   .0209375

Elapsed: 00:00:00.19
SQL>

We now see if our calculation returned a valid value by ‘plugging’ it into the following equation: E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/ sample_size). The cardinality estimate we compute is:


SQL> --
SQL> -- 'Plugging' this new density into the equation above produces:
SQL> --
SQL> -- E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/ sample_size)
SQL> -- E-Rows = 100 * greatest (.0209375, 1/100) = 2.09375 ~ 2
SQL> --
SQL> -- which matches the value returned by Oracle
SQL> --
SQL>
SQL> --
SQL> -- To validate these findings use a different non-popular value
SQL> -- that's also an endpoint:
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 43;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     3 |     9 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   2 - filter("N1"=43)

14 rows selected.

Elapsed: 00:00:00.02
SQL>

Oracle returned 3 as the estimated cardinality and so did our calculation (E-Rows = 100 * greatest (.0209375, 3/100) = 3), more proof that our concept of the Hybrid histogram is correct. One final test will prove how well we understand this histogram, considering non-popular values that are not endpoints. Using our calculated value of NewDensity in this equation: E-Rows = num_rows * NewDensity gives us the following result: E-Rows = 100 * .0209375 = 2.09375 ~ 2. Time to see if our calculated value matches that which Oracle has estimated:


SQL>
SQL> --
SQL> -- Case 3: Non-popular value without an endpoint number
SQL> --
SQL> -- This calculation is fairly simple and straightforward:
SQL> --
SQL> -- E-Rows = num_rows * NewDensity = 100 * .0209375 = 2.09375 ~ 2
SQL> --
SQL>
SQL> explain plan for select count(1) from t1 where n1 = 17;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |     6 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   2 - filter("N1"=17)

14 rows selected.

Elapsed: 00:00:00.02
SQL>

Our calculated value, as evidenced by the Rows value in the execution plan, matches Oracle’s estimate, giving further evidence that our concept of the Hybrid histogram is sound.

As mentioned earlier in this post Hybrid histograms can exhibit instability in the form of endpoint changes based on the changing frequency of values as inserts are executed against the table. To illustrate this let’s add another 16 to the data and see what happens:


SQL>
SQL> --
SQL> -- Hybrid histograms can display instability in terms of
SQL> -- endpoints and bucket size
SQL> --
SQL> -- To illustrate this we add 16 to the data, increasing
SQL> -- the repeat count to 3, causing this newly added 16
SQL> -- to shift into the previous bucket leaving 17
SQL> -- as the new endpoint
SQL> --
SQL> insert into t1 values (2, 16);

1 row created.

Elapsed: 00:00:00.00
SQL>
SQL> BEGIN
  2  	dbms_stats.gather_table_stats
  3  	  (user, 't1', method_opt => 'for all columns size 20');
  4  END;
  5  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.08
SQL>
SQL> select
  2  	      uth.endpoint_number
  3  	     ,uth.endpoint_actual_value
  4  	     ,uth.endpoint_repeat_count
  5  	 from
  6  	    user_tab_histograms uth
  7  	    ,user_tables ut
  8  	    ,user_tab_col_statistics ucs
  9  	where
 10  	     uth.table_name    = 'T1'
 11  	    and uth.column_name   = 'N1'
 12  	    and uth.table_name	  = ut.table_name
 13  	    and ut.table_name	  = ucs.table_name
 14  	    and uth.column_name   = ucs.column_name
 15  	   order by uth.endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
              1 8                                1
              6 13                               3
             11 17                               1 <----
             16 19                               3
             21 20                               5
             27 23                               2
             33 26                               3
             39 27                               6
             45 28                               6
             51 29                               6
             59 31                               5

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
             70 33                               8
             80 35                               7
             87 38                               5
             91 41                               1
             96 43                               3
             97 44                               1
             98 45                               1
             99 46                               1
            101 59                               1

20 rows selected.

Elapsed: 00:00:00.24

The endpoint value of 18 has been replaced by the value 17, because of Rule 2, that no value can span buckets. Before the insert the histogram bucket looked like this: 8 12 12 13 13 13|15 16 16 17 18 18|19 19 19 20 20 20 20 20| … After the insert the ordered value list looked like this: 8 12 12 13 13|13 15 16 16 16|17 18 18 19 19 19 … After Oracle ‘walks’ through the list to consolidate like values the buckets look like this: 8 12 12 13 13 13| 15 16 16 16 17 |18 18 19 19 19… giving a new endpoint number of 17. If we add yet another 16 to the data set we see:


SQL>
SQL> --
SQL> -- Add one more 16 and the endpoint shifts to 16
SQL> --
SQL> insert into t1 values (3, 16);

1 row created.

Elapsed: 00:00:00.00
SQL>
SQL> BEGIN
  2  	   dbms_stats.gather_table_stats
  3  	       (user, 't1', method_opt => 'for all columns size 20');
  4  	 END;
  5  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> select
  2  	      uth.endpoint_number
  3  	     ,uth.endpoint_actual_value
  4  	     ,uth.endpoint_repeat_count
  5  	 from
  6  	    user_tab_histograms uth
  7  	    ,user_tables ut
  8  	    ,user_tab_col_statistics ucs
  9  	where
 10  	     uth.table_name    = 'T1'
 11  	    and uth.column_name   = 'N1'
 12  	    and uth.table_name	  = ut.table_name
 13  	    and ut.table_name	  = ucs.table_name
 14  	    and uth.column_name   = ucs.column_name
 15  	   order by uth.endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
              1 8                                1
              6 13                               3
             11 16                               4 

The endpoint has shifted again and is now 16.

Now that we know what a Hybrid histogram is and what it can do let’s look at another way to generate such a histogram in 12.1.0.2, and this goes back (finally!) to that “old saw” mentioned earlier. Let’s create a table with a single-column primary key and see what Oracle does with it when we run DBMS_STATS.

This example, again by Jonathan Lewis, creates a table and populates the id column (the eventual primary key for the table) in a rather interesting way:


SQL> 
SQL> create table t1
  2  as
  3  with generator as (
  4  	     select  --+ materialize
  5  		     rownum id
  6  	     from dual
  7  	     connect by
  8  		     level  column maxid1 new_value max1
SQL> column maxid2 new_value max2
SQL> column maxid3 new_value max3
SQL> 
SQL> select min(id), max(id) maxid1
  2  from t1;

   MIN(ID)     MAXID1                                                           
---------- ----------                                                           
         1      10000                                                           

SQL> 
SQL> insert into t1 select id + 1e6 from t1;

10000 rows created.

SQL> 
SQL> select min(id), max(id) maxid2
  2  from t1;

   MIN(ID)     MAXID2                                                           
---------- ----------                                                           
         1    1010000                                                           

SQL> 
SQL> insert into t1 select id + 1e7 from t1;

20000 rows created.

SQL> 
SQL> select min(id), max(id) maxid3
  2  from t1;

   MIN(ID)     MAXID3                                                           
---------- ----------                                                           
         1   11010000                                                           

SQL> 

Notice the values in the column; the values go from 1 to 10000 then jump to 1000001, continuing to 1010000, and jump again to 11000001 continuing on to 11010000. The breaks in the value sequence are what make this data set qualify for a Hybrid histogram in 12.1.0.2; in 11.2.0.4 a height-balanced histogram is created. Let’s add the primary key to the table:

`


SQL> 
SQL> alter table t1 add constraint t1_pk primary key(id);

Table altered.

Now let’s query for data that isn’t in the table:


SQL> 
SQL> select
  2  	     /*+ dynamic_sampling(0) */
  3  	     *
  4  from    t1
  5  where
  6  	     id between 12000 and 13000
  7  ;

no rows selected

SQL> 

Doing nothing special we generate statistics on the table; Oracle will automatically determine that the PK column needs a histogram and create it:


SQL> begin
  2  	     dbms_stats.gather_table_stats(
  3  		     ownname	      => user,
  4  		     tabname	      =>'T1'
  5  	     );
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> 
SQL> column column_name format a35
SQL> set linesize 150
SQL> 
SQL> select
  2  	     column_name, sample_size,
  3  	     num_distinct, num_nulls, density,
  4  	     histogram, num_buckets
  5  from
  6  	     user_tab_cols
  7  where
  8  	     table_name = 'T1'
  9  order by
 10  	     column_name
 11  ;

COLUMN_NAME                         SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM       NUM_BUCKETS                                        
----------------------------------- ----------- ------------ ---------- ---------- --------------- -----------                                        
ID                                         5502        40000          0    .000025 HYBRID                  254                                        

SQL> 

To prove that 11.2.0.4 will create a height-balanced histogram let’s run the example there:


SQL> 
SQL> create table t1
  2  as
  3  with generator as (
  4  	     select  --+ materialize
  5  		     rownum id
  6  	     from dual
  7  	     connect by
  8  		     level 'T1'
  5  	     );
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> 
SQL> column column_name format a35
SQL> set linesize 150
SQL> 
SQL> select
  2  	     column_name, sample_size,
  3  	     num_distinct, num_nulls, density,
  4  	     histogram, num_buckets
  5  from
  6  	     user_tab_cols
  7  where
  8  	     table_name = 'T1'
  9  order by
 10  	     column_name
 11  ;

COLUMN_NAME                         SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM       NUM_BUCKETS                                        
----------------------------------- ----------- ------------ ---------- ---------- --------------- -----------                                        
ID                                         5509        40000          0 .000024959 HEIGHT BALANCED         254                                        

SQL> 

So it would appear that having a histogram on a single-column primary key isn’t so far-fetched, after all. And we didn’t have to do anything special to get it created.

It all comes down to numbers.

September 6, 2016

A Bloomin’ Mess

Filed under: General — dfitzjarrell @ 09:03

'Why is it,' he said quietly, 'that quite often even the things which are correct just don't seem to be right?'
-- Norton Juster, The Phantom Tollbooth

For some time now Oracle has used bloom filters to speed the processing of select statements; unfortunately in 11.2.0.4 converting a select statement that uses a bloom filter into an ‘insert into … select’ form causes Oracle to abandon the bloom filter altogether. Let’s look at an example provided by Jonathan Lewis that illustrates this; it will be run in 11.2.0.4 and then in 12.1.0.2, and there are differences in how the two versions generate plans for the insert statement.

This is not unusual according to Oracle support as the following bugs are listed to produce this or similar behavior:


Bug 20112932 : BLOOM FILTER IS NOT CHOOSED WHEN EXECUTING INSERT SELECT STATEMENT
11.2.0.4 - Nov 2014: "not a bug".
This cites three more documents:
  Bug 13801198  BLOOM PRUNING/FILTER NOT USED IN DML STATEMENTS 
    11.2.0.3 - March 2012 - Duplicate, base bug: 18949550
    18949550: MERGE QUERY NOT USING BLOOM FILTER
    11.2.0.3 - June 2014: fixed in 12.2 (no patches yet)

  Bug 14325392  BLOOM FILTER NOT SUPPORTED IN PARTITION WISE JOIN 
    11.2.0.3 - July 2012 - fixed in 12.1 
    (but I think this is a PX filter, not a partition filter)

  Bug 17716301  BLOOM FILTER NOT USED FOR OUTER JOIN

    Aug 2014 - hidden bug note. Patches available for 11.2.0.3
    Note 1919508.1 

It appears that bug 14325392 is the cause for this, and it is listed as fixed in 12.1. Although not completely addressed (as proven later on in this post) it is addressed to a degree in 12.1.0.2.

First a partitioned table is created, with four partitions, containing 100,000 rows, a 4-row non-partitioned table is created as a driving table for the hash join and an empty table is created to insert data into using the select statement as an ‘insert into … select’ construct:


SQL> create table pt_hash (
  2  	     id,
  3  	     grp,
  4  	     small_vc,
  5  	     padding
  6  )
  7  nologging
  8  pctfree 90 pctused 10
  9  partition by hash(grp)
 10  (
 11  	     partition p1,
 12  	     partition p2,
 13  	     partition p3,
 14  	     partition p4
 15  )
 16  as
 17  with generator as (
 18  	     select  --+ materialize
 19  		     rownum id
 20  	     from dual
 21  	     connect by
 22  		     level  create table t1
  2  as
  3  select
  4  	     rownum  id,
  5  	     rownum  n1,
  6  	     lpad(rownum,10) v1,
  7  	     rpad('x',100) padding
  8  from
  9  	     dual
 10  connect by
 11  	     rownum  
SQL> create table target (
  2  	     n1      number,
  3  	     id      number
  4  )
  5  ;

Table created.

SQL> 
SQL> set serveroutput off linesize 168 pagesize 0
SQL> alter session set statistics_level = all;

Session altered.

SQL> 

The stage is set; let’s run the basic select and see what plan 11.2.0.4 generates:


SQL> select
  2  	     /*+
  3  		     leading(t1 h1)
  4  		     use_hash(h1)    no_swap_join_inputs(h1)
  5  	     */
  6  	     t1.n1,
  7  	     h1.id
  8  from
  9  	     t1,
 10  	     pt_hash h1
 11  where
 12  	     t1.id between 2 and 3
 13  and     h1.grp = t1.n1
 14  and     h1.id  
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  3rm369uarustm, child number 0
-------------------------------------
select         /*+                 leading(t1 h1)
use_hash(h1)    no_swap_join_inputs(h1)         */         t1.n1,
  h1.id from         t1,         pt_hash h1 where         t1.id between
2 and 3 and     h1.grp = t1.n1 and     h1.id =2 AND "T1"."ID"<=3))
   5 - filter("H1"."ID" 

The hash join produces a bloom filter thus reducing the partition access to only the two partitions we need to generate the results. Converting this same select statement into an insert changes the plan:


SQL> insert into target(n1, id)
  2  select
  3  	     /*+
  4  		     ordered
  5  		     use_hash(h1)    no_swap_join_inputs(h1)
  6  	     */
  7  	     t1.id,
  8  	     h1.id
  9  from
 10  	     t1,
 11  	     pt_hash h1
 12  where
 13  	     t1.id between 2 and 3
 14  and     h1.grp = t1.n1
 15  and     h1.id  
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  g693aju7gwm9n, child number 0
-------------------------------------
insert into target(n1, id) select         /*+                 ordered
              use_hash(h1)    no_swap_join_inputs(h1)         */
 t1.id,         h1.id from         t1,         pt_hash h1 where
t1.id between 2 and 3 and     h1.grp = t1.n1 and     h1.id =2 AND "T1"."ID"<=3))
   5 - filter("H1"."ID" 

The bloom filter mysteriously vanishes, and all four partitions are accessed to produce the same results as the stand-alone query produced earlier. Proving this isn’t ‘one-off’ behaviour witn 11.2.0.4 the transaction is rolled back and the same insert statement is run a second time:


SQL> rollback;

Rollback complete.

SQL> 
SQL> insert /*+ append */ into target(n1, id)
  2  select
  3  	     /*+
  4  		     ordered
  5  		     use_hash(h1)    no_swap_join_inputs(h1)
  6  	     */
  7  	     t1.id,
  8  	     h1.id
  9  from
 10  	     t1,
 11  	     pt_hash h1
 12  where
 13  	     t1.id between 2 and 3
 14  and     h1.grp = t1.n1
 15  and     h1.id  
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  10jb8cf3tx39p, child number 0
-------------------------------------
insert /*+ append */ into target(n1, id) select         /*+
    ordered                 use_hash(h1)    no_swap_join_inputs(h1)
    */         t1.id,         h1.id from         t1,         pt_hash h1
where         t1.id between 2 and 3 and     h1.grp = t1.n1 and
h1.id =2 AND "T1"."ID"<=3))
   5 - filter("H1"."ID" 

Again the bloom filter vanishes; this is exactly the same plan generated by the previous run of this insert statement.

Moving to Oracle 12.1.0.2 some differing results are obtained. We set up the same tables and data as in the 11.2.0.4 run:


SQL> create table pt_hash (
  2  	     id,
  3  	     grp,
  4  	     small_vc,
  5  	     padding
  6  )
  7  nologging
  8  pctfree 90 pctused 10
  9  partition by hash(grp)
 10  (
 11  	     partition p1,
 12  	     partition p2,
 13  	     partition p3,
 14  	     partition p4
 15  )
 16  as
 17  with generator as (
 18  	     select  --+ materialize
 19  		     rownum id
 20  	     from dual
 21  	     connect by
 22  		     level  create table t1
  2  as
  3  select
  4  	     rownum  id,
  5  	     rownum  n1,
  6  	     lpad(rownum,10) v1,
  7  	     rpad('x',100) padding
  8  from
  9  	     dual
 10  connect by
 11  	     rownum  
SQL> create table target (
  2  	     n1      number,
  3  	     id      number
  4  )
  5  ;

Table created.

SQL> 
SQL> set serveroutput off linesize 168 pagesize 0
SQL> alter session set statistics_level = all;

Session altered.

SQL> alter session set tracefile_identifier='dml_bloom';

Session altered.

SQL> alter session set events '10053 trace name context forever, level 1';

Session altered.

SQL> 
SQL> select
  2  	     /*+
  3  		     leading(t1 h1)
  4  		     use_hash(h1)    no_swap_join_inputs(h1)
  5  	     */
  6  	     t1.n1,
  7  	     h1.id
  8  from
  9  	     t1,
 10  	     pt_hash h1
 11  where
 12  	     t1.id between 2 and 3
 13  and     h1.grp = t1.n1
 14  and     h1.id  
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  3rm369uarustm, child number 0
-------------------------------------
select         /*+                 leading(t1 h1)
use_hash(h1)    no_swap_join_inputs(h1)         */         t1.n1,
  h1.id from         t1,         pt_hash h1 where         t1.id between
2 and 3 and     h1.grp = t1.n1 and     h1.id =2 AND "T1"."ID"<=3))
   5 - filter("H1"."ID" 

As in 11.2.0.4 the select statement produces a bloom filter; a 10053 trace on the session shows this for the select and the two insert statement runs:


Consider using bloom filter between T1[T1] and H1[PT_HASH] with ??
kkoBloomFilter: join (lcdn:2 rcdn:5417 jcdn:155 limit:5417)
kkopqSingleJoinBloomNdv:Compute bloom ndv for lfro:T1[T1] and rfro:H1[PT_HASH] swap:no
kkopqSingleJoinBloomNdv: predCnt:#1 col1:(bndv:70 ndv:70) and col2:(bndv:4 ndv:2) creatorNDV:4.0 userNDV:70.0
kkopqComputeBloomNdv: predCnt:1 creatorNdv:4.0 userNdv:70.0 singleTblPred:yes
kkoBloomFilter: join ndv:2 reduction:0.000369 (limit:0.500000)  accepted
Enumerating distribution method (advanced)

It’s time to convert that select to an insert statement as see what 12.1.0.2 does with it. The first run produces the same plan as in 11.2.0.4:


SQL> insert into target(n1, id)
  2  select
  3  	     /*+
  4  		     ordered
  5  		     use_hash(h1)    no_swap_join_inputs(h1)
  6  	     */
  7  	     t1.id,
  8  	     h1.id
  9  from
 10  	     t1,
 11  	     pt_hash h1
 12  where
 13  	     t1.id between 2 and 3
 14  and     h1.grp = t1.n1
 15  and     h1.id  
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  g693aju7gwm9n, child number 0
-------------------------------------
insert into target(n1, id) select         /*+                 ordered
              use_hash(h1)    no_swap_join_inputs(h1)         */
 t1.id,         h1.id from         t1,         pt_hash h1 where
t1.id between 2 and 3 and     h1.grp = t1.n1 and     h1.id =2 AND "T1"."ID"<=3))
   5 - filter("H1"."ID" 

Remember that the bloom section appears in every statement executed by this session; the interesting part is the bloom filter is not used in the first run of the insert. Rolling back the initial insert and running the exact same statement again produces different results:


SQL> rollback;

Rollback complete.

SQL> 
SQL> insert /*+ append */ into target(n1, id)
  2  select
  3  	     /*+
  4  		     ordered
  5  		     use_hash(h1)    no_swap_join_inputs(h1)
  6  	     */
  7  	     t1.id,
  8  	     h1.id
  9  from
 10  	     t1,
 11  	     pt_hash h1
 12  where
 13  	     t1.id between 2 and 3
 14  and     h1.grp = t1.n1
 15  and     h1.id  
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  10jb8cf3tx39p, child number 0
-------------------------------------
insert /*+ append */ into target(n1, id) select         /*+
    ordered                 use_hash(h1)    no_swap_join_inputs(h1)
    */         t1.id,         h1.id from         t1,         pt_hash h1
where         t1.id between 2 and 3 and     h1.grp = t1.n1 and
h1.id =2 AND "T1"."ID"<=3))
   6 - filter("H1"."ID" 

The second insert returns to the original plan and cardinalities, and restores the use of the bloom filter. As mentioned earlier Bug 14325392 appears to have been addressed, providing at least one instance when the bloom filter returns to use. That both inserts didn’t use the bloom filter may indicate that the optimizer is still having issues deciding to use it if the conditions aren’t ‘right’. But the fact remains that Oracle 12.1.0.2 will use bloom filters in when a select statement that uses them is converted to an ‘insert into … as select …’ construct. We may have to wait for 12.2 to finally be available outside of the cloud to know if the problem has been completely resolved.

Knowledge is our armor as DBAs, and the more we have the better prepared we are to do ‘battle’ with optimizer bugs and occasional errant behavior. Knowing what to expect is half the battle; the more we know, the more likely we are to be able to address issues that may arise.

Just because it’s ‘correct’ doesn’t make it ‘right’.

July 21, 2016

On An Average Day

Filed under: General — dfitzjarrell @ 09:18

"There are other advantages," continued the child. "For instance, if one rat were cornered by nine cats,
on the average, each cat would be ten percent rat and the rat would be ninety percent cat. If you happened
to be a rat, you can see how much nicer it would make things."
-- Norton Juster, The Phantom Tollbooth

Recently an interesting question appeared on “Ask Tom” with regard to the ‘average’ between two dates. Which is not saying it’s the ONLY interesting question on “Ask Tom” but it did catch my attention, notably the topic of the discussion. Why, on earth, would anyone want the ‘average’ of two dates? Digging a bit deeper it became obvious what the original poster wanted to accomplish. Let’s take that trip and see where it leads.

The average person doesn’t usually ask for the average between two dates; averages are computed on salaries, on distance, on temperatures, on volumes, on numbers, in general, not dates. If we look at what the ‘average’ would be between two dates, however, we find it’s the date smack-dab in the middle of the two dates supplied. Connor McDonald provided this solution to the question:


SQL> with t as
  2     ( select
  3         date '2015-09-01' d1,
  4         date '2015-12-07' d2
  5      from dual )
  6  select
  7    least(d1,d2) + abs(d1-d2)/2
  8  from t;

LEAST(D1,
---------
19-OCT-15

SQL>

which provides the correct answer given the date range specified. It’s not an ‘average’, really, it’s more like the interval middle value, but it can serve a purpose. Let’s say that Millicent Muggwumpp, super-star billing clerk, needs to determine the middle date for a billing cycle so invoices can be generated and sent to the various customers. The solution Connor provided (with some modification) can provide that information. Let’s ‘hack away’ at Connor’s code and see what we end up with:


SQL> with interval_middle as
  2     ( select
  3         to_date('&&1', 'RRRR-MM-DD') d1,
  4         to_date('&&2', 'RRRR-MM-DD') d2
  5      from dual )
  6  select
  7    least(d1,d2) + abs(d1-d2)/2
  8  from interval_middle;

LEAST(D1,
---------
15-JUN-16

SQL>

Looking at the changes it should be noted that replaceable parameters have been substituted for the hard-coded dates and a more robust explicit date conversion has been written, including the expected date string format. Millicent can now use this script and pass to it any valid date range and return the middle date of that range. [SQL*Plus allows us to shut off the parameter replacement message showing the original code and the current version with the parameters supplied, but we could easily allow that to be displayed:


SQL> with interval_middle as
  2     ( select
  3         to_date('&&1', 'RRRR-MM-DD') d1,
  4         to_date('&&2', 'RRRR-MM-DD') d2
  5      from dual )
  6  select
  7    least(d1,d2) + abs(d1-d2)/2
  8  from interval_middle;
old   3:        to_date('&&1', 'RRRR-MM-DD') d1,
new   3:        to_date('2016-06-01', 'RRRR-MM-DD') d1,
old   4:        to_date('&&2', 'RRRR-MM-DD') d2
new   4:        to_date('2016-06-29', 'RRRR-MM-DD') d2

LEAST(D1,
---------
15-JUN-16

SQL>

if confirmation of the passed-in date values is required.]

Of course Connor’s answer wasn’t the only one provided as Stew Ashton questioned the need for LEAST() and ABS() in the query:


No matter what date you start with, you either subtract a positive number from the later date or subtract a negative number from the earlier date.


alter session set nls_date_format='YYYY-MM-DD HH24:MI';

with t as (
  select date '2016-07-01' d1, date '2016-07-18' d2
  from dual
)
select d1 - (d1 - d2) / 2 result1,
       d2 - (d2 - d1) / 2 result2
from t;

RESULT1          RESULT2
---------------- ----------------
2016-07-09 12:00 2016-07-09 12:00

Modifying this code in the same manner as Connor’s was produces:


SQL> with t as (
  2    select to_date('&&1', 'RRRR-MM-DD') d1, to_date('&&2', 'RRRR-MM-DD') d2
  3    from dual
  4  )
  5  select d1 - (d1 - d2) / 2 result1,
  6         d2 - (d2 - d1) / 2 result2
  7  from t;

RESULT1          RESULT2
---------------- ----------------
2016-06-15 00:00 2016-06-15 00:00

SQL>

The same result is produced by both queries, so LEAST() and ABS() aren’t absolutely necessary. Choose the approach that provides a sufficient level of comfort to you.

Averaging dates may not be an obvious operation but it can be useful, if in a limited way. We all look at dates in unusual ways on occasion so what may seem illogical to you, at least on the face of it, may be completely and utterly logical to someone else. Who knows, Aunt Edna may ask you to find the date between Uncle Milton’s and Aunt Haggatha’s birthdays because she wants to ship a plaster cat to Guatemala. Stranger things have happened.

And that’s about average.

July 14, 2016

Size Does Matter

Filed under: General — dfitzjarrell @ 11:07

"You see, to tall men I'm a midget, and to short men I'm a giant;
to the skinny ones I'm a fat man, and to the fat ones I'm a thin man."
-- Norton Juster, The Phantom Tollbooth

Someone in one of the Oracle forums asked the following question:


How to estimate datafile size of tablespace when it is almost full? Could someone please explain?

which is ambiguous to say the least, since no information was provided by the person posting as to system configuration, database size, transaction volume, etc. Several responses later the original poster replied:


My question is how much space to allocate to datafile when tablespace of it reaches almost max size?

which, on the face of it, seems to be a more ‘answerable’ question. Information is still missing, however, so it’s difficult to provide any real solution to the person who asked. Since Oracle doesn’t really monitor tablespace growth –it does monitor file size and space within the datafiles but those are only part of the required information — it’s up to the DBA (which, presumably, is you) to monitor such growth. It isn’t difficult, but it does take some preparation before any results can be returned. Let’s look at one way tablespace monitoring can be achieved.

The following script sets up a table and a sequence that will be populated by another script running as a scheduled job or task on the database server:


create table tablespace_monitor(
	mon_run	number,
	run_dt	date,
	tablespace_name varchar2(35),
	available  number,
	used	number,
	free	number,
	pct_free number(9,2))
tablespace users;

create sequence tab_mon_seq
start with 1 increment by 1 nocycle nomaxvalue nocache;

Created when connected as SYS as SYSDBA this will hold the necessary data to monitor tablespaces and their growth. This table will be populated by the following script:


insert into tablespace_monitor (tablespace_name, available, used, free, pct_free)
(
select tname_a tablespace_name, 
       ttl_avail available, 
       (ttl_avail - ttl_free) used, 
       ttl_free free, 
       round((ttl_free/ttl_avail)*100,2) "PCT FREE"
from
 (select tablespace_name tname_a, sum(user_bytes) ttl_avail
 from dba_data_files
 group by tablespace_name) avail left join
 (select tablespace_name tname_f, sum(bytes) ttl_free
 from dba_free_space
 group by tablespace_name) free on tname_f = tname_a
union
select tablespace_name, 
       sum(bytes_used+bytes_free) ttl_avail,
       sum(bytes_used),
       sum(bytes_free),
       round((sum(bytes_free)/sum(bytes_used+bytes_free))*100,2) pct_free
from v$temp_space_header
group by tablespace_name);

commit;

column next_seq noprint
select tab_mon_seq.nextval next_seq from dual;

update tablespace_monitor
set mon_run = tab_mon_seq.currval,
    run_dt = sysdate
where mon_run is null;

commit;

The provided scripts have been tested on 12.1.0.2 and work as expected. The second script is scheduled to run once daily to provide daily tablespace size data, which will be used by the next script to generate a report:


set linesize 200 numwidth 13 pagesize 40
column "CONSUMED/FREED (+)" format a20
column "TABLESPACE NAME" format a15

select m1.tablespace_name "TABLESPACE NAME", m2.free "BYTES FREE", m2.pct_free "PCT FREE",
      case when trunc((m1.free - m2.free)/(1024*1024)) > 0 then lpad(to_char(round((m1.free - m2.free)/(1024*1024),2)),20)
            when trunc((m1.free - m2.free)/(1024)) between 1 and 999 then lpad(to_char(round((m1.free - m2.free)/(1024),2)),20)
           when trunc((m1.free - m2.free)/(1024*1024)) > 0 then 'MB'
            when abs(trunc((m1.free - m2.free)/(1024))) between 1 and 999 then 'KB'
            else null end unit,
       case when trunc((((m1.free - m2.free)/(1024*1024))/((m2.run_dt - m1.run_dt)*1440))) > 0 then round((((m1.free - m2.free)/(1024*1024))/((m2.run_dt - m1.run_dt)*1440)),2)
            when trunc((((m1.free - m2.free)/(1024))/((m2.run_dt - m1.run_dt)*1440))) between 1 and 999 then round((((m1.free - m2.free)/(1024))/((m2.run_dt - m1.run_dt)*1440)),2)
            when trunc((((m1.free - m2.free)/(1024*1024))/((m2.run_dt - m1.run_dt)*1440))) >  0 then 'MB'
            when abs(trunc((((m1.free - m2.free)/(1024))/((m2.run_dt - m1.run_dt)*1440)))) between 1 and 999 then 'KB'
            else null end "UNIT PER MIN",
       round((m2.run_dt - m1.run_dt)*1440,2) "WINDOW IN MIN",
       to_char(m1.run_dt, 'RRRR-MM-DD HH24:MI:SS') "BEGIN DATE/TIME",
       to_char(m2.run_dt, 'RRRR-MM-DD HH24:MI:SS') "LAST RUN DATE/TIME"
from tablespace_monitor m1, tablespace_monitor m2
where m2.mon_run = (select max(mon_run) from tablespace_monitor)
and m1.mon_run = (select min(mon_run) from tablespace_monitor)
and m2.tablespace_name = m1.tablespace_name

spool &1
/
spool off

set linesize 80

The script takes the most recent data then takes the previous snapshot data and generates the difference, showing the growth in the last 24 hours based on running this script once per day. Of course it can be run more than once per day, but the report will only be generated on the most recent data and the run prior to that; if it is desired to run the population script more than once per day it’s recommended that the report also be run more than once per day, to report on all of the generated values stored in the tablespace_monitor table.

The report looks like this:


TABLESPACE NAME    BYTES FREE      PCT FREE CONSUMED/FREED (+)   UN   AVG PER MIN UN WINDOW IN MIN BEGIN DATE/TIME     LAST RUN DATE/TIME
--------------- ------------- ------------- -------------------- -- ------------- -- ------------- ------------------- -------------------
INDX                103809024           100                    0                0             9.87 2016-07-14 10:00:20 2016-07-14 10:10:12
SYSAUX               62193664          7.61                 2.19 MB        227.03 KB          9.87 2016-07-14 10:00:20 2016-07-14 10:10:12
SYSTEM               62849024          7.06                    0                0             9.87 2016-07-14 10:00:20 2016-07-14 10:10:12
TEMP                 42991616         68.33                    0                0             9.87 2016-07-14 10:00:20 2016-07-14 10:10:12
UNDOTBS1           1986789376         97.47                   23 MB          2.33 MB          9.87 2016-07-14 10:00:20 2016-07-14 10:10:12
USERS             33880604672         98.69               88 (+) MB          8.92 MB          9.87 2016-07-14 10:00:20 2016-07-14 10:10:12

Over a period of time growth trends can be determined for each tablespace, and space additions can be planned accordingly. Since it’s not likely that the grown patterns will be linear it’s a good idea to make note of reports of exceptionally large space usage and the period of time in which they occur. This should provide a window to plan on storage needs and give time to the System Admins and Storage Admins to prepare additional storage for use.

Storage cannot be managed without usage and growth data yet it seems that some DBAs try to do just that; a S.W.A.G CAN get you there on occasion but more often than not it’s simply glorified guesswork without a crystal ball or Ouija board handy. Regardless of how the end-users see them databases are part of a complex managed data entry and retrieval system that should not be left to chance or ‘sophisticated’ guesswork. Planning and monitoring are necessities in the IT environment; the database should not be left ‘out in the cold’ to suffer on its own.

Whether the provided scripts are used to monitor tablespace growth or you decide to write your own implementing tablespace monitoring should be at or near the top of every DBAs ‘to-do’ list. Keeping ‘tabs’ on how much space is being used in which tablespace can only help the DBA by ensuring that the users don’t run out of space because usage wasn’t tracked. Emergency space additions take time and interrupt the business flow; scheduling such additions can prevent the end-users from complaining and will keep management happy because lack of space didn’t stop the flow of transactions. Which means everybody wins.

It does seem that size is important.

Next Page »

Create a free website or blog at WordPress.com.