Oracle Tips and Tricks — David Fitzjarrell

September 26, 2016

“My Kingdom For A Histogram”

Filed under: General — dfitzjarrell @ 09:55

"Don't you know anything at all about numbers?"
"Well, I don't think they're very important," snapped Milo, too embarrassed to admit the truth.
"NOT IMPORTANT!" roared the Dodecahedron, turning red with fury.  "Could you have tea for two without the two, or three blind mice
without the three? Would there be four corners of the earth if there weren't a four? And how would you sail the seven seas without
a seven?"
"All I meant was..." began Milo, but the Dodecahedron, overcome with emotion and shouting furiously, carried right on.
"If you had high hopes, how would you know how high they were? And did you know that narrow escapes come in all different widths? 
Would you travel the whole wide world without ever knowing how wide it was? And how could you do anything at long last," he concluded,
waving his arms over his head, "without knowing how long the last was? Why, numbers are the most beautiful and valuable things in the
world.  Just follow me and I'll show you." He turned on his heel and stalked off into the cave.
-- Norton Juster, The Phantom Tollbooth

In one of the current Oracle forums an “old saw” was resurrected, namely that a single-column primary key has no need of a histogram. This presumes that the primary key has no ‘holes’ in the data and has a continuous range of data. The reality of that situation is such primary keys are rarely, if ever, unbroken and the range of data may not be continuous. With that last sentence in mind it’s not strange to consider a histogram on a single-column primary key. And depending on the release of Oracle in use that histogram may be an old standard (a height-balanced histogram) or a new histogram offered in 12c, the hybrid histogram. Let’s look at an example how this plays out in 11.2.0.4 and in 12.1.0.2. First, though, let’s look at that new Hybrid histogram.

The Hybrid histogram is a new type of histogram that uses a number of buckets less than the number of distinct values in the table data; it also determines, and utilizes, the repeat frequency of the endpoint values. The bucket size can be adjusted from the initial size based on endpoint values; there are two special buckets which will not change in size, and these are the ‘boundary’ buckets, bucket 0 and the last bucket of the histogram. The Hybrid histogram is based on four apparent ‘rules’, which are:

	1) a value should not be found in more than one bucket
	2) the bucket size is allowed to be extended in order to contain all instances of the same distinct value
	3) adjusted bucket size cannot be less than the original size (not applicable at either end of the data set)
	4) the original number of buckets should not be reduced

One way to create a hybrid histogram is shown in the example below, part of which was provided by Jonathan Lewis, the remainder of this example was adapted from work done by Mohammed Houri. We begin:


SQL>
SQL> --
SQL> -- Create and populate a table so we can
SQL> -- build a hybrid histogram
SQL> --
SQL> -- Thanks to Jonathan Lewis for generating
SQL> -- this table and data set
SQL> --
SQL> create table t1 (id number, n1 number);

Table created.

Elapsed: 00:00:00.01
SQL>
SQL> @InsT1.sql

Elapsed: 00:00:00.00
SQL>

Choosing a number of buckets less than the number of distinct values in the table is one way to create a hybrid histogram:


SQL>
SQL> --
SQL> -- Return the number of distinct values
SQL> -- in column n1
SQL> --
SQL>
SQL> select count(distinct n1) from t1;

COUNT(DISTINCTN1)
-----------------
               37

Elapsed: 00:00:00.00
SQL>

20 buckets will be used for our histogram and use DBMS_STATS to create it:


SQL>
SQL> --
SQL> -- Create a hybrid histogram on column n1
SQL> -- by setting the number of buckets less
SQL> -- than the number of distinct values
SQL> --
SQL> BEGIN
  2  	dbms_stats.gather_table_stats
  3  	  (user, 't1', method_opt => 'for columns n1 size 20');
  4  END;
  5  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.43
SQL>

During the creation of this histogram Oracle places the column values in order to find values that occur in more than one bucket. When such values are found new bucket sizes are computed for all but the boundary buckets, in this case bucket 0 and bucket 19:


SQL>
SQL> --
SQL> -- Four rules exist to create a hybrid histogram, and the
SQL> -- first two are:
SQL> --
SQL> -- a value should not be found in more than one bucket
SQL> -- a bucket size is allowed to be extended in order to contain
SQL> --   all instances of the same value
SQL> --
SQL> -- Compute the 'new' bucket size based on the data and
SQL> -- endpoint values
SQL> --
SQL> -- To illustrate how the bucket size adjustment is
SQL> -- calculated let's look at the value 13 in the data
SQL> --
SQL> -- Oracle 'walks' the ordered list of values until
SQL> -- it finds 13 in two buckets
SQL> --
SQL> -- Oracle extends the bucket where 13 is first found (bucket n)
SQL> -- and moves all other instances of 13 (from bucket n+1)
SQL> -- to this bucket (n)
SQL> --
SQL> -- Moving all instances of 13 to a single bucket
SQL> -- also makes 13 an endpoint value allowing Oracle
SQL> -- to compute the repeat count of that value
SQL> --
SQL> -- This continues on through the values so that
SQL> -- Oracle can place all occurrences of a distinct
SQL> -- value so they don't populate more than one bucket
SQL> --
SQL> SELECT
  2  	     (row_number() over(order by ept_nbr)-1) NumBucket
  3  	     ,ept_nbr
  4  	     ,ept_act_val
  5  	     ,rpt_cnt
  6  	     ,ept_nbr - (lag(ept_nbr,1,0) over(order by ept_nbr)) "new bucket size"
  7  	     ,bucket_size "original bucket_size"
  8  	 FROM
  9  	     (SELECT
 10  		  ah.endpoint_number		ept_nbr
 11  		 ,ah.endpoint_actual_value	ept_act_val
 12  		 ,lag(ah.endpoint_number,1,0) over(order by ah.endpoint_number) ept_lag
 13  		 ,ah.endpoint_repeat_count rpt_cnt
 14  		 ,at.sample_size/at.num_buckets bucket_size
 15  	      FROM
 16  		 user_tab_histograms	  ah
 17  		,user_tab_col_statistics  at
 18  	      WHERE ah.table_name  = at.table_name
 19  	      AND ah.column_name = at.column_name
 20  	      AND ah.table_name  = 'T1'
 21  	      AND ah.column_name = 'N1'
 22  	    ) ORDER BY ept_nbr;

 NUMBUCKET    EPT_NBR EPT_ACT_VAL     RPT_CNT new bucket size original bucket_size
---------- ---------- ------------ ---------- --------------- --------------------
         0          1 8                     1               1                    5
         1          6 13                    3               5                    5
         2         12 18                    2               6                    5
         3         20 20                    5               8                    5
         4         26 23                    2               6                    5
         5         32 26                    3               6                    5
         6         38 27                    6               6                    5
         7         44 28                    6               6                    5
         8         50 29                    6               6                    5
         9         58 31                    5               8                    5
        10         69 33                    8              11                    5

 NUMBUCKET    EPT_NBR EPT_ACT_VAL     RPT_CNT new bucket size original bucket_size
---------- ---------- ------------ ---------- --------------- --------------------
        11         79 35                    7              10                    5
        12         86 38                    5               7                    5
        13         90 41                    1               4                    5
        14         92 42                    2               2                    5
        15         95 43                    3               3                    5
        16         96 44                    1               1                    5
        17         97 45                    1               1                    5
        18         98 46                    1               1                    5
        19        100 59                    1               2                    5

20 rows selected.

Elapsed: 00:00:00.21
SQL>

Notice that for buckets 1-12 the bucket size increased; for the remaining non-boundary buckets the bucket size should decrease. But according to Rule 3 no bucket can be adjusted to a size smaller than the original bucket size so those buckets remain unadjusted. The bucket size computation in the query results below is based on the sample size divided by the number of buckets and, thus, displays a constant value for all buckets:


SQL>
SQL> --
SQL> -- Given that boundary values are the exception computing the new
SQL> -- bucket sizes shows that for bucket numbers 1-12 the sizes are
SQL> -- the same or larger than the originally computed size (5)
SQL> --
SQL> -- All remaining non-endpoint buckets compute to a smaller size
SQL> -- than the original
SQL> --
SQL> -- Now the third rule appears:
SQL> --
SQL> -- No non-endpoint bucket can be smaller than the originally
SQL> -- computed size
SQL> --
SQL> select
  2  	      uth.endpoint_number
  3  	     ,uth.endpoint_actual_value
  4  	     ,uth.endpoint_repeat_count
  5  	     ,ucs.sample_size/ucs.num_buckets bucket_size
  6  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
  7  	 from
  8  	     user_tab_histograms uth
  9  	    ,user_tab_col_statistics ucs
 10  	where
 11  	     uth.table_name   = ucs.table_name
 12  	 and uth.column_name   = ucs.column_name
 13  	 and uth.table_name    = 'T1'
 14  	 and uth.column_name   = 'N1'
 15  	order by uth.endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULARITY
--------------- ------------ --------------------- ----------- ----------
              1 8                                1           5         -4
              6 13                               3           5         -2
             12 18                               2           5         -3
             20 20                               5           5          0
             26 23                               2           5         -3
             32 26                               3           5         -2
             38 27                               6           5          1
             44 28                               6           5          1
             50 29                               6           5          1
             58 31                               5           5          0
             69 33                               8           5          3

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULARITY
--------------- ------------ --------------------- ----------- ----------
             79 35                               7           5          2
             86 38                               5           5          0
             90 41                               1           5         -4
             92 42                               2           5         -3
             95 43                               3           5         -2
             96 44                               1           5         -4
             97 45                               1           5         -4
             98 46                               1           5         -4
            100 59                               1           5         -4

20 rows selected.

Elapsed: 00:00:00.19
SQL>

For a Hybrid histogram three types of column value exist to base cardinality estimates upon: popular values, non-popular values which are endpoint and non-popular values that are not an endpoint. Let’s look at how Oracle estimates the cardinality for each case, starting with popular values:


SQL>
SQL> --
SQL> -- Display the endpoint, value, repeat count,
SQL> -- bucket size and the 'popularity'
SQL> --
SQL> -- Popularity is determined by computing the
SQL> -- difference between the frequency of appearance
SQL> -- (repeat count) and the bucket size (sample size/number of buckets)
SQL> -- Positive values are considered 'popular'
SQL> --
SQL> select
  2  	      endpoint_number
  3  	     ,endpoint_actual_value
  4  	     ,endpoint_repeat_count
  5  	     ,bucket_size
  6  	     ,case when Popularity > 0 then 'Pop'
  7  			else 'Non-Pop'
  8  	       end Popularity
  9  	 from
 10  	(
 11  	  select
 12  	      uth.endpoint_number
 13  	     ,uth.endpoint_actual_value
 14  	     ,uth.endpoint_repeat_count
 15  	     ,ucs.sample_size/ucs.num_buckets bucket_size
 16  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
 17  	 from
 18  	     user_tab_histograms uth
 19  	    ,user_tab_col_statistics ucs
 20  	where
 21  	     uth.table_name   = ucs.table_name
 22  	 and uth.column_name   = ucs.column_name
 23  	 and uth.table_name    = 'T1'
 24  	 and uth.column_name   = 'N1'
 25  	 )
 26  	order by endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
--------------- ------------ --------------------- ----------- -------
              1 8                                1           5 Non-Pop
              6 13                               3           5 Non-Pop
             12 18                               2           5 Non-Pop
             20 20                               5           5 Non-Pop
             26 23                               2           5 Non-Pop
             32 26                               3           5 Non-Pop
             38 27                               6           5 Pop
             44 28                               6           5 Pop
             50 29                               6           5 Pop
             58 31                               5           5 Non-Pop
             69 33                               8           5 Pop

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
--------------- ------------ --------------------- ----------- -------
             79 35                               7           5 Pop
             86 38                               5           5 Non-Pop
             90 41                               1           5 Non-Pop
             92 42                               2           5 Non-Pop
             95 43                               3           5 Non-Pop
             96 44                               1           5 Non-Pop
             97 45                               1           5 Non-Pop
             98 46                               1           5 Non-Pop
            100 59                               1           5 Non-Pop

20 rows selected.

Elapsed: 00:00:00.19
SQL>

Let’s return only the popular values and work through the cardinality estimate Oracle generates:


SQL> --
SQL> -- Using the popularity Oracle estimates the cardinality
SQL> -- by considering the following three types of values:
SQL> --
SQL> -- popular value
SQL> -- non-popular value with an endpoint number
SQL> -- non-popular value not present in the histogram table
SQL> --
SQL> -- Case 1: Popular values
SQL> --
SQL> -- Display the 'popular' values in this data set
SQL> --
SQL> select
  2  	      endpoint_actual_value
  3  	     ,endpoint_repeat_count
  4  	     ,bucket_size
  5  	     ,Popularity
  6  from
  7  (
  8    select
  9  	      endpoint_number
 10  	     ,endpoint_actual_value
 11  	     ,endpoint_repeat_count
 12  	     ,bucket_size
 13  	     ,case when Popularity > 0 then 'Pop'
 14  			else 'Non-Pop'
 15  	       end Popularity
 16  	 from
 17  	(
 18  	  select
 19  	      uth.endpoint_number
 20  	     ,uth.endpoint_actual_value
 21  	     ,uth.endpoint_repeat_count
 22  	     ,ucs.sample_size/ucs.num_buckets bucket_size
 23  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
 24  	 from
 25  	     user_tab_histograms uth
 26  	    ,user_tab_col_statistics ucs
 27  	where
 28  	     uth.table_name   = ucs.table_name
 29  	 and uth.column_name   = ucs.column_name
 30  	 and uth.table_name    = 'T1'
 31  	 and uth.column_name   = 'N1'
 32  	 )
 33    )
 34  where Popularity = 'Pop';

ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
27                               6           5 Pop
28                               6           5 Pop
29                               6           5 Pop
33                               8           5 Pop
35                               7           5 Pop

Elapsed: 00:00:00.20
SQL>

Looking at one of these popular values (33) let’s return the cardinality estimate Oracle calculated:


SQL> --
SQL> -- Using explain plan the cardinality estimation
SQL> -- can be displayed for one of the 'popular' values
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 33;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> --
SQL> -- The cardinality estimate displayed below was
SQL> -- calculated using the following formula:
SQL> --
SQL> -- E-Rows = ENDPOINT_REPEAT_COUNT * num_rows/sample_size
SQL> -- E-Rows = 8 * 100/100 = 8
SQL> --
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     8 |    24 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   2 - filter("N1"=33)

14 rows selected.

Elapsed: 00:00:00.03
SQL>

From the formula found in the above comments our calculated cardinality matches with that provided by the optimizer, so our understanding of the Hybrid histogram seems sound. To continue that verification we consider the case where we have a non-popular value that is an endpoint. The formula for that cardinality calculation is: E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/sample_size). We calculate the NewDensity with a query based on the work of Alberto Dell’Era. First we return the non-popular values that are endpoints:


SQL> --
SQL> -- Case 2: Non-popular values with an endpoint
SQL> --
SQL> -- First return all non-popular values that are an
SQL> -- endpoint
SQL> --
SQL> select
  2  	      endpoint_actual_value
  3  	     ,endpoint_repeat_count
  4  	     ,bucket_size
  5  	     ,Popularity
  6  from
  7  (
  8    select
  9  	      endpoint_number
 10  	     ,endpoint_actual_value
 11  	     ,endpoint_repeat_count
 12  	     ,bucket_size
 13  	     ,case when Popularity > 0 then 'Pop'
 14  			else 'Non-Pop'
 15  	       end Popularity
 16  	 from
 17  	(
 18  	  select
 19  	      uth.endpoint_number
 20  	     ,uth.endpoint_actual_value
 21  	     ,uth.endpoint_repeat_count
 22  	     ,ucs.sample_size/ucs.num_buckets bucket_size
 23  	     ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
 24  	 from
 25  	     user_tab_histograms uth
 26  	    ,user_tab_col_statistics ucs
 27  	where
 28  	     uth.table_name   = ucs.table_name
 29  	 and uth.column_name   = ucs.column_name
 30  	 and uth.table_name    = 'T1'
 31  	 and uth.column_name   = 'N1'
 32  	 )
 33    )
 34  where Popularity = 'Non-Pop';

ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
8                                1           5 Non-Pop
13                               3           5 Non-Pop
18                               2           5 Non-Pop
20                               5           5 Non-Pop
23                               2           5 Non-Pop
26                               3           5 Non-Pop
31                               5           5 Non-Pop
38                               5           5 Non-Pop
41                               1           5 Non-Pop
42                               2           5 Non-Pop
43                               3           5 Non-Pop

ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
44                               1           5 Non-Pop
45                               1           5 Non-Pop
46                               1           5 Non-Pop
59                               1           5 Non-Pop

15 rows selected.

Elapsed: 00:00:00.20
SQL>

Let’s return Oracle’s estimated cardinality for one of these values (45):


SQL> --
SQL> -- Use explain plan to return the estimated cardinality
SQL> --
SQL> -- In this case the estimated cardinality is computed as:
SQL> --
SQL> -- E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/sample_size)
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 45;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |     6 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------

   2 - filter("N1"=45)

14 rows selected.

Elapsed: 00:00:00.02
SQL>

Now let’s use our query to return the value for NewDensity:


SQL> --
SQL> -- The NewDensity is computed by an internal algorithm but
SQL> -- we can get a reliable density using the following query
SQL> -- based on work done by Alberto Dell'Era:
SQL> --
SQL> SELECT
  2  	      BktCnt
  3  	     ,PopBktCnt
  4  	     ,PopValCnt
  5  	     ,NDV
  6  	     ,pop_bucketSize
  7  	     ,trunc(((BktCnt-PopBktCnt)/BktCnt)/(NDV-PopValCnt),10) NewDensity
  8  	  FROM
  9  	     (SELECT
 10  		COUNT(1) PopValCnt,
 11  		SUM(endpoint_repeat_count) PopBktCnt,
 12  		ndv,
 13  		BktCnt,
 14  		pop_bucketSize
 15  	      FROM
 16  	       (SELECT
 17  		 (sample_size - num_nulls) BktCnt,
 18  		 num_distinct ndv,
 19  		 num_buckets,
 20  		 density OldDensity,
 21  		 (sample_size-num_nulls)/num_buckets pop_bucketSize
 22  	       FROM user_tab_col_statistics
 23  	       WHERE
 24  		   table_name  = 'T1'
 25  	       AND column_name = 'N1'
 26  	       ),
 27  	       user_histograms
 28  	     WHERE table_name	      = 'T1'
 29  	     AND column_name	      = 'N1'
 30  	     AND endpoint_repeat_count> pop_bucketSize
 31  	     GROUP BY ndv,
 32  	       BktCnt,
 33  	       pop_bucketSize
 34  	     );

    BKTCNT  POPBKTCNT  POPVALCNT        NDV POP_BUCKETSIZE NEWDENSITY
---------- ---------- ---------- ---------- -------------- ----------
       100         33          5         37              5   .0209375

Elapsed: 00:00:00.19
SQL>

We now see if our calculation returned a valid value by ‘plugging’ it into the following equation: E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/ sample_size). The cardinality estimate we compute is:


SQL> --
SQL> -- 'Plugging' this new density into the equation above produces:
SQL> --
SQL> -- E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/ sample_size)
SQL> -- E-Rows = 100 * greatest (.0209375, 1/100) = 2.09375 ~ 2
SQL> --
SQL> -- which matches the value returned by Oracle
SQL> --
SQL>
SQL> --
SQL> -- To validate these findings use a different non-popular value
SQL> -- that's also an endpoint:
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 43;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     3 |     9 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   2 - filter("N1"=43)

14 rows selected.

Elapsed: 00:00:00.02
SQL>

Oracle returned 3 as the estimated cardinality and so did our calculation (E-Rows = 100 * greatest (.0209375, 3/100) = 3), more proof that our concept of the Hybrid histogram is correct. One final test will prove how well we understand this histogram, considering non-popular values that are not endpoints. Using our calculated value of NewDensity in this equation: E-Rows = num_rows * NewDensity gives us the following result: E-Rows = 100 * .0209375 = 2.09375 ~ 2. Time to see if our calculated value matches that which Oracle has estimated:


SQL>
SQL> --
SQL> -- Case 3: Non-popular value without an endpoint number
SQL> --
SQL> -- This calculation is fairly simple and straightforward:
SQL> --
SQL> -- E-Rows = num_rows * NewDensity = 100 * .0209375 = 2.09375 ~ 2
SQL> --
SQL>
SQL> explain plan for select count(1) from t1 where n1 = 17;

Explained.

Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T1   |     2 |     6 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------

   2 - filter("N1"=17)

14 rows selected.

Elapsed: 00:00:00.02
SQL>

Our calculated value, as evidenced by the Rows value in the execution plan, matches Oracle’s estimate, giving further evidence that our contcspt of the Hybrid histogram is sound.

`

As mentioned earlier in this article Hybrid histograms can exhibit instability in the form of endpoint changes based on the changing frequency of values as inserts are executed against the table. To illustrate this let’s add another 16 to the data and see what happens:


SQL>
SQL> --
SQL> -- Hybrid histograms can display instability in terms of
SQL> -- endpoints and bucket size
SQL> --
SQL> -- To illustrate this we add 16 to the data, increasing
SQL> -- the repeat count to 3, causing this newly added 16
SQL> -- to shift into the previous bucket leaving 17
SQL> -- as the new endpoint
SQL> --
SQL> insert into t1 values (2, 16);

1 row created.

Elapsed: 00:00:00.00
SQL>
SQL> BEGIN
  2  	dbms_stats.gather_table_stats
  3  	  (user, 't1', method_opt => 'for all columns size 20');
  4  END;
  5  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.08
SQL>
SQL> select
  2  	      uth.endpoint_number
  3  	     ,uth.endpoint_actual_value
  4  	     ,uth.endpoint_repeat_count
  5  	 from
  6  	    user_tab_histograms uth
  7  	    ,user_tables ut
  8  	    ,user_tab_col_statistics ucs
  9  	where
 10  	     uth.table_name    = 'T1'
 11  	    and uth.column_name   = 'N1'
 12  	    and uth.table_name	  = ut.table_name
 13  	    and ut.table_name	  = ucs.table_name
 14  	    and uth.column_name   = ucs.column_name
 15  	   order by uth.endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
              1 8                                1
              6 13                               3
             11 17                               1 <----
             16 19                               3
             21 20                               5
             27 23                               2
             33 26                               3
             39 27                               6
             45 28                               6
             51 29                               6
             59 31                               5

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
             70 33                               8
             80 35                               7
             87 38                               5
             91 41                               1
             96 43                               3
             97 44                               1
             98 45                               1
             99 46                               1
            101 59                               1

20 rows selected.

Elapsed: 00:00:00.24

The endpoint value of 18 has been replaced by the value 17, because of Rule 2, that no value can span buckets. Before the insert the histogram bucket looked like this: 8 12 12 13 13 13|15 16 16 17 18 18|19 19 19 20 20 20 20 20| … After the insert the ordered value list looked like this: 8 12 12 13 13|13 15 16 16 16|17 18 18 19 19 19 … After Oracle ‘walks’ through the list to consolidate like values the buckets look like this: 8 12 12 13 13 13| 15 16 16 16 17 |18 18 19 19 19… giving a new endpoint number of 17. If we add yet another 16 to the data set we see:


SQL>
SQL> --
SQL> -- Add one more 16 and the endpoint shifts to 16
SQL> --
SQL> insert into t1 values (3, 16);

1 row created.

Elapsed: 00:00:00.00
SQL>
SQL> BEGIN
  2  	   dbms_stats.gather_table_stats
  3  	       (user, 't1', method_opt => 'for all columns size 20');
  4  	 END;
  5  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
SQL>
SQL> select
  2  	      uth.endpoint_number
  3  	     ,uth.endpoint_actual_value
  4  	     ,uth.endpoint_repeat_count
  5  	 from
  6  	    user_tab_histograms uth
  7  	    ,user_tables ut
  8  	    ,user_tab_col_statistics ucs
  9  	where
 10  	     uth.table_name    = 'T1'
 11  	    and uth.column_name   = 'N1'
 12  	    and uth.table_name	  = ut.table_name
 13  	    and ut.table_name	  = ucs.table_name
 14  	    and uth.column_name   = ucs.column_name
 15  	   order by uth.endpoint_number;

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
              1 8                                1
              6 13                               3
             11 16                               4 <-----
             17 19                               3
             22 20                               5
             28 23                               2
             34 26                               3
             40 27                               6
             46 28                               6
             52 29                               6
             60 31                               5

ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
             71 33                               8
             81 35                               7
             88 38                               5
             92 41                               1
             97 43                               3
             98 44                               1
             99 45                               1
            100 46                               1
            102 59                               1

20 rows selected.

Elapsed: 00:00:00.22
SQL>

The endpoint has shifted again and is now 16.

Now that we know what a Hybrid histogram is and what it can do let’s look at another way to generate such a histogram in 12.1.0.2, and this goes back (finally!) to that “old saw” mentioned earlier. Let’s create a table with a single-column primary key and see what Oracle does with it when we run DBMS_STATS.

This example, again by Jonathan Lewis, creates a table and populates the id column (the eventual primary key for the table) in a rather interesting way:


SQL> 
SQL> create table t1
  2  as
  3  with generator as (
  4  	     select  --+ materialize
  5  		     rownum id
  6  	     from dual
  7  	     connect by
  8  		     level <= 1e4
  9  )
 10  select
 11  	     rownum  id
 12  from
 13  	     generator	     v1,
 14  	     generator	     v2
 15  where
 16  	     rownum <= 1e4
 17  ;

Table created.

SQL> 
SQL> column maxid1 new_value max1
SQL> column maxid2 new_value max2
SQL> column maxid3 new_value max3
SQL> 
SQL> select min(id), max(id) maxid1
  2  from t1;

   MIN(ID)     MAXID1                                                           
---------- ----------                                                           
         1      10000                                                           

SQL> 
SQL> insert into t1 select id + 1e6 from t1;

10000 rows created.

SQL> 
SQL> select min(id), max(id) maxid2
  2  from t1;

   MIN(ID)     MAXID2                                                           
---------- ----------                                                           
         1    1010000                                                           

SQL> 
SQL> insert into t1 select id + 1e7 from t1;

20000 rows created.

SQL> 
SQL> select min(id), max(id) maxid3
  2  from t1;

   MIN(ID)     MAXID3                                                           
---------- ----------                                                           
         1   11010000                                                           

SQL> 

Notice the values in the column; the values go from 1 to 10000 then jump to 1000001, continuing to 1010000, and jump again to 11000001 continuing on to 11010000. The breaks in the value sequence are what make this data set qualify for a Hybrid histogram in 12.1.0.2; in 11.2.0.4 a height-balanced histogram is created. Let’s add the primary key to the table:

`


SQL> 
SQL> alter table t1 add constraint t1_pk primary key(id);

Table altered.

Now let’s query for data that isn’t in the table:


SQL> 
SQL> select
  2  	     /*+ dynamic_sampling(0) */
  3  	     *
  4  from    t1
  5  where
  6  	     id between 12000 and 13000
  7  ;

no rows selected

SQL> 

Doing nothing special we generate statistics on the table; Oracle will automatically determine that the PK column needs a histogram and create it:


SQL> begin
  2  	     dbms_stats.gather_table_stats(
  3  		     ownname	      => user,
  4  		     tabname	      =>'T1'
  5  	     );
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> 
SQL> column column_name format a35
SQL> set linesize 150
SQL> 
SQL> select
  2  	     column_name, sample_size,
  3  	     num_distinct, num_nulls, density,
  4  	     histogram, num_buckets
  5  from
  6  	     user_tab_cols
  7  where
  8  	     table_name = 'T1'
  9  order by
 10  	     column_name
 11  ;

COLUMN_NAME                         SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM       NUM_BUCKETS                                        
----------------------------------- ----------- ------------ ---------- ---------- --------------- -----------                                        
ID                                         5502        40000          0    .000025 HYBRID                  254                                        

SQL> 

To prove that 11.2.0.4 will create a height-balanced histogram let’s run the example there:


SQL> 
SQL> create table t1
  2  as
  3  with generator as (
  4  	     select  --+ materialize
  5  		     rownum id
  6  	     from dual
  7  	     connect by
  8  		     level <= 1e4
  9  )
 10  select
 11  	     rownum  id
 12  from
 13  	     generator	     v1,
 14  	     generator	     v2
 15  where
 16  	     rownum  user,
  4  		     tabname	      =>'T1'
  5  	     );
  6  end;
  7  /

PL/SQL procedure successfully completed.

SQL> 
SQL> column column_name format a35
SQL> set linesize 150
SQL> 
SQL> select
  2  	     column_name, sample_size,
  3  	     num_distinct, num_nulls, density,
  4  	     histogram, num_buckets
  5  from
  6  	     user_tab_cols
  7  where
  8  	     table_name = 'T1'
  9  order by
 10  	     column_name
 11  ;

COLUMN_NAME                         SAMPLE_SIZE NUM_DISTINCT  NUM_NULLS    DENSITY HISTOGRAM       NUM_BUCKETS                                        
----------------------------------- ----------- ------------ ---------- ---------- --------------- -----------                                        
ID                                         5509        40000          0 .000024959 HEIGHT BALANCED         254                                        

SQL> 

So it would appear that having a histogram on a single-column primary key isn’t so far-fetched, after all. And we didn’t have to do anything special to get it created.

It all comes down to numbers.

September 6, 2016

A Bloomin’ Mess

Filed under: General — dfitzjarrell @ 09:03

'Why is it,' he said quietly, 'that quite often even the things which are correct just don't seem to be right?'
-- Norton Juster, The Phantom Tollbooth

For some time now Oracle has used bloom filters to speed the processing of select statements; unfortunately in 11.2.0.4 converting a select statement that uses a bloom filter into an ‘insert into … select’ form causes Oracle to abandon the bloom filter altogether. Let’s look at an example provided by Jonathan Lewis that illustrates this; it will be run in 11.2.0.4 and then in 12.1.0.2, and there are differences in how the two versions generate plans for the insert statement.

This is not unusual according to Oracle support as the following bugs are listed to produce this or similar behavior:

Bug 20112932 : BLOOM FILTER IS NOT CHOOSED WHEN EXECUTING INSERT SELECT STATEMENT
11.2.0.4 - Nov 2014: "not a bug".
This cites three more documents:
  Bug 13801198  BLOOM PRUNING/FILTER NOT USED IN DML STATEMENTS 
    11.2.0.3 - March 2012 - Duplicate, base bug: 18949550
    18949550: MERGE QUERY NOT USING BLOOM FILTER
    11.2.0.3 - June 2014: fixed in 12.2 (no patches yet)

  Bug 14325392  BLOOM FILTER NOT SUPPORTED IN PARTITION WISE JOIN 
    11.2.0.3 - July 2012 - fixed in 12.1 
    (but I think this is a PX filter, not a partition filter)

  Bug 17716301  BLOOM FILTER NOT USED FOR OUTER JOIN

    Aug 2014 - hidden bug note. Patches available for 11.2.0.3
    Note 1919508.1 

It appears that bug 14325392 is the cause for this, and it is listed as fixed in 12.1. Although not completely addressed (as proven later on in this post) it is addressed to a degree in 12.1.0.2.

First a partitioned table is created, with four partitions, containing 100,000 rows, a 4-row non-partitioned table is created as a driving table for the hash join and an empty table is created to insert data into using the select statement as an ‘insert into … select’ construct:


SQL> create table pt_hash (
  2  	     id,
  3  	     grp,
  4  	     small_vc,
  5  	     padding
  6  )
  7  nologging
  8  pctfree 90 pctused 10
  9  partition by hash(grp)
 10  (
 11  	     partition p1,
 12  	     partition p2,
 13  	     partition p3,
 14  	     partition p4
 15  )
 16  as
 17  with generator as (
 18  	     select  --+ materialize
 19  		     rownum id
 20  	     from dual
 21  	     connect by
 22  		     level <= 1e4
 23  )
 24  select
 25  	     rownum			     id,
 26  	     trunc(rownum/50)		     grp,
 27  	     to_char(trunc(rownum/20))	     small_vc,
 28  	     rpad('x',100)		     padding
 29  from
 30  	     generator, generator
 31  where
 32  	     rownum <= 1e5
 33  ;

Table created.

SQL> 
SQL> create table t1
  2  as
  3  select
  4  	     rownum  id,
  5  	     rownum  n1,
  6  	     lpad(rownum,10) v1,
  7  	     rpad('x',100) padding
  8  from
  9  	     dual
 10  connect by
 11  	     rownum <= 4
 12  ;

Table created.

SQL> 
SQL> create table target (
  2  	     n1      number,
  3  	     id      number
  4  )
  5  ;

Table created.

SQL> 
SQL> set serveroutput off linesize 168 pagesize 0
SQL> alter session set statistics_level = all;

Session altered.

SQL> 

The stage is set; let’s run the basic select and see what plan 11.2.0.4 generates:


SQL> select
  2  	     /*+
  3  		     leading(t1 h1)
  4  		     use_hash(h1)    no_swap_join_inputs(h1)
  5  	     */
  6  	     t1.n1,
  7  	     h1.id
  8  from
  9  	     t1,
 10  	     pt_hash h1
 11  where
 12  	     t1.id between 2 and 3
 13  and     h1.grp = t1.n1
 14  and     h1.id <= 100
 15  ;
         2        100

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  3rm369uarustm, child number 0
-------------------------------------
select         /*+                 leading(t1 h1)
use_hash(h1)    no_swap_join_inputs(h1)         */         t1.n1,
  h1.id from         t1,         pt_hash h1 where         t1.id between
2 and 3 and     h1.grp = t1.n1 and     h1.id <= 100

Plan hash value: 3022732788

------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |      1 |        |  4617 (100)|       |       |      1 |00:00:00.24 |    8377 |   8135 |       |       |          |
|*  1 |  HASH JOIN                  |         |      1 |    164 |  4617   (1)|       |       |      1 |00:00:00.24 |    8377 |   8135 |  2440K|  2440K|  771K (0)|
|   2 |   PART JOIN FILTER CREATE   | :BF0000 |      1 |      2 |     3   (0)|       |       |      2 |00:00:00.01 |       3 |      0 |       |       |          |
|*  3 |    TABLE ACCESS FULL        | T1      |      1 |      2 |     3   (0)|       |       |      2 |00:00:00.01 |       3 |      0 |       |       |          |
|   4 |   PARTITION HASH JOIN-FILTER|         |      1 |   4929 |  4614   (1)|:BF0000|:BF0000|     51 |00:00:00.24 |    8374 |   8135 |       |       |          |
|*  5 |    TABLE ACCESS FULL        | PT_HASH |      2 |   4929 |  4614   (1)|:BF0000|:BF0000|     51 |00:00:00.24 |    8374 |   8135 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "H1"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "H1"@"SEL$1")
      USE_HASH(@"SEL$1" "H1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("H1"."GRP"="T1"."N1")
   3 - filter(("T1"."ID">=2 AND "T1"."ID"<=3))
   5 - filter("H1"."ID"<=100)

Note
-----
   - dynamic sampling used for this statement (level=2)


48 rows selected.

SQL> 

The hash join produces a bloom filter thus reducing the partition access to only the two partitions we need to generate the results. Converting this same select statement into an insert changes the plan:


SQL> insert into target(n1, id)
  2  select
  3  	     /*+
  4  		     ordered
  5  		     use_hash(h1)    no_swap_join_inputs(h1)
  6  	     */
  7  	     t1.id,
  8  	     h1.id
  9  from
 10  	     t1,
 11  	     pt_hash h1
 12  where
 13  	     t1.id between 2 and 3
 14  and     h1.grp = t1.n1
 15  and     h1.id <= 100
 16  ;

1 row created.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  g693aju7gwm9n, child number 0
-------------------------------------
insert into target(n1, id) select         /*+                 ordered
              use_hash(h1)    no_swap_join_inputs(h1)         */
 t1.id,         h1.id from         t1,         pt_hash h1 where
t1.id between 2 and 3 and     h1.grp = t1.n1 and     h1.id <= 100

Plan hash value: 3736574952

---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |         |      1 |        |  4617 (100)|       |       |      0 |00:00:00.22 |   16713 |   8084 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL |         |      1 |        |            |       |       |      0 |00:00:00.22 |   16713 |   8084 |       |       |          |
|*  2 |   HASH JOIN              |         |      1 |    164 |  4617   (1)|       |       |      1 |00:00:00.22 |   16682 |   8084 |  1969K|  1969K|  494K (0)|
|*  3 |    TABLE ACCESS FULL     | T1      |      1 |      2 |     3   (0)|       |       |      2 |00:00:00.01 |       3 |      0 |       |       |          |
|   4 |    PARTITION HASH ALL    |         |      1 |   4929 |  4614   (1)|     1 |     4 |    100 |00:00:00.22 |   16679 |   8084 |       |       |          |
|*  5 |     TABLE ACCESS FULL    | PT_HASH |      4 |   4929 |  4614   (1)|     1 |     4 |    100 |00:00:00.22 |   16679 |   8084 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"INS$1")
      FULL(@"INS$1" "TARGET"@"INS$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "H1"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "H1"@"SEL$1")
      USE_HASH(@"SEL$1" "H1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("H1"."GRP"="T1"."N1")
   3 - filter(("T1"."ID">=2 AND "T1"."ID"<=3))
   5 - filter("H1"."ID"<=100)

Note
-----
   - dynamic sampling used for this statement (level=2)


50 rows selected.

SQL> 

The bloom filter mysteriously vanishes, and all four partitions are accessed to produce the same results as the stand-alone query produced earlier. Proving this isn’t ‘one-off’ behaviour witn 11.2.0.4 the transaction is rolled back and the same insert statement is run a second time:


SQL> rollback;

Rollback complete.

SQL> 
SQL> insert /*+ append */ into target(n1, id)
  2  select
  3  	     /*+
  4  		     ordered
  5  		     use_hash(h1)    no_swap_join_inputs(h1)
  6  	     */
  7  	     t1.id,
  8  	     h1.id
  9  from
 10  	     t1,
 11  	     pt_hash h1
 12  where
 13  	     t1.id between 2 and 3
 14  and     h1.grp = t1.n1
 15  and     h1.id <= 100
 16  ;

1 row created.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  10jb8cf3tx39p, child number 0
-------------------------------------
insert /*+ append */ into target(n1, id) select         /*+
    ordered                 use_hash(h1)    no_swap_join_inputs(h1)
    */         t1.id,         h1.id from         t1,         pt_hash h1
where         t1.id between 2 and 3 and     h1.grp = t1.n1 and
h1.id <= 100

Plan hash value: 3504255781

-----------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name    | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Writes |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT     |         |      1 |        |  4617 (100)|       |       |      0 |00:00:00.02 |   16716 |      1 |       |       |          |
|   1 |  LOAD AS SELECT      |         |      1 |        |            |       |       |      0 |00:00:00.02 |   16716 |      1 |   266K|   266K|  266K (0)|
|*  2 |   HASH JOIN          |         |      1 |    164 |  4617   (1)|       |       |      1 |00:00:00.02 |   16682 |      0 |  1969K|  1969K|  501K (0)|
|*  3 |    TABLE ACCESS FULL | T1      |      1 |      2 |     3   (0)|       |       |      2 |00:00:00.01 |       3 |      0 |       |       |          |
|   4 |    PARTITION HASH ALL|         |      1 |   4929 |  4614   (1)|     1 |     4 |    100 |00:00:00.02 |   16679 |      0 |       |       |          |
|*  5 |     TABLE ACCESS FULL| PT_HASH |      4 |   4929 |  4614   (1)|     1 |     4 |    100 |00:00:00.02 |   16679 |      0 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"INS$1")
      FULL(@"INS$1" "TARGET"@"INS$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "H1"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "H1"@"SEL$1")
      USE_HASH(@"SEL$1" "H1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("H1"."GRP"="T1"."N1")
   3 - filter(("T1"."ID">=2 AND "T1"."ID"<=3))
   5 - filter("H1"."ID"<=100)

Note
-----
   - dynamic sampling used for this statement (level=2)


51 rows selected.

SQL> 

Again the bloom filter vanishes; this is exactly the same plan generated by the previous run of this insert statement.

Moving to Oracle 12.1.0.2 some differing results are obtained. We set up the same tables and data as in the 11.2.0.4 run:


SQL> create table pt_hash (
  2  	     id,
  3  	     grp,
  4  	     small_vc,
  5  	     padding
  6  )
  7  nologging
  8  pctfree 90 pctused 10
  9  partition by hash(grp)
 10  (
 11  	     partition p1,
 12  	     partition p2,
 13  	     partition p3,
 14  	     partition p4
 15  )
 16  as
 17  with generator as (
 18  	     select  --+ materialize
 19  		     rownum id
 20  	     from dual
 21  	     connect by
 22  		     level <= 1e4
 23  )
 24  select
 25  	     rownum			     id,
 26  	     trunc(rownum/50)		     grp,
 27  	     to_char(trunc(rownum/20))	     small_vc,
 28  	     rpad('x',100)		     padding
 29  from
 30  	     generator, generator
 31  where
 32  	     rownum <= 1e5
 33  ;

Table created.

SQL> 
SQL> create table t1
  2  as
  3  select
  4  	     rownum  id,
  5  	     rownum  n1,
  6  	     lpad(rownum,10) v1,
  7  	     rpad('x',100) padding
  8  from
  9  	     dual
 10  connect by
 11  	     rownum <= 4
 12  ;

Table created.

SQL> 
SQL> create table target (
  2  	     n1      number,
  3  	     id      number
  4  )
  5  ;

Table created.

SQL> 
SQL> set serveroutput off linesize 168 pagesize 0
SQL> alter session set statistics_level = all;

Session altered.

SQL> alter session set tracefile_identifier='dml_bloom';

Session altered.

SQL> alter session set events '10053 trace name context forever, level 1';

Session altered.

SQL> 
SQL> select
  2  	     /*+
  3  		     leading(t1 h1)
  4  		     use_hash(h1)    no_swap_join_inputs(h1)
  5  	     */
  6  	     t1.n1,
  7  	     h1.id
  8  from
  9  	     t1,
 10  	     pt_hash h1
 11  where
 12  	     t1.id between 2 and 3
 13  and     h1.grp = t1.n1
 14  and     h1.id <= 100
 15  ;
         2        100

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  3rm369uarustm, child number 0
-------------------------------------
select         /*+                 leading(t1 h1)
use_hash(h1)    no_swap_join_inputs(h1)         */         t1.n1,
  h1.id from         t1,         pt_hash h1 where         t1.id between
2 and 3 and     h1.grp = t1.n1 and     h1.id <= 100

Plan hash value: 3022732788

------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |      1 |        |  4531 (100)|       |       |      1 |00:00:00.20 |    8385 |   8279 |       |       |          |
|*  1 |  HASH JOIN                  |         |      1 |    155 |  4531   (1)|       |       |      1 |00:00:00.20 |    8385 |   8279 |  2440K|  2440K|  730K (0)|
|   2 |   PART JOIN FILTER CREATE   | :BF0000 |      1 |      2 |     2   (0)|       |       |      2 |00:00:00.01 |       3 |      0 |       |       |          |
|*  3 |    TABLE ACCESS FULL        | T1      |      1 |      2 |     2   (0)|       |       |      2 |00:00:00.01 |       3 |      0 |       |       |          |
|   4 |   PARTITION HASH JOIN-FILTER|         |      1 |   5417 |  4529   (1)|:BF0000|:BF0000|     51 |00:00:00.20 |    8382 |   8279 |       |       |          |
|*  5 |    TABLE ACCESS FULL        | PT_HASH |      2 |   5417 |  4529   (1)|:BF0000|:BF0000|     51 |00:00:00.20 |    8382 |   8279 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "H1"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "H1"@"SEL$1")
      USE_HASH(@"SEL$1" "H1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("H1"."GRP"="T1"."N1")
   3 - filter(("T1"."ID">=2 AND "T1"."ID"<=3))
   5 - filter("H1"."ID"<=100)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


48 rows selected.

SQL> 

As in 11.2.0.4 the select statement produces a bloom filter; a 10053 trace on the session shows this for the select and the two insert statement runs:


Consider using bloom filter between T1[T1] and H1[PT_HASH] with ??
kkoBloomFilter: join (lcdn:2 rcdn:5417 jcdn:155 limit:5417)
kkopqSingleJoinBloomNdv:Compute bloom ndv for lfro:T1[T1] and rfro:H1[PT_HASH] swap:no
kkopqSingleJoinBloomNdv: predCnt:#1 col1:(bndv:70 ndv:70) and col2:(bndv:4 ndv:2) creatorNDV:4.0 userNDV:70.0
kkopqComputeBloomNdv: predCnt:1 creatorNdv:4.0 userNdv:70.0 singleTblPred:yes
kkoBloomFilter: join ndv:2 reduction:0.000369 (limit:0.500000)  accepted
Enumerating distribution method (advanced)

It’s time to convert that select to an insert statement as see what 12.1.0.2 does with it. The first run produces the same plan as in 11.2.0.4:


SQL> insert into target(n1, id)
  2  select
  3  	     /*+
  4  		     ordered
  5  		     use_hash(h1)    no_swap_join_inputs(h1)
  6  	     */
  7  	     t1.id,
  8  	     h1.id
  9  from
 10  	     t1,
 11  	     pt_hash h1
 12  where
 13  	     t1.id between 2 and 3
 14  and     h1.grp = t1.n1
 15  and     h1.id <= 100
 16  ;

1 row created.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  g693aju7gwm9n, child number 0
-------------------------------------
insert into target(n1, id) select         /*+                 ordered
              use_hash(h1)    no_swap_join_inputs(h1)         */
 t1.id,         h1.id from         t1,         pt_hash h1 where
t1.id between 2 and 3 and     h1.grp = t1.n1 and     h1.id <= 100

Plan hash value: 3736574952

---------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name    | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT         |         |      1 |        |  4531 (100)|       |       |      0 |00:00:00.20 |   16706 |   8218 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL | TARGET  |      1 |        |            |       |       |      0 |00:00:00.20 |   16706 |   8218 |       |       |          |
|*  2 |   HASH JOIN              |         |      1 |    155 |  4531   (1)|       |       |      1 |00:00:00.20 |   16698 |   8218 |  1969K|  1969K|  509K (0)|
|*  3 |    TABLE ACCESS FULL     | T1      |      1 |      2 |     2   (0)|       |       |      2 |00:00:00.01 |       3 |      0 |       |       |          |
|   4 |    PARTITION HASH ALL    |         |      1 |   5417 |  4529   (1)|     1 |     4 |    100 |00:00:00.20 |   16695 |   8218 |       |       |          |
|*  5 |     TABLE ACCESS FULL    | PT_HASH |      4 |   5417 |  4529   (1)|     1 |     4 |    100 |00:00:00.20 |   16695 |   8218 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"INS$1")
      FULL(@"INS$1" "TARGET"@"INS$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "H1"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "H1"@"SEL$1")
      USE_HASH(@"SEL$1" "H1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("H1"."GRP"="T1"."N1")
   3 - filter(("T1"."ID">=2 AND "T1"."ID"<=3))
   5 - filter("H1"."ID"<=100)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


50 rows selected.

SQL> 

Remember that the bloom section appears in every statement executed by this session; the interesting part is the bloom filter is not used in the first run of the insert. Rolling back the initial insert and running the exact same statement again produces different results:


SQL> rollback;

Rollback complete.

SQL> 
SQL> insert /*+ append */ into target(n1, id)
  2  select
  3  	     /*+
  4  		     ordered
  5  		     use_hash(h1)    no_swap_join_inputs(h1)
  6  	     */
  7  	     t1.id,
  8  	     h1.id
  9  from
 10  	     t1,
 11  	     pt_hash h1
 12  where
 13  	     t1.id between 2 and 3
 14  and     h1.grp = t1.n1
 15  and     h1.id <= 100
 16  ;

1 row created.

SQL> 
SQL> select * from table(dbms_xplan.display_cursor(null,null,'outline allstats last cost partition'));
SQL_ID  10jb8cf3tx39p, child number 0
-------------------------------------
insert /*+ append */ into target(n1, id) select         /*+
    ordered                 use_hash(h1)    no_swap_join_inputs(h1)
    */         t1.id,         h1.id from         t1,         pt_hash h1
where         t1.id between 2 and 3 and     h1.grp = t1.n1 and
h1.id <= 100

Plan hash value: 3662684207

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Starts | E-Rows | Cost (%CPU)| Pstart| Pstop | A-Rows |   A-Time   | Buffers | Writes |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT             |         |      1 |        |  4531 (100)|       |       |      0 |00:00:00.01 |    8387 |      1 |       |       |          |
|   1 |  LOAD AS SELECT              |         |      1 |        |            |       |       |      0 |00:00:00.01 |    8387 |      1 |  1036K|  1036K| 1036K (0)|
|*  2 |   HASH JOIN                  |         |      1 |    155 |  4531   (1)|       |       |      1 |00:00:00.01 |    8384 |      0 |  1969K|  1969K|  507K (0)|
|   3 |    PART JOIN FILTER CREATE   | :BF0000 |      1 |      2 |     2   (0)|       |       |      2 |00:00:00.01 |       3 |      0 |       |       |          |
|*  4 |     TABLE ACCESS FULL        | T1      |      1 |      2 |     2   (0)|       |       |      2 |00:00:00.01 |       3 |      0 |       |       |          |
|   5 |    PARTITION HASH JOIN-FILTER|         |      1 |   5417 |  4529   (1)|:BF0000|:BF0000|     51 |00:00:00.01 |    8381 |      0 |       |       |          |
|*  6 |     TABLE ACCESS FULL        | PT_HASH |      2 |   5417 |  4529   (1)|:BF0000|:BF0000|     51 |00:00:00.01 |    8381 |      0 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"INS$1")
      FULL(@"INS$1" "TARGET"@"INS$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      FULL(@"SEL$1" "H1"@"SEL$1")
      LEADING(@"SEL$1" "T1"@"SEL$1" "H1"@"SEL$1")
      USE_HASH(@"SEL$1" "H1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("H1"."GRP"="T1"."N1")
   4 - filter(("T1"."ID">=2 AND "T1"."ID"<=3))
   6 - filter("H1"."ID"<=100)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


52 rows selected.

SQL> 

The second insert returns to the original plan and cardinalities, and restores the use of the bloom filter. As mentioned earlier Bug 14325392 appears to have been addressed, providing at least one instance when the bloom filter returns to use. That both inserts didn’t use the bloom filter may indicate that the optimizer is still having issues deciding to use it if the conditions aren’t ‘right’. But the fact remains that Oracle 12.1.0.2 will use bloom filters in when a select statement that uses them is converted to an ‘insert into … as select …’ construct. We may have to wait for 12.2 to finally be available outside of the cloud to know if the problem has been completely resolved.

Knowledge is our armor as DBAs, and the more we have the better prepared we are to do ‘battle’ with optimizer bugs and occasional errant behavior. Knowing what to expect is half the battle; the more we know, the more likely we are to be able to address issues that may arise.

Just because it’s ‘correct’ doesn’t make it ‘right’.

July 21, 2016

On An Average Day

Filed under: General — dfitzjarrell @ 09:18

"There are other advantages," continued the child. "For instance, if one rat were cornered by nine cats,
on the average, each cat would be ten percent rat and the rat would be ninety percent cat. If you happened
to be a rat, you can see how much nicer it would make things."
-- Norton Juster, The Phantom Tollbooth

Recently an interesting question appeared on “Ask Tom” with regard to the ‘average’ between two dates. Which is not saying it’s the ONLY interesting question on “Ask Tom” but it did catch my attention, notably the topic of the discussion. Why, on earth, would anyone want the ‘average’ of two dates? Digging a bit deeper it became obvious what the original poster wanted to accomplish. Let’s take that trip and see where it leads.

The average person doesn’t usually ask for the average between two dates; averages are computed on salaries, on distance, on temperatures, on volumes, on numbers, in general, not dates. If we look at what the ‘average’ would be between two dates, however, we find it’s the date smack-dab in the middle of the two dates supplied. Connor McDonald provided this solution to the question:


SQL> with t as
  2     ( select
  3         date '2015-09-01' d1,
  4         date '2015-12-07' d2
  5      from dual )
  6  select
  7    least(d1,d2) + abs(d1-d2)/2
  8  from t;

LEAST(D1,
---------
19-OCT-15

SQL>

which provides the correct answer given the date range specified. It’s not an ‘average’, really, it’s more like the interval middle value, but it can serve a purpose. Let’s say that Millicent Muggwumpp, super-star billing clerk, needs to determine the middle date for a billing cycle so invoices can be generated and sent to the various customers. The solution Connor provided (with some modification) can provide that information. Let’s ‘hack away’ at Connor’s code and see what we end up with:


SQL> with interval_middle as
  2     ( select
  3         to_date('&&1', 'RRRR-MM-DD') d1,
  4         to_date('&&2', 'RRRR-MM-DD') d2
  5      from dual )
  6  select
  7    least(d1,d2) + abs(d1-d2)/2
  8  from interval_middle;

LEAST(D1,
---------
15-JUN-16

SQL>

Looking at the changes it should be noted that replaceable parameters have been substituted for the hard-coded dates and a more robust explicit date conversion has been written, including the expected date string format. Millicent can now use this script and pass to it any valid date range and return the middle date of that range. [SQL*Plus allows us to shut off the parameter replacement message showing the original code and the current version with the parameters supplied, but we could easily allow that to be displayed:


SQL> with interval_middle as
  2     ( select
  3         to_date('&&1', 'RRRR-MM-DD') d1,
  4         to_date('&&2', 'RRRR-MM-DD') d2
  5      from dual )
  6  select
  7    least(d1,d2) + abs(d1-d2)/2
  8  from interval_middle;
old   3:        to_date('&&1', 'RRRR-MM-DD') d1,
new   3:        to_date('2016-06-01', 'RRRR-MM-DD') d1,
old   4:        to_date('&&2', 'RRRR-MM-DD') d2
new   4:        to_date('2016-06-29', 'RRRR-MM-DD') d2

LEAST(D1,
---------
15-JUN-16

SQL>

if confirmation of the passed-in date values is required.]

Of course Connor’s answer wasn’t the only one provided as Stew Ashton questioned the need for LEAST() and ABS() in the query:


No matter what date you start with, you either subtract a positive number from the later date or subtract a negative number from the earlier date.


alter session set nls_date_format='YYYY-MM-DD HH24:MI';

with t as (
  select date '2016-07-01' d1, date '2016-07-18' d2
  from dual
)
select d1 - (d1 - d2) / 2 result1,
       d2 - (d2 - d1) / 2 result2
from t;

RESULT1          RESULT2
---------------- ----------------
2016-07-09 12:00 2016-07-09 12:00

Modifying this code in the same manner as Connor’s was produces:


SQL> with t as (
  2    select to_date('&&1', 'RRRR-MM-DD') d1, to_date('&&2', 'RRRR-MM-DD') d2
  3    from dual
  4  )
  5  select d1 - (d1 - d2) / 2 result1,
  6         d2 - (d2 - d1) / 2 result2
  7  from t;

RESULT1          RESULT2
---------------- ----------------
2016-06-15 00:00 2016-06-15 00:00

SQL>

The same result is produced by both queries, so LEAST() and ABS() aren’t absolutely necessary. Choose the approach that provides a sufficient level of comfort to you.

Averaging dates may not be an obvious operation but it can be useful, if in a limited way. We all look at dates in unusual ways on occasion so what may seem illogical to you, at least on the face of it, may be completely and utterly logical to someone else. Who knows, Aunt Edna may ask you to find the date between Uncle Milton’s and Aunt Haggatha’s birthdays because she wants to ship a plaster cat to Guatemala. Stranger things have happened.

And that’s about average.

July 14, 2016

Size Does Matter

Filed under: General — dfitzjarrell @ 11:07

"You see, to tall men I'm a midget, and to short men I'm a giant;
to the skinny ones I'm a fat man, and to the fat ones I'm a thin man."
-- Norton Juster, The Phantom Tollbooth

Someone in one of the Oracle forums asked the following question:


How to estimate datafile size of tablespace when it is almost full? Could someone please explain?

which is ambiguous to say the least, since no information was provided by the person posting as to system configuration, database size, transaction volume, etc. Several responses later the original poster replied:


My question is how much space to allocate to datafile when tablespace of it reaches almost max size?

which, on the face of it, seems to be a more ‘answerable’ question. Information is still missing, however, so it’s difficult to provide any real solution to the person who asked. Since Oracle doesn’t really monitor tablespace growth –it does monitor file size and space within the datafiles but those are only part of the required information — it’s up to the DBA (which, presumably, is you) to monitor such growth. It isn’t difficult, but it does take some preparation before any results can be returned. Let’s look at one way tablespace monitoring can be achieved.

The following script sets up a table and a sequence that will be populated by another script running as a scheduled job or task on the database server:


create table tablespace_monitor(
	mon_run	number,
	run_dt	date,
	tablespace_name varchar2(35),
	available  number,
	used	number,
	free	number,
	pct_free number(9,2))
tablespace users;

create sequence tab_mon_seq
start with 1 increment by 1 nocycle nomaxvalue nocache;

Created when connected as SYS as SYSDBA this will hold the necessary data to monitor tablespaces and their growth. This table will be populated by the following script:


insert into tablespace_monitor (tablespace_name, available, used, free, pct_free)
(
select tname_a tablespace_name, 
       ttl_avail available, 
       (ttl_avail - ttl_free) used, 
       ttl_free free, 
       round((ttl_free/ttl_avail)*100,2) "PCT FREE"
from
 (select tablespace_name tname_a, sum(user_bytes) ttl_avail
 from dba_data_files
 group by tablespace_name) avail left join
 (select tablespace_name tname_f, sum(bytes) ttl_free
 from dba_free_space
 group by tablespace_name) free on tname_f = tname_a
union
select tablespace_name, 
       sum(bytes_used+bytes_free) ttl_avail,
       sum(bytes_used),
       sum(bytes_free),
       round((sum(bytes_free)/sum(bytes_used+bytes_free))*100,2) pct_free
from v$temp_space_header
group by tablespace_name);

commit;

column next_seq noprint
select tab_mon_seq.nextval next_seq from dual;

update tablespace_monitor
set mon_run = tab_mon_seq.currval,
    run_dt = sysdate
where mon_run is null;

commit;

The provided scripts have been tested on 12.1.0.2 and work as expected. The second script is scheduled to run once daily to provide daily tablespace size data, which will be used by the next script to generate a report:


set linesize 200 numwidth 13 pagesize 40
column "CONSUMED/FREED (+)" format a20
column "TABLESPACE NAME" format a15

select m1.tablespace_name "TABLESPACE NAME", m2.free "BYTES FREE", m2.pct_free "PCT FREE",
      case when trunc((m1.free - m2.free)/(1024*1024)) > 0 then lpad(to_char(round((m1.free - m2.free)/(1024*1024),2)),20)
            when trunc((m1.free - m2.free)/(1024)) between 1 and 999 then lpad(to_char(round((m1.free - m2.free)/(1024),2)),20)
           when trunc((m1.free - m2.free)/(1024*1024))  0 then 'MB'
            when abs(trunc((m1.free - m2.free)/(1024))) between 1 and 999 then 'KB'
            else null end unit,
       case when trunc((((m1.free - m2.free)/(1024*1024))/((m2.run_dt - m1.run_dt)*1440))) > 0 then round((((m1.free - m2.free)/(1024*1024))/((m2.run_dt - m1.run_dt)*1440)),2)
            when trunc((((m1.free - m2.free)/(1024))/((m2.run_dt - m1.run_dt)*1440))) between 1 and 999 then round((((m1.free - m2.free)/(1024))/((m2.run_dt - m1.run_dt)*1440)),2)
            when trunc((((m1.free - m2.free)/(1024*1024))/((m2.run_dt - m1.run_dt)*1440)))  0 then 'MB'
            when abs(trunc((((m1.free - m2.free)/(1024))/((m2.run_dt - m1.run_dt)*1440)))) between 1 and 999 then 'KB'
            else null end "UNIT PER MIN",
       round((m2.run_dt - m1.run_dt)*1440,2) "WINDOW IN MIN",
       to_char(m1.run_dt, 'RRRR-MM-DD HH24:MI:SS') "BEGIN DATE/TIME",
       to_char(m2.run_dt, 'RRRR-MM-DD HH24:MI:SS') "LAST RUN DATE/TIME"
from tablespace_monitor m1, tablespace_monitor m2
where m2.mon_run = (select max(mon_run) from tablespace_monitor)
and m1.mon_run = (select min(mon_run) from tablespace_monitor)
and m2.tablespace_name = m1.tablespace_name

spool &1
/
spool off

set linesize 80

The script takes the most recent data then takes the previous snapshot data and generates the difference, showing the growth in the last 24 hours based on running this script once per day. Of course it can be run more than once per day, but the report will only be generated on the most recent data and the run prior to that; if it is desired to run the population script more than once per day it’s recommended that the report also be run more than once per day, to report on all of the generated values stored in the tablespace_monitor table.

The report looks like this:


TABLESPACE NAME    BYTES FREE      PCT FREE CONSUMED/FREED (+)   UN   AVG PER MIN UN WINDOW IN MIN BEGIN DATE/TIME     LAST RUN DATE/TIME
--------------- ------------- ------------- -------------------- -- ------------- -- ------------- ------------------- -------------------
INDX                103809024           100                    0                0             9.87 2016-07-14 10:00:20 2016-07-14 10:10:12
SYSAUX               62193664          7.61                 2.19 MB        227.03 KB          9.87 2016-07-14 10:00:20 2016-07-14 10:10:12
SYSTEM               62849024          7.06                    0                0             9.87 2016-07-14 10:00:20 2016-07-14 10:10:12
TEMP                 42991616         68.33                    0                0             9.87 2016-07-14 10:00:20 2016-07-14 10:10:12
UNDOTBS1           1986789376         97.47                   23 MB          2.33 MB          9.87 2016-07-14 10:00:20 2016-07-14 10:10:12
USERS             33880604672         98.69               88 (+) MB          8.92 MB          9.87 2016-07-14 10:00:20 2016-07-14 10:10:12

Over a period of time growth trends can be determined for each tablespace, and space additions can be planned accordingly. Since it’s not likely that the grown patterns will be linear it’s a good idea to make note of reports of exceptionally large space usage and the period of time in which they occur. This should provide a window to plan on storage needs and give time to the System Admins and Storage Admins to prepare additional storage for use.

Storage cannot be managed without usage and growth data yet it seems that some DBAs try to do just that; a S.W.A.G CAN get you there on occasion but more often than not it’s simply glorified guesswork without a crystal ball or Ouija board handy. Regardless of how the end-users see them databases are part of a complex managed data entry and retrieval system that should not be left to chance or ‘sophisticated’ guesswork. Planning and monitoring are necessities in the IT environment; the database should not be left ‘out in the cold’ to suffer on its own.

Whether the provided scripts are used to monitor tablespace growth or you decide to write your own implementing tablespace monitoring should be at or near the top of every DBAs ‘to-do’ list. Keeping ‘tabs’ on how much space is being used in which tablespace can only help the DBA by ensuring that the users don’t run out of space because usage wasn’t tracked. Emergency space additions take time and interrupt the business flow; scheduling such additions can prevent the end-users from complaining and will keep management happy because lack of space didn’t stop the flow of transactions. Which means everybody wins.

It does seem that size is important.

July 6, 2016

He’s Making A List

Filed under: General — dfitzjarrell @ 10:16

"But I suppose there's a lot to see everywhere, if only you keep your eyes open."
-- Norton Juster, The Phantom Tollbooth

Recently a tweet by Chris Saxon (@chrisrsaxon) with respect to SQL, the language and implementation, started a list of contributions which should cause people to think a bit differently about how the language works and how queries should be structured. The list, found here, is by no stretch of the imagination complete but it is a very good place to begin re-thinking how to write SQL. Let’s look at a few of the listed items a bit further (since the list is likekly to grow not all of the list members can be covered).

Franck Pachot start the list with the following items:


				1. SQL is a declarative language
				2. It is compiled to procedural operations
				3. It operates on row sets

So what, exactly, is a “declarative language”? According to the accepted definition it is:


	... non-procedural and very high-level (4th generation). This means the programmer specifies what
        needs to be done rather than how to do it.

Unlike languages such as C/C++, Pascal, FORTRAN, BASIC, Java, Perl and Python, the SQL language simply requires that you decide what you want done and, in the case of Oracle, the optimizer decides HOW to get it done. That doesn’t mean that you can’t have some input to influence the ‘how’, it simply means you don’t need to think about it. Certainly hints are available to ‘nudge’ the optimizer in the direction more experienced coders may think is ‘best’, but there is no requirement to use them.

His second item is a reference to PL/SQL, the procedural component to Oracle’s SQL*Plus, the ADA-based language extension where procedures, functions, triggers and anonymous procedural blocks are written. Yes, you can go happily along with many tasks and not need PL/SQL but it does come in handy when repeated tasks are necessary or when intermediate feedback (well, sort of) is desired. [Pipelined functions do provide immediate feedback during execution, but they are a special case.] Many PL/SQL examples have been posted here, so look around and they can be found.

His third item was addressed by Jonathan Lewis in an Oracle Forums response; let’s look at the problem that initiated the set-based response and Jonathan Lewis’ offering.

A query and part of the execution plan were posted with the poster asking how to get rid of the table scan. The table has a primary key index so, on the face of it, it is interesting that the primary key wasn’t used.

Looking at the definition for the ‘problem’ table (LF_HOTEL_TEMP) we find only two columns, both of which are in the primary key definition. The tables were created:


SQL> create table LF_HOTEL_TEMP
  2  (
  3    HOTEL_CODE VARCHAR2(4000),
  4    SERVICE_ID NUMBER(3)
  5  )
  6  /

Table created.

SQL>
SQL> create table LF_TS_ROOMTYPE_PROPERTIES
  2  (
  3    HOTEL_CODE    VARCHAR2(20),
  4    ROOM_TYPE     VARCHAR2(500),
  5    BOARD_TYPE    VARCHAR2(500),
  6    ROOM_AMT      FLOAT,
  7    SERVICE_ID    NUMBER,
  8    CURRENCY_CODE VARCHAR2(10)
  9  )
 10  /

Table created.

SQL>
SQL> create table LF_HB_ROOMTYPE_PROPERTIES
  2  (
  3    HOTEL_CODE    VARCHAR2(20),
  4    ROOM_TYPE     VARCHAR2(500),
  5    BOARD_TYPE    VARCHAR2(500),
  6    ROOM_AMT      FLOAT,
  7    SERVICE_ID    NUMBER,
  8    CURRENCY_CODE VARCHAR2(10)
  9  )
 10  /

Table created.

SQL>
SQL> create table LF_GTA_ROOMTYPE_PROPERTIES
  2  (
  3    HOTEL_CODE    VARCHAR2(20),
  4    ROOM_TYPE     VARCHAR2(500),
  5    BOARD_TYPE    VARCHAR2(500),
  6    ROOM_AMT      FLOAT,
  7    SERVICE_ID    NUMBER,
  8    CURRENCY_CODE VARCHAR2(10)
  9  )
 10  /

Table created.

SQL>

and then the primary key and additional indexes were created:


SQL> alter table lf_hotel_temp
  2    add constraint lf_hotel_temp_PK primary key (HOTEL_CODE,service_id)
  3  /

Table altered.

SQL>
SQL> create  index LF_hb_roomtype_prop_IDX on lf_hb_roomtype_properties (HOTEL_CODE)
  2  /

Index created.

SQL>
SQL> create  index LF_ts_roomtype_prop_IDX on lf_ts_roomtype_properties (HOTEL_CODE)
  2  /

Index created.

SQL>
SQL> create index LF_gta_roomtype_prop_IDX on lf_gta_roomtype_properties (HOTEL_CODE)
  2  /

Index created.

SQL>

The original poster reported 278,000 rows in the LF_HOTEL_TEMP table so data was generated to reproduce that nunber of rows. Next the remaining tables were populated so that none of the three remaining tables shared any data:


SQL> begin
  2  	     for i in 1..278000 loop
  3  		     insert into lf_hotel_temp
  4  		     values(i, mod(i,999)+1);
  5  	     end loop;
  6
  7  	     commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL>
SQL> insert into lf_ts_roomtype_properties
  2    (hotel_code)
  3  select hotel_code from lf_hotel_temp
  4  where service_id < 511;

142058 rows created.

SQL> insert into lf_ts_roomtype_properties
  2    (hotel_code)
  3  select hotel_code from lf_hotel_temp
  4  where service_id between 313 and 642;

91740 rows created.

SQL> insert into lf_gta_roomtype_properties
  2    (hotel_code)
  3  select hotel_code from lf_hotel_temp
  4  where service_id between 271 and 823;

153743 rows created.

SQL> insert into lf_hb_roomtype_properties
  2    (hotel_code)
  3  select hotel_code from lf_hotel_temp
  4  where service_id between 571 and 999;

119262 rows created.

SQL> COMMIT;

Commit complete.

SQL>

Let’s look at the original query and its execution plan:


SQL> set autotrace on linesize 140
SQL>
SQL> -- original query
SQL> SELECT a.hotel_code
  2    FROM lf_hotel_temp a
  3  WHERE a.service_id = : p_service_id
  4  	    AND (NOT EXISTS (SELECT *
  5  	       FROM lf_ts_roomtype_properties b
  6  	      WHERE a.hotel_code = b.hotel_code)
  7  	     or NOT EXISTS (SELECT *
  8  	       FROM lf_gta_roomtype_properties b
  9  	      WHERE a.hotel_code = b.hotel_code)
 10  	    or	NOT EXISTS (SELECT *
 11  	       FROM lf_hb_roomtype_properties b
 12  	      WHERE a.hotel_code = b.hotel_code));

HOTEL_CODE
-------------------------------------------------------------------
1998
999
5994
...
243756
235764
238761

278 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4111332730

-----------------------------------------------------------------------------------------------
| Id  | Operation          | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                          |     3 |    33 |   512   (4)| 00:00:01 |
|*  1 |  FILTER            |                          |       |       |            |          |
|*  2 |   TABLE ACCESS FULL| LF_HOTEL_TEMP            |   278 |  3058 |    94  (16)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | LF_TS_ROOMTYPE_PROP_IDX  |     1 |     7 |     3   (0)| 00:00:01 |
|*  4 |   INDEX RANGE SCAN | LF_GTA_ROOMTYPE_PROP_IDX |     1 |     7 |     1   (0)| 00:00:01 |
|*  5 |   INDEX RANGE SCAN | LF_HB_ROOMTYPE_PROP_IDX  |     1 |     7 |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "LF_TS_ROOMTYPE_PROPERTIES" "B" WHERE
              "B"."HOTEL_CODE"=:B1) OR  NOT EXISTS (SELECT 0 FROM "LF_GTA_ROOMTYPE_PROPERTIES" "B"
              WHERE "B"."HOTEL_CODE"=:B2) OR  NOT EXISTS (SELECT 0 FROM "LF_HB_ROOMTYPE_PROPERTIES"
              "B" WHERE "B"."HOTEL_CODE"=:B3))
   2 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
   3 - access("B"."HOTEL_CODE"=:B1)
   4 - access("B"."HOTEL_CODE"=:B1)
   5 - access("B"."HOTEL_CODE"=:B1)



Statistics
----------------------------------------------------------
         11  recursive calls
          0  db block gets
       2077  consistent gets
        872  physical reads
          0  redo size
       4119  bytes sent via SQL*Net to client
        570  bytes received via SQL*Net from client
         20  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        278  rows processed


SQL>

All but LF_HOTEL_TEMP use the indexes to speed access to the data; this may be due to the fact that LF_HOTEL_TEMP is a two-column table and the priomary key index will be larger than the table since it contains the table data and a rowid for each row. The main idea illustrated by this problem involves being able to find a logically equivalent way of writing the SQL. Jonathan Lewis decided that set operations might produce a ‘better’ plan and re-wrote the query as shown below, producing a plan accessing LF_HOTEL_TEMP by the primary key index:


SQL> -- JL solution
SQL> var p_service_id number
SQL> exec : p_service_id := 1

PL/SQL procedure successfully completed.

SQL>
SQL> select  /*+ dynamic_sampling(0) */
  2  	     hotel_code
  3  from    lf_hotel_temp
  4  where   service_id = :p_service_id
  5  minus   (
  6  	     select  hotel_code
  7  	     from    lf_ts_roomtype_properties
  8  	     where   hotel_code is not null
  9  	     intersect
 10  	     select  hotel_code
 11  	     from    lf_gta_roomtype_properties
 12  	     where   hotel_code is not null
 13  	     intersect
 14  	     select  hotel_code
 15  	     from    lf_hb_roomtype_properties
 16  	     where   hotel_code is not null
 17  	     )
 18  ;

HOTEL_CODE
---------------------------------------------------------------
100899
101898
102897
...
999
9990
99900


278 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 775735246

--------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                          |     1 |  2159 |     4 (100)| 00:00:01 |
|   1 |  MINUS                |                          |       |       |            |          |
|   2 |   SORT UNIQUE NOSORT  |                          |     1 |  2015 |     1 (100)| 00:00:01 |
|*  3 |    INDEX FULL SCAN    | LF_HOTEL_TEMP_PK         |     1 |  2015 |     0   (0)| 00:00:01 |
|   4 |   INTERSECTION        |                          |       |       |            |          |
|   5 |    INTERSECTION       |                          |       |       |            |          |
|   6 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     1 (100)| 00:00:01 |
|*  7 |      INDEX FULL SCAN  | LF_TS_ROOMTYPE_PROP_IDX  |     4 |    48 |     0   (0)| 00:00:01 |
|   8 |     SORT UNIQUE NOSORT|                          |     4 |    48 |     1 (100)| 00:00:01 |
|*  9 |      INDEX FULL SCAN  | LF_GTA_ROOMTYPE_PROP_IDX |     4 |    48 |     0   (0)| 00:00:01 |
|  10 |    SORT UNIQUE NOSORT |                          |     4 |    48 |     1 (100)| 00:00:01 |
|* 11 |     INDEX FULL SCAN   | LF_HB_ROOMTYPE_PROP_IDX  |     4 |    48 |     0   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
       filter("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
   7 - filter("HOTEL_CODE" IS NOT NULL)
   9 - filter("HOTEL_CODE" IS NOT NULL)
  11 - filter("HOTEL_CODE" IS NOT NULL)



Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       5996  consistent gets
       2416  physical reads
     234680  redo size
       4119  bytes sent via SQL*Net to client
        570  bytes received via SQL*Net from client
         20  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        278  rows processed

SQL>

Notice that using set-based processing improved the execution plan. Because of the first item on this list it’s easy to forget that SQL is a set-based implementation; sometimes it takes ‘interesting’ problems to lead us down the set-based pathway.

As the tweet thread progressed Jeff Smith (@thatjeffsmith) offered additional items that, not specific to SQL, are good to remember nonetheless:


				4. getting results back doesn't mean your code is right.
				5. learning never stops

Number 4 is an important lesson to remember; just because you get data doesn’t necessarily mean that data is right. Many times queries are written and run only to discover that the join conditions aren’t correct or are missing entirely, making the results useless. Always think through the process, and, if possible, manually generate a few rows to know what SHOULD be returned. Comparing those to the query results should show if the logic in the query is correct.

That learning never stops is one reason blogs are written and read; there’s always something new, different, or changed that makes ‘resting on one’s laurels’ nearly impossible, especially in IT. A day doesn’t go by that I don’t learn something, and, to be honest, if that day ever comes I’ll consider it wasted.

There are more items in that list that I won’t mention here but just because they aren’t commented on doesn’t make them any less important. Read through the list (the link provided should bring up the latest chain of tweets) and ponder those responses. It’s time to think about SQL anew, especially if some of these comments weren’t in your idea of what SQL is and what it does. Perspectives change, and ideas along with them, so never stop learning. The more you know, the more you can grow.

Santa isn’t the only one who makes lists, you know.

June 27, 2016

Examining The Remains

Filed under: General — dfitzjarrell @ 08:14

"Don't you know anything at all about numbers?"
"Well, I don't think they're very important," snapped Milo, too embarrassed to admit the truth.
"NOT IMPORTANT!" roared the Dodecahedron, turning red with fury.  "Could you have tea for two 
without the two, or three blind mice without the three? Would there be four corners of the earth 
if there weren't a four? And how would you sail the seven seas without a seven?"
"All I meant was..." began Milo, but the Dodecahedron, overcome with emotion and shouting furiously,
carried right on.
"If you had high hopes, how would you know how high they were? And did you know that narrow escapes 
come in all different widths? Would you travel the whole wide world without ever knowing how wide it
was? And how could you do anything at long last," he concluded, waving his arms over his head, "without
knowing how long the last was? Why, numbers are the most beautiful and valuable things in the world.
Just follow me and I'll show you." He turned on his heel and stalked off into the cave.
-- Norton Juster, The Phantom Tollbooth

It used to be that a database was, well, a database and it didn’t contain lots of interesting mathematical and analytic functions, just tables, data and basic string and number functions. That’s changed, mainly because of what work we now need to do and what results we need to report; enterprise level databases are both more complicated in terms of extended functionality and easier in terms of end-users generating with a single function values that were the result of complicated calculations done outside of the database. It’s a testament to Oracle that, for the most part, these functions provide correct results. Unforunately it appears that Oracle has ‘dropped the ball’ with what may be one of the simpler supplied functions, REMAINDER().

To start this odyssey a PL/SQL post was provided to illustrate the difference between MOD() and REMAINDER() and, unfortunately, the results from REMAINDER looked rather odd. I’ve provided the example from that post, modified to provide some context in the output. The ‘problem’ is that some of the remainders are negative which flies in the face of the mathematical definition of a remainder, which is:


In mathematics, the remainder is the amount "left over" after performing some computation.  In arithmetic, the remainder
is the integer "left over" after dividing one integer by another to produce an integer quotient (integer division).

The thought in my mind and that of every mathematician on this planet is that a remainder will never be negative. It’s what is ‘left over’ and you can’t ‘owe someone’ a remainder; besides being impossible according to the definition it’s also just not polite. For example, 15 divided by 2 has a remainder of 1, not -1. Let’s run the modified example and see what it returns:


SQL> 
SQL> --
SQL> -- REMAINDER doesn't return the correct remainder for
SQL> -- divisors of odd numbers that are powers of 2
SQL> --
SQL> -- The following example returns correct remainders
SQL> -- for all listed divisors of 15 except the
SQL> -- powers of 2, where the formula generates 16 as the
SQL> -- (n*X) value, thus displaying a negative remainder:
SQL> --
SQL> BEGIN
  2  	DBMS_OUTPUT.put_line ('MOD(15,2):'||MOD (15, 2));
  3  	DBMS_OUTPUT.put_line ('REMAINDER(15,2):'||REMAINDER (15, 2));
  4  	DBMS_OUTPUT.put_line ('MOD(15,3):'||MOD (15, 3));
  5  	DBMS_OUTPUT.put_line ('REMAINDER(15,3):'||REMAINDER (15, 3));
  6  	DBMS_OUTPUT.put_line ('MOD(15,4):'||MOD (15, 4));
  7  	DBMS_OUTPUT.put_line ('REMAINDER(15,4):'||REMAINDER (15, 4));
  8  	DBMS_OUTPUT.put_line ('MOD(15,5):'||MOD (15, 5));
  9  	DBMS_OUTPUT.put_line ('REMAINDER(15,5):'||REMAINDER (15, 5));
 10  	DBMS_OUTPUT.put_line ('MOD(15,6):'||MOD (15, 6));
 11  	DBMS_OUTPUT.put_line ('REMAINDER(15,6):'||REMAINDER (15, 6));
 12  	DBMS_OUTPUT.put_line ('MOD(15,7):'||MOD (15, 7));
 13  	DBMS_OUTPUT.put_line ('REMAINDER(15,7):'||REMAINDER (15, 7));
 14  	DBMS_OUTPUT.put_line ('MOD(15,8):'||MOD (15, 8));
 15  	DBMS_OUTPUT.put_line ('REMAINDER(15,8):'||REMAINDER (15, 8));
 16  END;
 17  /
MOD(15,2):1                                                                     
REMAINDER(15,2):-1                                                              
MOD(15,3):0                                                                     
REMAINDER(15,3):0                                                               
MOD(15,4):3                                                                     
REMAINDER(15,4):-1                                                              
MOD(15,5):0                                                                     
REMAINDER(15,5):0                                                               
MOD(15,6):3                                                                     
REMAINDER(15,6):3                                                               
MOD(15,7):1                                                                     
REMAINDER(15,7):1                                                               
MOD(15,8):7                                                                     
REMAINDER(15,8):-1                                                              

PL/SQL procedure successfully completed.

Why, oh why, does this happen? Oracle takes a ‘shortcut’ to get there, using a formula that, as far as I can tell, isn’t checking all that it needs to check before ‘spitting out’ the answer. That formula is shown below:


					R = m - (n*X)

where R is the remainder, m is the dividend, n is the divisor and X is an integer where n*X should be <= m

Notice what check is missing? For some odd reason Oracle never checks to see if the product (n*X) is less than or equal to m. Since Oracle calculates X by performng ROUND(m/n,0) fractional parts of a quotient, when they are .5 or greater, round up to the next highest integer. For the case of 15 divided by even numbers the calculated value of X results in (n*X) being equal to 16. Once that happens the remainders all end up as -1 which, by definition, is impossible.

Fiddle-dee-dee, what are we to do? One option is to use MOD() instead, since the results for MOD() don’t exhibit the same problem. Another option, if production code can’t be changed, is to create another remainder function, possibly in the application schema, with a private synonym calling it ‘REMAINDER’. One example of such a function is shown below:


SQL> 
SQL> --
SQL> -- Create a function to return the correct remainder - does not
SQL> -- generate values greater than the supplied target so the remainders
SQL> -- are not negative
SQL> --
SQL> create or replace function remainder_func(p_num1 in number, p_num2 in number)
  2  return number is
  3  	     v_rmdr  number;
  4  	     v_x     number:-1;
  5  begin
  6  	     v_x :- trunc(p_num1/p_num2, 0);
  7  	     v_rmdr :- p_num1 - (v_x * p_num2);
  8  	     return(v_rmdr);
  9  end;
 10  /

Function created.

SQL> 

So, let’s see what results are returned, since this function uses TRUNC() instead of ROUND():


SQL> BEGIN
  2  	DBMS_OUTPUT.put_line ('MOD(15,2):'||MOD (15, 2));
  3  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,2):'||REMAINDER_func (15, 2));
  4  	DBMS_OUTPUT.put_line ('MOD(15,3):'||MOD (15, 3));
  5  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,3):'||REMAINDER_func (15, 3));
  6  	DBMS_OUTPUT.put_line ('MOD(15,4):'||MOD (15, 4));
  7  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,4):'||REMAINDER_func (15, 4));
  8  	DBMS_OUTPUT.put_line ('MOD(15,5):'||MOD (15, 5));
  9  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,5):'||REMAINDER_func (15, 5));
 10  	DBMS_OUTPUT.put_line ('MOD(15,6):'||MOD (15, 6));
 11  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,6):'||REMAINDER_func (15, 6));
 12  	DBMS_OUTPUT.put_line ('MOD(15,7):'||MOD (15, 7));
 13  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,7):'||REMAINDER_func (15, 7));
 14  	DBMS_OUTPUT.put_line ('MOD(15,8):'||MOD (15, 8));
 15  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,8):'||REMAINDER_func (15, 8));
 16  END;
 17  /
MOD(15,2):1                                                                     
REMAINDER_func(15,2):1                                                          
MOD(15,3):0                                                                     
REMAINDER_func(15,3):0                                                          
MOD(15,4):3                                                                     
REMAINDER_func(15,4):3                                                          
MOD(15,5):0                                                                     
REMAINDER_func(15,5):0                                                          
MOD(15,6):3                                                                     
REMAINDER_func(15,6):3                                                          
MOD(15,7):1                                                                     
REMAINDER_func(15,7):1                                                          
MOD(15,8):7                                                                     
REMAINDER_func(15,8):7                                                          

PL/SQL procedure successfully completed.

Wow, golly gee whilikers, the results are correct! Of course that should be expected since there is no possibility that (n*X) will be greater than m. Let’s take a side trip and see how ROUND() and TRUNC() provide different results for odd numbers divided by even numbers, using 15 as a ‘test subject’:


SQL> 
SQL> --
SQL> -- ROUND() doesn't work for this example as it
SQL> -- generates values that are powers of 2
SQL> --
SQL> 
SQL> create or replace procedure remainder_test(p_num1 in number, p_num2 in number)
  2  is
  3  	     v_t_rmdr	     number;
  4  	     v_r_rmdr	     number;
  5  	     v_tx    number:-1;
  6  	     v_rx    number:-1;
  7  begin
  8  	     dbms_output.put_line('---------------------------------------------------------------------');
  9  	     v_tx :- trunc(p_num1/p_num2, 0);
 10  	     v_rx :- round(p_num1/p_num2, 0);
 11  	     v_t_rmdr :- p_num1 - (v_tx * p_num2);
 12  	     v_r_rmdr :- p_num1 - (v_rx * p_num2);
 13  	     dbms_output.put_line('Rounded:   '||v_rx||' (n*X): '||v_rx*p_num2||' Remainder: '||v_r_rmdr);
 14  	     dbms_output.put_line('Truncated: '||v_tx||' (n*X): '||v_tx*p_num2||' Remainder: '||v_t_rmdr);
 15  	     dbms_output.put_line('---------------------------------------------------------------------');
 16  end;
 17  /

Procedure created.

SQL> 
SQL> BEGIN
  2  	REMAINDER_test (15, 2);
  3  	REMAINDER_test (15, 3);
  4  	REMAINDER_test (15, 4);
  5  	REMAINDER_test (15, 5);
  6  	REMAINDER_test (15, 6);
  7  	REMAINDER_test (15, 7);
  8  	REMAINDER_test (15, 8);
  9  END;
 10  /
---------------------------------------------------------------------           
Rounded:   8 (n*X): 16 Remainder: -1                                            
Truncated: 7 (n*X): 14 Remainder: 1                                             
---------------------------------------------------------------------           
---------------------------------------------------------------------           
Rounded:   5 (n*X): 15 Remainder: 0                                             
Truncated: 5 (n*X): 15 Remainder: 0                                             
---------------------------------------------------------------------           
---------------------------------------------------------------------           
Rounded:   4 (n*X): 16 Remainder: -1                                            
Truncated: 3 (n*X): 12 Remainder: 3                                             
---------------------------------------------------------------------           
---------------------------------------------------------------------           
Rounded:   3 (n*X): 15 Remainder: 0                                             
Truncated: 3 (n*X): 15 Remainder: 0                                             
---------------------------------------------------------------------           
---------------------------------------------------------------------           
Rounded:   3 (n*X): 18 Remainder: -3                                            
Truncated: 2 (n*X): 12 Remainder: 3                                             
---------------------------------------------------------------------           
---------------------------------------------------------------------           
Rounded:   2 (n*X): 14 Remainder: 1                                             
Truncated: 2 (n*X): 14 Remainder: 1                                             
---------------------------------------------------------------------           
---------------------------------------------------------------------           
Rounded:   2 (n*X): 16 Remainder: -1                                            
Truncated: 1 (n*X): 8 Remainder: 7                                              
---------------------------------------------------------------------           

PL/SQL procedure successfully completed.

SQL> 

There’s quite a difference between ROUND() and TRUNC(); to be fair MOD() uses FLOOR() in the calculations to generate MOD values and, as mentioned before, REMAINDER() uses ROUND(), probably in an effort to make the two functions internally different. Using a bit more code the ROUND() option could also work:


SQL> --
SQL> -- Create a function to return the correct remainder - does not
SQL> -- generate values greater than the supplied target so the remainders
SQL> -- are not negative
SQL> --
SQL> -- Uses ROUNND() to compute values
SQL> --
SQL> create or replace function remainder_func(p_num1 in number, p_num2 in number)
  2  return number is
  3  	     v_rmdr  number;
  4  	     v_x     number:=1;
  5  begin
  6  	     v_x := round(p_num1/p_num2, 0);
  7  	     if v_x*p_num2 > p_num1 then
  8  		     v_x := v_x -1;
  9  	     end if;
 10  	     v_rmdr := p_num1 - (v_x * p_num2);
 11  	     return(v_rmdr);
 12  end;
 13  /

Function created.

SQL> 
SQL> BEGIN
  2  	DBMS_OUTPUT.put_line ('MOD(15,2):'||MOD (15, 2));
  3  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,2):'||REMAINDER_func (15, 2));
  4  	DBMS_OUTPUT.put_line ('MOD(15,3):'||MOD (15, 3));
  5  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,3):'||REMAINDER_func (15, 3));
  6  	DBMS_OUTPUT.put_line ('MOD(15,4):'||MOD (15, 4));
  7  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,4):'||REMAINDER_func (15, 4));
  8  	DBMS_OUTPUT.put_line ('MOD(15,5):'||MOD (15, 5));
  9  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,5):'||REMAINDER_func (15, 5));
 10  	DBMS_OUTPUT.put_line ('MOD(15,6):'||MOD (15, 6));
 11  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,6):'||REMAINDER_func (15, 6));
 12  	DBMS_OUTPUT.put_line ('MOD(15,7):'||MOD (15, 7));
 13  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,7):'||REMAINDER_func (15, 7));
 14  	DBMS_OUTPUT.put_line ('MOD(15,8):'||MOD (15, 8));
 15  	DBMS_OUTPUT.put_line ('REMAINDER_func(15,8):'||REMAINDER_func (15, 8));
 16  END;
 17  /
MOD(15,2):1                                                                     
REMAINDER_func(15,2):1                                                          
MOD(15,3):0                                                                     
REMAINDER_func(15,3):0                                                          
MOD(15,4):3                                                                     
REMAINDER_func(15,4):3                                                          
MOD(15,5):0                                                                     
REMAINDER_func(15,5):0                                                          
MOD(15,6):3                                                                     
REMAINDER_func(15,6):3                                                          
MOD(15,7):1                                                                     
REMAINDER_func(15,7):1                                                          
MOD(15,8):7                                                                     
REMAINDER_func(15,8):7                                                          

PL/SQL procedure successfully completed.

SQL> 

but I think the first function is slightly more efficient. Additionally the second function essentially ‘re-invents’ FLOOR(), so why not simply use FLOOR() to get the job done:


SQL>
SQL> --
SQL> -- Create a function to return the correct remainder - does not
SQL> -- generate values greater than the supplied target so the remainders
SQL> -- are not negative
SQL> --
SQL> -- Uses FLOOR() to compute values
SQL> --
SQL> create or replace function remainder_func(p_num1 in number, p_num2 in number)
  2  return number is
  3          v_rmdr  number;
  4          v_x     number:=1;
  5  begin
  6          v_x := floor(p_num1/p_num2);
  7          v_rmdr := p_num1 - (v_x * p_num2);
  8          return(v_rmdr);
  9  end;
 10  /

Function created.

SQL>
SQL> BEGIN
  2     DBMS_OUTPUT.put_line ('MOD(15,2):'||MOD (15, 2));
  3     DBMS_OUTPUT.put_line ('REMAINDER_func(15,2):'||REMAINDER_func (15, 2));
  4     DBMS_OUTPUT.put_line ('MOD(15,3):'||MOD (15, 3));
  5     DBMS_OUTPUT.put_line ('REMAINDER_func(15,3):'||REMAINDER_func (15, 3));
  6     DBMS_OUTPUT.put_line ('MOD(15,4):'||MOD (15, 4));
  7     DBMS_OUTPUT.put_line ('REMAINDER_func(15,4):'||REMAINDER_func (15, 4));
  8     DBMS_OUTPUT.put_line ('MOD(15,5):'||MOD (15, 5));
  9     DBMS_OUTPUT.put_line ('REMAINDER_func(15,5):'||REMAINDER_func (15, 5));
 10     DBMS_OUTPUT.put_line ('MOD(15,6):'||MOD (15, 6));
 11     DBMS_OUTPUT.put_line ('REMAINDER_func(15,6):'||REMAINDER_func (15, 6));
 12     DBMS_OUTPUT.put_line ('MOD(15,7):'||MOD (15, 7));
 13     DBMS_OUTPUT.put_line ('REMAINDER_func(15,7):'||REMAINDER_func (15, 7));
 14     DBMS_OUTPUT.put_line ('MOD(15,8):'||MOD (15, 8));
 15     DBMS_OUTPUT.put_line ('REMAINDER_func(15,8):'||REMAINDER_func (15, 8));
 16  END;
 17  /
MOD(15,2):1
REMAINDER_func(15,2):1
MOD(15,3):0
REMAINDER_func(15,3):0
MOD(15,4):3
REMAINDER_func(15,4):3
MOD(15,5):0
REMAINDER_func(15,5):0
MOD(15,6):3
REMAINDER_func(15,6):3
MOD(15,7):1
REMAINDER_func(15,7):1
MOD(15,8):7
REMAINDER_func(15,8):7

PL/SQL procedure successfully completed.

SQL>

FLOOR() returns the highest value that is not larger than the supplied argument (in this case 15 divided by some number), rounding down when necessary. It’s interesting that Oracle used a different formula to compute REMAINDER() values that could (and does) produce negative results.

A discussion ensued on Twitter that ROUND() contained a bug, yet nothing was further from the truth. ROUND() behaves exactly as it should; the issue is that ROUND() can generate a number that, when multiplied by the supplied divisor, is larger than the supplied dividend, producing remainders that are negative. So the ‘bug’, if you will, is in how Oracle implemented REMAINDER() [R=m-(n*X)], and that ‘bug’ can be fixed by coding the function to produce (n*X) values that are always less than or equal to m.

No matter how thoroughly testing is done it seems that one of two problems will rear its ugly head. One is that as hard as you try you can’t generate every bit of data the application will see ‘in the wild’. The other is you can overlook simple examples in favor of the more complex. This is, of course, why we file bug reports with software vendors, because we caught something they didn’t. It is, I suppose, the ‘circle of life’ for software.

Now the remainder is up to you.

June 14, 2016

“It Doesn’t Work”

Filed under: General — dfitzjarrell @ 17:09


“Since you got here by not thinking, it seems reasonable to expect that, in order to get out, you must start thinking.”
— Norton Juster, The Phantom Tollbooth

The Devil is in the details, however, in support forums and newsgroups, the Devil is just as present when the details are either sketchy or non-existent. Volunteers in such forums and newsgroups quite often hear the dreaded words “it doesn’t work”, followed by … nothing. No detail, no explanation, no follow-up, nothing. If you walked in to your doctor and said, with no context, “it hurts” your doctor would be at a loss with respect to any reasonable attempt at a diagnosis. WHAT hurts? Where? When did it start? All questions your doctor would immediately ask in hopes of finding an answer. Support volunteers have no crystal balls, no Ouija boards, no Tarot cards and are not mind readers so leaving out important information makes it impossible for them to assess your situation.

Encountering a problem for the first time can be frustrating, that’s understood, so it stands to reason that you may not have much of a clue on how to ask your question. How should you frame questions in such forums? I blogged about that here so I won’t repeat myself. Suffice it to say that the more detail and explanation you can provide to support volunteers the better the responses will be. Also try your best to find an answer before you go to the forums and newsgroups. The more YOU can do to find an answer outside of the forums the more likely you’ll find help in the forums when you really need it.

Members of such forums see all sorts of questions, many of them repeatedly, so supplying as much information as you can when you first ask your question is important. If you ‘miss the mark’, so to speak, when supplying relevant details these volnteers will ask you questions to narrow down the possibilities. Doing so doesn’t mean they don’t know what they are talking about, it means you didn’t make your situation clear enough to generate a usable answer. Another response you might encounter for ambiguous questions is “That depends…”, and that isn’t a red flag of ignorance, it’s the volunteers trying to get you to be more specific with your details. Remember, the more they know about your problem the better able they will be to find you an answer.

Patience is a virtue; forum members don’t sit at their computers every second of every day just waiting for you to ask questions so you shouldn’t expect immediate responses. Sometimes it may take a day to get an initial response, so that shouldn’t worry you. In some cases there may be forum members who tend to answer questions in certain specific areas, and your question may fall into one of those areas, and those members may be offline when you get the time to ask. Just because you don’t get an immediate response in no way indicates you and your question are being ignored, and a delay in responses doesn’t mean you need to post your question again and again. Rest assured you will have the attention of forum members and someone will provide a response, even if it’s to ask for more information. Quality takes time.

It bears repeating that the brilliant souls in these forums and newsgroups are volunteers, not paid support personnel, and they are in these forums because they actually, honestly want to help by sharing knowledge they have gained through years of experience. They are giving their time to help you and others like you and it would be wise to remember that. Having a sense of entitlement has no place in these forums; behaving as though you are owed an answer is probably the surest way to keep those volunteers from helping you. Treat them as you would want to be treated and you may find that these volunteers will go the extra mile to help you.

It’s definitely worth thinking about.

June 6, 2016

“It’s … MUTATING!!!!”

Filed under: General — dfitzjarrell @ 08:59

“The most important reason for going from one place to another is to see what's in between.” 
― Norton Juster, The Phantom Tollbooth   

The mutating table error (ORA-04091) thrown by Oracle can be a familiar sight throughout a DBA’s career, along with the usual question of ‘Why?’. The error itself is not difficult to understand nor is it hazardous to the table data. Let’s look at why the error is thrown, why it’s not gong to create data problems and how to possibly fix it.

The first thing to realize is the table isn’t actually mutating; the error is thrown because of the read consistency mechanism Oracle employs to ensure against ‘dirty reads’, which are reads of uncommitted changes. For all but the session performing the data modifications Oracle will use redo blocks to reconstruct the data image as of the starting time of the query. This prevents any uncommitted changes from being read. For the session performing the modifications the uncommitted changes are visible, but only after the modifications have completed. Insert 10 rows into a table, then query the count and you’ll see there are 10 more rows than when you started. Every other session sees only the committed results. The mutating issue surfaces when a regular trigger attempts to modify the same table that was modified by the driving insert, update or delete statement. Since Oracle allows the modifying session to see its own changes Oracle tries to execute the trigger but fails when another DML statement tries to change incompletely modified data, meaning the change IS made but the update statement hasn’t yet completed all actions it started, such as firing triggers. Since the insert/update/delete can’t complete until the trigger successfully executes, and executing the trigger would pile changes on top of changes in the middle of a transaction, Oracle doesn’t allow it and throws the error.

In a promoted video on YouTube the following statement is made:


“… that you are breaking this rule and stop your table data and schema objects from corruption.”

Trying to modify data that is in the middle of another modification operation won’t corrupt it since Oracle won’t allow such updates to occur. Depending on the type of trigger involved Oracle will either try to modify existing data before the requested modifications are processed or Oracle will process the insert/update then try to modify the resulting data before the transaction ends. In either case since an active modification is in progress Oracle won’t allow a second modification from the same session on the incompletely altered data. Were this safety mechanism not in place it would be possible to ‘mangle’ table data into an irreversible state and thus ‘currupt’ it, however schema object changes are effected thtough data definition language (DDL) changes which are preceded by a commit and followed by one. DML, such as insert, update and delete statements, cannot corrupt schema object structure.

Getting around such an error can be done in two ways: using a compound trigger if you’re using Oracle 11g or later, or by using the autonomous_transaction pragma. The following example shows how to code such a trigger to avoid the ORA-04091 error:


create or replace procedure set_comm(p_empno in number, p_sal in number, p_oldsal number)
is
pragma autonomous_transaction;
begin
	if p_sal >= p_oldsal*1.10 then
		update emp
		set comm = p_sal*.10
		where empno = p_empno;
	end if;
end;
/

create or replace trigger after_emp_updt
after update of sal on emp
for each row
begin
		set_comm(:new.empno, :new.sal, :old.sal);
end;
/

The trigger is simple, and uses a procedure declared to execute in an autonomous transaction. Autonomous transactions occur in a new process spawned from the calling process, and are thus subject to the standard read consistency mechanism Oracle employs. The procedure can perform its work outside of the original session that fired the trigger, getting around the ORA-04091 error entirely.

The compound trigger operates a bit differently, doing all of its table access before the triggering event causes one of the sections to fire, thus there is no in-process data update to interrupt. Let’s look at an example from a previous article to see how this is done:


SQL> create or replace trigger check_raise_on_avg
  2  for update of sal on emp
  3  COMPOUND TRIGGER
  4    Twelve_Percent        constant number:=0.12;
  5
  6    -- Declare collection type and variable:
  7
  8    TYPE Department_Salaries_t  IS TABLE OF Emp.Sal%TYPE
  9                                  INDEX BY VARCHAR2(80);
 10    Department_Avg_Salaries     Department_Salaries_t;
 11    TYPE Sal_t             IS TABLE OF Emp.Sal%TYPE;
 12    Avg_Salaries                Sal_t;
 13    TYPE Deptno_t       IS TABLE OF Emp.Deptno%TYPE;
 14    Department_IDs              Deptno_t;
 15
 16    BEFORE STATEMENT IS
 17    BEGIN
 18      SELECT               AVG(e.Sal), NVL(e.Deptno, -1)
 19        BULK COLLECT INTO  Avg_Salaries, Department_IDs
 20        FROM               Emp e
 21        GROUP BY           e.Deptno;
 22      FOR j IN 1..Department_IDs.COUNT() LOOP
 23        Department_Avg_Salaries(Department_IDs(j)) := Avg_Salaries(j);
 24      END LOOP;
 25    END BEFORE STATEMENT;
 26
 27    AFTER EACH ROW IS
 28    BEGIN
 29      IF :NEW.Sal - :Old.Sal >
 30        Twelve_Percent*Department_Avg_Salaries(:NEW.Deptno)
 31      THEN
 32        Raise_Application_Error(-20000, 'Raise too large');
 33      END IF;
 34    END AFTER EACH ROW;
 35  END Check_Raise_On_Avg;
 36  /

Trigger created.

SQL> select empno, sal from emp;

     EMPNO        SAL
---------- ----------
      7369        800
      7499       1600
      7521       1250
      7566       2975
      7654       1250
      7698       2850
      7782       2450
      7788       3000
      7839       5000
      7844       1500
      7876       1100
      7900        950
      7902       3000
      7934       1300

14 rows selected.

SQL>
SQL> update emp set sal=sal*1.10 where empno = 7369;

1 row updated.

SQL>
SQL> select empno, sal from emp;

     EMPNO        SAL
---------- ----------
      7369        880
      7499       1600
      7521       1250
      7566       2975
      7654       1250
      7698       2850
      7782       2450
      7788       3000
      7839       5000
      7844       1500
      7876       1100
      7900        950
      7902       3000
      7934       1300

14 rows selected.

SQL>

Updates occur even though the trigger is also selecting data from the table being modified, all possible by the way Oracle executes compound triggers.

The mutating table error is inconvenient but it really isn’t a danger to your data or your schema objects. It’s a read consistency problem that can be overcome by using autonomous transactions or compound triggers. Knowing that should make the DBAs life a little bit easier.

Sometimes what’s in between can be interesting.

March 21, 2016

Repeat After Me

Filed under: General — dfitzjarrell @ 08:44

“every time you decide something without having a good reason, you jump to Conclusions whether you like it or not.” 
-- Norton Juster, The Phantom Tollbooth

In an Oracle forum recently the following question was posted:


I have a request for 3 indices as shown below. Does the 1st index suffice for 2 and 3?  Do I need all 3?
 
    CREATE INDEX IDX_ATM_EM_EFF_VER_CURRENT_DEL ON ATM_xxx_SALARY
    (EMPLOYEE_KEY, EFFECTIVE_DT, SALARY_VERSION_NUMBER, IS_CURRENT, IS_DELETED);

    CREATE INDEX IDX_ATM_EM_VER_CURRENT ON ATM_xxx_SALARY
    (EMPLOYEE_KEY, SALARY_VERSION_NUMBER, IS_CURRENT);

    CREATE INDEX .IDX_ATM_SAL_CURRENT_DEL ON ATM_xxx_SALARY
    (EMPLOYEE_KEY, IS_DELETED, IS_CURRENT);

Answering such a question in absence of any additional information isn’t easy, but looking at the columns it’s not unrealistic to make the following presumptions about the data:

        EMPLOYEE_KEY must be not null and unique
        IS_CURRENT is either a YES or NO value
        IS_DELETED is either a YES or NO value    

The following example, using queries tailored to the indexes to be created, was run in Oracle 11.2.0.4; notice that the first index is used in all three queries:


SQL> --
SQL> -- Create the table
SQL> --
SQL> create table atm_xxx_salary(
  2  employee_key            varchar2(20) not null,
  3  emp_fname               varchar2(20),
  4  emp_lname               varchar2(40),
  5  effective_dt            date,
  6  salary_version_number      number,
  7  is_current              varchar2(3),
  8  is_deleted              varchar2(3));
      
Table created.

SQL>
SQL> --
SQL> -- Load data
SQL> --
SQL> begin
  2          for i in 1..100 loop
  3                  insert into atm_xxx_salary
  4                  values('Employee_'||i, 'Blobbo', 'Fingnestle', sysdate + mod(i, 9), mod(i,11)+1, 'YES','NO');
  5  End loop;
  6          commit;
  7  end;
  8  /

PL/SQL procedure successfully completed.

SQL>
SQL> --
SQL> -- Create first index
SQL> --
SQL> CREATE INDEX IDX_ATM_EM_EFF_VER_CURRENT_DEL ON ATM_xxx_SALARY
  2  (EMPLOYEE_KEY, EFFECTIVE_DT, SALARY_VERSION_NUMBER, IS_CURRENT, IS_DELETED);
         
Index created.
         
SQL>
SQL> exec dbms_stats.gather_schema_stats(user);
      
PL/SQL procedure successfully completed.
    
SQL>
SQL> set autotrace on linesize 150
SQL>
SQL> select *
  2  from atm_xxx_salary
  3  where employee_key like 'Emp%10%'
  4  and effective_dt between sysdate and sysdate+4
  5  and salary_version_number = 7
  6  and is_current = 'YES';

no rows selected

   
       
Execution Plan
----------------------------------------------------------
Plan hash value: 2729507590

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                                |     6 |   288 |     2   (0)| 00:00:01 |
|*  1 |  FILTER                      |                                |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| ATM_XXX_SALARY                 |     6 |   288 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IDX_ATM_EM_EFF_VER_CURRENT_DEL |     6 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
       
Predicate Information (identified by operation id):
---------------------------------------------------
     
   1 - filter(SYSDATE@!+4>=SYSDATE@!)
   3 - access("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "EFFECTIVE_DT">=SYSDATE@! AND
              "SALARY_VERSION_NUMBER"=7 AND "IS_CURRENT"='YES' AND "EFFECTIVE_DT"<=SYSDATE@!+4)
       filter("SALARY_VERSION_NUMBER"=7 AND "EFFECTIVE_DT"=SYSDATE@! AND "IS_CURRENT"='YES')

SQL>
SQL> select *
  2  from atm_xxx_salary
  3  where employee_key like 'Emp%10%'
  4  and is_deleted = 'NO'
  5  and is_current = 'YES';
        
EMPLOYEE_KEY         EMP_FNAME            EMP_LNAME                                EFFECTIVE SALARY_VERSION_NUMBER IS_ IS_
-------------------- -------------------- ---------------------------------------- --------- --------------------- --- ---
Employee_10          Blobbo               Fingnestle                               22-MAR-16                    11 YES NO
Employee_100         Blobbo               Fingnestle                               22-MAR-16                     2 YES NO


Execution Plan
----------------------------------------------------------
Plan hash value: 1328304448
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                                |    99 |  4752 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID |  ATM_XXX_SALARY                |    99 |  4752 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN           | IDX_ATM_EM_EFF_VER_CURRENT_DEL |    99 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
        

Predicate Information (identified by operation id):
---------------------------------------------------
         
   2 - access("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_CURRENT"='YES' AND "IS_DELETED"='NO')
       filter("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_DELETED"='NO' AND "IS_CURRENT"='YES')
        
SQL>
SQL> select *
  2  from atm_xxx_salary
  3  where employee_key like 'Emp%10%'
  4  and is_deleted = 'NO'
  5  and is_current = 'YES';
       
EMPLOYEE_KEY         EMP_FNAME            EMP_LNAME                                EFFECTIVE SALARY_VERSION_NUMBER IS_ IS_
-------------------- -------------------- ---------------------------------------- --------- --------------------- --- ---
Employee_10          Blobbo               Fingnestle                               22-MAR-16                    11 YES NO
Employee_100         Blobbo               Fingnestle                               22-MAR-16                     2 YES NO


Execution Plan
----------------------------------------------------------
Plan hash value: 1328304448
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                                |    99 |  4752 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | ATM_XXX_SALARY                 |    99 |  4752 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN           | IDX_ATM_EM_EFF_VER_CURRENT_DEL |    99 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
         
Predicate Information (identified by operation id):
---------------------------------------------------
        
   2 - access("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_CURRENT"='YES' AND "IS_DELETED"='NO')
       filter("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_DELETED"='NO' AND "IS_CURRENT"='YES')

        
         
SQL>
SQL> set autotrace off

The above appears to answer the question and prove the remaining two indexes are not necessary, but it can’t hurt to test this theory by creating the remaining indexes and running the same set of queries after each index creation. Creating the second index apparently does no good as it isn’t used in favor of the first:

       
SQL> --
SQL> -- Create second index
SQL> --
SQL> CREATE INDEX IDX_ATM_EM_VER_CURRENT ON ATM_xxx_SALARY
  2  (EMPLOYEE_KEY, SALARY_VERSION_NUMBER, IS_CURRENT);
         
Index created.
         
SQL>
SQL>
SQL> set autotrace on
SQL>
SQL> select *
  2  from atm_xxx_salary
  3  where employee_key like 'Emp%10%'
  4  and effective_dt between sysdate and sysdate+4
  5  and salary_version_number = 7
  6  and is_current = 'YES';
         
no rows selected
        
         
Execution Plan
----------------------------------------------------------
Plan hash value: 2729507590
---------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                                |     6 |   288 |     2   (0)| 00:00:01 |
|*  1 |  FILTER                       |                                |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID | ATM_XXX_SALARY                 |     6 |   288 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | IDX_ATM_EM_EFF_VER_CURRENT_DEL |     6 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
         
Predicate Information (identified by operation id):
---------------------------------------------------
         
   1 - filter(SYSDATE@!+4>=SYSDATE@!)
   3 - access("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "EFFECTIVE_DT">=SYSDATE@! AND
              "SALARY_VERSION_NUMBER"=7 AND "IS_CURRENT"='YES' AND "EFFECTIVE_DT"<=SYSDATE@!+4)
       filter("SALARY_VERSION_NUMBER"=7 AND "EFFECTIVE_DT"=SYSDATE@! AND "IS_CURRENT"='YES')
         
SQL>
SQL> select *
  2  from atm_xxx_salary
  3  where employee_key like 'Emp%10%'
  4  and is_deleted = 'NO'
  5  and is_current = 'YES';
      
EMPLOYEE_KEY         EMP_FNAME            EMP_LNAME                                EFFECTIVE SALARY_VERSION_NUMBER IS_ IS_
-------------------- -------------------- ---------------------------------------- --------- --------------------- --- ---
Employee_10          Blobbo               Fingnestle                               22-MAR-16                    11 YES NO
Employee_100         Blobbo               Fingnestle                               22-MAR-16                     2 YES NO
         
         
Execution Plan
----------------------------------------------------------
Plan hash value: 1328304448
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                                |    99 |  4752 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | ATM_XXX_SALARY                 |    99 |  4752 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN           | IDX_ATM_EM_EFF_VER_CURRENT_DEL |    99 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
         
Predicate Information (identified by operation id):
---------------------------------------------------
         
   2 - access("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_CURRENT"='YES' AND "IS_DELETED"='NO')
       filter("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_DELETED"='NO' AND "IS_CURRENT"='YES')
         
SQL>
SQL> select *
  2  from atm_xxx_salary
  3  where employee_key like 'Emp%10%'
  4  and is_deleted = 'NO'
  5  and is_current = 'YES';
      
EMPLOYEE_KEY         EMP_FNAME            EMP_LNAME                                EFFECTIVE SALARY_VERSION_NUMBER IS_ IS_
-------------------- -------------------- ---------------------------------------- --------- --------------------- --- ---
Employee_10          Blobbo               Fingnestle                               22-MAR-16                    11 YES NO
Employee_100         Blobbo               Fingnestle                               22-MAR-16                     2 YES NO
         
         
Execution Plan
----------------------------------------------------------
Plan hash value: 1328304448
         
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                                |    99 |  4752 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | ATM_XXX_SALARY                 |    99 |  4752 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN           | IDX_ATM_EM_EFF_VER_CURRENT_DEL |    99 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
         
Predicate Information (identified by operation id):
---------------------------------------------------
         
   2 - access("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_CURRENT"='YES' AND "IS_DELETED"='NO')
       filter("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_DELETED"='NO' AND "IS_CURRENT"='YES')
         
SQL>
SQL> set autotrace off

Let’s create the third index and see what Oracle does:

       
SQL> --
SQL> -- Create third index
SQL> --
SQL> CREATE INDEX IDX_ATM_SAL_CURRENT_DEL ON ATM_xxx_SALARY
  2  (EMPLOYEE_KEY, IS_DELETED, IS_CURRENT);
         
Index created.
         
SQL>
SQL> set autotrace on
SQL>
SQL> select *
  2  from atm_xxx_salary
  3  where employee_key like 'Emp%10%'
  4  and effective_dt between sysdate and sysdate+4
  5  and salary_version_number = 7
  6  and is_current = 'YES';
         
no rows selected
         
         
Execution Plan
----------------------------------------------------------
Plan hash value: 2729507590
         
---------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                           | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                                |     6 |   288 |     2   (0)| 00:00:01 |
|*  1 |  FILTER                       |                                |       |       |            |       |
|   2 |   TABLE ACCESS BY INDEX ROWID | ATM_XXX_SALARY                 |     6 |   288 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN           | IDX_ATM_EM_EFF_VER_CURRENT_DEL |     6 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------------
         
Predicate Information (identified by operation id):
---------------------------------------------------
         
   1 - filter(SYSDATE@!+4>=SYSDATE@!)
   3 - access("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "EFFECTIVE_DT">=SYSDATE@! AND
              "SALARY_VERSION_NUMBER"=7 AND "IS_CURRENT"='YES' AND "EFFECTIVE_DT"<=SYSDATE@!+4)
       filter("SALARY_VERSION_NUMBER"=7 AND "EFFECTIVE_DT"=SYSDATE@! AND "IS_CURRENT"='YES')
         
SQL>
SQL> select *
  2  from atm_xxx_salary
  3  where employee_key like 'Emp%10%'
  4  and is_deleted = 'NO'
  5  and is_current = 'YES';
         
EMPLOYEE_KEY         EMP_FNAME            EMP_LNAME                                EFFECTIVE SALARY_VERSION_NUMBER IS_ IS_
-------------------- -------------------- ---------------------------------------- --------- --------------------- --- ---
Employee_10          Blobbo               Fingnestle                               22-MAR-16                    11 YES NO
Employee_100         Blobbo               Fingnestle                               22-MAR-16                     2 YES NO
         
         
Execution Plan
---------------------------------------------------------
Plan hash value: 1562453262
         
-------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |    99 |  4752 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | ATM_XXX_SALARY          |    99 |  4752 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN           | IDX_ATM_SAL_CURRENT_DEL |    99 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
         
Predicate Information (identified by operation id):
---------------------------------------------------
         
    2 - access("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_DELETED"='NO' AND "IS_CURRENT"='YES')
        filter("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_DELETED"='NO' AND "IS_CURRENT"='YES')
         
SQL>
SQL> select *
  2  from atm_xxx_salary
  3  where employee_key like 'Emp%10%'
  4  and is_deleted = 'NO'
  5  and is_current = 'YES';
         
EMPLOYEE_KEY         EMP_FNAME            EMP_LNAME                                EFFECTIVE SALARY_VERSION_NUMBER IS_ IS_
-------------------- -------------------- ---------------------------------------- --------- --------------------- --- ---
Employee_10          Blobbo               Fingnestle                               22-MAR-16                    11 YES NO
Employee_100         Blobbo               Fingnestle                               22-MAR-16                     2 YES NO
         
         
Execution Plan
----------------------------------------------------------
Plan hash value: 1562453262
         
-------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |    99 |  4752 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | ATM_XXX_SALARY          |    99 |  4752 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN           | IDX_ATM_SAL_CURRENT_DEL |    99 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
         
Predicate Information (identified by operation id):
---------------------------------------------------
         
   2 - access("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_DELETED"='NO' AND "IS_CURRENT"='YES')
       filter("EMPLOYEE_KEY" LIKE 'Emp%10%' AND "IS_DELETED"='NO' AND "IS_CURRENT"='YES')

SQL>
SQL>
SQL> set autotrace off

Notice that the first and third indexes are used, with the third index used on the second and third queries. Remember that the first index was used on all three queries so given the presumptions made on the nature of the data the second and third indexes appear to not be needed. Of course this is not rigorous testing and other queries and/or data distributions could produce differing results. You really need to test, test, test, test and test again before making a final decision. But, the indications are good that a single index would suffice to provide “speedy” data retrieval.

Please don’t make me repeat myself.

February 17, 2016

“That Ain’t Right”

Filed under: General — dfitzjarrell @ 09:46

"Is everyone who lives in Ignorance like you?" asked Milo.
"Much worse," he said longingly. "But I don't live here. I'm from a place very far away called Context." 
-- Norton Juster, The Phantom Tollbooth 

Dates are, well, dates. They are not strings, they are not numbers, they are dates and to handle them properly the correct data type needs to be used. I can’t count how many times I’ve seen dates stored as VARCHAR2 strings or as numbers, neither of which treat them properly or order them correctly. Recently in a forum I frequent this question was asked:


Version Oracle 11 g R2


Hello every body
 

First of All i know the Format is wrong
but as it is a old project i need to extract the report as it is

there is a table with two fields and data as follow

create table Onduty_leave
(
Start_time varchar2(100),
end_time  varchar2(100)
);

insert into onduty_leave (start_time,end_time) values('09.00 am','06.00 pm');

Minutes i am handling differently

Please just tell me how to calculate hours between 09 am to 06 pm 

Yes, this person realizes that the data type is incorrect for the data being stored, but the fact remains someone thought this was a good idea and implemented it into a production system. Of course it’s possible to provide the results the person posting this question is after, but it would be so much easier had the correct data type been used. Be that as it may let’s proceed with the current table definition and show what can be done to calculate the desired quantity. Let’s build and populate the table with some simple code:


SQL> 
SQL> --
SQL> -- Create the table with the improper data type
SQL> --
SQL> create table Onduty_leave
  2  (
  3  Start_time varchar2(100),
  4  end_time  varchar2(100)
  5  );

Table created.

SQL> 
SQL> --
SQL> -- Populate the table
SQL> --
SQL> begin
  2  	     for i in 1..10 loop
  3  		     insert into onduty_leave
  4  		     values(sysdate-i, sysdate+i);
  5  	     end loop;
  6  
  7  	     commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> 
SQL> --
SQL> -- Set the display format to see the date and time
SQL> --
SQL> alter session set nls_date_format = 'DD-MON-RRRR HH24:MI:SS';

Session altered.

SQL> 
SQL> --
SQL> -- Format the strings for display
SQL> --
SQL> column start_time format a25
SQL> column end_time format a25
SQL> 

We are now ready to compute the hours between the two provided dates in each record. Since these are VARCHAR2 strings it will be necessary to use TO_CHAR and TO_DATE to get proper results.


SQL> --
SQL> -- 24 hours per day, find days between and multiply
SQL> -- This uses the entire date string to provide correct
SQL> -- results
SQL> --
SQL> 
SQL> select (to_date(end_time) - to_date(start_time))*24 hours_between
  2  from onduty_leave;

HOURS_BETWEEN                                                                   
-------------                                                                   
           48                                                                   
           96                                                                   
          144                                                                   
          192                                                                   
          240                                                                   
          288                                                                   
          336                                                                   
          384                                                                   
          432                                                                   
          480                                                                   

10 rows selected.

SQL> 

It isn’t rocket science but it is a bit more work than we should be doing to get these answers. The data in the table was generated simply; let’s generate some new data that may more accurately reflect real-world entries into this table:


SQL> --
SQL> -- Purge all current data
SQL> --
SQL> truncate table onduty_leave;

Table truncated.

SQL> 
SQL> --
SQL> -- Populate table with more realistic records
SQL> --
SQL> begin
  2  	     for i in 1..10 loop
  3  		     insert into onduty_leave
  4  		     values(sysdate, sysdate+((25*i)/24));
  5  	     end loop;
  6  
  7  	     commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> 

Using just the time from each entry let’s compute the new hours difference between them:


SQL> --
SQL> -- Hours between entries
SQL> -- Use only the time portion of the date
SQL> --
SQL> -- This produces wrong results since the date
SQL> -- isn't included in the calculation
SQL> --
SQL> select start_time, end_time,
  2  	    (to_date(to_char(to_date(end_time), 'HH24:MI:SS'), 'HH24:MI:SS') - to_date(to_char(to_date(start_time), 'HH24:MI:SS'), 'HH24:MI:SS'))*24 hours_between
  3  from onduty_leave;

START_TIME                END_TIME                  HOURS_BETWEEN               
------------------------- ------------------------- -------------               
17-FEB-2016 09:12:28      18-FEB-2016 10:12:28                  1               
17-FEB-2016 09:12:28      19-FEB-2016 11:12:28                  2               
17-FEB-2016 09:12:28      20-FEB-2016 12:12:28                  3               
17-FEB-2016 09:12:28      21-FEB-2016 13:12:28                  4               
17-FEB-2016 09:12:28      22-FEB-2016 14:12:28                  5               
17-FEB-2016 09:12:28      23-FEB-2016 15:12:28                  6               
17-FEB-2016 09:12:28      24-FEB-2016 16:12:28                  7               
17-FEB-2016 09:12:28      25-FEB-2016 17:12:28                  8               
17-FEB-2016 09:12:28      26-FEB-2016 18:12:28                  9               
17-FEB-2016 09:12:28      27-FEB-2016 19:12:28                 10               

10 rows selected.

SQL> 

The initial request was flawed by assuming each record would reference a single day; even though this may be true for the actual data the solution cannot rely on such assumptions. Obviously the results from that last query were wrong; let’s use the full date string and see what values we get:


SQL> --
SQL> -- 24 hours per day, find days between and multiply
SQL> --
SQL> -- Use full date string to compute the hours between
SQL> -- entries
SQL> --
SQL> 
SQL> select start_time, end_time,
  2  	    (to_date(end_time) - to_date(start_time))*24 hours_between
  3  from onduty_leave;

START_TIME                END_TIME                  HOURS_BETWEEN               
------------------------- ------------------------- -------------               
17-FEB-2016 09:12:28      18-FEB-2016 10:12:28                 25               
17-FEB-2016 09:12:28      19-FEB-2016 11:12:28                 50               
17-FEB-2016 09:12:28      20-FEB-2016 12:12:28                 75               
17-FEB-2016 09:12:28      21-FEB-2016 13:12:28                100               
17-FEB-2016 09:12:28      22-FEB-2016 14:12:28                125               
17-FEB-2016 09:12:28      23-FEB-2016 15:12:28                150               
17-FEB-2016 09:12:28      24-FEB-2016 16:12:28                175               
17-FEB-2016 09:12:28      25-FEB-2016 17:12:28                200               
17-FEB-2016 09:12:28      26-FEB-2016 18:12:28                225               
17-FEB-2016 09:12:28      27-FEB-2016 19:12:28                250               

10 rows selected.

SQL> 

This looks much better, but it’s still based on date values stored as strings. With the format specified we can successfully order the data for a single month, but when we add data for other months and sort it ‘strange’ things can happen:


SQL> --
SQL> -- Populate table with more records
SQL> --
SQL> begin
  2  	     for i in 11..100 loop
  3  		     insert into onduty_leave
  4  		     values(sysdate, sysdate+((25*i)/24));
  5  	     end loop;
  6  
  7  	     commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> 
SQL> --
SQL> -- Sort the data
SQL> --
SQL> select *
  2  from onduty_leave
  3  order by 2;

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:04      01-APR-2016 03:29:04                                  
17-FEB-2016 09:29:04      01-MAR-2016 22:29:04                                  
17-FEB-2016 09:29:04      01-MAY-2016 08:29:04                                  
17-FEB-2016 09:29:04      02-APR-2016 04:29:04                                  
17-FEB-2016 09:29:04      02-MAR-2016 23:29:04                                  
17-FEB-2016 09:29:04      02-MAY-2016 09:29:04                                  
17-FEB-2016 09:29:04      03-APR-2016 05:29:04                                  
17-FEB-2016 09:29:04      03-MAY-2016 10:29:04                                  
17-FEB-2016 09:29:04      04-APR-2016 06:29:04                                  
17-FEB-2016 09:29:04      04-MAR-2016 00:29:04                                  
17-FEB-2016 09:29:04      04-MAY-2016 11:29:04                                  
17-FEB-2016 09:29:04      05-APR-2016 07:29:04                                  
17-FEB-2016 09:29:04      05-MAR-2016 01:29:04                                  
17-FEB-2016 09:29:04      05-MAY-2016 12:29:04                                  
17-FEB-2016 09:29:04      06-APR-2016 08:29:04                                  
17-FEB-2016 09:29:04      06-MAR-2016 02:29:04                                  
17-FEB-2016 09:29:04      06-MAY-2016 13:29:04                                  
17-FEB-2016 09:29:04      07-APR-2016 09:29:04                                  
17-FEB-2016 09:29:04      07-MAR-2016 03:29:04                                  
17-FEB-2016 09:29:04      07-MAY-2016 14:29:04                                  
17-FEB-2016 09:29:04      08-APR-2016 10:29:04                                  
17-FEB-2016 09:29:04      08-MAR-2016 04:29:04                                  
...

100 rows selected.

SQL> 

Oracle doesn’t know these are dates, so they are sorted as ASCII text making the order anything but logical, as far as dates go. Let’s drop the orginal table, recreate with the proper date type and try this again:


SQL> --
SQL> -- Drop the original table
SQL> --
SQL> drop table onduty_leave purge;

Table dropped.

SQL> 
SQL> --
SQL> -- Create the table with the proper data type
SQL> --
SQL> create table Onduty_leave
  2  (
  3  Start_time date,
  4  end_time  date
  5  );

Table created.

SQL> 
SQL> --
SQL> -- Populate the table
SQL> --
SQL> begin
  2  	     for i in 1..10 loop
  3  		     insert into onduty_leave
  4  		     values(sysdate, sysdate+((25*i)/24));
  5  	     end loop;
  6  
  7  	     commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> 
SQL> --
SQL> -- Compute the hours between entries for a given record
SQL> --
SQL> select (end_time - start_time)*24 hours_between
  2  from onduty_leave;

HOURS_BETWEEN                                                                   
-------------                                                                   
           25                                                                   
           50                                                                   
           75                                                                   
          100                                                                   
          125                                                                   
          150                                                                   
          175                                                                   
          200                                                                   
          225                                                                   
          250                                                                   

10 rows selected.

SQL> 
SQL> --
SQL> -- Populate table with more records
SQL> --
SQL> begin
  2  	     for i in 11..100 loop
  3  		     insert into onduty_leave
  4  		     values(sysdate, sysdate+((25*i)/24));
  5  	     end loop;
  6  
  7  	     commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> 
SQL> --
SQL> -- Sort the data
SQL> --
SQL> select *
  2  from onduty_leave
  3  order by 2;

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:05      18-FEB-2016 10:29:05                                  
17-FEB-2016 09:29:05      19-FEB-2016 11:29:05                                  
17-FEB-2016 09:29:05      20-FEB-2016 12:29:05                                  
17-FEB-2016 09:29:05      21-FEB-2016 13:29:05                                  
17-FEB-2016 09:29:05      22-FEB-2016 14:29:05                                  
17-FEB-2016 09:29:05      23-FEB-2016 15:29:05                                  
17-FEB-2016 09:29:05      24-FEB-2016 16:29:05                                  
17-FEB-2016 09:29:05      25-FEB-2016 17:29:05                                  
17-FEB-2016 09:29:05      26-FEB-2016 18:29:05                                  
17-FEB-2016 09:29:05      27-FEB-2016 19:29:05                                  
17-FEB-2016 09:29:05      28-FEB-2016 20:29:05                                  

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:05      29-FEB-2016 21:29:05                                  
17-FEB-2016 09:29:05      01-MAR-2016 22:29:05                                  
17-FEB-2016 09:29:05      02-MAR-2016 23:29:05                                  
17-FEB-2016 09:29:05      04-MAR-2016 00:29:05                                  
17-FEB-2016 09:29:05      05-MAR-2016 01:29:05                                  
17-FEB-2016 09:29:05      06-MAR-2016 02:29:05                                  
17-FEB-2016 09:29:05      07-MAR-2016 03:29:05                                  
17-FEB-2016 09:29:05      08-MAR-2016 04:29:05                                  
17-FEB-2016 09:29:05      09-MAR-2016 05:29:05                                  
17-FEB-2016 09:29:05      10-MAR-2016 06:29:05                                  
17-FEB-2016 09:29:05      11-MAR-2016 07:29:05                                  

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:05      12-MAR-2016 08:29:05                                  
17-FEB-2016 09:29:05      13-MAR-2016 09:29:05                                  
17-FEB-2016 09:29:05      14-MAR-2016 10:29:05                                  
17-FEB-2016 09:29:05      15-MAR-2016 11:29:05                                  
17-FEB-2016 09:29:05      16-MAR-2016 12:29:05                                  
17-FEB-2016 09:29:05      17-MAR-2016 13:29:05                                  
17-FEB-2016 09:29:05      18-MAR-2016 14:29:05                                  
17-FEB-2016 09:29:05      19-MAR-2016 15:29:05                                  
17-FEB-2016 09:29:05      20-MAR-2016 16:29:05                                  
17-FEB-2016 09:29:05      21-MAR-2016 17:29:05                                  
17-FEB-2016 09:29:05      22-MAR-2016 18:29:05                                  

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:05      23-MAR-2016 19:29:05                                  
17-FEB-2016 09:29:05      24-MAR-2016 20:29:05                                  
17-FEB-2016 09:29:05      25-MAR-2016 21:29:05                                  
17-FEB-2016 09:29:05      26-MAR-2016 22:29:05                                  
17-FEB-2016 09:29:05      27-MAR-2016 23:29:05                                  
17-FEB-2016 09:29:05      29-MAR-2016 00:29:05                                  
17-FEB-2016 09:29:05      30-MAR-2016 01:29:05                                  
17-FEB-2016 09:29:05      31-MAR-2016 02:29:05                                  
17-FEB-2016 09:29:05      01-APR-2016 03:29:05                                  
17-FEB-2016 09:29:05      02-APR-2016 04:29:05                                  
17-FEB-2016 09:29:05      03-APR-2016 05:29:05                                  

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:05      04-APR-2016 06:29:05                                  
17-FEB-2016 09:29:05      05-APR-2016 07:29:05                                  
17-FEB-2016 09:29:05      06-APR-2016 08:29:05                                  
17-FEB-2016 09:29:05      07-APR-2016 09:29:05                                  
17-FEB-2016 09:29:05      08-APR-2016 10:29:05                                  
17-FEB-2016 09:29:05      09-APR-2016 11:29:05                                  
17-FEB-2016 09:29:05      10-APR-2016 12:29:05                                  
17-FEB-2016 09:29:05      11-APR-2016 13:29:05                                  
17-FEB-2016 09:29:05      12-APR-2016 14:29:05                                  
17-FEB-2016 09:29:05      13-APR-2016 15:29:05                                  
17-FEB-2016 09:29:05      14-APR-2016 16:29:05                                  

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:05      15-APR-2016 17:29:05                                  
17-FEB-2016 09:29:05      16-APR-2016 18:29:05                                  
17-FEB-2016 09:29:05      17-APR-2016 19:29:05                                  
17-FEB-2016 09:29:05      18-APR-2016 20:29:05                                  
17-FEB-2016 09:29:05      19-APR-2016 21:29:05                                  
17-FEB-2016 09:29:05      20-APR-2016 22:29:05                                  
17-FEB-2016 09:29:05      21-APR-2016 23:29:05                                  
17-FEB-2016 09:29:05      23-APR-2016 00:29:05                                  
17-FEB-2016 09:29:05      24-APR-2016 01:29:05                                  
17-FEB-2016 09:29:05      25-APR-2016 02:29:05                                  
17-FEB-2016 09:29:05      26-APR-2016 03:29:05                                  

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:05      27-APR-2016 04:29:05                                  
17-FEB-2016 09:29:05      28-APR-2016 05:29:05                                  
17-FEB-2016 09:29:05      29-APR-2016 06:29:05                                  
17-FEB-2016 09:29:05      30-APR-2016 07:29:05                                  
17-FEB-2016 09:29:05      01-MAY-2016 08:29:05                                  
17-FEB-2016 09:29:05      02-MAY-2016 09:29:05                                  
17-FEB-2016 09:29:05      03-MAY-2016 10:29:05                                  
17-FEB-2016 09:29:05      04-MAY-2016 11:29:05                                  
17-FEB-2016 09:29:05      05-MAY-2016 12:29:05                                  
17-FEB-2016 09:29:05      06-MAY-2016 13:29:05                                  
17-FEB-2016 09:29:05      07-MAY-2016 14:29:05                                  

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:05      08-MAY-2016 15:29:05                                  
17-FEB-2016 09:29:05      09-MAY-2016 16:29:05                                  
17-FEB-2016 09:29:05      10-MAY-2016 17:29:05                                  
17-FEB-2016 09:29:05      11-MAY-2016 18:29:05                                  
17-FEB-2016 09:29:05      12-MAY-2016 19:29:05                                  
17-FEB-2016 09:29:05      13-MAY-2016 20:29:05                                  
17-FEB-2016 09:29:05      14-MAY-2016 21:29:05                                  
17-FEB-2016 09:29:05      15-MAY-2016 22:29:05                                  
17-FEB-2016 09:29:05      16-MAY-2016 23:29:05                                  
17-FEB-2016 09:29:05      18-MAY-2016 00:29:05                                  
17-FEB-2016 09:29:05      19-MAY-2016 01:29:05                                  

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:05      20-MAY-2016 02:29:05                                  
17-FEB-2016 09:29:05      21-MAY-2016 03:29:05                                  
17-FEB-2016 09:29:05      22-MAY-2016 04:29:05                                  
17-FEB-2016 09:29:05      23-MAY-2016 05:29:05                                  
17-FEB-2016 09:29:05      24-MAY-2016 06:29:05                                  
17-FEB-2016 09:29:05      25-MAY-2016 07:29:05                                  
17-FEB-2016 09:29:05      26-MAY-2016 08:29:05                                  
17-FEB-2016 09:29:05      27-MAY-2016 09:29:05                                  
17-FEB-2016 09:29:05      28-MAY-2016 10:29:05                                  
17-FEB-2016 09:29:05      29-MAY-2016 11:29:05                                  
17-FEB-2016 09:29:05      30-MAY-2016 12:29:05                                  

START_TIME                END_TIME                                              
------------------------- -------------------------                             
17-FEB-2016 09:29:05      31-MAY-2016 13:29:05                                  

100 rows selected.

SQL> 

Proper date arithmetic can be performed and the dates sort correctly since Oracle now knows they are dates.

Of course the DATE datatype isn’t the only one that can be used; TIMESTAMP is also a valid choice for date data. Re-working the above example to use the TIMESTAMP datatype we see:


SQL> create table records(
  2  myid    number,
  3  mydata  varchar2(40),
  4  recorddt	     timestamp);

Table created.

SQL> 
SQL> begin
  2  	     for i in 1..1000 loop
  3  		     insert into records
  4  		     values(i, 'Record '||i, systimestamp - (4*i));
  5  	     end loop;
  6  
  7  	     commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.

SQL> 
SQL> select * from records
  2  where
  3  recorddt > (select add_months(systimestamp, -24 )-1 from dual)
  4  order by recorddt;

      MYID MYDATA                         RECORDDT                              
---------- ------------------------------ ------------------------------        
       182 Record 182                     02-OCT-14 10.12.45.000000 AM          
       181 Record 181                     06-OCT-14 10.12.45.000000 AM          
       180 Record 180                     10-OCT-14 10.12.45.000000 AM          
       179 Record 179                     14-OCT-14 10.12.45.000000 AM          
       178 Record 178                     18-OCT-14 10.12.45.000000 AM          
       177 Record 177                     22-OCT-14 10.12.45.000000 AM          
       176 Record 176                     26-OCT-14 10.12.45.000000 AM          
       175 Record 175                     30-OCT-14 10.12.45.000000 AM          
       174 Record 174                     03-NOV-14 10.12.45.000000 AM          
       173 Record 173                     07-NOV-14 10.12.45.000000 AM          
       172 Record 172                     11-NOV-14 10.12.45.000000 AM          
...
      MYID MYDATA                         RECORDDT                              
---------- ------------------------------ ------------------------------        
        17 Record 17                      23-JUL-16 10.12.45.000000 AM          
        16 Record 16                      27-JUL-16 10.12.45.000000 AM          
        15 Record 15                      31-JUL-16 10.12.45.000000 AM          
        14 Record 14                      04-AUG-16 10.12.45.000000 AM          
        13 Record 13                      08-AUG-16 10.12.45.000000 AM          
        12 Record 12                      12-AUG-16 10.12.45.000000 AM          
        11 Record 11                      16-AUG-16 10.12.45.000000 AM          
        10 Record 10                      20-AUG-16 10.12.45.000000 AM          
         9 Record 9                       24-AUG-16 10.12.45.000000 AM          
         8 Record 8                       28-AUG-16 10.12.45.000000 AM          
         7 Record 7                       01-SEP-16 10.12.45.000000 AM          

      MYID MYDATA                         RECORDDT                              
---------- ------------------------------ ------------------------------        
         6 Record 6                       05-SEP-16 10.12.45.000000 AM          
         5 Record 5                       09-SEP-16 10.12.45.000000 AM          
         4 Record 4                       13-SEP-16 10.12.45.000000 AM          
         3 Record 3                       17-SEP-16 10.12.45.000000 AM          
         2 Record 2                       21-SEP-16 10.12.45.000000 AM          
         1 Record 1                       25-SEP-16 10.12.45.000000 AM          

182 rows selected.

SQL> 

Using the correct data type for the data being used is critical; storing data in a format other than what should be used can create problems with applications and can result in incorrect values. Analytical functions can produce erroneous output when data isn’t using the correct type, especially functions like lead(), lag() and sum() that can partition and order data to provide ‘rolling’ results. And if you can manipulate the data to provide correct answers it usually requires much more work than would be necessary using the proper data type. Keep that in mind when you design tables and applications.

Context is everything.

Next Page »

Blog at WordPress.com.