"Don't you know anything at all about numbers?"
"Well, I don't think they're very important," snapped Milo, too embarrassed to admit the truth.
"NOT IMPORTANT!" roared the Dodecahedron, turning red with fury. "Could you have tea for two without the two, or three blind mice
without the three? Would there be four corners of the earth if there weren't a four? And how would you sail the seven seas without
a seven?"
"All I meant was..." began Milo, but the Dodecahedron, overcome with emotion and shouting furiously, carried right on.
"If you had high hopes, how would you know how high they were? And did you know that narrow escapes come in all different widths?
Would you travel the whole wide world without ever knowing how wide it was? And how could you do anything at long last," he concluded,
waving his arms over his head, "without knowing how long the last was? Why, numbers are the most beautiful and valuable things in the
world. Just follow me and I'll show you." He turned on his heel and stalked off into the cave.
-- Norton Juster, The Phantom Tollbooth

In one of the current Oracle forums an “old saw” was resurrected, namely that a single-column primary key has no need of a histogram. This presumes that the primary key has no ‘holes’ in the data and has a continuous range of data. The reality of that situation is such primary keys are rarely, if ever, unbroken and the range of data may not be continuous. With that last sentence in mind it’s not strange to consider a histogram on a single-column primary key. And depending on the release of Oracle in use that histogram may be an old standard (a height-balanced histogram) or a new histogram offered in 12c, the hybrid histogram. Let’s look at an example how this plays out in 11.2.0.4 and in 12.1.0.2. First, though, let’s look at that new Hybrid histogram.

The Hybrid histogram is a new type of histogram that uses a number of buckets less than the number of distinct values in the table data; it also determines, and utilizes, the repeat frequency of the endpoint values. The bucket size can be adjusted from the initial size based on endpoint values; there are two special buckets which will not change in size, and these are the ‘boundary’ buckets, bucket 0 and the last bucket of the histogram. The Hybrid histogram is based on four apparent ‘rules’, which are:

1) a value should not be found in more than one bucket
2) the bucket size is allowed to be extended in order to contain all instances of the same distinct value
3) adjusted bucket size cannot be less than the original size (not applicable at either end of the data set)
4) the original number of buckets should not be reduced

One way to create a hybrid histogram is shown in the example below, part of which was provided by Jonathan Lewis, the remainder of this example was adapted from work done by Mohammed Houri. We begin:

SQL>
SQL> --
SQL> -- Create and populate a table so we can
SQL> -- build a hybrid histogram
SQL> --
SQL> -- Thanks to Jonathan Lewis for generating
SQL> -- this table and data set
SQL> --
SQL> create table t1 (id number, n1 number);
Table created.
Elapsed: 00:00:00.01
SQL>
SQL> @InsT1.sql
Elapsed: 00:00:00.00
SQL>

Choosing a number of buckets less than the number of distinct values in the table is one way to create a hybrid histogram:

SQL>
SQL> --
SQL> -- Return the number of distinct values
SQL> -- in column n1
SQL> --
SQL>
SQL> select count(distinct n1) from t1;
COUNT(DISTINCTN1)
-----------------
37
Elapsed: 00:00:00.00
SQL>

20 buckets will be used for our histogram and use DBMS_STATS to create it:

SQL>
SQL> --
SQL> -- Create a hybrid histogram on column n1
SQL> -- by setting the number of buckets less
SQL> -- than the number of distinct values
SQL> --
SQL> BEGIN
2 dbms_stats.gather_table_stats
3 (user, 't1', method_opt => 'for columns n1 size 20');
4 END;
5 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.43
SQL>

During the creation of this histogram Oracle places the column values in order to find values that occur in more than one bucket. When such values are found new bucket sizes are computed for all but the boundary buckets, in this case bucket 0 and bucket 19:

SQL>
SQL> --
SQL> -- Four rules exist to create a hybrid histogram, and the
SQL> -- first two are:
SQL> --
SQL> -- a value should not be found in more than one bucket
SQL> -- a bucket size is allowed to be extended in order to contain
SQL> -- all instances of the same value
SQL> --
SQL> -- Compute the 'new' bucket size based on the data and
SQL> -- endpoint values
SQL> --
SQL> -- To illustrate how the bucket size adjustment is
SQL> -- calculated let's look at the value 13 in the data
SQL> --
SQL> -- Oracle 'walks' the ordered list of values until
SQL> -- it finds 13 in two buckets
SQL> --
SQL> -- Oracle extends the bucket where 13 is first found (bucket n)
SQL> -- and moves all other instances of 13 (from bucket n+1)
SQL> -- to this bucket (n)
SQL> --
SQL> -- Moving all instances of 13 to a single bucket
SQL> -- also makes 13 an endpoint value allowing Oracle
SQL> -- to compute the repeat count of that value
SQL> --
SQL> -- This continues on through the values so that
SQL> -- Oracle can place all occurrences of a distinct
SQL> -- value so they don't populate more than one bucket
SQL> --
SQL> SELECT
2 (row_number() over(order by ept_nbr)-1) NumBucket
3 ,ept_nbr
4 ,ept_act_val
5 ,rpt_cnt
6 ,ept_nbr - (lag(ept_nbr,1,0) over(order by ept_nbr)) "new bucket size"
7 ,bucket_size "original bucket_size"
8 FROM
9 (SELECT
10 ah.endpoint_number ept_nbr
11 ,ah.endpoint_actual_value ept_act_val
12 ,lag(ah.endpoint_number,1,0) over(order by ah.endpoint_number) ept_lag
13 ,ah.endpoint_repeat_count rpt_cnt
14 ,at.sample_size/at.num_buckets bucket_size
15 FROM
16 user_tab_histograms ah
17 ,user_tab_col_statistics at
18 WHERE ah.table_name = at.table_name
19 AND ah.column_name = at.column_name
20 AND ah.table_name = 'T1'
21 AND ah.column_name = 'N1'
22 ) ORDER BY ept_nbr;
NUMBUCKET EPT_NBR EPT_ACT_VAL RPT_CNT new bucket size original bucket_size
---------- ---------- ------------ ---------- --------------- --------------------
0 1 8 1 1 5
1 6 13 3 5 5
2 12 18 2 6 5
3 20 20 5 8 5
4 26 23 2 6 5
5 32 26 3 6 5
6 38 27 6 6 5
7 44 28 6 6 5
8 50 29 6 6 5
9 58 31 5 8 5
10 69 33 8 11 5
NUMBUCKET EPT_NBR EPT_ACT_VAL RPT_CNT new bucket size original bucket_size
---------- ---------- ------------ ---------- --------------- --------------------
11 79 35 7 10 5
12 86 38 5 7 5
13 90 41 1 4 5
14 92 42 2 2 5
15 95 43 3 3 5
16 96 44 1 1 5
17 97 45 1 1 5
18 98 46 1 1 5
19 100 59 1 2 5
20 rows selected.
Elapsed: 00:00:00.21
SQL>

Notice that for buckets 1-12 the bucket size increased; for the remaining non-boundary buckets the bucket size should decrease. But according to Rule 3 no bucket can be adjusted to a size smaller than the original bucket size so those buckets remain unadjusted. The bucket size computation in the query results below is based on the sample size divided by the number of buckets and, thus, displays a constant value for all buckets:

SQL>
SQL> --
SQL> -- Given that boundary values are the exception computing the new
SQL> -- bucket sizes shows that for bucket numbers 1-12 the sizes are
SQL> -- the same or larger than the originally computed size (5)
SQL> --
SQL> -- All remaining non-endpoint buckets compute to a smaller size
SQL> -- than the original
SQL> --
SQL> -- Now the third rule appears:
SQL> --
SQL> -- No non-endpoint bucket can be smaller than the originally
SQL> -- computed size
SQL> --
SQL> select
2 uth.endpoint_number
3 ,uth.endpoint_actual_value
4 ,uth.endpoint_repeat_count
5 ,ucs.sample_size/ucs.num_buckets bucket_size
6 ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
7 from
8 user_tab_histograms uth
9 ,user_tab_col_statistics ucs
10 where
11 uth.table_name = ucs.table_name
12 and uth.column_name = ucs.column_name
13 and uth.table_name = 'T1'
14 and uth.column_name = 'N1'
15 order by uth.endpoint_number;
ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULARITY
--------------- ------------ --------------------- ----------- ----------
1 8 1 5 -4
6 13 3 5 -2
12 18 2 5 -3
20 20 5 5 0
26 23 2 5 -3
32 26 3 5 -2
38 27 6 5 1
44 28 6 5 1
50 29 6 5 1
58 31 5 5 0
69 33 8 5 3
ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULARITY
--------------- ------------ --------------------- ----------- ----------
79 35 7 5 2
86 38 5 5 0
90 41 1 5 -4
92 42 2 5 -3
95 43 3 5 -2
96 44 1 5 -4
97 45 1 5 -4
98 46 1 5 -4
100 59 1 5 -4
20 rows selected.
Elapsed: 00:00:00.19
SQL>

For a Hybrid histogram three types of column value exist to base cardinality estimates upon: popular values, non-popular values which are endpoint and non-popular values that are not an endpoint. Let’s look at how Oracle estimates the cardinality for each case, starting with popular values:

SQL>
SQL> --
SQL> -- Display the endpoint, value, repeat count,
SQL> -- bucket size and the 'popularity'
SQL> --
SQL> -- Popularity is determined by computing the
SQL> -- difference between the frequency of appearance
SQL> -- (repeat count) and the bucket size (sample size/number of buckets)
SQL> -- Positive values are considered 'popular'
SQL> --
SQL> select
2 endpoint_number
3 ,endpoint_actual_value
4 ,endpoint_repeat_count
5 ,bucket_size
6 ,case when Popularity > 0 then 'Pop'
7 else 'Non-Pop'
8 end Popularity
9 from
10 (
11 select
12 uth.endpoint_number
13 ,uth.endpoint_actual_value
14 ,uth.endpoint_repeat_count
15 ,ucs.sample_size/ucs.num_buckets bucket_size
16 ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
17 from
18 user_tab_histograms uth
19 ,user_tab_col_statistics ucs
20 where
21 uth.table_name = ucs.table_name
22 and uth.column_name = ucs.column_name
23 and uth.table_name = 'T1'
24 and uth.column_name = 'N1'
25 )
26 order by endpoint_number;
ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
--------------- ------------ --------------------- ----------- -------
1 8 1 5 Non-Pop
6 13 3 5 Non-Pop
12 18 2 5 Non-Pop
20 20 5 5 Non-Pop
26 23 2 5 Non-Pop
32 26 3 5 Non-Pop
38 27 6 5 Pop
44 28 6 5 Pop
50 29 6 5 Pop
58 31 5 5 Non-Pop
69 33 8 5 Pop
ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
--------------- ------------ --------------------- ----------- -------
79 35 7 5 Pop
86 38 5 5 Non-Pop
90 41 1 5 Non-Pop
92 42 2 5 Non-Pop
95 43 3 5 Non-Pop
96 44 1 5 Non-Pop
97 45 1 5 Non-Pop
98 46 1 5 Non-Pop
100 59 1 5 Non-Pop
20 rows selected.
Elapsed: 00:00:00.19
SQL>

Let’s return only the popular values and work through the cardinality estimate Oracle generates:

SQL> --
SQL> -- Using the popularity Oracle estimates the cardinality
SQL> -- by considering the following three types of values:
SQL> --
SQL> -- popular value
SQL> -- non-popular value with an endpoint number
SQL> -- non-popular value not present in the histogram table
SQL> --
SQL> -- Case 1: Popular values
SQL> --
SQL> -- Display the 'popular' values in this data set
SQL> --
SQL> select
2 endpoint_actual_value
3 ,endpoint_repeat_count
4 ,bucket_size
5 ,Popularity
6 from
7 (
8 select
9 endpoint_number
10 ,endpoint_actual_value
11 ,endpoint_repeat_count
12 ,bucket_size
13 ,case when Popularity > 0 then 'Pop'
14 else 'Non-Pop'
15 end Popularity
16 from
17 (
18 select
19 uth.endpoint_number
20 ,uth.endpoint_actual_value
21 ,uth.endpoint_repeat_count
22 ,ucs.sample_size/ucs.num_buckets bucket_size
23 ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
24 from
25 user_tab_histograms uth
26 ,user_tab_col_statistics ucs
27 where
28 uth.table_name = ucs.table_name
29 and uth.column_name = ucs.column_name
30 and uth.table_name = 'T1'
31 and uth.column_name = 'N1'
32 )
33 )
34 where Popularity = 'Pop';
ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
27 6 5 Pop
28 6 5 Pop
29 6 5 Pop
33 8 5 Pop
35 7 5 Pop
Elapsed: 00:00:00.20
SQL>

Looking at one of these popular values (33) let’s return the cardinality estimate Oracle calculated:

SQL> --
SQL> -- Using explain plan the cardinality estimation
SQL> -- can be displayed for one of the 'popular' values
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 33;
Explained.
Elapsed: 00:00:00.00
SQL>
SQL> --
SQL> -- The cardinality estimate displayed below was
SQL> -- calculated using the following formula:
SQL> --
SQL> -- E-Rows = ENDPOINT_REPEAT_COUNT * num_rows/sample_size
SQL> -- E-Rows = 8 * 100/100 = 8
SQL> --
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 8 | 24 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
2 - filter("N1"=33)
14 rows selected.
Elapsed: 00:00:00.03
SQL>

From the formula found in the above comments our calculated cardinality matches with that provided by the optimizer, so our understanding of the Hybrid histogram seems sound. To continue that verification we consider the case where we have a non-popular value that is an endpoint. The formula for that cardinality calculation is: E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/sample_size). We calculate the NewDensity with a query based on the work of Alberto Dell’Era. First we return the non-popular values that are endpoints:

SQL> --
SQL> -- Case 2: Non-popular values with an endpoint
SQL> --
SQL> -- First return all non-popular values that are an
SQL> -- endpoint
SQL> --
SQL> select
2 endpoint_actual_value
3 ,endpoint_repeat_count
4 ,bucket_size
5 ,Popularity
6 from
7 (
8 select
9 endpoint_number
10 ,endpoint_actual_value
11 ,endpoint_repeat_count
12 ,bucket_size
13 ,case when Popularity > 0 then 'Pop'
14 else 'Non-Pop'
15 end Popularity
16 from
17 (
18 select
19 uth.endpoint_number
20 ,uth.endpoint_actual_value
21 ,uth.endpoint_repeat_count
22 ,ucs.sample_size/ucs.num_buckets bucket_size
23 ,(uth.endpoint_repeat_count - ucs.sample_size/ucs.num_buckets) Popularity
24 from
25 user_tab_histograms uth
26 ,user_tab_col_statistics ucs
27 where
28 uth.table_name = ucs.table_name
29 and uth.column_name = ucs.column_name
30 and uth.table_name = 'T1'
31 and uth.column_name = 'N1'
32 )
33 )
34 where Popularity = 'Non-Pop';
ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
8 1 5 Non-Pop
13 3 5 Non-Pop
18 2 5 Non-Pop
20 5 5 Non-Pop
23 2 5 Non-Pop
26 3 5 Non-Pop
31 5 5 Non-Pop
38 5 5 Non-Pop
41 1 5 Non-Pop
42 2 5 Non-Pop
43 3 5 Non-Pop
ENDPOINT_ACT ENDPOINT_REPEAT_COUNT BUCKET_SIZE POPULAR
------------ --------------------- ----------- -------
44 1 5 Non-Pop
45 1 5 Non-Pop
46 1 5 Non-Pop
59 1 5 Non-Pop
15 rows selected.
Elapsed: 00:00:00.20
SQL>

Let’s return Oracle’s estimated cardinality for one of these values (45):

SQL> --
SQL> -- Use explain plan to return the estimated cardinality
SQL> --
SQL> -- In this case the estimated cardinality is computed as:
SQL> --
SQL> -- E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/sample_size)
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 45;
Explained.
Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 2 | 6 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
---------------------------------------------------
2 - filter("N1"=45)
14 rows selected.
Elapsed: 00:00:00.02
SQL>

Now let’s use our query to return the value for NewDensity:

SQL> --
SQL> -- The NewDensity is computed by an internal algorithm but
SQL> -- we can get a reliable density using the following query
SQL> -- based on work done by Alberto Dell'Era:
SQL> --
SQL> SELECT
2 BktCnt
3 ,PopBktCnt
4 ,PopValCnt
5 ,NDV
6 ,pop_bucketSize
7 ,trunc(((BktCnt-PopBktCnt)/BktCnt)/(NDV-PopValCnt),10) NewDensity
8 FROM
9 (SELECT
10 COUNT(1) PopValCnt,
11 SUM(endpoint_repeat_count) PopBktCnt,
12 ndv,
13 BktCnt,
14 pop_bucketSize
15 FROM
16 (SELECT
17 (sample_size - num_nulls) BktCnt,
18 num_distinct ndv,
19 num_buckets,
20 density OldDensity,
21 (sample_size-num_nulls)/num_buckets pop_bucketSize
22 FROM user_tab_col_statistics
23 WHERE
24 table_name = 'T1'
25 AND column_name = 'N1'
26 ),
27 user_histograms
28 WHERE table_name = 'T1'
29 AND column_name = 'N1'
30 AND endpoint_repeat_count> pop_bucketSize
31 GROUP BY ndv,
32 BktCnt,
33 pop_bucketSize
34 );
BKTCNT POPBKTCNT POPVALCNT NDV POP_BUCKETSIZE NEWDENSITY
---------- ---------- ---------- ---------- -------------- ----------
100 33 5 37 5 .0209375
Elapsed: 00:00:00.19
SQL>

We now see if our calculation returned a valid value by ‘plugging’ it into the following equation: E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/ sample_size). The cardinality estimate we compute is:

SQL> --
SQL> -- 'Plugging' this new density into the equation above produces:
SQL> --
SQL> -- E-Rows = num_rows * greatest (NewDensity, ENDPOINT_REPEAT_COUNT/ sample_size)
SQL> -- E-Rows = 100 * greatest (.0209375, 1/100) = 2.09375 ~ 2
SQL> --
SQL> -- which matches the value returned by Oracle
SQL> --
SQL>
SQL> --
SQL> -- To validate these findings use a different non-popular value
SQL> -- that's also an endpoint:
SQL> --
SQL> explain plan for select count(1) from t1 where n1 = 43;
Explained.
Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 3 | 9 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
2 - filter("N1"=43)
14 rows selected.
Elapsed: 00:00:00.02
SQL>

Oracle returned 3 as the estimated cardinality and so did our calculation (E-Rows = 100 * greatest (.0209375, 3/100) = 3), more proof that our concept of the Hybrid histogram is correct. One final test will prove how well we understand this histogram, considering non-popular values that are not endpoints. Using our calculated value of NewDensity in this equation: E-Rows = num_rows * NewDensity gives us the following result: E-Rows = 100 * .0209375 = 2.09375 ~ 2. Time to see if our calculated value matches that which Oracle has estimated:

SQL>
SQL> --
SQL> -- Case 3: Non-popular value without an endpoint number
SQL> --
SQL> -- This calculation is fairly simple and straightforward:
SQL> --
SQL> -- E-Rows = num_rows * NewDensity = 100 * .0209375 = 2.09375 ~ 2
SQL> --
SQL>
SQL> explain plan for select count(1) from t1 where n1 = 17;
Explained.
Elapsed: 00:00:00.00
SQL>
SQL> select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3724264953
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 3 | | |
|* 2 | TABLE ACCESS FULL| T1 | 2 | 6 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
2 - filter("N1"=17)
14 rows selected.
Elapsed: 00:00:00.02
SQL>

Our calculated value, as evidenced by the Rows value in the execution plan, matches Oracle’s estimate, giving further evidence that our contcspt of the Hybrid histogram is sound.

`

As mentioned earlier in this article Hybrid histograms can exhibit instability in the form of endpoint changes based on the changing frequency of values as inserts are executed against the table. To illustrate this let’s add another 16 to the data and see what happens:

SQL>
SQL> --
SQL> -- Hybrid histograms can display instability in terms of
SQL> -- endpoints and bucket size
SQL> --
SQL> -- To illustrate this we add 16 to the data, increasing
SQL> -- the repeat count to 3, causing this newly added 16
SQL> -- to shift into the previous bucket leaving 17
SQL> -- as the new endpoint
SQL> --
SQL> insert into t1 values (2, 16);
1 row created.
Elapsed: 00:00:00.00
SQL>
SQL> BEGIN
2 dbms_stats.gather_table_stats
3 (user, 't1', method_opt => 'for all columns size 20');
4 END;
5 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.08
SQL>
SQL> select
2 uth.endpoint_number
3 ,uth.endpoint_actual_value
4 ,uth.endpoint_repeat_count
5 from
6 user_tab_histograms uth
7 ,user_tables ut
8 ,user_tab_col_statistics ucs
9 where
10 uth.table_name = 'T1'
11 and uth.column_name = 'N1'
12 and uth.table_name = ut.table_name
13 and ut.table_name = ucs.table_name
14 and uth.column_name = ucs.column_name
15 order by uth.endpoint_number;
ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
1 8 1
6 13 3
11 17 1 <----
16 19 3
21 20 5
27 23 2
33 26 3
39 27 6
45 28 6
51 29 6
59 31 5
ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
70 33 8
80 35 7
87 38 5
91 41 1
96 43 3
97 44 1
98 45 1
99 46 1
101 59 1
20 rows selected.
Elapsed: 00:00:00.24

The endpoint value of 18 has been replaced by the value 17, because of Rule 2, that no value can span buckets. Before the insert the histogram bucket looked like this: 8 12 12 13 13 13|15 16 16 17 18 18|19 19 19 20 20 20 20 20| … After the insert the ordered value list looked like this: 8 12 12 13 13|13 15 16 16 16|17 18 18 19 19 19 … After Oracle ‘walks’ through the list to consolidate like values the buckets look like this: 8 12 12 13 13 13| 15 16 16 16 17 |18 18 19 19 19… giving a new endpoint number of 17. If we add yet another 16 to the data set we see:

SQL>
SQL> --
SQL> -- Add one more 16 and the endpoint shifts to 16
SQL> --
SQL> insert into t1 values (3, 16);
1 row created.
Elapsed: 00:00:00.00
SQL>
SQL> BEGIN
2 dbms_stats.gather_table_stats
3 (user, 't1', method_opt => 'for all columns size 20');
4 END;
5 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.03
SQL>
SQL> select
2 uth.endpoint_number
3 ,uth.endpoint_actual_value
4 ,uth.endpoint_repeat_count
5 from
6 user_tab_histograms uth
7 ,user_tables ut
8 ,user_tab_col_statistics ucs
9 where
10 uth.table_name = 'T1'
11 and uth.column_name = 'N1'
12 and uth.table_name = ut.table_name
13 and ut.table_name = ucs.table_name
14 and uth.column_name = ucs.column_name
15 order by uth.endpoint_number;
ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
1 8 1
6 13 3
11 16 4 <-----
17 19 3
22 20 5
28 23 2
34 26 3
40 27 6
46 28 6
52 29 6
60 31 5
ENDPOINT_NUMBER ENDPOINT_ACT ENDPOINT_REPEAT_COUNT
--------------- ------------ ---------------------
71 33 8
81 35 7
88 38 5
92 41 1
97 43 3
98 44 1
99 45 1
100 46 1
102 59 1
20 rows selected.
Elapsed: 00:00:00.22
SQL>

The endpoint has shifted again and is now 16.

Now that we know what a Hybrid histogram is and what it can do let’s look at another way to generate such a histogram in 12.1.0.2, and this goes back (finally!) to that “old saw” mentioned earlier. Let’s create a table with a single-column primary key and see what Oracle does with it when we run DBMS_STATS.

This example, again by Jonathan Lewis, creates a table and populates the id column (the eventual primary key for the table) in a rather interesting way:

SQL>
SQL> create table t1
2 as
3 with generator as (
4 select --+ materialize
5 rownum id
6 from dual
7 connect by
8 level <= 1e4
9 )
10 select
11 rownum id
12 from
13 generator v1,
14 generator v2
15 where
16 rownum <= 1e4
17 ;
Table created.
SQL>
SQL> column maxid1 new_value max1
SQL> column maxid2 new_value max2
SQL> column maxid3 new_value max3
SQL>
SQL> select min(id), max(id) maxid1
2 from t1;
MIN(ID) MAXID1
---------- ----------
1 10000
SQL>
SQL> insert into t1 select id + 1e6 from t1;
10000 rows created.
SQL>
SQL> select min(id), max(id) maxid2
2 from t1;
MIN(ID) MAXID2
---------- ----------
1 1010000
SQL>
SQL> insert into t1 select id + 1e7 from t1;
20000 rows created.
SQL>
SQL> select min(id), max(id) maxid3
2 from t1;
MIN(ID) MAXID3
---------- ----------
1 11010000
SQL>

Notice the values in the column; the values go from 1 to 10000 then jump to 1000001, continuing to 1010000, and jump again to 11000001 continuing on to 11010000. The breaks in the value sequence are what make this data set qualify for a Hybrid histogram in 12.1.0.2; in 11.2.0.4 a height-balanced histogram is created. Let’s add the primary key to the table:

`

SQL>
SQL> alter table t1 add constraint t1_pk primary key(id);
Table altered.

Now let’s query for data that isn’t in the table:

SQL>
SQL> select
2 /*+ dynamic_sampling(0) */
3 *
4 from t1
5 where
6 id between 12000 and 13000
7 ;
no rows selected
SQL>

Doing nothing special we generate statistics on the table; Oracle will automatically determine that the PK column needs a histogram and create it:

SQL> begin
2 dbms_stats.gather_table_stats(
3 ownname => user,
4 tabname =>'T1'
5 );
6 end;
7 /
PL/SQL procedure successfully completed.
SQL>
SQL> column column_name format a35
SQL> set linesize 150
SQL>
SQL> select
2 column_name, sample_size,
3 num_distinct, num_nulls, density,
4 histogram, num_buckets
5 from
6 user_tab_cols
7 where
8 table_name = 'T1'
9 order by
10 column_name
11 ;
COLUMN_NAME SAMPLE_SIZE NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM NUM_BUCKETS
----------------------------------- ----------- ------------ ---------- ---------- --------------- -----------
ID 5502 40000 0 .000025 HYBRID 254
SQL>

To prove that 11.2.0.4 will create a height-balanced histogram let’s run the example there:

SQL>
SQL> create table t1
2 as
3 with generator as (
4 select --+ materialize
5 rownum id
6 from dual
7 connect by
8 level <= 1e4
9 )
10 select
11 rownum id
12 from
13 generator v1,
14 generator v2
15 where
16 rownum user,
4 tabname =>'T1'
5 );
6 end;
7 /
PL/SQL procedure successfully completed.
SQL>
SQL> column column_name format a35
SQL> set linesize 150
SQL>
SQL> select
2 column_name, sample_size,
3 num_distinct, num_nulls, density,
4 histogram, num_buckets
5 from
6 user_tab_cols
7 where
8 table_name = 'T1'
9 order by
10 column_name
11 ;
COLUMN_NAME SAMPLE_SIZE NUM_DISTINCT NUM_NULLS DENSITY HISTOGRAM NUM_BUCKETS
----------------------------------- ----------- ------------ ---------- ---------- --------------- -----------
ID 5509 40000 0 .000024959 HEIGHT BALANCED 254
SQL>

So it would appear that having a histogram on a single-column primary key isn’t so far-fetched, after all. And we didn’t have to do anything special to get it created.

It all comes down to numbers.