"But I suppose there's a lot to see everywhere, if only you keep your eyes open."
-- Norton Juster, The Phantom Tollbooth
Recently a tweet by Chris Saxon (@chrisrsaxon) with respect to SQL, the language and implementation, started a list of contributions which should cause people to think a bit differently about how the language works and how queries should be structured. The list, found here, is by no stretch of the imagination complete but it is a very good place to begin re-thinking how to write SQL. Let’s look at a few of the listed items a bit further (since the list is likekly to grow not all of the list members can be covered).
Franck Pachot start the list with the following items:
1. SQL is a declarative language
2. It is compiled to procedural operations
3. It operates on row sets
So what, exactly, is a “declarative language”? According to the accepted definition it is:
... non-procedural and very high-level (4th generation). This means the programmer specifies what
needs to be done rather than how to do it.
Unlike languages such as C/C++, Pascal, FORTRAN, BASIC, Java, Perl and Python, the SQL language simply requires that you decide what you want done and, in the case of Oracle, the optimizer decides HOW to get it done. That doesn’t mean that you can’t have some input to influence the ‘how’, it simply means you don’t need to think about it. Certainly hints are available to ‘nudge’ the optimizer in the direction more experienced coders may think is ‘best’, but there is no requirement to use them.
His second item is a reference to PL/SQL, the procedural component to Oracle’s SQL*Plus, the ADA-based language extension where procedures, functions, triggers and anonymous procedural blocks are written. Yes, you can go happily along with many tasks and not need PL/SQL but it does come in handy when repeated tasks are necessary or when intermediate feedback (well, sort of) is desired. [Pipelined functions do provide immediate feedback during execution, but they are a special case.] Many PL/SQL examples have been posted here, so look around and they can be found.
His third item was addressed by Jonathan Lewis in an Oracle Forums response; let’s look at the problem that initiated the set-based response and Jonathan Lewis’ offering.
A query and part of the execution plan were posted with the poster asking how to get rid of the table scan. The table has a primary key index so, on the face of it, it is interesting that the primary key wasn’t used.
Looking at the definition for the ‘problem’ table (LF_HOTEL_TEMP) we find only two columns, both of which are in the primary key definition. The tables were created:
SQL> create table LF_HOTEL_TEMP
2 (
3 HOTEL_CODE VARCHAR2(4000),
4 SERVICE_ID NUMBER(3)
5 )
6 /
Table created.
SQL>
SQL> create table LF_TS_ROOMTYPE_PROPERTIES
2 (
3 HOTEL_CODE VARCHAR2(20),
4 ROOM_TYPE VARCHAR2(500),
5 BOARD_TYPE VARCHAR2(500),
6 ROOM_AMT FLOAT,
7 SERVICE_ID NUMBER,
8 CURRENCY_CODE VARCHAR2(10)
9 )
10 /
Table created.
SQL>
SQL> create table LF_HB_ROOMTYPE_PROPERTIES
2 (
3 HOTEL_CODE VARCHAR2(20),
4 ROOM_TYPE VARCHAR2(500),
5 BOARD_TYPE VARCHAR2(500),
6 ROOM_AMT FLOAT,
7 SERVICE_ID NUMBER,
8 CURRENCY_CODE VARCHAR2(10)
9 )
10 /
Table created.
SQL>
SQL> create table LF_GTA_ROOMTYPE_PROPERTIES
2 (
3 HOTEL_CODE VARCHAR2(20),
4 ROOM_TYPE VARCHAR2(500),
5 BOARD_TYPE VARCHAR2(500),
6 ROOM_AMT FLOAT,
7 SERVICE_ID NUMBER,
8 CURRENCY_CODE VARCHAR2(10)
9 )
10 /
Table created.
SQL>
and then the primary key and additional indexes were created:
SQL> alter table lf_hotel_temp
2 add constraint lf_hotel_temp_PK primary key (HOTEL_CODE,service_id)
3 /
Table altered.
SQL>
SQL> create index LF_hb_roomtype_prop_IDX on lf_hb_roomtype_properties (HOTEL_CODE)
2 /
Index created.
SQL>
SQL> create index LF_ts_roomtype_prop_IDX on lf_ts_roomtype_properties (HOTEL_CODE)
2 /
Index created.
SQL>
SQL> create index LF_gta_roomtype_prop_IDX on lf_gta_roomtype_properties (HOTEL_CODE)
2 /
Index created.
SQL>
The original poster reported 278,000 rows in the LF_HOTEL_TEMP table so data was generated to reproduce that number of rows. Next the remaining tables were populated so that none of the three remaining tables shared any data:
SQL> begin
2 for i in 1..278000 loop
3 insert into lf_hotel_temp
4 values(i, mod(i,999)+1);
5 end loop;
6
7 commit;
8 end;
9 /
PL/SQL procedure successfully completed.
SQL>
SQL> insert into lf_ts_roomtype_properties
2 (hotel_code)
3 select hotel_code from lf_hotel_temp
4 where service_id insert into lf_ts_roomtype_properties
2 (hotel_code)
3 select hotel_code from lf_hotel_temp
4 where service_id between 313 and 642;
91740 rows created.
SQL> insert into lf_gta_roomtype_properties
2 (hotel_code)
3 select hotel_code from lf_hotel_temp
4 where service_id between 271 and 823;
153743 rows created.
SQL> insert into lf_hb_roomtype_properties
2 (hotel_code)
3 select hotel_code from lf_hotel_temp
4 where service_id between 571 and 999;
119262 rows created.
SQL> COMMIT;
Commit complete.
SQL>
Let’s look at the original query and its execution plan:
SQL> set autotrace on linesize 140
SQL>
SQL> -- original query
SQL> SELECT a.hotel_code
2 FROM lf_hotel_temp a
3 WHERE a.service_id = : p_service_id
4 AND (NOT EXISTS (SELECT *
5 FROM lf_ts_roomtype_properties b
6 WHERE a.hotel_code = b.hotel_code)
7 or NOT EXISTS (SELECT *
8 FROM lf_gta_roomtype_properties b
9 WHERE a.hotel_code = b.hotel_code)
10 or NOT EXISTS (SELECT *
11 FROM lf_hb_roomtype_properties b
12 WHERE a.hotel_code = b.hotel_code));
HOTEL_CODE
-------------------------------------------------------------------
1998
999
5994
...
243756
235764
238761
278 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 4111332730
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 33 | 512 (4)| 00:00:01 |
|* 1 | FILTER | | | | | |
|* 2 | TABLE ACCESS FULL| LF_HOTEL_TEMP | 278 | 3058 | 94 (16)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | LF_TS_ROOMTYPE_PROP_IDX | 1 | 7 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | LF_GTA_ROOMTYPE_PROP_IDX | 1 | 7 | 1 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | LF_HB_ROOMTYPE_PROP_IDX | 1 | 7 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT 0 FROM "LF_TS_ROOMTYPE_PROPERTIES" "B" WHERE
"B"."HOTEL_CODE"=:B1) OR NOT EXISTS (SELECT 0 FROM "LF_GTA_ROOMTYPE_PROPERTIES" "B"
WHERE "B"."HOTEL_CODE"=:B2) OR NOT EXISTS (SELECT 0 FROM "LF_HB_ROOMTYPE_PROPERTIES"
"B" WHERE "B"."HOTEL_CODE"=:B3))
2 - filter("A"."SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
3 - access("B"."HOTEL_CODE"=:B1)
4 - access("B"."HOTEL_CODE"=:B1)
5 - access("B"."HOTEL_CODE"=:B1)
Statistics
----------------------------------------------------------
11 recursive calls
0 db block gets
2077 consistent gets
872 physical reads
0 redo size
4119 bytes sent via SQL*Net to client
570 bytes received via SQL*Net from client
20 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
278 rows processed
SQL>
All but LF_HOTEL_TEMP use the indexes to speed access to the data; this may be due to the fact that LF_HOTEL_TEMP is a two-column table and the priomary key index will be larger than the table since it contains the table data and a rowid for each row. The main idea illustrated by this problem involves being able to find a logically equivalent way of writing the SQL. Jonathan Lewis decided that set operations might produce a ‘better’ plan and re-wrote the query as shown below, producing a plan accessing LF_HOTEL_TEMP by the primary key index:
SQL> -- JL solution
SQL> var p_service_id number
SQL> exec : p_service_id := 1
PL/SQL procedure successfully completed.
SQL>
SQL> select /*+ dynamic_sampling(0) */
2 hotel_code
3 from lf_hotel_temp
4 where service_id = :p_service_id
5 minus (
6 select hotel_code
7 from lf_ts_roomtype_properties
8 where hotel_code is not null
9 intersect
10 select hotel_code
11 from lf_gta_roomtype_properties
12 where hotel_code is not null
13 intersect
14 select hotel_code
15 from lf_hb_roomtype_properties
16 where hotel_code is not null
17 )
18 ;
HOTEL_CODE
---------------------------------------------------------------
100899
101898
102897
...
999
9990
99900
278 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 775735246
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2159 | 4 (100)| 00:00:01 |
| 1 | MINUS | | | | | |
| 2 | SORT UNIQUE NOSORT | | 1 | 2015 | 1 (100)| 00:00:01 |
|* 3 | INDEX FULL SCAN | LF_HOTEL_TEMP_PK | 1 | 2015 | 0 (0)| 00:00:01 |
| 4 | INTERSECTION | | | | | |
| 5 | INTERSECTION | | | | | |
| 6 | SORT UNIQUE NOSORT| | 4 | 48 | 1 (100)| 00:00:01 |
|* 7 | INDEX FULL SCAN | LF_TS_ROOMTYPE_PROP_IDX | 4 | 48 | 0 (0)| 00:00:01 |
| 8 | SORT UNIQUE NOSORT| | 4 | 48 | 1 (100)| 00:00:01 |
|* 9 | INDEX FULL SCAN | LF_GTA_ROOMTYPE_PROP_IDX | 4 | 48 | 0 (0)| 00:00:01 |
| 10 | SORT UNIQUE NOSORT | | 4 | 48 | 1 (100)| 00:00:01 |
|* 11 | INDEX FULL SCAN | LF_HB_ROOMTYPE_PROP_IDX | 4 | 48 | 0 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
filter("SERVICE_ID"=TO_NUMBER(:P_SERVICE_ID))
7 - filter("HOTEL_CODE" IS NOT NULL)
9 - filter("HOTEL_CODE" IS NOT NULL)
11 - filter("HOTEL_CODE" IS NOT NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
5996 consistent gets
2416 physical reads
234680 redo size
4119 bytes sent via SQL*Net to client
570 bytes received via SQL*Net from client
20 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
278 rows processed
SQL>
Notice that using set-based processing improved the execution plan. Because of the first item on this list it’s easy to forget that SQL is a set-based implementation; sometimes it takes ‘interesting’ problems to lead us down the set-based pathway.
As the tweet thread progressed Jeff Smith (@thatjeffsmith) offered additional items that, not specific to SQL, are good to remember nonetheless:
4. getting results back doesn't mean your code is right.
5. learning never stops
Number 4 is an important lesson to remember; just because you get data doesn’t necessarily mean that data is right. Many times queries are written and run only to discover that the join conditions aren’t correct or are missing entirely, making the results useless. Always think through the process, and, if possible, manually generate a few rows to know what SHOULD be returned. Comparing those to the query results should show if the logic in the query is correct.
That learning never stops is one reason blogs are written and read; there’s always something new, different, or changed that makes ‘resting on one’s laurels’ nearly impossible, especially in IT. A day doesn’t go by that I don’t learn something, and, to be honest, if that day ever comes I’ll consider it wasted.
There are more items in that list that I won’t mention here but just because they aren’t commented on doesn’t make them any less important. Read through the list (the link provided should bring up the latest chain of tweets) and ponder those responses. It’s time to think about SQL anew, especially if some of these comments weren’t in your idea of what SQL is and what it does. Perspectives change, and ideas along with them, so never stop learning. The more you know, the more you can grow.
Santa isn’t the only one who makes lists, you know.