Oracle Tips and Tricks — David Fitzjarrell

January 14, 2014

You’re Not In The Group

Filed under: General — dfitzjarrell @ 15:59

In yet another Oracle forum (yes, I’m in a number of them) the following question was posed:


I'm reading oracle 9i sql and there is an example of a subquery which I'm trying to write as a group by query.

 select empno, ename, sal
 from emp
 where sal =
 (select min(sal) from emp);

 This works but I don't understand why I can't write this as a group by expression

 select empno, ename, min(sal)
 from emp
 group by empno, ename
 having min(sal) = sal;

 I'd like to understand what is going on inside of oracle that makes the subquery work but the group by fail.

 Thanks.

It is an interesting question, with, I believe, an interesting answer, so let’s dive in and see what makes these two queries ‘tick’.

The first query uses a subquery to return the absolute minimum sal from the emp table:


SQL> select min(sal) from emp;

  MIN(SAL)
----------
       800
SQL>

There is only one reference value for the driving query to use:


SQL> select empno, ename, sal
  2  from emp
  3  where sal =
  4  (select min(sal) from emp);

     EMPNO ENAME             SAL
---------- ---------- ----------
      7369 SMITH             800

SQL>

This is not the case with the initial group by attempt, as it returns every sal value as a minimum due to the group by criteria:


SQL> select empno, ename, min(sal)
  2  from emp
  3  group by empno, ename;

     EMPNO ENAME        MIN(SAL)
---------- ---------- ----------
      7521 WARD             1250
      7959 QUEEN            5000
      7869 JACK             5000
      7566 JONES            2975
      7844 TURNER           1500
      7876 ADAMS            1100
      7499 ALLEN            1600
      7369 SMITH             800
      7782 CLARK            2450
      7839 KING             5000
      7698 BLAKE            2850

     EMPNO ENAME        MIN(SAL)
---------- ---------- ----------
      7900 JAMES             950
      7902 FORD             3000
      7788 SCOTT            3000
      7939 DUKE             5000
      7949 PRINCE           5000
      7654 MARTIN           1250
      7934 MILLER           1300

18 rows selected.

SQL>

[I've added a few rows to the table to help explain another example; for the standard EMP build you should return 14 rows.] Since there is no absolute minimum sal from the group by query, and since a non-grouped column can’t be used as a reference in a HAVING clause:


SQL> select empno, ename, min(sal)
  2  from emp
  3  group by empno, ename
  4  having min(sal) = sal;
having min(sal) = sal
                  *
ERROR at line 4:
ORA-00979: not a GROUP BY expression


SQL>

you will need to re-write the group by query slightly to return the same answer the original query returned:


SQL> select empno, ename, min_sal
  2  from
  3  (select empno, ename, min(sal) min_sal
  4  from emp
  5  group by empno, ename
  6  order by 3)
  7  where rownum = 1;

     EMPNO ENAME         MIN_SAL
---------- ---------- ----------
      7369 SMITH             800

SQL>

Now the group by query works, and, through some ROWNUM magic, returns the same answer as the original, non-group-by query. The key to understanding this is the fact that the group by query did NOT return a single reference value, as min(sal) was evaluated for each of the empno/ename pairs in the table, and all of those pairs are unique. (There would be a real problem in HR if that wasn’t the case for an employee table. True, employee names may not be unique but their employee identifier must be, which makes each pair of values unique, thus the min(sal) returns the sal value for that employee.) The following two queries are equivalent:


SQL> select empno, ename, min(sal)
  2  from emp
  3  group by empno, ename;

     EMPNO ENAME        MIN(SAL)
---------- ---------- ----------
      7521 WARD             1250
      7959 QUEEN            5000
      7869 JACK             5000
      7566 JONES            2975
      7844 TURNER           1500
      7876 ADAMS            1100
      7499 ALLEN            1600
      7369 SMITH             800
      7782 CLARK            2450
      7839 KING             5000
      7698 BLAKE            2850

     EMPNO ENAME        MIN(SAL)
---------- ---------- ----------
      7900 JAMES             950
      7902 FORD             3000
      7788 SCOTT            3000
      7939 DUKE             5000
      7949 PRINCE           5000
      7654 MARTIN           1250
      7934 MILLER           1300

18 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4067220884

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    18 |   594 |     4  (25)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    18 |   594 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| EMP  |    18 |   594 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
       1195  bytes sent via SQL*Net to client
        531  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         18  rows processed

SQL>
SQL> select empno, ename, sal
  2  from emp;

     EMPNO ENAME             SAL
---------- ---------- ----------
      7369 SMITH             800
      7499 ALLEN            1600
      7521 WARD             1250
      7566 JONES            2975
      7654 MARTIN           1250
      7698 BLAKE            2850
      7782 CLARK            2450
      7788 SCOTT            3000
      7839 KING             5000
      7844 TURNER           1500
      7876 ADAMS            1100

     EMPNO ENAME             SAL
---------- ---------- ----------
      7900 JAMES             950
      7902 FORD             3000
      7934 MILLER           1300
      7939 DUKE             5000
      7949 PRINCE           5000
      7959 QUEEN            5000
      7869 JACK             5000

18 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3956160932

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    18 |   594 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| EMP  |    18 |   594 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          4  recursive calls
          0  db block gets
         17  consistent gets
          0  physical reads
          0  redo size
       1196  bytes sent via SQL*Net to client
        531  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         18  rows processed

SQL>

Notice that the second, non-group-by query is slightly more efficient (as evidenced by the COST column in both plan outputs). Had there been more than one record with the same empno/ename combination, each with a different value for sal, the group by query would, indeed, return a smaller result set. It still wouldn’t return the single row the original query reports, at least not without using order by in an in-line view and ROWNUM like the ‘fixed’ example illustrates.

Understanding the data in a table is key to understanding the queries that return data from it. It wasn’t the query, but the data, that caused a correctly written (meaning no syntax or logic errors) group by query to, well, fail. And understanding what the group by query DOES return allows you to make adjustments in coding and logic which, in turn, results in the correct result being displayed.

Now, let’s rejoin the group.

About these ads

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 665 other followers

%d bloggers like this: