Externally yours

September 12, 2023

Externally yours

Filed under: General — dfitzjarrell @ 13:53

Every so often it may be necessary to provide data that can change on a regular basis. Unfortunately for a “standard” table this can be complicated and tedious especially if the definition changes. Oracle provides a somewhat unique solution with the external table. Let’s see what that id, now to create one and how relatively simple it can be to alter the definition and load data.

An external table takes advantage of a tool named SQL*Loader, Oracle’s bulk data loading utility using flat files. Shown below is a scriipt to create just such a table:

CONNECT / AS SYSDBA;
drop table ch33.sec_audit_ext_tbl;
— create the external table
CREATE TABLE ch33.sec_audit_ext_tbl
(audit_dt varchar2(20),
db varchar2(12),
emp_name varchar2(80),
email varchar2(80),
acct_status varchar2(24),
priv_granted varchar2(45)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY data_pump_dir
ACCESS PARAMETERS
(
records delimited by newline
badfile data_pump_dir:’loader_tst%a_%p.bad’
logfile data_pump_dir:’loader_tst%a_%p.log’
fields terminated by ‘;’
missing field values are null
( audit_dt, db, emp_name,email, acct_status, priv_granted
)
)
LOCATION (‘user_privs_semicolon.lst’)
)
PARALLEL
REJECT LIMIT UNLIMITED;

It looks … interesting and possibly confusing, so let’s discuss each part in some detail. It starts, as any other table creation script, with the create table statement, including column names and data types:

CREATE TABLE ch33.sec_audit_ext_tbl
(audit_dt varchar2(20),
db varchar2(12),
emp_name varchar2(80),
email varchar2(80),
acct_status varchar2(24),
priv_granted varchar2(45)
)

From here it differs as it uses features found in SQL*Loader. The next section describes the table as external and provides the relevant information the utility needs to build and populate the table, like record delimiters,column delimiters, the directory where the source flat file is located and files to log the build and log bad records in the source file. Explaining this section further:

 ORGANIZATION EXTERNAL                                                                  <== Declare this an external table
 ( 
   TYPE ORACLE_LOADER                                                                   <== Use SQL*Loader to read and populate data
   DEFAULT DIRECTORY data_pump_dir                                                      <== Oracle directory where files will be read and written
   ACCESS PARAMETERS 
   ( 
     records delimited by newline                                                       <== Newlines define records
     badfile data_pump_dir:'loader_tst%a_%p.bad'                                        <== File where bad records will be recorded    
     logfile data_pump_dir:'loader_tst%a_%p.log' a                                      <== Log file
     fields terminated by ';'                                                           <== Column values are separated with semicolons
     missing field values are null                                                      <== "Missing" values are considered NULL
     ( audit_dt, db, emp_name,email, acct_status, priv_granted                          <== columns to populate in each row
     ) 
   ) 
   LOCATION ('user_privs_semicolon.lst')                                                <== File name to load
 )

Next is whether to use serial or parallel processing followed by the reject limit to reach before failing the table creation. Siince SQL*Loader is the tool of choice it will follow the usual rule for bulk data loading.

As this table relies upon the flat file used as the source the data can be re-loaded by simply modifying the file specified in the LOCATION parameter. Using this example let’s assume there are two possible flat files:

   user_privs_semicolon.lst  
   user_privs_semicolo2n.lst

This table can use one, the other, or both files as the data source. Obviously changing the file name is the first way to change the table data loaded. The second is to add file names, enclosed in single-quotes and separated by commas, to load multiple files at the same time. To make this a bit clearer presume a numeric data is used as part of the file name:

   user_privs_semicolon_01012023.lst  
   user_privs_semicolon_01312023.lst

`
Now we know the extract dates for this data and we can load either file individually or load them both. If the data changes regularly there may be weekly extracts:

   user_privs_semicolon_01012023.lst  
   user_privs_semicolon_01082023.lst  
   user_privs_semicolon_01152023.lst  
   user_privs_semicolon_01222023.lst  
   user_privs_semicolon_01292023.lst  
  ...

Given that frequency of file creation it’s possible to load data for roughly an entire month by defining LOCATION as follows:

   LOCATION ('user_privs_semicolon_01012023.lst','user_privs_semicolon_01082023.lst', user_privs_semicolon_01152023.lst','user_privs_semicolon_01222023.lst','user_privs_semicolon_01292023.lst')

Presuming no errors occur during the load phase of the external table creation all of the rows will appear in the external table. Changing the data is a matter of dropping the current table definition, editing the script to change flat files and executing the modified script.

External tables build quickly, yet return data a bit slower than a conventional table the first time they are accessed, due to the nature of SQL*Loader. Once a session as accessed the table no additional loading is necessary and results return with the same speed as data from permanent, physical tables. As long as the user maintains his or her current session queries will be speedy. Should the session be disconnected due to inactivity or a network interruption the table will need to be reloaded. As each session gets a memory-resident copy of the data many users can access the same data without any impact to query speed.

Just as simply the external table definition can change so if fields are added or removed from the extract simple edits to the source script can easily add or remove columns. Since external tables load the definition into memory they consume no physical storage and leave no “bodies” behind when dropped. For example if the Oracle username is to be included in the next extract the script would be changed to:

CONNECT / AS SYSDBA;
drop table ch33.sec_audit_ext_tbl;
— create the external table
CREATE TABLE ch33.sec_audit_ext_tbl
(audit_dt varchar2(20),
db varchar2(12),
username varchar2(35),
emp_name varchar2(80),
email varchar2(80),
acct_status varchar2(24),
priv_granted varchar2(45)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY data_pump_dir
ACCESS PARAMETERS
(
records delimited by newline
badfile data_pump_dir:’loader_tst%a_%p.bad’
logfile data_pump_dir:’loader_tst%a_%p.log’
fields terminated by ‘;’
missing field values are null
( audit_dt, db, username, emp_name,email, acct_status, priv_granted
)
)
LOCATION (‘user_privs_semicolon.lst’)
)
PARALLEL
REJECT LIMIT UNLIMITED;

The column is added to the definition and to the list of columns to populate on load. The table is now changed to include the Oracle username value and the column is located exactly where the developer wants it. It really is that simple.

Creating and using external tables is a fairly operation, once the basic structure of the external table creation command is understood. Simple edits make it convenient to change the table structure, the data loaded, or both, allowing the suer to tailor the table or tables as the data and/or query requirements dictate.

Data in external tables cannot be updated using the traditional “UPDATE schplitzenflibber …” syntax as they are read-only objects. Updating data requires that the source file be edited and the table rebuild/reloaded. This, of course, ensures that the data in= an external table cannot be inadvertently changed bu an errant insert/update/delete statement:

33prap22(uat001-wl-oradb): SYS > update ch33.sec_audit_ext_tbl set db = ‘SNARD’;
update ch33.sec_audit_ext_tbl set db = ‘SNARD’
*
ERROR at line 1:
ORA-30657: operation not supported on external organized table

Inserts and deletes will be met with the same error, making external tables ideal for storing reference data that cannot be changed by the end user.

Trailing spaces in fields are not removed by a VARCHAR2 declaration so such spaces must be removed. Edits to a source file can be made after the table is created and will be reflected in the output the next time the table is queried.

External tables aren’t for every situation, but they can be extremely handy when reference data can change frequently and the data is not to be updated by end users. They may take some practice in their setup but they load quickly, they can be modified with little effort, they occupy no physical storage and the data can’t be changed by conventional insert/update/delete statements. One down side to external tables is the data cannot be encrypted –no tablespace contains them so TDE can’t be used. Oracle support reports that there is no method to encrypt external table data, so external tables are not a good choice when PHI and/or PII is to be used. However since an external table can be used to store non-sensitive data any PII or PHI can be stored in a conventional table using TDE and encrypted tablesapces external tables are well-suited for non-sensitive lookup data that references any PII/PHI values.

External tables can be very useful when it’s necessary to augment table data with values not otherwise stored in the database. They can eliminate the tedious task of matching values from flat file to database columns by putting such data into a form SQL*Plus can access, saving time, energy and the patience of the end user. And, the more the end users can do for themselves the less likely they are to go to the DBA. It’s a win/win situation.

And we all like winning.

Oracle Tips and Tricks — David Fitzjarrell

September 12, 2023