July | 2022 | Oracle Tips and Tricks -- David Fitzjarrell

July 28, 2022

It’s Magic?

Filed under: General — dfitzjarrell @ 16:45

Space may not be the final frontier, but it is one DBAs need to navigate. Oracle does make the task somewhat easier with locally managed tablespaces and one aspect of that management is the Space Management Coordinator process (SMCO}. This process, appearing in version 11.2 and all versions afterwards, does much to help manage space in tablespaces as it performs several tasks DBAs should be aware of. Let’s look at that process and what it covers.

The SMCO process coordinates the following space management tasks:

Performs proactive space allocation and space reclamation
Dynamically spawns secondary processes (Wnnn) space reclamation requires
Tablespace-level space (Extent) pre-allocation.
Pre-allocation refers to datafile extension, which occurs when a space request (extent allocation) operation is triggered by inserts or loading to a segment and contiguous space is not currently available in the tablespace. A file extension
request is initiated and the file is then incremented by the next defined allocation set for the datafile.

This, of course, requires that datafiles are set to autoextend for the tablespace of interest. SMCO expands the tablespace based on the history of extension activity. Such extensions are split evenly across all datafiles in the tablespace which have not reached their maxsize setting. These extensions are limited to 10% of the full tablespace size per hourly SMCO activity. As a reminder the full tablespace size is the total space across all datafiles in the tablespace at the time the action is taken.

In addition to the above mentioned operations the SMCO process is also responsible for the following:

Updating block and extent counts in SEG$ view for locally managed tablespaces after adding an extent.
Securefile lob segment pre-extension.
Securefile lob segment in-memory dispenser space pre-allocation.
Securefile lob segment space reclamation (moving free chunks from uncommitted free space area to committed free space area).
Temporary segment space reclamation.

Although the SMCO process starts at database startup the actual work is done via a related module. KTSJ is the space coordinator module, managing the Wnnn processes that actually perform the work. Querying V$SESSION from a typical running database we see:

SQL > select program, module from v$session
2 where module = ‘KTSJ’
3 order by 1;

PROGRAM MODULE

PROGRAM                                          MODULE
------------------------------------------------ ----------------------------------------------------------------
oracle@ip-1-2-3-4 (SMCO)                         KTSJ
oracle@ip-1-2-3-4 (W000)                         KTSJ
oracle@ip-1-2-3-4 (W001)                         KTSJ
oracle@ip-1-2-3-4 (W002)                         KTSJ
oracle@ip-1-2-3-4 (W003)                         KTSJ
oracle@ip-1-2-3-4 (W004)                         KTSJ
oracle@ip-1-2-3-4 (W005)                         KTSJ
oracle@ip-1-2-3-4 (W006)                         KTSJ
oracle@ip-1-2-3-4 (W007)                         KTSJ
oracle@ip-1-2-3-4 (W008)                         KTSJ
oracle@ip-1-2-3-4 (W009)                         KTSJ
oracle@ip-1-2-3-4 (W00A)                         KTSJ
oracle@ip-1-2-3-4 (W00B)                         KTSJ
oracle@ip-1-2-3-4 (W00C)                         KTSJ
oracle@ip-1-2-3-4 (W00D)                         KTSJ
oracle@ip-1-2-3-4 (W00E)                         KTSJ
oracle@ip-1-2-3-4 (W00F)                         KTSJ

17 rows selected.

16 Wnnn processes are spawned from SMCO via KTSJ, and it’s the KTSJ process that is reported in AWR and ASH reports. This can be reported in the following sections of an AWR:

Latch Miss Sources DB/Inst:BOB/BOB Snaps: 2691-2692
-> only latches with sleeps are shown
-> ordered by name, sleeps desc

Latch Name               Where                       Misses     Sleeps   Sleeps
------------------------ -------------------------- ------- ---------- --------
ksuosstats global area   ksugetosstat                     0         14       14
space background task la ktsj_grab_task                   0      2,455    4,723
space background task la ktsj_detach_task                 0      2,270        6
space background task la ktsj_smco_purgeexpired           0          4        0
unknown latch            kghalo                           0          6        3
unknown latch            kghfrunp: alloc: wait            0          1        0
unknown latch            kghupr1                          0          1        3
                          ------------------------------------------------------

<edited>

Enqueue Type (Request Reason)
------------------------------------------------------------------------------
    Requests    Succ Gets Failed Gets       Waits  Wt Time (s) Av Wt Time(ms)
------------ ------------ ----------- ----------- ------------ --------------
CR-Reuse Block Range (block range reuse ckpt)
          40           40           0           4            0            .00
JG-Job Scheduler1
     189,060      189,060           0           0            0            N/A
JG-Job Scheduler1 (queue lock)
     189,060      189,060           0           0            0            N/A
JG-Job Scheduler1 (q mem clnup lck)
      18,004       18,004           0           0            0            N/A
CF-Controlfile Transaction
       4,462        4,462           0           0            0            N/A
SJ-KTSJ Slave Task Cancel (Slave Task Cancel)
       2,518        2,520           0           0            0            N/A

<edited>

The SMCO process polls for work every hour, so the sleeps are expected. It hands off the work to KTSJ, which then coordinates work between the 16 Wnnn slaves so tablespaces can have space managed efficiently. [Other databases may show more Wnnn slaves, some less, depending upon the configuration. 16 is what the database used for this example configured.] As noted earlier these management tasks include pre-allocation of extents and data cleanup of temporary extents. These ‘temporary’ extents could be due to table/index creation (each table or index extent is considered temporary until the creation task has completed). Such ‘temporary’ extents reveal themselves when querying the DBA_EXTENTS view when tables or indexes are being created. An interesting value, numeric in nature, presents itself while the object creation is in process. Once the operation successfully completes the segment_name is replaced with the actual segment name the object has been given. Should the operation fail the KTSJ/Wnnn processes perform the necessary cleanup, recovering the allocated extents for the object that no longer exists. This cleanup occurs rapidly, returning the blocks to the freelists so other sessions/processes can use them if necessary.

Notice the “grab_task” and “detach_task” latches; these, as would be expected, allow KTSJ and the Wnnn slaves to take on and complete storage management operations required for pre-allocation of extents. As this process is designed to be proactive, rather than reactive, as Oracle detects the need for additional space in a given tablespace the KTSJ/Wnnn “team” begins work on allocating the anticipated additional extents based upon the NEXT setting for the datafile or datafiles in question. Being proactive makes the space additions more fluid, reducing any wait time by creating these extents before Oracle has need for them. Remember that Oracle must allocate and format new storage extents before they are available for use; the SMCO/KTSJ/Wnnn architecture streamlines that process by allowing Oracle to allocate and format such storage before it’s actually needed. It may be that the only indication the DBA sees for autoextend activities is the presence of KTSJ entries in an AWR or ASH report.

It is possible to turn off this process, or change its behavior, by setting an initialization parameter, “_enable_space_preallocation”, to one of the following values:

0 to turn off the tbs pre-extension feature.
1 To enable tablespace extension.
2 To enable segment growth.
3 To enable both 1 and 2
4 To enable chunk allocation.

As it is a dynamic parameter it can be set using an alter system call and no database restart is required. To turn the process off:

ALTER SYSTEM SET “_ENABLE_SPACE_PREALLOCATION” = 0;

By setting “_enable_space_preallocation”=3 (the default value) the expected behavior is restored. It goes without saying that available space in autoextend datafiles must still be managed by the DBA, and when that space is running low additional datafiles will need to be added manually. The SMCO mechanism can do many things, but expanding tablespace storage isn’t one of them.

Tablespace space management for autoextend datafiles has been given a helping hand with the SMCO process by introducing more efficiency in the allocation mechanism. Pre-allocation of extents keeps data flowing smoothly during times of heavy insert and update activity. It’s like having a crystal ball for database storage.

Without the spirits.

July 22, 2022

“We Need An Exterminator???”

Filed under: General — dfitzjarrell @ 07:35

An interesting event can occasionally appear in an ASH report, an event that sounds ominous:

buffer exterminate

Thankfully it’s not a cause for concern. Let’s discuss the automatic memory options Oracle provides to see how such an event can occur.

Oracle offers two automatic memory options — automatic memory management (or AMM) and automatic shared memory management (or ASMM). Both options allow Oracle to self-manage memory allocations within the confines of initialization parameter settings. AMM is configured using the memory_* parameters, but is not available on Linux systems configured to use hugepages as that file type is not supported. ASMM is available regardless of hugepages settings and restricts the memory management activities to the various SGA components. Regardless of which automatic memory management option is in effect if Oracle is managing its own memory the above event can occur. So it’s automatic memory management of some type that is the source of activities that generate that event.

What, then, triggers this event? Oracle’s automatic management of memory areas can shift available memory resources between the various pools, and when one pool needs an increase another pool, usually a buffer pool, decreases. When the default buffer pool needs to shrink Oracle starts the deallocation process for the default buffer pool. This requires that all objects in that part of the cache be dereferenced, so that Oracle no longer finds those object pointers and those memory addresses can be assigned to a different area. All this affects is Oracle’s ability to perform logical reads on those objects. The next time those block addresses for the affected objects are accessed they will be placed into the resized buffer cache and logical reads are restored. It’s a small performance hit while memory areas are resized to meet demand.

As mentioned it’s most likely the default buffer pool that is being reduced. Looking at the various dynamic SGA components we see:

COMPONENT                                                        CURRENT_SIZE   MIN_SIZE   MAX_SIZE USER_SPECIFIED_SIZE OPER_COUNT LAST_OPER_TYP LAST_OPER LAST_OPER_TIME                GRANULE_SIZE
---------------------------------------------------------------- ------------ ---------- ---------- ------------------- ---------- ------------- --------- ----------------------------- ------------
shared pool                                                        1912602624 1644167168 1912602624                   0          6 GROW          DEFERRED  21-JUL-2022 10:01:34              33554432
large pool                                                          369098752   67108864 3523215360                   0        245 GROW          IMMEDIATE 21-JUL-2022 10:58:00              33554432
java pool                                                            33554432   33554432   67108864                   0          1 SHRINK        DEFERRED  16-JUL-2022 19:57:14              33554432
streams pool                                                        100663296  100663296  100663296           100663296          0 STATIC                                                    33554432
DEFAULT buffer cache                                               1.0402E+10 7247757312 1.0972E+10                   0        258 SHRINK        IMMEDIATE 21-JUL-2022 10:58:00              33554432
KEEP buffer cache                                                           0          0          0                   0          0 STATIC                                                    33554432
RECYCLE buffer cache                                                        0          0          0                   0          0 STATIC                                                    33554432
DEFAULT 2K buffer cache                                                     0          0          0                   0          0 STATIC                                                    33554432
DEFAULT 4K buffer cache                                                     0          0          0                   0          0 STATIC                                                    33554432
DEFAULT 8K buffer cache                                                     0          0          0                   0          0 STATIC                                                    33554432
DEFAULT 16K buffer cache                                                    0          0          0                   0          0 STATIC                                                    33554432
DEFAULT 32K buffer cache                                                    0          0          0                   0          0 STATIC                                                    33554432
Shared IO Pool                                                              0          0          0                   0          6 GROW          IMMEDIATE 15-JUL-2022 20:07:30              33554432
ASM Buffer Cache                                                            0          0          0                   0          0 STATIC                                                    33554432

The last operation to occur on the shared pool and large pool was a GROW, resulting in a SHRINK operation for the default buffer pool. This SHRINK triggered the buffer exterminate as memory was shifted to the shared pool and large pool. An ASH report will report this in multiple areas:

op User Events           DB/Inst: V33PRAP2/v33prap22  (Jul 21 08:41 to 10:41)

                                                               Avg Active
Event                               Event Class        % Event   Sessions
----------------------------------- --------------- ---------- ----------
direct path read                    User I/O             52.44      13.94
CPU + Wait for CPU                  CPU                  14.44       3.84
db file sequential read             User I/O              6.75       1.80
log file sync                       Commit                5.74       1.53
buffer exterminate                  Other                 4.16       1.11
          -------------------------------------------------------------
...

Top Event P1/P2/P3 Values DB/Inst: V33PRAP2/v33prap22  (Jul 21 08:41 to 10:41)

Event                          % Event  P1 Value, P2 Value, P3 Value % Activity
------------------------------ ------- ----------------------------- ----------
Parameter 1                Parameter 2                Parameter 3
-------------------------- -------------------------- --------------------------
direct path read                 52.44         "15","123891968","16"       0.05
file number                first dba                  block cnt

db file sequential read           6.79                 "1","371","1"       0.01
file#                      block#                     blocks

log file sync                     5.74          "2","3547662223","0"       0.01
buffer#                    sync scn                   NOT DEFINED

buffer exterminate                4.25 "16","200643500","9428339040"       0.27
file#                      block#                     buf_ptr

db file async I/O submit          3.36                  "15","0","0"       1.49
requests                   interrupt                  timeout

          -------------------------------------------------------------
...

Top Client IDs            DB/Inst: V33PRAP2/v33prap22  (Jul 21 08:41 to 10:41)

08:50:00  (10.0 min)    1,506 direct path read                    845    4.41
                              CPU + Wait for CPU                  226    1.18
                              log file sync                       192    1.00
09:00:00  (10.0 min)    1,550 direct path read                    864    4.51
                              CPU + Wait for CPU                  251    1.31
                              db file sequential read             152    0.79
09:10:00  (10.0 min)    1,600 direct path read                    801    4.18
                              CPU + Wait for CPU                  277    1.45
                              db file sequential read             181    0.95
09:20:00  (10.0 min)    1,588 direct path read                    891    4.65
                              CPU + Wait for CPU                  225    1.18
                              enq: KO - fast object checkpoi      161    0.84
09:30:00  (10.0 min)    1,510 direct path read                    901    4.71
                              CPU + Wait for CPU                  233    1.22
                              db file sequential read             103    0.54
09:40:00  (10.0 min)    1,735 direct path read                  1,096    5.73
                              CPU + Wait for CPU                  209    1.09
                              db file sequential read             148    0.77
09:50:00  (10.0 min)    1,523 direct path read                    950    4.96
                              CPU + Wait for CPU                  225    1.18
                              db file sequential read              84    0.44
10:00:00  (10.0 min)    1,380 direct path read                    724    3.78
                              CPU + Wait for CPU                  253    1.32
                              db file sequential read             129    0.67
10:10:00  (10.0 min)    1,332 direct path read                    555    2.90
                              CPU + Wait for CPU                  202    1.06
                              free buffer waits                   184    0.96
10:20:00  (10.0 min)    1,540 direct path read                    682    3.56
                              buffer exterminate                  307    1.60
                              CPU + Wait for CPU                  275    1.44
10:30:00  (10.0 min)    2,395 direct path read                    975    5.09
                              buffer exterminate                  454    2.37
                              free buffer waits                   368    1.92
10:40:00   (1.8 min)      319 direct path read                    138    0.72
                              direct path write temp               64    0.33
                              CPU + Wait for CPU                   61    0.32
          -------------------------------------------------------------

Of all the areas where this is reported it’s the second section reported here that provides some detail into what datafile and block number were affected. Given these two pieces of data the DBA can know which object was affected, although it may take a bit of effort to narrow the search criteria. Since the reported block id may not be listed in the DBA_EXTENTS view (block_id in that view is reported for the initial block of a given extent) the DBA may need to fiddle with the boundary values for the BETWEEN operator to narrow the search. Patience is a virtue and for the data posted here the final query became:

select segment_name, block_id from dba_extents e, dba_data_files d
where d.file_id = 16
and e.tablespace_name = d.tablespace_name
and e.block_id between 200635000
and 200643500
/

The affected block was associated with an often accessed index in the database, and was replaced in the buffer cache at its next access. Only buffered block reads are affected when this occurs, and the performance ‘hit’ should be negligible.

Seeing a “buffer exterminate” event in an ASH report isn’t as dire is it may first appear, it’s basically Oracle having to reorganize the SGA dynamic memory areas to expand one while reducing another. Oftentimes such events go unreported in the ASH output because they occur outside of the requested reporting window. Don’t be alarmed if one does “pop up”, know it’s Oracle doing its job of managing memory so you don’t have to.

Rest assured, this isn’t a bug.