Oracle Tips and Tricks — David Fitzjarrell

July 12, 2017

Latch, What Latch?!?

Filed under: General — dfitzjarrell @ 13:04

"I know one thing for certain; it is much harder to tell whether you are lost than whether you were lost, 
for, on many occasions, where you are going is exactly where you are. On the other hand, if you often find
that where you've been is not at all where you should have gone, and, since it's much more difficult to
find your way back from someplace you've never left, I suggest you go there immediately and then decide."
-- Norton Juster, The Phantom Tollbooth

Latches can be, well, confusing. Sessions can wait for them, and wait for them, and when those waits become excessive (kind of like at the doctor’s office) performance degrades. To many they appear to be the same as locks, but they aren’t, even though lock waits, when the times get excessive, can also degrade performance. So, what are the differences between latches and locks? Let’s examine both and see where they differ, what can cause latch waits and how latch waits can be investigated.

Latches are similar to locks but they operate on memory to protect code and internal data structures by preventing concurrent access. For example the LRU latches are used when managing the buffer cache, an operation that is restricted to being run by a single process; other processes must wait for the current process to complete its actions on the buffer cache before the next one in line can proceed. The latch holds this structure for the current process to access; when the current process is done the latch is released and the next process in the queue can acquire it. Since only one process at a time can hold a latch ‘popular’ actions can create a long line of processes waiting to obtain the associated latch. Normally latch waits are in milliseconds or fractions of milliseconds so waiting for a latch usually isn’t noticeable.

Let’s look at the differences between a lock and a latch:

    Locks

  • Protect the logical contents of the database object (table, index) from other transactions.
  • Are held for the transaction duration.
  • Provide rollback capability for the associated transaction.
  • Latches

  • Protect the critical sections of the associated internal data structures from other threads.
  • Are held only until the operation completes and then are released.
  • Prevent concurrent access to a memory structure.

Since memory operations aren’t transactions the ability to ‘rollback’ a change isn’t necessary; clearing a buffer so another process can use it is a fast, irreversible operation that isn’t designed to be undone. Contrast that to a lock, which controls access to change data in a table; such actions are transactional in nature and are designed with the ability to undo the changes should the need arise. Locks can also protect a data image while it’s being read which gives rise to the different types of locks that are available — shared, row shared, row shared exclusive, exclusive — that govern who and how the data can be modified. Without exception every lock in an Oracle database allows other sessions to read data in the locked table; the locks are there to control concurrent updates and protect against the possibility of ‘lost’ transactions.

Latches can indicate the existence of a locking problem, as the ‘cursor pin s wait on x’ latch illustrates. This latch, when its wait times become long, usually indicates an issue with a ‘hot block’, a data or index block that is so popular everyone and their Aunt Gertrude are waiting for it. Long waits for this latch usually accompany concurrent sessions inserting data into a table using a sequence; the sequence values will be clustered together and will usually access the same data (for direct-path inserts) or index block and can cause sessions to wait until the current process completes. And since the latch allows one, and only one, process access to it at a time the rest of the interested transactions (and their processes) form a long queue, sort of like a really busy day at the amusement park. In such cases addressing the locking problem also addresses the latching problem.

Various tools are available to investigate latching issues; Tanel Poder has written two scripts, latchprof.sql and latchprofx.sql, that provide a wealth of information about latch wait history. They can be invaluable in investigating latch waits and contention. An example of the output that can be generated is shown below:


-- LatchProfX 2.00 by Tanel Poder ( http://www.tanelpoder.com )

  SID NAME                                HMODE        FUNC                                             OBJECT SQLID               Held       Gets  Held %     Held ms Avg hold ms
----- ----------------------------------- ------------ ---------------------------------------- -------------- ------------- ---------- ---------- ------- ----------- -----------
  144 transaction allocation              exclusive    ktcxbr                                                0 9brz3jsxmf4ts        788         57   78.80     267.920       4.700
  144 DML lock allocation                 exclusive    ktadmc                                                0 9brz3jsxmf4ts        609         59   60.90     207.060       3.509
 1168 shared pool                         exclusive    kghalo                                                0 0zmr8ccx3xpck        177          5   17.70      60.180      12.036
 1562 cache buffers lru chain             exclusive    kcbzgws                                               0 8n2dsqpu94h4a        119          7   11.90      40.460       5.780
 1562 cache buffers chains                exclusive    kcbgtcr: slow path excl                        2403D825 8n2dsqpu94h4a         90          1    9.00      30.600      30.600
 1562 cache buffers lru chain             exclusive    kcbzgws                                               0 gt8xcw52s32ha         40         16    4.00      13.600        .850
 2794 redo copy                           exclusive    kcrfw_redo_gen: nowait                                0 4vs91dcv7u1p6         38          2    3.80      12.920       6.460
 2328 KTILM latch                         shared       ktilmht get                                           0 bnqcqzztx3a41         37          4    3.70      12.580       3.145
 2794 sim partition latch                 exclusive    kcbm_simulate                                         0 4vs91dcv7u1p6         37          1    3.70      12.580      12.580
 1562 KTILM latch                         shared       ktilmht get                                           0 8n2dsqpu94h4a         30          6    3.00      10.200       1.700
 2328 cache buffers chains                exclusive    kcbgcur: fast path excl                          87316B 9zg9qd9bm4spu         26          1    2.60       8.840       8.840
 2412 row cache objects                   exclusive    kqrso                                                 0                       24         15    2.40       8.160        .544
 1413 cache buffers chains                exclusive    kcbgcur: fast path excl                          873294 9zg9qd9bm4spu         22          1    2.20       7.480       7.480
 2412 row cache objects                   exclusive    kqrpre: find obj                                      0                       21         11    2.10       7.140        .649
 1168 row cache objects                   exclusive    kqrpre: find obj                                      0 0zmr8ccx3xpck         20         10    2.00       6.800        .680
 2328 row cache objects                   exclusive    kqrso                                                 0                       20         13    2.00       6.800        .523
 1168 cache buffers chains                exclusive    kcbgcur: fast path excl                          873294 9zg9qd9bm4spu         19          1    1.90       6.460       6.460
 2328 row cache objects                   exclusive    kqrpre: find obj                                      0 5wwkp7spyq2fn         19         11    1.90       6.460        .587
 2412 cache buffers chains                exclusive    kcbgcur: fast path excl                          87260B 9zg9qd9bm4spu         18          1    1.80       6.120       6.120
 1562 KTILM latch                         shared       ktilmht get                                           0 gt8xcw52s32ha         18          5    1.80       6.120       1.224
 1413 cache buffers lru chain             exclusive    kcbzgb                                                0 9zg9qd9bm4spu         18          1    1.80       6.120       6.120
 2794 row cache objects                   exclusive    kqrso                                                 0                       17         13    1.70       5.780        .445
 1168 cache buffers lru chain             exclusive    kcbzgb                                                0 9zg9qd9bm4spu         17          1    1.70       5.780       5.780
 2328 row cache objects                   exclusive    kqrpre: find obj                                      0                       17         10    1.70       5.780        .578
 2328 shared pool                         exclusive    kghalo                                                0                       15          7    1.50       5.100        .729
 2794 row cache objects                   exclusive    kqrpre: find obj                                      0                       14         12    1.40       4.760        .397
 2412 shared pool                         exclusive    kghalo                                                0                       14          4    1.40       4.760       1.190
 2794 row cache objects                   exclusive    kqrpre: find obj                                      0 4k3y3hrxfzh6v         14          8    1.40       4.760        .595
 2794 shared pool                         exclusive    kghalo                                                0 4k3y3hrxfzh6v         13          6    1.30       4.420        .737
 1562 cache buffers lru chain             exclusive    kcbzgws                                               0 149htpw5s5n0c         13          5    1.30       4.420        .884
 2412 row cache objects                   exclusive    kqreqd                                                0                       13          9    1.30       4.420        .491
 2794 row cache objects                   exclusive    kqrso                                                 0 4k3y3hrxfzh6v         11          9    1.10       3.740        .416
 1562 KTILM latch                         shared       ktilmht get                                           0 by143mnp65wgp         10          2    1.00       3.400       1.700
 1562 cache buffers chains                exclusive    kcbzgb: scan from tail. nowait                        0 gt8xcw52s32ha         10          8    1.00       3.400        .425
 2794 cache buffers chains                exclusive    kcbgtcr: slow path excl                          400276 4k3y3hrxfzh6v          9          1     .90       3.060       3.060
 2794 cache buffers lru chain             exclusive    kcbzgws                                               0 4k3y3hrxfzh6v          9          1     .90       3.060       3.060
 2412 row cache objects                   exclusive    kqrpre: find obj                                      0 5wwkp7spyq2fn          9          7     .90       3.060        .437
 2412 simulator lru latch                 exclusive    kcbs_simulate: simulate set                          34 5wwkp7spyq2fn          9          1     .90       3.060       3.060
 2328 row cache objects                   exclusive    kqreqd                                                0                        9          6     .90       3.060        .510
 1168 cache buffers chains                exclusive    kcbgtcr: slow path excl                          402106 0zmr8ccx3xpck          9          1     .90       3.060       3.060
 1562 object queue header operation       exclusive    kcbo_link_q                                           0 gt8xcw52s32ha          9          6     .90       3.060        .510
 2328 KTILM latch                         shared       ktilmht get                                           0 4twyxstjxggc1          9          2     .90       3.060       1.530
 1168 cache buffers lru chain             exclusive    kcbzgws                                               0 0zmr8ccx3xpck          9          1     .90       3.060       3.060
  144 simulator lru latch                 exclusive    kcbs_simulate: simulate set                          3C 9brz3jsxmf4ts          9          1     .90       3.060       3.060
 1168 KTILM latch                         shared       ktilmht get                                           0 24kkd5u3fdf0c          9          1     .90       3.060       3.060
 2328 simulator lru latch                 exclusive    kcbs_simulate: simulate set                          3A 24kkd5u3fdf0c          9          1     .90       3.060       3.060
 2412 simulator lru latch                 exclusive    kcbs_simulate: simulate set                          39 5wwkp7spyq2fn          8          1     .80       2.720       2.720
 2412 cache buffers chains                maybe-shared kcbgtcr: fast path exam                          4003C1 5wwkp7spyq2fn          8          1     .80       2.720       2.720
 2412 cache buffers lru chain             exclusive    kcbzgb                                                0 9zg9qd9bm4spu          8          1     .80       2.720       2.720
 2328 cache buffers lru chain             exclusive    kcbzgb                                                0 9zg9qd9bm4spu          8          1     .80       2.720       2.720

In this example the latchprofx.sql script is being called; it reports the SID of the holding session, the name of the latch, the mode in which the latch is held (notice that almost all of the latches reported are exclusive), the associated kernel function, the object involved (for latches associated with transactional activity), the number of times the latch has been held, the number of times the latch has been gotten successfully, the percentage of gets to holds, the total hold time in milliseconds and the average hold time, again in milliseconds. The report is orderd by the the HELD values to make it easier to see which latches are being held and for how long. Other values can be output depending upon the parameters passed to the script.

As latches control access to memory structures anything that affects how memory is allocated and used can affect them. It’s possible to run afoul of latch contention when memory ‘leaks’ occur (usually caused by code that doesn’t properly manage the memory structures it uses). Common memory ‘leaks’ involve memory areas that are no longer used but are no longer accessible because the code that created the memory structure didn’t free it properly. As available memory slowly decreases from these leaks it can become more difficult to obtain a latch, resulting in queues that are much longer than they would normally be. Another problem that can create the same situation is remote user sessions that ‘die’ at the client but can’t clean up properly, leaving orphan processes holding latches that can’t be released. How such situations are managed is dependent upon the specific conditions surrounding them and resolution in extreme cases could require drastic action. Since these are usually rare occurrences these won’t be discussed further.

AWR and ASH reports are a good place to start when investigating wait activity and knowing whether the waits are lock or latch related; of course those reports require the Diagnostic and Tuning pack license and may not be available if your site isn’t licensed. Statspack reports are also good sources of information to start an investigation and list latch activity in a dedicated report section; a part of that section is reproduced below:


Latch Activity  DB/Inst: FNORG/fnorg  Snaps: 1-2
->"Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for
  willing-to-wait latch get requests
->"NoWait Requests", "Pct NoWait Miss" are for no-wait latch get requests
->"Pct Misses" for both should be very close to 0.0

                                           Pct    Avg   Wait                 Pct
                              Get          Get   Slps   Time       NoWait NoWait
Latch                       Requests      Miss  /Miss    (s)     Requests   Miss
------------------------ -------------- ------ ------ ------ ------------ ------
AQ Background: interrupt              1    0.0             0            0
AQ deq hash table latch               1    0.0             0            0
ASM db client latch                   5    0.0             0            0
ASM map operation hash t              1    0.0             0            0
Change Notification Hash              3    0.0             0            0
Consistent RBA                       40    0.0             0            0
DML lock allocation                 706    0.0             0            0
Event Group Locks                     1    0.0             0            0
File State Object Pool P              1    0.0             0            0
I/O Staticstics latch                 1    0.0             0            0
ILM Stats Stripe Latch                1    0.0             0            0
ILM Stats main anchor la            155    0.0             0            0
ILM access tracking exte              1    0.0             0            0
ILM activity tracking la              1    0.0             0            0
IM area sb latch                      1    0.0             0            0
IM area scb latch                     1    0.0             0            0
IM emb latch                          1    0.0             0            0
IM seg hdr latch                      1    0.0             0            0
IPC stats buffer allocat              1    0.0             0            0
In memory undo latch                  1    0.0             0            0
JS Sh mem access                      1    0.0             0            0
JS queue access latch                 1    0.0             0            0
JS queue state obj latch             84    0.0             0            0
JS slv state obj latch                1    0.0             0            0
KCNIBR - invalid block r              1    0.0             0            0
KFC FX Hash Latch                     1    0.0             0            0
KFC Hash Latch                        1    0.0             0            0
KFCL LE Freelist                      1    0.0             0            0
KGNFS-NFS:SHM structure               1    0.0             0            0
KGNFS-NFS:SVR LIST                    1    0.0             0            0
KJC message pool free li              1    0.0             0            0
KJC message pool pending              1    0.0             0            0
KJCT flow control latch               1    0.0             0            0
KMG MMAN ready and start              2    0.0             0            0
KSFS OFS ctx level paren              1    0.0             0            0
KSFS OFS req layer paren              1    0.0             0            0
KSFS OFS sess layer pare              1    0.0             0            0
KSFS id table parent lat              1    0.0             0            0
KSFS ksfs_node latch                  1    0.0             0            0
KSFSD name cache parent               1    0.0             0            0
KSXR hang analysis phase              1    0.0             0            0
KSXR hang analysis phase              1    0.0             0            0
KTF sga latch                         0                    0            1    0.0
KTILM latch                           1    0.0             0            0
KWQP Prop Status                      2    0.0             0            0
Locator state objects po              1    0.0             0            0
Lsod array latch                      1    0.0             0            0
Memory Management Latch               1    0.0             0            2    0.0
Memory Queue                          1    0.0             0            0
Memory Queue Message Sub              1    0.0             0            0
...

Latch contention occurs with the ‘willing to wait’ latch calls; nowait calls won’t build a queue and will be retried at a later time.

Knowing the difference between a latch and a lock, and that latches can be affected by locks (but in most cases they are not) can make troubleshooting application and database performance a bit easier. Since latches are memory control structures they can be affected by the amount of available memory as well as by the overall activity of the database. Keeping this in mind during such investigations can prevent the DBA from being led into trying to address a lock problem when it’s really a latch affecting performance. Also, having the right tools available can greatly assist in determining where the problem area or areas lie.

Yes, latches can be confusing. Hopefully this has provided guidance and insight into what latches are, how they can affect performance and what tools to use to find additional information.

Now, go back where you weren’t.

Advertisements

2 Comments »

  1. Nice post, latches are one of those problems that always requires a google (and a bit of luck that someone experienced the same problem and solved it!)

    The problem with latches is that they don’t form an orderly queue, the sessions will just spin and retry and hopefully get lucky. You can be first to try for a held latch but be the last to actually get to hold it because some other session tried while you were spinning.

    You mentioned ‘cursor pin s wait on x’ as indicating a hot block, we actually experienced this particular event when parallel queries were taking a while to parse – during that parse, all the PX slaves waited on this event whilst the QC session was parsing the child cursor. A few googles suggests that it’s not just parallel slaves but any time you have another session trying to hard parse the same child cursor.

    Comment by Andrew Sayer — July 12, 2017 @ 15:49 | Reply

  2. Good article.
    AWR and ASH reports require only Diagnostic Pack license, no Tuning Pack license is required. https://docs.oracle.com/cd/E55822_01/DBLIC/options.htm#DBLIC165

    R.

    Comment by vaurob — July 25, 2017 @ 23:03 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: