Oracle Tips and Tricks — David Fitzjarrell

February 22, 2008

The Dreaded ORA-01555

Filed under: General — dfitzjarrell @ 15:50

After looking more closely at the blog I mentioned in my last entry I’m finding more and more that this poor soul has offered in the way of advice that is either incomplete or wrong. One of the more glaring examples is his overly simplistic treatment of the source for ORA-01555 errors, reducing it to inadequately sized undo segments. Yes, I know, that’s ONE cause of the dreaded ORA-01555, but it is not the sole cause. Let’s examine this error, its causes and what you can do about them. (Tom Kyte has written on this topic many times, and his in-depth explanations are, in my opinion, the best. I’ve used those to help me write this.)

There are two basic causes of this error, plus a third special case of the second which occurs quite frequently:

1) The undo segments are too small
2) The application fetches across commits (a design flaw)
3) Delayed block cleanout

[LOBs are handled differently but can still generate an ORA-01555; this is discussed toward the end of this post]

The first two are related to the read consistency mechanism Oracle employs; this involves undo segments to reconstruct data as it appeared at the moment the query started. Since UNDO segments are used to provide both read-consistency and to rollback transactions the dreaded ORA-01555 can appear. It is the first situation which is ‘addressed’ by our ‘poor soul’, and ONLY the first situation. According to his blog this is the sole reason for an ORA-01555. Because of that his suggestions, which are valid for that situation alone, are:

* Adjust the undo_retention parameter (for 9i and higher releases of Oracle)
* Increase the size of the UNDO tablespace
* Increase the size of the undo segments [if using manual UNDO management]

You’ll know if this is the likely cause by perusing the alert log for the currently generated ORA-01555 error; you should find a message similar to this:

ORA-01555 caused by SQL statement below (SQL ID: dxswvyyhkkg43, Query Duration=2347 sec, SCN: 0x0915.5ce46fff):

The offending query text will follow. Notice in the error message the elapsed query time is presented:

... Query Duration=2347 sec, ...

If that query time exceeds your setting for undo_retention you have a possible candidate for the corrective steps listed above. [You can find the session responsible for this by querying V$SESSION to retrieve the SID and SERIAL# where SQL_ID or the PREV_SQL_ID is identified by the reported value:

select sid, serial#
from v$session
where sql_id = ‘dxswvyyhkkg43’
or prev_sql_id = ‘dxswvyyhkkg43’;

] Also, if you have the UNDO tablespace set to guarantee the retention (which is usually not necessary as Oracle will do its best to enforce the undo_retention setting) you could run into this same error as subsequent undo segments won’t be able to acquire allocated but currently unused undo space if the undo_retention time hasn’t been met, so it’s not advisable to guarantee the retention in active, heavily used systems. If, on the other hand, that query time is well within the configured undo retention period you’ll need to look elsewhere for the cause. And this is where he stops, leaving one to believe those are the ONLY solutions necessary. Sadly he’s wrong.

[All of this presumes you’re using automatic undo management. If you choose (or are required by version) to use manual undo management then setting OPTIMAL for any undo/rollback segment can be a mistake as Oracle can, and will, resize a rollback/undo segment to the OPTIMAL setting if it hasn’t recently been used by a transaction. Queries don’t usually generate UNDO (see the delayed block cleanout explanation for an exception) so if you’re in the middle of a long-running query and the undo/rollback segment has been transactionally idle Oracle can unexpectedly shrink it down to OPTIMAL which will obliterate any data along with the starting SCN you may have been using to reconstruct the read-consistent image for your query, and as a result generate an ORA-01555. The solution to that problem is to set the OPTIIMAL value to NULL by issuing

alter rollback segment storage(optimal null);

for every rollback segment which has a non-null OPTIMAL setting.]

Since one can receive this error for fetching across commits let’s look at that for a moment. As the ORA-01555 error indicates to you that the UNDO information is no longer available one wonders why a developer would decide to commit inside a loop, as that’s the surest way to generate an ORA-01555. The reasoning for this tactic is to, hopefully, conserve on the use of UNDO/rollback segments, and, unfortunately, it does just that. I consider the loop the entire transaction; commiting in that loop causes Oracle to complete the initial transaction then begin another, thus freeing the UNDO generated up until that point. ‘Freeing’ the UNDO means that Oracle is now free to overwrite it with the next transaction (which is the next ‘section’ of data you want to modify from the original query). Since you can no longer roll back to the original starting point, and you may need one or more data blocks from that point in time, you generate the “snapshot too old” error. Another aspect of committing in a loop is that when the ORA-01555 occurs you leave the database in an unknown state; part of the updates have completed, others have not. I’d hate to be the one to sort through the committed updates and have to manually roll them back to start over again. Oh, and starting over again, using the same code, invites the same problem that stopped the process the last time, so it becomes a never-ending cycle until the code is corrected.

According to the Oracle documentation if you REALLY want to fetch across commits you should use the ROWID pseudocolumn to mimic the CURRENT OF clause of a SELECT .. FOR UPDATE statement. Select the rowid of each row into a variable of the UROWID type, then use the ROWID to identify the current row during subsequent updates and deletes as shown in the following example:

DECLARE
   CURSOR c1 IS SELECT last_name, job_id, rowid
     FROM employees;
   my_lastname   employees.last_name%TYPE;
   my_jobid      employees.job_id%TYPE;
   my_rowid      UROWID;
BEGIN
   OPEN c1;
   LOOP
      FETCH c1 INTO my_lastname, my_jobid, my_rowid;
      EXIT WHEN c1%NOTFOUND;
      UPDATE employees SET salary = salary * 1.02 WHERE rowid = my_rowid;
      -- this mimics WHERE CURRENT OF c1
      COMMIT;
   END LOOP;
   CLOSE c1;
END;
/

Since the fetched rows are not locked by a FOR UPDATE clause other users might unintentionally overwrite your changes, which is another ‘gotcha’ (remember the first ‘gotcha’ is the ORA-01555). And because the extra space Oracle needs to provide read consistency is not released until the cursor is closed this tactic can slow down processing for large updates. You play your cards, you take your chances.

The third is the probably the worst of the group, as it can occur with only a single user on the system performing a SELECT. Yes, a SELECT. How can a select statement generate UNDO and thus an ORA-01555? It’s called Delayed Block Cleanout and it’s a dastardly event because it cannot be eliminated entirely (and this can occur in large databases with SYS-owned tables such as the export/import tables/views [sys.exu9sto, as an example]). Delayed Block Cleanout is caused by a session accessing a block that’s been recently modified and hasn’t been cleaned out yet; the current session needs to check if the modifying session is still active. If the modifying transaction is not active the current session ‘cleans out’ the block so any subsequent sessions accessing that block won’t need to go through the same ‘song and dance’. The cleanout process reads the UNDO header, and determines if the blocks are marked as committed or not. If the changes are committed the session ‘cleans out’ the block removing any transaction-related information for the modified blocks, which generates redo. It also frees the UNDO segments used to modify those blocks. So how can this generate an ORA-01555? The following conditions must be met:

— Data is modified and committed and the blocks are not cleaned out automatically (because they exceed the 10% limit to the SGA block buffer cache).

— The modified blocks are not accessed by another session and won’t be accessed until the current session does so.

— The current session begins a long-running query and starts at SCN t_a, the SCN to roll back to in order to ensure a read-consistent image.

— During this query other sessions modify and commit changes to the database, but don’t touch the blocks this long-running query needs.

— The transaction tables roll around due to the high number of COMMITS and ‘step on’ the transaction entry for SCN t_a.

— The lowest SCN (call it t_b) is now higher than t_a (the read-consistent SCN of the long-running query) preventing a read-consistent image due to the large number of COMMITS.

Bingo, the ORA-01555 rears its ugly head because the query is now attempting to access a block that has not been modified since the query began, but the high number of commits has overwritten the rollback data. Since Oracle is now unsure of whether this block is usable or not (even though it was the query, not some other transaction, which generated the UNDO), it throws the ORA-01555 error.

To prevent such things from happening after large data loads, batch updates or deletes a good idea would be to run DBMS_STATS to access the blocks and clean them out. It’s also a good idea to run that anyway, since you’ve changed the data and such changes may have altered the ‘picture’ Oracle should see for those tables and indexes.

LOB updates are a different matter entirely, and can also throw the ORA-01555 error for the same reason but using a different mechanism. Before images of LOB data are stored in the segment itself provided that MAXEXTENTS has not been reached or the tablespace containing the LOB segments has not filled to capacity. Additionally Oracle keeps PCTVERSION of the storage allocated for LOBs for before images; if PCTVERSION is set to too low of a value older images will be overwritten and, you guessed it, an ORA-01555 error appears. The PCTVERSION setting reserves that percentage of the total number of chunks of LOB data which have been allocated during the update process. Let’s say you have 400 chunks already allocated, PCTVERSION is 5 and you need to update 33 chunks. 33 additional chunks are allocated for the before images of that data. This transaction succeeds and commits, freeing those 33 chunks for the next update. But, wait, PCTVERSION is set to 5 so 5% of 400 (20) of those chunks can’t be touched to preserve the before images. This leaves 13 chunks of the previous allocation available for the next update transaction. If the next update affects 20 chunks then 7 additional chunks need to be allocated to complete that request. For an active table with LOB data this could continue on and on, where Oracle reuses some chunks from a prior update but also allocates additional chunks, until MAXEXTENTS is reached, the tablespace is filled or the series of update transactions comes to an end. A long-running select against that data will probably need those now-overwritten before images and, because they’re no longer available both an ORA-01555 (“snapshot too old, rollback segment too small”) and an ORA-22924 (“snapshot too old”, relating to the PCTVERSION setting) are raised. This occurrence of an ORA-01555 isn’t corrected using the methods listed previously, it’s corrected by increasing the PCTVERSION for the LOB segment in question:

SQL> alter table lobex modify lob (isalob) (pctversion 10);

Table altered.

SQL>

Verifying the setting has been changed:

SQL> select table_name, column_name, pctversion
  2  from user_lobs
  3  where table_name = 'LOBEX'
  4  /

TABLE_NAME                     COLUMN_NAME          PCTVERSION
------------------------------ -------------------- ----------
LOBEX                          ISALOB                       10

SQL>

So what if you query USER_LOBS and find that PCTVERSION is NULL? The LOB has been created with the RETENTION property set which retains the old versions of the LOB data for a period of time, and you’ll see the RETENTION column of the USER_LOBS view populated with the current UNDO_RETENTION value:

SQL> select table_name, column_name, pctversion, retention
  2  from user_lobs
  3  where pctversion is null
  4  /

TABLE_NAME                     COLUMN_NAME          PCTVERSION  RETENTION
------------------------------ -------------------- ---------- ----------
LOBEX2                         ISALOB                                 900

SQL> 

Should this be the case the solution to curing the ORA-01555 is back to the usual remedy of increasing the UNDO_RETENTION, with the added steps of converting the existing LOBS to PCTVERSION and then back to using the UNDO_RETENTION:

SQL> alter system set undo_retention=28800;

System altered.

SQL> alter table lobex modify lob (isalob) (pctversion 10);

Table altered.

SQL> alter table lobex modify lob (isalob) (retention);

Table altered.

SQL>

Without the conversion to PCTVERSION and back the increase in the UNOD_RETENTION is not passed through to the old LOB segments and the problem remains. You can, of course, leave the given LOB segment set to pctversion and manage it that way. The choice is yours, however managing heavily updated LOB data may be easier with RETENTION set rather than configuring the PCTVERSION, even with having to convert it to pctversion for a moment then reverting it back to retention.

Should you need to trace the ORA-01555 error to find its origin you can set the 1555 event to dump the error stack each time an ORA-01555 is encountered:


alter session set events '1555 trace name errorstack forever, level 3';

It might also be a good idea to set the tracefile_identifier so you can more easily find the trace file just generated:


alter session set tracefile_identifier='BAD_1555';

Once you have the trace file you can examine it yourself or upload it to Oracle support with an SR filed for the issue.

A rather nasty bug (2643723) affects LOB segments in Oracle versions 9.2.0.1, 9.2.0.2 and 9.2.0.3 and LOB segments in version 9.2.0.4 (3213101) that use Auto Segment Space Management (ASSM). Concurrent writes to the same LOB segment can create data corruption in that LOB causing an ORA-01555 to be displayed. Metalink Note 253131.1 describes the problem and provides diagnostic code to determine if the error is due to the segment corruption bug.

The ORA-01555 error may have a simple list of causes, but the solutions may be far from simply adjusting the UNDO segment size or resetting the undo_retention parameter. I wish more people offering ‘advice’ would realize this. There would be a lot less myth and a lot more usable information.

Advertisements

8 Comments »

  1. This post is awaesome and very enlighting. Thank you very much for your work.

    Comment by Coskan Gundogar — June 3, 2009 @ 21:43 | Reply

  2. Thanks for the comprehensive explanation of the causes of ORA-1555

    Comment by Jared — March 17, 2010 @ 16:46 | Reply

  3. Hi we are getting this error while executing specific querys,and we found that the INdex on the specific table causing problem. so we drop and recreate the index and now its working very fine. Need to rebuld index in most of the case will resolve the issueJosephFCCKuwait

    Comment by Anonymous — June 14, 2010 @ 11:03 | Reply

  4. In your specific case it solved your problem; you really cannot generalize this to 'most of the case', as you phrase it. Oracle is too complex a product and there are a number of situations which can generate an ORA-01555 error; yours was that the query was taking far longer than your undo retention could keep undo blocks and in THIS particular case an index was at fault. What if there are no indexes? Index rebuilds won't help as there are none to rebuild. What if the indexes are fine but inappropriate for the queries? Rebuilding them won't help; creating indexes on the appropriate columns might. Broad generalizations are out of place in the Oracle 'world' as too many factors come into play when a problem arises.

    Comment by d_d_f — June 15, 2010 @ 21:23 | Reply

  5. I find myself in the early days of 2012, and without being able to resolver my f***g ORA-01555 error: none of the million things Oracle thinks it is caused by has changed anything in my sad an desolated panorama. I give it up: I have just wasted a lot of time trying one after one solutions proposed. Thanks everybody ;o)

    Comment by Anonymous — January 2, 2012 @ 15:07 | Reply

  6. Please read here:http://blogs.oracle.com/db/entry/troubleshooting_ora_1555 This link provides methods to set the events parameter to capture greater detail from your Oracle database when an ORA-01555 error occurs. Especially note the "Alter session set events '1555 trace name errorstack forever, level 3';" event which will dump the errorstack for each ORA-01555 you generate. You can look through these dumps yourself or upload them to Oracle support for an analyst to examine.

    Comment by d_d_f — January 3, 2012 @ 15:06 | Reply

  7. […] written before about committing inside of a loop but the ORA-01555 isn’t the only error that can cause. Depending upon how the cursor is […]

    Pingback by Feelin’ bad, Feelin’ blue, Gots me a ORA-01002 « Oracle Tips and Tricks — July 23, 2012 @ 08:58 | Reply

  8. […] the UNDO images are overwritten by commits, causing the query to end unsuccessfully with an ORA-01555 error. Re-running the query usually succeeds and returns the now-committed […]

    Pingback by Consistency Is Good | Oracle Tips and Tricks -- David Fitzjarrell — November 9, 2016 @ 06:30 | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: