Oracle Tips and Tricks — David Fitzjarrell

August 23, 2010

Corruption And Mayhem

Filed under: General — dfitzjarrell @ 15:11
Tags:

It may happen that at some point you could encounter block corruption in empty blocks that prevents you from performing either a ‘delete from <some table name>;’ or a ‘truncate table <some table name>;’ because you are met with the following error:


ERROR at line 1: 
ORA-01578: ORACLE data block corrupted (file # 5, block # 4290) 

These are empty blocks; why on earth are they causing you grief? The explanation for that is these blocks fall below the highwater mark because they once contained data and the blanket delete and truncate statements process every block up to the highwater mark regardless of whether they are empty or not.

So you cannot truncate the table or delete everything from it; what do you do to fix this? The first possibility which comes to mind is to use the DBMS_REPAIR package:

DBMS_REPAIR Procedures

Procedure Name      Description 

ADMIN_TABLES        Provides administrative functions (create, drop, purge) for
                    repair or orphan key tables. 
                    Note: These tables are always created in the SYS schema

CHECK_OBJECT        Detects and reports corruptions in a table or index 

DUMP_ORPHAN_KEYS    Reports on index entries that point to rows in corrupt data
                    blocks 

FIX_CORRUPT_BLOCKS  Marks blocks as software corrupt that have been previously
                    identified as corrupt by the CHECK_OBJECT procedure 

REBUILD_FREELISTS   Rebuilds the free lists of the object 

SEGMENT_FIX_STATUS  Provides the capability to fix the corrupted state of a 
                    bitmap entry when segment space management is AUTO

SKIP_CORRUPT_BLOCKS When used, ignores blocks marked corrupt during table and 
                    index scans. If not used, you get error ORA-1578 when
                    encountering blocks marked corrupt. 

The procedures of interest are ADMIN_TABLES, CHECK_OBJECT, FIX_CORRUPT_BLOCKS, DUMP_ORPHAN_KEYS and possibly SKIP_CORRUPT_BLOCKS, in that order. Presuming you have never before used DBMS_REPAIR it will be necessary to execute the ADMIN_TABLES procedure to create the objects and tables necessary for the package to do its job:

BEGIN
DBMS_REPAIR.ADMIN_TABLES (
     TABLE_NAME => 'REPAIR_TABLE',
     TABLE_TYPE => dbms_repair.repair_table,
     ACTION     => dbms_repair.create_action,
     TABLESPACE => 'TOOLS');
END;
/

This creates a table named REPAIR_TABLE in the TOOLS tablespace (not providing a tablespace name the procedure creates these tables in the SYSTEM tablespace). In addition the procedure creates an associated view (named DBA_) that filters any rows referencing objects which no longer exist. Let’s look at the table definition for our REPAIR_TABLE:

DESC REPAIR_TABLE

 Name                         Null?    Type
 ---------------------------- -------- --------------
 OBJECT_ID                    NOT NULL NUMBER
 TABLESPACE_ID                NOT NULL NUMBER
 RELATIVE_FILE_ID             NOT NULL NUMBER
 BLOCK_ID                     NOT NULL NUMBER
 CORRUPT_TYPE                 NOT NULL NUMBER
 SCHEMA_NAME                  NOT NULL VARCHAR2(30)
 OBJECT_NAME                  NOT NULL VARCHAR2(30)
 BASEOBJECT_NAME                       VARCHAR2(30)
 PARTITION_NAME                        VARCHAR2(30)
 CORRUPT_DESCRIPTION                   VARCHAR2(2000)
 REPAIR_DESCRIPTION                    VARCHAR2(200)
 MARKED_CORRUPT               NOT NULL VARCHAR2(10)
 CHECK_TIMESTAMP              NOT NULL DATE
 FIX_TIMESTAMP                         DATE
 REFORMAT_TIMESTAMP                    DATE

These columns will be populated by the CHECK_OBJECT procedure:

SET SERVEROUTPUT ON
DECLARE num_corrupt INT;
BEGIN
num_corrupt := 0;
DBMS_REPAIR.CHECK_OBJECT (
     SCHEMA_NAME => 'ORDER_APP',
     OBJECT_NAME => 'BACKORDERS',
     REPAIR_TABLE_NAME => 'REPAIR_TABLE',
     CORRUPT_COUNT =>  num_corrupt);
DBMS_OUTPUT.PUT_LINE('number corrupt: ' || TO_CHAR (num_corrupt));
END;
/

The output from that PL/SQL block is one line reporting the total number of corrupt blocks in that table:

number corrupt: 1

Querying REPAIR_TABLE will provide information on the type of corruption and a suggested repair action:

SELECT OBJECT_NAME, BLOCK_ID, CORRUPT_TYPE, MARKED_CORRUPT,
       CORRUPT_DESCRIPTION, REPAIR_DESCRIPTION
     FROM REPAIR_TABLE;

OBJECT_NAME                      BLOCK_ID CORRUPT_TYPE MARKED_COR
------------------------------ ---------- ------------ ----------
CORRUPT_DESCRIPTION
------------------------------------------------------------------------------
REPAIR_DESCRIPTION
------------------------------------------------------------------------------
DEPT                                 4290            1 FALSE
kdbchk: row locked by non-existent transaction
        table=0   slot=0
        lockid=32   ktbbhitc=1
mark block software corrupt

Since the reported block has not yet been marked as corrupt now is the time to extract any data to minimize loss using ‘alter system dump datafile block ;’. For this example the file_id is 5 and the block_id is 4290:

alter session set tracefile_identifier = 'blockdump';

Session altered.

alter system dump datafile 5 block 4290;

System altered.

The data will be found in a tracefile in the defined user_dump_dest and will have the tracefile_identifier text in the file name (this makes the file easier to find). This should allow you to mark the block corrupt and re-insert the data. Depending upon what is corrupting the block the block dump may not be successful thus making data replacement difficult with this procedure; there may be an available export from before the block corruption occurred which will allow the data to be imported into a different schema so that any missing data can be replaced. As this would be a basic ‘insert into .. select … from …’ operation I will not describe it here.

Repairing the block is also a fairly simple task:

SET SERVEROUTPUT ON
DECLARE num_fix INT;
BEGIN 
num_fix := 0;
DBMS_REPAIR.FIX_CORRUPT_BLOCKS (
     SCHEMA_NAME => 'ORDER_APP',
     OBJECT_NAME=> 'BACKORDERS',
     OBJECT_TYPE => dbms_repair.table_object,
     REPAIR_TABLE_NAME => 'REPAIR_TABLE',
     FIX_COUNT=> num_fix);
DBMS_OUTPUT.PUT_LINE('num fix: ' || TO_CHAR(num_fix));
END;
/

This PL/SQL block also reports one line of output:

num fix: 1

indicating that all of the blocks marked corrupt have been marked as such and are no longer available. Querying REPAIR_TABLE again proves that:

SELECT OBJECT_NAME, BLOCK_ID, MARKED_CORRUPT
     FROM REPAIR_TABLE;

OBJECT_NAME                      BLOCK_ID MARKED_COR
------------------------------ ---------- ----------
BACKORDERS                           4290 TRUE

What if DBMS_REPAIR.FIX_CORRUPT_BLOCKS returns a number less than the total number of corrupt blocks reported? It’s likely that you have another underlying problem causing the corruption which needs to be addressed, such has a hardware or firmware issue. In such cases DBMS_REPAIR.FIX_CORRUPT_BLOCKS will fail to repair those blocks.

Now it’s time to find any orphan keys in the index BACKORDER_IDX; these are entries pointing to rows in the corrupt data block. First is to create the ORPHAN_KEY_TABLE:

BEGIN
DBMS_REPAIR.ADMIN_TABLES (
     TABLE_NAME => 'ORPHAN_KEY_TABLE',
     TABLE_TYPE => dbms_repair.orphan_table,
     ACTION     => dbms_repair.create_action,
     TABLESPACE => 'USERS');
END;
/

The orphan key table has the following columns:

DESC ORPHAN_KEY_TABLE

 Name                         Null?    Type
 ---------------------------- -------- -----------------
 SCHEMA_NAME                  NOT NULL VARCHAR2(30)
 INDEX_NAME                   NOT NULL VARCHAR2(30)
 IPART_NAME                            VARCHAR2(30)
 INDEX_ID                     NOT NULL NUMBER
 TABLE_NAME                   NOT NULL VARCHAR2(30)
 PART_NAME                             VARCHAR2(30)
 TABLE_ID                     NOT NULL NUMBER
 KEYROWID                     NOT NULL ROWID
 KEY                          NOT NULL ROWID
 DUMP_TIMESTAMP               NOT NULL DATE

With this table in place we can now discover any orphan keys:

SET SERVEROUTPUT ON
DECLARE num_orphans INT;
BEGIN
num_orphans := 0;
DBMS_REPAIR.DUMP_ORPHAN_KEYS (
     SCHEMA_NAME => 'ORDER_APP',
     OBJECT_NAME => 'BACKORDER_IDX',
     OBJECT_TYPE => dbms_repair.index_object,
     REPAIR_TABLE_NAME => 'REPAIR_TABLE',
     ORPHAN_TABLE_NAME=> 'ORPHAN_KEY_TABLE',
     KEY_COUNT => num_orphans);
DBMS_OUTPUT.PUT_LINE('orphan key count: ' || TO_CHAR(num_orphans));
END;
/

The output reports the number of orphan keys in the index:

orphan key count: 3

Rebuilding the index is the necessary action to match the index entries to the table data. Of course if you can replace the now-missing data that should be done first, and then perform an index rebuild.

Skipping corrupt blocks will prevent any errors from surfacing reporting corrupted blocks and DBMS_REPAIR provides such a procedure:

BEGIN
DBMS_REPAIR.SKIP_CORRUPT_BLOCKS (
     SCHEMA_NAME => 'ORDER_APP',
     OBJECT_NAME => 'BACKORDERS',
     OBJECT_TYPE => dbms_repair.table_object,
     FLAGS => dbms_repair.skip_flag);
END;
/

Querying DBA_TABLES for the BACKORDERS table shows that SKIP_CORRUPT is enabled:

SELECT OWNER, TABLE_NAME, SKIP_CORRUPT FROM DBA_TABLES
    WHERE OWNER = 'ORDER_APP'
    AND TABLE_NAME = 'BACKORDERS';

OWNER                          TABLE_NAME                     SKIP_COR
------------------------------ ------------------------------ --------
ORDER_APP                      BACKORDERS                     ENABLED

and now table scans and index scans will skip over blocks marked corrupt and not report the errors shown at the beginning of this post.

The second possibility is it’s a hardware/firmware issue and that would need to be addressed by your local Administrator (Windows or UNIX) and the storage vendor, if necessary. You can’t fix the problem if the hardware can possibly create more damage; repairing such problems can result in creating a new database or restoring the current database from the most recent backup as the storage media may be reformatted once the controller/firmware issue is corrected (this depends upon how far the media corruption has spread and also on the actions necessary for the hardware/firmware fix). This is where having successfully tested your backup and recovery procedures can save you. And, if for some reason you haven’t a tested backup/restore procedure you DO have, at the very least, a recent export to use as any ‘recovery’ is better than no recovery at all.

A third possibility is a power problem; incorrectly wired connections can destroy data on a disk as ‘unfiltered’ power can make its way to the server since the battery backup/power conditioner is being bypassed by the botched wiring scheme. You’ll need an electrician to fix this one and possibly replacement media as well which puts you at the ‘create or recover the database’ stage mentioned in the previous paragraph. I’ve personally seen this happen so don’t think it’s merely hypothetical. I truly hope you never experience this sort of problem.

In many cases of datafile corruption the DBMS_REPAIR package can mark and bypass the corrupted blocks and restore table and data access. Yes, you’ll most likely need to replace the data in the corrupted blocks but a recent export or a block dump can provide the means to do just that. Believe me it’s nice to know that package exists for you never know when you might need it.

Now, if only all corruption could be fixed this easily…

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: