Oracle Tips and Tricks — David Fitzjarrell

September 20, 2023

Out Of Sorts

Filed under: disaster recovery,General,replication — dfitzjarrell @ 18:27

Occasionally the unthinkable can occur and the DBA can be left with a standby database that is no longer synchronizing with the primary. A plethora of “advice”will soon follow that discovery, most of it much like this:

“Well, ya gotta rebuild it.”

Of course the question to ask is “how far out of synch is the standby>” That question is key in determining how to attack this situation. Let’s go through the two most common occurrences of this and see how to address them.

Let’s start with the most drastic — you’re over 500 logs behind the primary (don’t laugh, I’m aware of at least one site where this occurred). Depending upon how RMAN is configured the DBA may not be able to restore all of the missing logs. At this point it will most likely be just as fast to recreate the standby from the primary using RMAN. Since that process is well-documented it won’t be discussed here. Please note that this is an act to perform when the next method can’t be used.

The standby is a “reasonable” number of logs behind — the fastest and least intrusive way to fix this is to recover the missing archivelogs from RMAN backups. Again this allows RMAN to come to the rescue. Going through the process step by step should make this easy for most DBAs. Let’s begin.

Step one is to determine how many logs comprise the gap that prevents the standby from getting caught up — Oracle 0provides that information in the V$ARCHIVE_GAP view:

THREAD# NUMBER
LOW_SEQUENCE# NUMBER
HIGH_SEQUENCE# NUMBER
CON_ID NUMBER

This view provides the log sequence information for the recovery gap in the form of the lowest sequence missing and the highest sequence missing. Using this view provides the information for the RMAN archive restore command shown below:

run {
allocate channel d1 device type disk;
restore archivelog from logseq &1 until logseq &2;
}

Putting that command into a script allows the DBA to pass in the two sequence values n the RMAN command line:

RMAN > @restore_logs.rman 1000 1230

This will find and restore archivelogs with the provided sequence numbers. Now the question becomes “where do these logs et restored to?” The commonly expected destination is the standby server, which is where these logs eventually need to go. The DBA can to that, but t hen the DBA must also create a script to register those logs with the standby database after they have been restored. A simpler and cleaner approach is to restore those logs to the primary and let the standby configuration handle all of the work. Restoring to the primary writes the restored logs to the configured archive log destination; the ARCH process then performs its usual duties of copying those logs to the standby server and registering them automatically. With this technique all the DBA needs to do is get the sequence range and restore them to the primary — Oracle takes over and completes the tasks at hand and the standby starts recovering as soon as the first restored log is transferred to the standby server.

This behavior can be verified by executing the following query on the standby database:

SELECT *
FROM V$MANAGED_STANDBY
WHERE THREAD# > 0
AND GROUP# <> ‘N/A’;

The output will show all of the logs being applied to the standby, regardless of whether they are archived redo logs or standby redo logs. Repeated execution of that query will show the log sequence begin applied by either LGWR or ARCH. As the log restores progress and the apply process works through them the standby closes the gap and again becomes synchronized with the primary.

This is why at least one additional standby redo log is configured for the standby — depending upon how large the gap has become that “extra” standby redo log wil get used to hold the LGWR data sent to the standby. There is nothing preventing the DBA from configuring MORE than one extra standby redo log, but it is expected that the DBA is actually monitoring the standby and can address a gab before it overflows the standby redo logs already configured. To see how many standby redo logs are involoved the following query ncan be executed:

select group#, thread#, sequence#, used, round(used*100/bytes, 2) pct_full, archived, status, first_change#, first_time, last_change#, last_time
from v$standby_log
where exists (select 1 from v$database where database_role = ‘PHYSICAL STANDBY’);

which produces output similar to:

  GROUP#  THREAD#   SEQUENCE#        USED PCT_FULL ARC STATUS       FIRST_CHANGE# FIRST_TIME              LAST_CHANGE# LAST_TIME

      10    1      300465   271206912    25.86 YES ACTIVE        110812847402 05-SEP-2023 16:46:29        110812981549 05-SEP-2023 16:47:56
      11    1       0       0     0.00 NO  UNASSIGNED
      12    1       0       0     0.00 YES UNASSIGNED
      13    1       0       0     0.00 YES UNASSIGNED
      14    1       0       0     0.00 YES UNASSIGNED

During a gap situation there will likely be several standby redo logs populated simultaneously. As the gap closes these logs will be read and the transactions applied, releasing them to return to UNASSIGNED status.

Once the archivelog restore has started on the primary the entire process can be monitored from the standby server using the two queries provided above. An occasional query of V$ARCHIVE_GAP will show the gap, or gaps, closing as the logs are restored and shipped to the standby. If a gap remains simply repeat the process, using the newest log sequence numbers. In most cases only a single execution of this process will be required to set things right.

When a standby databse is “out of sorts” with the primary most likely all that needs to occur is to restore the missing archivelogs to the primary and let Oracle take over. Rarely will the synchronization be so far “out of whack”: that a rebuild of the standby is required. Provided the DBA has properly configured the backup and recovery process for the database in question it should be a trivial matter to restore the standby to it’s rightful state.

Be prepared. It’s not just for the Boy Scouts.

1 Comment »

  1. […] David Fitzjarrell proffers advice on recovering from a non-synchronizing standby database: […]

    Pingback by Oracle: RMAN and Non-Synchronizing Standby Database – Curated SQL — September 22, 2023 @ 06:10


RSS feed for comments on this post. TrackBack URI

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.