Tuesday, October 30, 2012

Oracle Data Guard : Synchronous vs. Asynchronous Redo Transport

Data Guard Redo Transport Services coordinate the transmission of redo from a primary database
to the standby database. While LGWR process in Primary database is writing redo to its Online Redo Log files (ORL), a separate Data Guard process called the Log Network Server (LNS) is reading from
the redo buffer in SGA and passes redo to Oracle Net Services for transmission to the standby

Redo records transmitted by the LNS are received at the standby database by another Data Guard process called the Remote File Server (RFS) that writes it to a sequential file called a standby redo log file (SRL).

Synchronous Redo Transport

Its also called “zero data loss” method as the LGWR is not allowed to acknowledge a commit has succeeded until the LNS confirms that the redo needed to recover the transaction has been written to disk at the standby site.

So that's how it works

1. when user performs commits. The LGWR reads the redo record from the log buffer, writes it to the online redo log file, and waits for confirmation from the LNS.

2. The LNS reads the same redo record from the log buffer and transmits it to the standby database using Oracle Net Services. The RFS receives the redo at the standby database and writes it to a standby redo log file.

3. When the RFS receives a write-complete from the disk, it transmits an acknowledgment back to the LNS process on the primary database, which in turn notifies the LGWR that transmission is complete. The LGWR then sends a commit acknowledgment to the user.

see the diagram below

Asynchronous Redo Transport

Asynchronous transport (ASYNC) LGWR  process does not wait for the acknowledgment from the LNS. This creates a near zero performance impact on the primary database regardless of the distance between primary and standby locations

This behaviour of  ASYNC transport enables the primary database to buffer a large amount of redo,
called a transport lag, without terminating transmission or impacting availability. Now the problem is if a failure destroys the primary database before any transport lag is reduced to zero, any committed transactions that are a part ofthe transport lag will be lost.

The LGWR will continue to acknowledge commit success to the user even if limited bandwidth prevents the redo of previous transactions from being sent to the standby database immediately.

LNS Behaviour when redo log is flushed 

If the LNS is unable to keep pace and the log buffer is recycled before the redo can be transmitted to the standby, the LNS automatically transitions to reading and sending from the ORL (Data Guard 11g onward). Once the LNS is Once the LNS is caught up, it automatically transitions back to reading/sending directly from the log buffer. This is shown in the below diagram.

source : Oracle documentation, oracle dataguard 11g hand book.

Further reading:

BASH-DBA: Managing a Physical Standby Database
BASH-DBA: Monitoring Primary and Physical Standby Databases
BASH-DBA: Snapshot standby database
BASH-DBA: Open Physical Standby For Read Write Testing and ...

No comments:

Post a Comment