Oracle Goldengate High Availability as Oracle ClusterWare Resources

 This Blog contains a summary of the environment, and a detailed list of steps used to create the Oracle GoldenGate CRS resources for HA in an Oracle RAC environment. 


This procedure creates multiple CRS resources for Oracle GoldenGate. The following are the resources:

1.     Resource for a dedicated VIP (one per cluster) for remotely communicating with the active node in the cluster running Oracle GoldenGate

a.     Called “ggvip”

2.     Resource for controlling the Oracle GoldenGate Manager process (mgr)

a.     CRS will only check and start/stop the Oracle GoldenGate Manager process

                                               i.     mgr.prm will need AUTOSTART and AUTORESTART parameters to start all remaining processes

b.     Uses an Action Script which is executed by CRS to manage Oracle GoldenGate (stop/start/check of the manager process)

c.     Called “ggapp”

3.     (optional) Resource for the Oracle GoldenGate monitoring process

a.     Script will follow Oracle GoldenGate processes from node to node (follows the “ggapp” resources)

                                               i.     Most monitoring solutions (including OEM) do not understand Oracle GoldenGate in a clustered configuration, as those solutions are node specific. This option makes the monitoring script cluster aware.

b.     Called “ggmon”


Environment Overview for Oracle GoldenGate


Name:   ggserver.nndc.kp.org

Address: 172.19.130.122

                OGG OS Userid: oracle

               Grid OS Userid: oracle:oinstall

                RDBMS Userid: oracle:oinstall

      ORACLE OGG VIP Resource: ggvip    (DNS Name = ggserver-gg.nndc.kp.org, VIP = 172.18.131.212)

         CRS OGG App Resource: ggapp    (resource controlling OGG manager process)
         CRS OGG Mon Resource: ggmon    (resource controlling OGG monitoring script)
               GGS_HOME: /u01/app/oracle/product/ogg/

               CRS_HOME: /u01/app/11.2.0.4/grid
            ORACLE_HOME: /u01/app/oracle/product/11.2.0.4/dbhome_1


Scripts for Oracle GoldenGate and CRS

The following are the action script(s) used for each CRS resource created. Resource “ggvip” has not action script, all actions are managed by CLusterWare. The remaining two resources are managed via the action scripts.

1.  ggapp

a.  /$GGS_HOME/scripts/gg_action.scr   (implements the start/stop/check/abort/clean callbacks for OGG mgr)

2.  ggmon:

a.  /$GGS_HOME/scripts/ggmon.scr   (implements the start/stop/check/abort/clean callbacks for all OGG processes)


Steps to add CRS Resources for Oracle GoldenGate HA in an Oracle RAC Cluster

1.    Create the required group and user-id for OGG on the remaining nodes in the cluster (if they do not already exist)

a.    Each node must have the same GUID/UID for the GoldenGate user

                                     i.    each node is able to read and modify the files on the shared storage form any node in the fail over

2.    Install Oracle GoldenGate binaries on each node in the cluster (on the local file system for each node)

a.    Can also install the home on the shared storage (if mounted as the same name across all nodes)

                                     i.    Keep in mind, special NFS mount options for OGG for certain file types

1.    See MOS Note: 1232303.1

                                    ii.    Keep in mind the specific DBFS configuration needed if using DBFS as the “shared storage”

3.    Shutdown Oracle GoldenGate and move the following directories to the shared storage

a.    ./dirprm                   (parameter files – all)

b.    ./dirchk                    (process checkpoint files)

c.    ./dirpcs                    (process run PID files)

d.    ./dirdat                    (trail files – if used in dirdat)

e.    ./dirtmp                   (caching to temp files for extract or apply)

f.     ./BR                        (extraction only – Bounded Recovery files)

g.    ./scripts                   (monitoring scripts and log files)

h.    ./dirdef                    (source definition files)

Note:
Try to use relative paths in all Oracle GoldenGate parameter files and commands or command files (OBEY) when pointing to files or directories in the file system.  Relative paths can be used for trail files, report files, parameter files, etc.
Also use default locations for all files, easier to move and use symbolic links to new locations.


See OGG Best Practices presentation.

4.    Create a softlink in $GGS_HOME to each directory location (moved to shared storage). Do this for each directory that needs to be moved to shared storage.

a.    ln –s /zfssa/gold01/PERC/dirprm dirprm

b.    Continue until all directories moved to shared storage are linked

5.    Add the following to the “mgr.prm” file

a.    AUTOSTART ER *

                                     i.    This is needed to instruct the manager to start all processes when the manager starts

6.    Create/assign a new IP to be used as the Virtual IP for the cluster (application VIP)

a.    VIP will be registered as a VIP resource in CRS and managed by CRS

                                     i.    Source systems will use this VIP as the RMTHOST host parameter

b.    Following commands will need the IP (or hostname) of the VIP

7.    Set CRS_HOME to the ORACLE_HOME path for the GI home

a.    Export CRS_HOME=${ORACLE_HOME}

8.    Copy the action script and monitor script to the $CRS_HOME/crs/script on each node in the RAC cluster, ensure the permissions allow the “oracle” owner to execute and modify:

a.    cp “gg_action.scr” and “ggmon.scr”  $CRS_HOME/crs/script

b.    cd $CRS_HOME/crs/script

c.    chmod 750 gg_*

9.    As “root” (with the Grid environment sourced), add the VIP CRS resource and then change the permissions so that other users can execute the resource: (must be added as root – required to add a VIP on the server)

$CRS_HOME/bin/appvipcfg create -network=1 -ip=172.18.131.212 -vipname=ggvip -user=root

$CRS_HOME/bin/crsctl setperm resource ggvip -u user:oracle:r-x

10. As “root” (with the Grid environment sourced), add the OGG Manager CRS resource (ggapp):      

$CRS_HOME/bin/crsctl add resource ggapp -type cluster_resource \
-attr "ACTION_SCRIPT=
$CRS_HOME/crs/script/gg_action.scr, CHECK_INTERVAL=300, \
START_DEPENDENCIES='hard(ggvip) pullup(ggvip)', STOP_DEPENDENCIES='hard(ggvip)', SCRIPT_TIMEOUT=300"

Note:
If your Oracle GoldenGate software owner is not the same as the Oracle Grid infrastructure software owner (“grid”), then you must set the ownership of the application resource to the Oracle GoldenGate software owner. Run this command as root (or as the user who created the ORACLE app resource (root or grid)).  Create resource as root/grid, then change owner (as root/grid):

$CRS_HOME/bin/crsctl setperm resource ggvip -u user:oracle:r-x

$CRS_HOME/bin/crsctl setperm resource ggapp -g oinstall

$CRS_HOME/bin/crsctl setperm resource ggapp -u user:oracle:rwx

11.As root (with the Grid environment sourced), add the OGG Monitor CRS resource (ggmon)

$CRS_HOME/bin/crsctl add resource ggmon -type cluster_resource \
-attr "ACTION_SCRIPT=
$CRS_HOME/crs/script/ggmon.scr, CHECK_INTERVAL=600, \
START_DEPENDENCIES='hard(intermediate:ggapp)', STOP_DEPENDENCIES='hard(ggvip)', SCRIPT_TIMEOUT=240"


Note:
If your Oracle GoldenGate software owner (e.g. “oracle”) is not the same as the Oracle Grid infrastructure software owner (“grid”), then you must set the ownership of the ORACLE application resource to the Oracle GoldenGate software owner. Run this command as root (or as the user who created the ORACLE app resource (root or grid)).  Create resource as root/grid, then change owner (as root/grid):

$CRS_HOME/bin/crsctl setperm resource ggmon -g oinstall
$CRS_HOME/bin/crsctl setperm resource ggmon -u user:oracle:rwx

12. Create a dependency on the CRS resource ggmon to ggapp

a.  ggmon resource needs to always run the same server as “ggapp” for process checks

  crsctl modify resource ggmon -attr “START_DEPENDENCIES='hard(intermediate:ggapp)'”

 

 

13.Change the RMTHOST parameter to use the OGG VIP on all source system pumps for all sources that send data to this GoldenGate instance (may need to use the IP address if the hostname is not resolvable in DNS)


RMTHOST <OGG VIP>, MGRPORT 7809


14.(Optional) To restrict the Oracle CRS resources to specific RAC cluster nodes.
If Oracle GoldenGate is allowed to run from any node in the cluster, this step is not needed.
(allow ggvip, ggapp and ggmon resources to run only on certain cluster nodes):

 

crsctl modify resource ggvip -attr “PLACEMENT=restricted, HOSTING_MEMBERS='host1 host2'”

crsctl modify resource ggapp -attr “PLACEMENT=restricted, HOSTING_MEMBERS='host1 host2'”

crsctl modify resource ggmon -attr “PLACEMENT=restricted, HOSTING_MEMBERS='host1 host2'


Changing the monitoring script execution frequency for GGMON Resource:

GoldenGate monitoring frequency (ggmon resource) can be changed by altering the CRS resource attribute “CHECK_INTERVAL”.

1.  As oracle (with the GRID HOME environment sourced) run the following “crsctl” command to alter the “CHECK_INTERVAL” attribute of the ggmon CRS resource.

          $CRS_HOME/bin/crsctl modify resource ggmon -attr “CHECK_INTERVAL=600

 

Note:
This attribute controls the frequency at which the ‘check’ routine is called by CRS to run the script’s check() routine.
The ‘check’ routine/entry point causes the monitor script to check the GoldenGate processes according to the $GGS_HOME/dirprm/parms file thresholds and configuration.


Commands to Create CRS Resources in the environment:

 


# Run as root (with Grid Home environment)

# Set CRS_HOME to ORACLE_HOME for Grid Infra

# Create VIP resource in CRS configuration

$CRS_HOME/bin/appvipcfg create -network=1 -ip=172.18.131.212 -vipname=ggvip -user=root

 

# Give Grid/RDBMS users ability to read and execute resource
$CRS_HOME/bin/crsctl setperm resource ggvip -u user:oracle:r-x

 

# Create the GGAPP resource – manages the OGG Manager process

$CRS_HOME/bin/crsctl add resource ggapp -type cluster_resource \

-attr "ACTION_SCRIPT=/goldengate/scripts/gg_action.scr, CHECK_INTERVAL=600, \

START_DEPENDENCIES='hard(ggvip) pullup(ggvip)', STOP_DEPENDENCIES='hard(ggvip)', SCRIPT_TIMEOUT=300"

 

# Create the GGMON resource – script-based monitoring

$CRS_HOME/bin/crsctl add resource ggmon -type cluster_resource \

-attr "ACTION_SCRIPT=/goldengate/scripts/ggmon.scr, CHECK_INTERVAL=600, \

START_DEPENDENCIES='hard(intermediate:ggapp)', STOP_DEPENDENCIES='hard(ggvip)', SCRIPT_TIMEOUT=120"

 

# Change ownership/permissions on resource (created as root and must be changed to actual OS owner)

$CRS_HOME/bin/crsctl setperm resource ggapp -o oracle

$CRS_HOME/bin/crsctl setperm resource ggapp -g oinstall

$CRS_HOME/bin/crsctl setperm resource ggapp -u user:oracle:rwx

 

$CRS_HOME/bin/crsctl setperm resource ggmon -o oracle

$CRS_HOME/bin/crsctl setperm resource ggmon -g oinstall

$CRS_HOME/bin/crsctl setperm resource ggmon -u user:oracle:rwx

CRS Resource Attribute Values (after creation):

 

 

[oracle@xnzxddb70xe1 ~]$ crsctl stat res ggvip –f

[oracle@xnzxddb70xe1 ~]$ crsctl stat res ggapp -f

[oracle@xnzxddb70xe1 ~]$ crsctl stat res ggmon –f




 

Review Resources aftr Creating and After failing Over:

 

View where the resources are running by running the command à  “crsctl stat res –t

----------------------------------------------------
Cluster Resources

----------------------------------------------------
ggapp       1        ONLINE  ONLINE       ggserver

ggmon       1        ONLINE  ONLINE       ggserver

ggvip       1        ONLINE  ONLINE       ggserver

 

 

[oracle@ggserver monitor]$ ps -ef | grep mgr

oracle    23394      1  0 22:32 ?        00:00:00 ./mgr PARAMFILE /u01/app/oracle/ggs/uisamsp/dirprm/mgr.prm REPORTFILE /u01/app/oracle/ggs/uisamsp/dirrpt/MGR.rpt PROCESSID MGR PORT 7809

 

 

Resources failed over to node #1 from node #2, run “crsctl relocate res ggvip –f”

 

 

[oracle@ggserver monitor]$

[oracle@ggserver monitor]$ crsctl relocate res ggvip -f

CRS-2673: Attempting to stop 'ggapp' on 'ggserver'

CRS-2673: Attempting to stop 'ggmon' on 'ggserver'

CRS-2677: Stop of 'ggmon' on 'ggserver' succeeded

CRS-2677: Stop of 'ggapp' on 'ggserver' succeeded

CRS-2673: Attempting to stop 'ggvip' on 'ggserver'

CRS-2677: Stop of 'ggvip' on 'ggserver' succeeded

CRS-2672: Attempting to start 'ggvip' on 'xnzxddb70xe1'

CRS-2676: Start of 'ggvip' on 'xnzxddb70xe1' succeeded

CRS-2672: Attempting to start 'ggapp' on 'xnzxddb70xe1'

CRS-2676: Start of 'ggapp' on 'xnzxddb70xe1' succeeded

CRS-2672: Attempting to start 'ggmon' on 'xnzxddb70xe1'

CRS-2676: Start of 'ggmon' on 'xnzxddb70xe1' succeeded

 

 

Review Resources aftr Creating and After failing Over:

 

Run “crsctl stat res –t

----------------------------------------------------
Cluster Resources

----------------------------------------------------
ggapp      1        ONLINE  ONLINE       xnzxddb70xe1

ggmon      1        ONLINE  ONLINE       xnzxddb70xe1

ggvip      1        ONLINE  ONLINE       xnzxddb70xe1

 

 

[oracle@xnzxddb70xe1 ~]$ ps -ef | grep mgr

oracle    58495      1  0 22:30 ?        00:00:00 ./mgr PARAMFILE ./dirprm/mgr.prm REPORTFILE ./dirrpt/MGR.rpt PROCESSID MGR PORT 7809

 

 

 

Useful Administration Commands for CRS:

 


To failover all GoldeGate processes to another node:

ècrsctl relocate res ggvip –f

ècrsctl relocate res ggvip –f –n <node_name>   (force failover to a specific node)


To stop the GG monitoring activity:

ècrsctl relocate res ggmon –f


To stop GG app (manager) for maintenance:

ècrsctl relocate res ggapp –f


To stop GG VIP (OGG VIP) – this is rather needed to stop the IP itself

ècrsctl relocate res ggvip –f

 

To see a resource’s CRS attributes:

ècrsctl relocate res <resource_name> –f

ècrsctl relocate res ggvip –f

ècrsctl relocate res ggmon –f

ècrsctl relocate res ggapp –f


 

Appendix A

  gg_action.scr

The following is the Oracle provided action script (agent script) for controlling the “ggapp” resource in the CRS environment. The default script works and can be modified or customized as needed for any additional processing needed.


#!/bin/sh

#

# gg_action.scr

 

 

# . ~oracle/.profile

 

[ -z "$1" ]&& echo "ERROR!! Usage $0 <start|stop|abort|clean>"&& exit 99

 

 

#specify delay after start before checking for successful start

start_delay_secs=5

 

#Include the Oracle GoldenGate home in the library path to start GGSCI

 

# set the oracle home to the database to ensure Oracle GoldenGate will get

# the right environment settings to be able to connect to the database

 

export GGS_HOME=/u01/ogg

export ORACLE_HOME=/u01/app/oracle/product/11.2.0.2/dbhome_1

export CRS_HOME=/u01/app/11.2.0.2/grid

export LD_LIBRARY_PATH=${ORACLE_HOME}/lib:${GGS_HOME}

 

#Set NLS_LANG otherwise it will default to US7ASCII

export NLS_LANG=American_America.AL32UTF8

 

logfile=/tmp/crs_ggapp.log

#rm ${logfile}

 

###########################

function log

###########################

Log ()

{

        DATETIME=`date +%d/%m/%y-%H:%M:%S`

        echo $DATETIME "goldengate_action.scr>>" $1

        #echo $DATETIME "goldengate_action.scr>>" $1 >> $logfile

        echo $DATETIME "goldengate_action.scr>>" $1 > $logfile

}

 

 

#check_process validates that a manager process is running at the PID

#that Oracle GoldenGate specifies.

 

check_process ()

{

 

        if ( [ -f "${GGS_HOME}/dirpcs/MGR.pcm" ] )

        then

               pid=`cut -f8 "${GGS_HOME}/dirpcs/MGR.pcm"`

 

               if [ ${pid} = `ps -e |grep ${pid} |grep mgr |awk '{ print $1 }'` ]

               then

                  #manager process is running on the PID . exit success

                  #echo "manager process is running on the PID . exit success"> /tmp/check.out

                  echo "ogg manager process is running on the PID . exit wth check success"

                  exit 0

               else

                  #manager process is not running on the PID

                  echo "manager process is not running on the PID" >> /tmp/check.out

                  #echo "ogg manager process is not running on the PID"

                  exit 1

               fi

        else

               #manager is not running because there is no PID file

               #echo "manager is not running because there is no PID file" >> /tmp/check.out

               echo "manager is not running because there is no PID file"

               exit 1

        fi

 

}

 

 

#call_ggsci is a generic routine that executes a ggsci command

 

call_ggsci () {

 

  log "entering call_ggsci"

  ggsci_command=$1

  #log "about to execute $ggsci_command"

  log "id= $USER"

 

  cd ${GGS_HOME}

  ggsci_output=`${GGS_HOME}/ggsci << EOF

  ${ggsci_command}

  exit

  EOF`

  log "got output of : $ggsci_output"

 

}

 

 

# ----------------------------------------------

#unmount_dbfs will unmount the DBFS file system

# ----------------------------------------------

 

unmount_dbfs ()

{

   if ( [ -d ${DBFS_FILE_SYSTEM} ] )

   then

     fusermount -u ${DBFS_MOUNT_POINT}

   fi

}

 

 

case $1 in

 

'start')

    #Updated by Sourav (02/10/201

    # During failover if the ?mgr.pcm? file is not deleted at the node crash

    # then Oracle clusterware won?t start the manager on the new node assuming the                  

    # manager process is still running on the failed node. To get around this issue

    # we will delete the ?mgr.prm? file before starting up the manager on the new

    # node. We will also delete the other process files with pc* extension and to   

    # avoid any file locking issue we will first backup the checkpoint files and then

    # delete them from the dirchk directory.After that we will restore the checkpoint

    # files from  backup to the original location (dirchk directory).    

 

    log "removing *.pc* files from dirpcs directory..."

       rm -f $GGS_HOME/dirpcs/*.pc*

    

    log "creating tmp directory to backup checkpoint file...."

       mkdir $GGS_HOME/dirchk/tmp

 

    log "backing up checkpoint files..."

       cp $GGS_HOME/dirchk/*.cp* $GGS_HOME/dirchk/tmp    

 

    log "Deleting checkpoint files under dirchk......"

       rm -f $GGS_HOME/dirchk/*.cp*  

 

    log "Restore checkpoint files from backup to dirchk directory...."

       cp $GGS_HOME/dirchk/tmp/*.cp* $GGS_HOME/dirchk

 

    log "Deleting tmp directory...."

       rm -r $GGS_HOME/dirchk/tmp

 

    log "starting manager"

    call_ggsci 'start manager'

    #there is a small delay between issuing the start manager command

    #and the process being spawned on the OS . wait before checking

    log "sleeping for start_delay_secs"

    sleep ${start_delay_secs}

    #check whether manager is running and exit accordingly

    check_process

   ;;

 

'stop')

    #attempt a clean stop for all non-manager processes

    call_ggsci 'stop er *'

    #ensure everything is stopped

    call_ggsci 'stop er *!'

    #stop manager without (y/n) confirmation

    call_ggsci 'stop manager!'

    #exit success

    exit 0

   ;;

 

'check')

    check_process

    exit 0

   ;;

 

'clean')

    #attempt a clean stop for all non-manager processes

    call_ggsci 'stop er *'

    #ensure everything is stopped

    call_ggsci 'stop er *!'

    #in case there are lingering processes

    call_ggsci 'kill er *'

    #stop manager without (y/n) confirmation

    call_ggsci 'stop manager!'

    #unmount DBFS

    #unmount_dbfs

    #exit success

    exit 0

   ;;

 

'abort')

 

    #ensure everything is stopped

    call_ggsci 'stop er *!'

    #in case there are lingering processes

    call_ggsci 'kill er *'

    #stop manager without (y/n) confirmation

    call_ggsci 'stop manager!'

    #unmount DBFS

    #unmount_dbfs

    #exit success

    exit 0

   ;;

 

esac


 

Appendix B

ggmon.scr

 

 

A wrapper script to call monitoring that supports the callback required by CRS. This script calls the monitoring script which should already be setup and functioning.

 

 

 

[oracle@xszxddb69xe1 scriptagent_oracle]$ cat /u01/app/11.2.0.4/grid/crs/script/ggmon.scr

 

 

#!/bin/sh

#

# ggmon.scr

 

# . ~oracle/.profile

[ -z "$1" ]&& echo "ERROR!! Usage $0 <start|stop|abort|clean>"&& exit 99

 

#specify delay after start before checking for successful start

start_delay_secs=2

 

#Include the Oracle GoldenGate home in the library path to start GGSCI

# set the oracle home to the database to ensure Oracle GoldenGate will get

# the right environment settings to be able to connect to the database

 

export GGS_HOME=/u01/app/oracle/product/ogg

#export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1

#export CRS_HOME=/u01/app/11.2.0.2/grid

#export LD_LIBRARY_PATH=${ORACLE_HOME}/lib:${GGS_HOME}

export GGNAME="ggmon"

ACTIONFLAG=$1

 

#. $GGS_HOME/dirprm/parms.ini

 

logfile=/tmp/crs_${GGNAME}.log

#rm ${logfile}

 

# --------------------------------

# Functions/Prcoedures

# --------------------------------

 

log

{

        DATETIME=`date +%d/%m/%y-%H:%M:%S`

        echo "$DATETIME ${GGNAME}.scr>> $1"

        echo "$DATETIME ${GGNAME}.scr>> $1" > $logfile

}

 

# ---------------------------------------------

#check_process runs the Perl monitoring script

 

check_process ()

{

 # call PERL script under $GGS_HOME and pass it the GGS_HOME parameter

 perl $GGS_HOME/scripts/gg_monitor.pl $GGS_HOME

}

 

# --------------------

# Main Execution Body

# --------------------

 

case $1 in

'start')

    # call Perl script

    log "[${ACTIONFLAG}] called...calling monitoring script"

    log "touch  $GGS_HOME/scripts/ggmon.running.pid"

    touch $GGS_HOME/scripts/${GGNAME}.running.pid

    log " $GGS_HOME/scripts/${GGNAME}.running.pid exists - creating --> START"

    check_process

    exit 0

   ;;

'stop')

    # nothing to do to stop - monitoring script is stateless, only runs on check/start calls

    log "[${ACTIONFLAG}] called...nothing to do"

    if [ -e  $GGS_HOME/scripts/${GGNAME}.running.pid ]

    then

      log " $GGS_HOME/scripts/${GGNAME}.running.pid exists - deleting --> STOP"

      rm -f $GGS_HOME/scripts/${GGNAME}.running.pid

    fi

    exit 0

   ;;

'check')

    log "[${ACTIONFLAG}] called...calling script - checking  $GGS_HOME/scripts/${GGNAME}.running.pid"

    if [ -e $GGS_HOME/scripts/${GGNAME}.running.pid ]

    then

      log " $GGS_HOME/scripts/${GGNAME}.running.pid exists - run check --> CHECK"

      check_process

      exit 0

    else

      log "$GGS_HOME/scripts/${GGNAME}.running.pid not present - nothing to do"

      exit 1

    fi

   ;;

'clean')

    # nothing to do to clean - monitoring script is stateless

    log "[${ACTIONFLAG}] called...nothing to do"

    exit 0

   ;;

'abort')

    # nothing to do to abort - monitoring script is stateless

    log "[${ACTIONFLAG}] called...nothing to do"

    exit 0

   ;;

esac


You can learn more about oracle database administration: 

Pro Oracle GoldenGate for the DBA

Oracle Security Tools Installation and Configuration

Expert Oracle Exadata 

Oracle Database 12c Release 2 Real Application Clusters Handbook: Concepts, Administration, Tuning & Troubleshooting

Expert Oracle RAC 12c

==========================================================

 

Comments

Popular posts from this blog

Restart Innodb MySQL Cluster after Complete outage(All node Down)

Oracle Block Corruption - Detection and Resolution

Install & Configure MySQL Router - MySQL InnoDB Cluster