Oracle Goldengate High Availability as Oracle ClusterWare Resources
This Blog contains a summary of the environment, and a detailed list of steps used to create the Oracle GoldenGate CRS resources for HA in an Oracle RAC environment.
This procedure creates multiple CRS resources for Oracle GoldenGate. The
following are the resources:
1.
Resource for a dedicated VIP (one per cluster)
for remotely communicating with the active node in the cluster running Oracle GoldenGate
a.
Called “ggvip”
2.
Resource for controlling the Oracle GoldenGate Manager
process (mgr)
a.
CRS will only check and start/stop the Oracle GoldenGate
Manager process
i. mgr.prm will
need AUTOSTART and AUTORESTART parameters to start all remaining processes
b.
Uses an Action Script which is executed by CRS
to manage Oracle GoldenGate (stop/start/check of the manager process)
c.
Called “ggapp”
3.
(optional) Resource for the Oracle GoldenGate monitoring
process
a.
Script will follow Oracle GoldenGate processes
from node to node (follows the “ggapp” resources)
i. Most
monitoring solutions (including OEM) do not understand Oracle GoldenGate in a
clustered configuration, as those solutions are node specific. This option
makes the monitoring script cluster aware.
b. Called “ggmon”
Environment
Overview for Oracle GoldenGate
Name: ggserver.nndc.kp.org
Address: 172.19.130.122
OGG OS Userid: oracle
Grid OS Userid: oracle:oinstall
RDBMS Userid: oracle:oinstall
ORACLE
OGG VIP Resource: ggvip (DNS Name = ggserver-gg.nndc.kp.org,
VIP = 172.18.131.212)
CRS OGG App Resource: ggapp
(resource controlling OGG manager process)
CRS OGG Mon Resource: ggmon (resource controlling OGG monitoring
script)
GGS_HOME: /u01/app/oracle/product/ogg/
CRS_HOME: /u01/app/11.2.0.4/grid
ORACLE_HOME: /u01/app/oracle/product/11.2.0.4/dbhome_1
Scripts
for Oracle GoldenGate and CRS
The following are the action script(s) used for each CRS resource created.
Resource “ggvip” has not action script, all actions are managed by CLusterWare.
The remaining two resources are managed via the action scripts.
1. ggapp
a. /$GGS_HOME/scripts/gg_action.scr (implements the start/stop/check/abort/clean
callbacks for OGG mgr)
2. ggmon:
a. /$GGS_HOME/scripts/ggmon.scr (implements the start/stop/check/abort/clean
callbacks for all OGG processes)
Steps
to add CRS Resources for Oracle GoldenGate HA in an Oracle RAC Cluster
1. Create the
required group and user-id for OGG on the remaining nodes in the cluster (if
they do not already exist)
a. Each node
must have the same GUID/UID for the GoldenGate user
i. each node
is able to read and modify the files on the shared storage form any node in the
fail over
2. Install
Oracle GoldenGate binaries on each node in the cluster (on the local file
system for each node)
a. Can also
install the home on the shared storage (if mounted as the same name across all
nodes)
i. Keep in
mind, special NFS mount options for OGG for certain file types
1. See MOS Note:
1232303.1
ii. Keep in
mind the specific DBFS configuration needed if using DBFS as the “shared
storage”
3. Shutdown
Oracle GoldenGate and move the following directories to the shared storage
a. ./dirprm (parameter files – all)
b. ./dirchk (process checkpoint files)
c. ./dirpcs (process run PID files)
d. ./dirdat (trail files – if used in
dirdat)
e. ./dirtmp (caching to temp files for
extract or apply)
f. ./BR (extraction only – Bounded Recovery
files)
g. ./scripts (monitoring scripts and log
files)
h. ./dirdef (source definition files)
Note:
Try to use relative paths in all Oracle GoldenGate parameter files and commands
or command files (OBEY) when pointing to files or directories in the file
system. Relative paths can be used for
trail files, report files, parameter files, etc.
Also use default locations for all files, easier to move and use symbolic links
to new locations.
See OGG Best Practices presentation.
4. Create a
softlink in $GGS_HOME to each directory location (moved to shared storage). Do
this for each directory that needs to be moved to shared storage.
a. ln –s
/zfssa/gold01/PERC/dirprm dirprm
b. Continue
until all directories moved to shared storage are linked
5. Add the following
to the “mgr.prm” file
a. AUTOSTART
ER *
i. This is
needed to instruct the manager to start all processes when the manager starts
6. Create/assign
a new IP to be used as the Virtual IP for the cluster (application VIP)
a. VIP will
be registered as a VIP resource in CRS and managed by CRS
i. Source
systems will use this VIP as the RMTHOST host parameter
b. Following
commands will need the IP (or hostname) of the VIP
7. Set
CRS_HOME to the ORACLE_HOME path for the GI home
a. Export
CRS_HOME=${ORACLE_HOME}
8. Copy the
action script and monitor script to the $CRS_HOME/crs/script
on each node in the RAC cluster, ensure the permissions allow the “oracle”
owner to execute and modify:
a. cp “gg_action.scr”
and “ggmon.scr” $CRS_HOME/crs/script
b. cd $CRS_HOME/crs/script
c. chmod 750 gg_*
9. As “root” (with
the Grid environment sourced), add the VIP CRS resource and
then change the permissions so that other users can execute the resource: (must
be added as root – required to add a VIP on the server)
$CRS_HOME/bin/appvipcfg
create -network=1 -ip=172.18.131.212 -vipname=ggvip -user=root
$CRS_HOME/bin/crsctl setperm
resource ggvip -u user:oracle:r-x
10. As
“root” (with the Grid environment sourced), add the OGG Manager CRS resource (ggapp):
$CRS_HOME/bin/crsctl
add resource ggapp -type cluster_resource \
-attr "ACTION_SCRIPT=$CRS_HOME/crs/script/gg_action.scr,
CHECK_INTERVAL=300, \
START_DEPENDENCIES='hard(ggvip) pullup(ggvip)', STOP_DEPENDENCIES='hard(ggvip)',
SCRIPT_TIMEOUT=300"
Note:
If your Oracle GoldenGate software owner is not the same as the Oracle Grid
infrastructure software owner (“grid”), then you must set the ownership of the
application resource to the Oracle GoldenGate software owner. Run this command
as root (or as the user who created the ORACLE app resource (root or
grid)). Create resource as root/grid,
then change owner (as root/grid):
$CRS_HOME/bin/crsctl setperm resource ggvip -u user:oracle:r-x
$CRS_HOME/bin/crsctl setperm resource ggapp -g oinstall
$CRS_HOME/bin/crsctl setperm resource ggapp -u user:oracle:rwx
11.As root (with the
Grid environment sourced), add the OGG Monitor CRS resource (ggmon)
$CRS_HOME/bin/crsctl add resource ggmon -type cluster_resource \
-attr "ACTION_SCRIPT=$CRS_HOME/crs/script/ggmon.scr,
CHECK_INTERVAL=600,
\
START_DEPENDENCIES='hard(intermediate:ggapp)', STOP_DEPENDENCIES='hard(ggvip)',
SCRIPT_TIMEOUT=240"
Note:
If your Oracle GoldenGate software owner (e.g. “oracle”) is
not the same as the Oracle Grid infrastructure software owner (“grid”), then
you must set the ownership of the ORACLE application resource to the Oracle
GoldenGate software owner. Run this command as root (or as the user who created
the ORACLE app resource (root or grid)).
Create resource as root/grid, then change owner (as root/grid):
$CRS_HOME/bin/crsctl setperm resource ggmon -g oinstall
$CRS_HOME/bin/crsctl setperm resource ggmon -u user:oracle:rwx
12.
Create a dependency on the CRS resource ggmon to ggapp
a. ggmon resource
needs to always run the same server as “ggapp” for process checks
crsctl modify resource ggmon -attr
“START_DEPENDENCIES='hard(intermediate:ggapp)'”
13.Change the
RMTHOST parameter to use the OGG VIP on all source system pumps for all sources
that send data to this GoldenGate instance (may need to use the IP address if
the hostname is not resolvable in DNS)
RMTHOST <OGG VIP>, MGRPORT 7809
14.(Optional) To restrict the Oracle CRS resources
to specific RAC cluster nodes.
If Oracle GoldenGate is allowed to run from any node in the cluster, this step
is not needed.
(allow ggvip, ggapp and ggmon resources to run only on certain cluster nodes):
crsctl modify resource ggvip -attr “PLACEMENT=restricted,
HOSTING_MEMBERS='host1 host2'”
crsctl modify resource ggapp -attr “PLACEMENT=restricted,
HOSTING_MEMBERS='host1 host2'”
crsctl modify resource ggmon -attr “PLACEMENT=restricted,
HOSTING_MEMBERS='host1 host2'”
Changing the
monitoring script execution frequency for GGMON Resource:
GoldenGate monitoring frequency (ggmon resource) can be changed by
altering the CRS resource attribute “CHECK_INTERVAL”.
1. As oracle
(with the GRID HOME environment sourced) run the following “crsctl” command to
alter the “CHECK_INTERVAL” attribute of the ggmon CRS resource.
$CRS_HOME/bin/crsctl modify resource ggmon -attr “CHECK_INTERVAL=600”
Note:
This attribute controls the frequency at which the ‘check’ routine is called by
CRS to run the script’s check() routine.
The ‘check’ routine/entry point causes the monitor script to check the
GoldenGate processes according to the $GGS_HOME/dirprm/parms file thresholds
and configuration.
Commands to
Create CRS Resources in the environment:
# Run as root (with Grid Home environment)
# Set CRS_HOME to ORACLE_HOME for Grid Infra
# Create VIP resource in CRS configuration
$CRS_HOME/bin/appvipcfg create -network=1 -ip=172.18.131.212
-vipname=ggvip -user=root
# Give Grid/RDBMS users ability to read and execute resource
$CRS_HOME/bin/crsctl setperm resource ggvip -u user:oracle:r-x
# Create the GGAPP resource – manages the OGG Manager process
$CRS_HOME/bin/crsctl add resource ggapp -type cluster_resource \
-attr "ACTION_SCRIPT=/goldengate/scripts/gg_action.scr,
CHECK_INTERVAL=600, \
START_DEPENDENCIES='hard(ggvip) pullup(ggvip)',
STOP_DEPENDENCIES='hard(ggvip)', SCRIPT_TIMEOUT=300"
# Create the GGMON resource – script-based monitoring
$CRS_HOME/bin/crsctl add resource ggmon -type cluster_resource \
-attr "ACTION_SCRIPT=/goldengate/scripts/ggmon.scr,
CHECK_INTERVAL=600, \
START_DEPENDENCIES='hard(intermediate:ggapp)',
STOP_DEPENDENCIES='hard(ggvip)', SCRIPT_TIMEOUT=120"
# Change ownership/permissions on resource (created as root and
must be changed to actual OS owner)
$CRS_HOME/bin/crsctl setperm resource ggapp -o oracle
$CRS_HOME/bin/crsctl setperm resource ggapp -g oinstall
$CRS_HOME/bin/crsctl setperm resource ggapp -u user:oracle:rwx
$CRS_HOME/bin/crsctl setperm resource ggmon -o oracle
$CRS_HOME/bin/crsctl setperm resource ggmon -g oinstall
CRS Resource
Attribute Values (after creation):
[oracle@xnzxddb70xe1
~]$ crsctl stat res ggvip –f
[oracle@xnzxddb70xe1
~]$ crsctl stat res ggapp -f
[oracle@xnzxddb70xe1
~]$ crsctl stat res ggmon –f
Review Resources
aftr Creating and After failing Over:
View where the resources are running by running the command à “crsctl
stat res –t”
----------------------------------------------------
Cluster Resources
----------------------------------------------------
ggapp 1 ONLINE
ONLINE ggserver
ggmon 1 ONLINE
ONLINE ggserver
ggvip 1 ONLINE
ONLINE ggserver
[oracle@ggserver monitor]$ ps -ef | grep mgr
oracle 23394 1
0 22:32 ? 00:00:00 ./mgr
PARAMFILE /u01/app/oracle/ggs/uisamsp/dirprm/mgr.prm REPORTFILE
/u01/app/oracle/ggs/uisamsp/dirrpt/MGR.rpt PROCESSID MGR PORT 7809
Resources failed
over to node #1 from node #2, run “crsctl relocate res ggvip –f”
[oracle@ggserver monitor]$
[oracle@ggserver monitor]$ crsctl relocate res ggvip -f
CRS-2673: Attempting to stop 'ggapp' on 'ggserver'
CRS-2673: Attempting to stop 'ggmon' on 'ggserver'
CRS-2677: Stop of 'ggmon' on 'ggserver' succeeded
CRS-2677: Stop of 'ggapp' on 'ggserver' succeeded
CRS-2673: Attempting to stop 'ggvip' on 'ggserver'
CRS-2677: Stop of 'ggvip' on 'ggserver' succeeded
CRS-2672: Attempting to start 'ggvip' on 'xnzxddb70xe1'
CRS-2676: Start of 'ggvip' on 'xnzxddb70xe1' succeeded
CRS-2672: Attempting to start 'ggapp' on 'xnzxddb70xe1'
CRS-2676: Start of 'ggapp' on 'xnzxddb70xe1' succeeded
CRS-2672: Attempting to start 'ggmon' on 'xnzxddb70xe1'
CRS-2676: Start of 'ggmon' on 'xnzxddb70xe1' succeeded
Review Resources
aftr Creating and After failing Over:
Run “crsctl stat res –t”
----------------------------------------------------
Cluster Resources
----------------------------------------------------
ggapp 1 ONLINE
ONLINE xnzxddb70xe1
ggmon 1 ONLINE
ONLINE xnzxddb70xe1
ggvip 1 ONLINE
ONLINE xnzxddb70xe1
[oracle@xnzxddb70xe1 ~]$ ps -ef | grep mgr
oracle 58495 1
0 22:30 ? 00:00:00 ./mgr
PARAMFILE ./dirprm/mgr.prm REPORTFILE ./dirrpt/MGR.rpt PROCESSID MGR PORT 7809
Useful
Administration Commands for CRS:
To failover all GoldeGate processes to another node:
ècrsctl
relocate res ggvip –f
ècrsctl
relocate res ggvip –f –n <node_name>
(force failover to a specific node)
To stop the GG monitoring activity:
ècrsctl
relocate res ggmon –f
To stop GG app (manager) for maintenance:
ècrsctl
relocate res ggapp –f
To stop GG VIP (OGG VIP) – this is rather needed to stop the IP itself
ècrsctl
relocate res ggvip –f
To see a resource’s CRS attributes:
ècrsctl
relocate res <resource_name> –f
ècrsctl
relocate res ggvip –f
ècrsctl
relocate res ggmon –f
ècrsctl relocate res ggapp –f
Appendix A
gg_action.scr
The following is the Oracle provided action script (agent script)
for controlling the “ggapp” resource in the CRS environment. The default script
works and can be modified or customized as needed for any additional processing
needed.
#!/bin/sh
#
# gg_action.scr
# . ~oracle/.profile
[ -z "$1" ]&& echo "ERROR!! Usage $0
<start|stop|abort|clean>"&& exit 99
#specify delay after start before checking for successful start
start_delay_secs=5
#Include the Oracle GoldenGate home in the library path to start
GGSCI
# set the oracle home to the database to ensure Oracle GoldenGate
will get
# the right environment settings to be able to connect to the
database
export GGS_HOME=/u01/ogg
export ORACLE_HOME=/u01/app/oracle/product/11.2.0.2/dbhome_1
export CRS_HOME=/u01/app/11.2.0.2/grid
export LD_LIBRARY_PATH=${ORACLE_HOME}/lib:${GGS_HOME}
#Set NLS_LANG otherwise it will default to US7ASCII
export NLS_LANG=American_America.AL32UTF8
logfile=/tmp/crs_ggapp.log
#rm ${logfile}
###########################
function log
###########################
Log ()
{
DATETIME=`date
+%d/%m/%y-%H:%M:%S`
echo $DATETIME
"goldengate_action.scr>>" $1
#echo $DATETIME
"goldengate_action.scr>>" $1 >> $logfile
echo $DATETIME
"goldengate_action.scr>>" $1 > $logfile
}
#check_process validates that a manager process is running at the
PID
#that Oracle GoldenGate specifies.
check_process ()
{
if ( [ -f
"${GGS_HOME}/dirpcs/MGR.pcm" ] )
then
pid=`cut -f8
"${GGS_HOME}/dirpcs/MGR.pcm"`
if [ ${pid} =
`ps -e |grep ${pid} |grep mgr |awk '{ print $1 }'` ]
then
#manager
process is running on the PID . exit success
#echo
"manager process is running on the PID . exit success">
/tmp/check.out
echo
"ogg manager process is running on the PID . exit wth check success"
exit 0
else
#manager
process is not running on the PID
echo
"manager process is not running on the PID" >> /tmp/check.out
#echo
"ogg manager process is not running on the PID"
exit 1
fi
else
#manager is
not running because there is no PID file
#echo
"manager is not running because there is no PID file" >> /tmp/check.out
echo
"manager is not running because there is no PID file"
exit 1
fi
}
#call_ggsci is a generic routine that executes a ggsci command
call_ggsci () {
log "entering
call_ggsci"
ggsci_command=$1
#log "about to
execute $ggsci_command"
log "id= $USER"
cd ${GGS_HOME}
ggsci_output=`${GGS_HOME}/ggsci << EOF
${ggsci_command}
exit
EOF`
log "got output of :
$ggsci_output"
}
# ----------------------------------------------
#unmount_dbfs will unmount the DBFS file system
# ----------------------------------------------
unmount_dbfs ()
{
if ( [ -d
${DBFS_FILE_SYSTEM} ] )
then
fusermount -u
${DBFS_MOUNT_POINT}
fi
}
case $1 in
'start')
#Updated by Sourav
(02/10/201
# During failover if the
?mgr.pcm? file is not deleted at the node crash
# then Oracle
clusterware won?t start the manager on the new node assuming the
# manager process is
still running on the failed node. To get around this issue
# we will delete the
?mgr.prm? file before starting up the manager on the new
# node. We will also
delete the other process files with pc* extension and to
# avoid any file locking
issue we will first backup the checkpoint files and then
# delete them from the
dirchk directory.After that we will restore the checkpoint
# files from backup to the original location (dirchk
directory).
log "removing *.pc*
files from dirpcs directory..."
rm -f
$GGS_HOME/dirpcs/*.pc*
log "creating tmp
directory to backup checkpoint file...."
mkdir
$GGS_HOME/dirchk/tmp
log "backing up
checkpoint files..."
cp
$GGS_HOME/dirchk/*.cp* $GGS_HOME/dirchk/tmp
log "Deleting
checkpoint files under dirchk......"
rm -f
$GGS_HOME/dirchk/*.cp*
log "Restore
checkpoint files from backup to dirchk directory...."
cp
$GGS_HOME/dirchk/tmp/*.cp* $GGS_HOME/dirchk
log "Deleting tmp
directory...."
rm -r $GGS_HOME/dirchk/tmp
log "starting
manager"
call_ggsci 'start
manager'
#there is a small delay
between issuing the start manager command
#and the process being
spawned on the OS . wait before checking
log "sleeping for
start_delay_secs"
sleep
${start_delay_secs}
#check whether manager
is running and exit accordingly
check_process
;;
'stop')
#attempt a clean stop
for all non-manager processes
call_ggsci 'stop er *'
#ensure everything is
stopped
call_ggsci 'stop er *!'
#stop manager without
(y/n) confirmation
call_ggsci 'stop
manager!'
#exit success
exit 0
;;
'check')
check_process
exit 0
;;
'clean')
#attempt a clean stop for
all non-manager processes
call_ggsci 'stop er *'
#ensure everything is
stopped
call_ggsci 'stop er *!'
#in case there are
lingering processes
call_ggsci 'kill er *'
#stop manager without
(y/n) confirmation
call_ggsci 'stop
manager!'
#unmount DBFS
#unmount_dbfs
#exit success
exit 0
;;
'abort')
#ensure everything is
stopped
call_ggsci 'stop er *!'
#in case there are
lingering processes
call_ggsci 'kill er *'
#stop manager without
(y/n) confirmation
call_ggsci 'stop
manager!'
#unmount DBFS
#unmount_dbfs
#exit success
exit 0
;;
esac
Appendix B
ggmon.scr
A wrapper script to call monitoring that supports the callback required
by CRS. This script calls the monitoring script which should already be setup
and functioning.
[oracle@xszxddb69xe1 scriptagent_oracle]$ cat
/u01/app/11.2.0.4/grid/crs/script/ggmon.scr
#!/bin/sh
#
# ggmon.scr
# . ~oracle/.profile
[ -z "$1" ]&& echo "ERROR!! Usage $0
<start|stop|abort|clean>"&& exit 99
#specify delay after start before checking for successful start
start_delay_secs=2
#Include the Oracle GoldenGate home in the library path to start
GGSCI
# set the oracle home to the database to ensure Oracle GoldenGate
will get
# the right environment settings to be able to connect to the
database
export GGS_HOME=/u01/app/oracle/product/ogg
#export ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1
#export CRS_HOME=/u01/app/11.2.0.2/grid
#export LD_LIBRARY_PATH=${ORACLE_HOME}/lib:${GGS_HOME}
export GGNAME="ggmon"
ACTIONFLAG=$1
#. $GGS_HOME/dirprm/parms.ini
logfile=/tmp/crs_${GGNAME}.log
#rm ${logfile}
# --------------------------------
# Functions/Prcoedures
# --------------------------------
log
{
DATETIME=`date
+%d/%m/%y-%H:%M:%S`
echo "$DATETIME
${GGNAME}.scr>> $1"
echo "$DATETIME
${GGNAME}.scr>> $1" > $logfile
}
# ---------------------------------------------
#check_process runs the Perl monitoring script
check_process ()
{
# call PERL script under
$GGS_HOME and pass it the GGS_HOME parameter
perl
$GGS_HOME/scripts/gg_monitor.pl $GGS_HOME
}
# --------------------
# Main Execution Body
# --------------------
case $1 in
'start')
# call Perl script
log
"[${ACTIONFLAG}] called...calling monitoring script"
log "touch $GGS_HOME/scripts/ggmon.running.pid"
touch
$GGS_HOME/scripts/${GGNAME}.running.pid
log "
$GGS_HOME/scripts/${GGNAME}.running.pid exists - creating --> START"
check_process
exit 0
;;
'stop')
# nothing to do to stop
- monitoring script is stateless, only runs on check/start calls
log
"[${ACTIONFLAG}] called...nothing to do"
if [ -e $GGS_HOME/scripts/${GGNAME}.running.pid ]
then
log "
$GGS_HOME/scripts/${GGNAME}.running.pid exists - deleting --> STOP"
rm -f
$GGS_HOME/scripts/${GGNAME}.running.pid
fi
exit 0
;;
'check')
log
"[${ACTIONFLAG}] called...calling script - checking $GGS_HOME/scripts/${GGNAME}.running.pid"
if [ -e
$GGS_HOME/scripts/${GGNAME}.running.pid ]
then
log "
$GGS_HOME/scripts/${GGNAME}.running.pid exists - run check --> CHECK"
check_process
exit 0
else
log
"$GGS_HOME/scripts/${GGNAME}.running.pid not present - nothing to do"
exit 1
fi
;;
'clean')
# nothing to do to clean
- monitoring script is stateless
log
"[${ACTIONFLAG}] called...nothing to do"
exit 0
;;
'abort')
# nothing to do to abort
- monitoring script is stateless
log
"[${ACTIONFLAG}] called...nothing to do"
exit 0
;;
esac
You can learn more about oracle database administration:
Pro Oracle GoldenGate for the DBA
Oracle Security Tools Installation and Configuration
==========================================================
Comments
Post a Comment
Please do not enter any spam link in comment Section suggestions are Always Appreciated. Thanks.. !