As DMA its regular activity to patch exadata machine. There are two ways of patching Exadata box Rolling and Non-Rolling. In this blog , We will start with part 1 of the Exadata (Half Rack) Image Upgrade (Rolling).
Step 1: Backup Current Configurations
echo "Executing prechecks specific to cell nodes..."
This is one time precheck configs will collect for all 7 cell nodes of the cell group
#cd /root
echo "" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo "************************" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo "Cell specific prechecks" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo "************************" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Need at least 1.5gb + size of ISO file (approx 3gb total for Jan-July releases) space on / partition of cells to do cell update" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group df -h / >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Check all cells are up" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group cellcli -e list cell >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Check cell network configuration" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group "/opt/oracle.cellos/ipconf -verify" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Validate cell disks for valid physicalInsertTime (should be no output)" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group cellcli -e 'list physicaldisk attributes luns where physicalInsertTime = null' >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Check for WriteBack Flash Cache" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group cellcli -e "list cell attributes flashcachemode" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo -e "\n# Check for Flash Cache Compression" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
dcli -l root -g cell_group cellcli -e "list cell attributes flashCacheCompress" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt
echo "Executing prechecks specific to compute nodes..."
This is one time precheck configs will collect for all 4 db nodes of the dbs group
#cd /root
echo "" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo "************************" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo "Compute specific prechecks" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo "************************" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# need 3-5 gb on / partition" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group df -h / >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# need ~40 mb on /boot partition" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group df -h /boot >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# check freespace on /u01 partition" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group df -h /u01 >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# Make sure not snaps are still active" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group lvs >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# Need at least 1.5gb gb free PE" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group vgdisplay >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# Mounted filesystems" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group mount >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# Contents of /etc/fstab" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group cat /etc/fstab >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
echo -e "\n# Contents of /etc/exports" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group cat /etc/exports >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
#
echo "Done."
echo -e "\n# Contents of /etc/exports" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
dcli -l root -g dbs_group cat /etc/exports >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt
#
echo "Done."
Step 2: Pre-Validation - Validate if there is any Critical and Stateful alert is leftover on the cell servers
cd /root
[root@abcxyzadm01 ~]# dcli -g cell_group -l root "cellcli -e list alerthistory attributes name,beginTime,alertShortName,alertDescription,severity where alerttype=stateful and severity=critical"
We can drop all the alert history
CellCLI> drop alerthistory all
OR
====
To drop individual alert we need to use below command
Example :
CellCLI> list alerthistory <6_1>
CellCLI> list alerthistory <6_1> detail
CellCLI> drop alerthistory <6_1>
Only drop alert which are taken care.
Step 3: Precheck
[root@abcxyzadm01 patch_PATCHNUMBER]# ./patchmgr -cells <cell_group> -patch_check_prereq -rolling
-rw-r--r-- 1 root root 10 May 4 15:53 cell_node_1
-rw-r--r-- 1 root root 11 May 4 16:10 cell_node_2
-rw-r--r-- 1 root root 10 May 4 16:11 cell_node_3
-rw-r--r-- 1 root root 10 May 4 16:11 cell_node_4
-rw-r--r-- 1 root root 11 May 4 16:11 cell_node_5
-rw-r--r-- 1 root root 10 May 4 16:11 cell_node_6
-rw-r--r-- 1 root root 10 May 4 16:12 cell_node_7
Note : change the cell_group name to one which you are patching
Example : For first cell node
[root@abcxyzadm01 patch_PATCHNUMBER]# ./patchmgr -cells cell_node_1 -patch_check_prereq -rolling
[root@abcxyzadm01 patch_PATCHNUMBER]# ./patchmgr -cells cell_node_1 -patch_check_prereq -rolling
2017-05-08 11:24:52 -0700 [INFO] Disabling /var/log/cellos cleanup on this node for the duration of the patchmgr session.
2017-05-08 11:25:56 -0700 :Working: DO: Check cells have ssh equivalence for root user. Up to 10 seconds per cell ...
2017-05-08 11:25:56 -0700 :SUCCESS: DONE: Check cells have ssh equivalence for root user.
2017-05-08 11:26:00 -0700 :Working: DO: Initialize files, check space and state of cell services. Up to 1 minute ...
2017-05-08 11:26:44 -0700 :SUCCESS: DONE: Initialize files, check space and state of cell services.
2017-05-08 11:26:44 -0700 :Working: DO: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction. Up to 40 minutes ...
2017-05-08 11:26:59 -0700 Wait correction of degraded md11 due to md partner size mismatch. Up to 30 minutes.
2017-05-08 11:27:00 -0700 :SUCCESS: DONE: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction.
2017-05-08 11:27:00 -0700 :Working: DO: Check prerequisites on all cells. Up to 2 minutes ...
2017-05-08 11:27:39 -0700 :SUCCESS: DONE: Check prerequisites on all cells.
2017-05-08 11:27:39 -0700 :Working: DO: Execute plugin check for Patch Check Prereq ...
2017-05-08 11:27:39 -0700 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 22909764 v1.0. Details in logfile /u01/exa_img_upg/CELL/patch_PATCHNUMBER/patchmgr.stdout.
2017-05-08 11:27:39 -0700 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 17854520 v1.3. Details in logfile /u01/exa_img_upg/CELL/patch_PATCHNUMBER/patchmgr.stdout.
2017-05-08 11:27:41 -0700 :WARNING: ACTION REQUIRED: Cells to be upgraded pass version check, however other cells not being upgraded may be at version 11.2.3.1.x or 11.2.3.2.x, exposing the system to bug 17854520. Manually check other cells for version 11.2.3.1.x or 11.2.3.2.x.
2017-05-08 11:27:41 -0700 :INFO: Checking database homes for remote db nodes with oracle-user ssh equivalence to the local system.
2017-05-08 11:27:41 -0700 :INFO: Database homes that exist only on remote nodes must be checked manually.
2017-05-08 11:27:47 -0700 :SUCCESS: Patchmgr plugin complete: Prereq check passed - no exposure to bug 17854520
2017-05-08 11:27:47 -0700 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 22468216 v1.0. Details in logfile /u01/exa_img_upg/CELL/patch_PATCHNUMBER/patchmgr.stdout.
2017-05-08 11:27:48 -0700 :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 22468216
2017-05-08 11:27:48 -0700 :INFO : Patchmgr plugin start: Prereq check for exposure to bug 24625612 v1.0.
2017-05-08 11:27:48 -0700 :INFO : Details in logfile /u01/exa_img_upg/CELL/patch_PATCHNUMBER/patchmgr.stdout.
2017-05-08 11:27:48 -0700 :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 24625612
2017-05-08 11:27:48 -0700 :SUCCESS: DONE: Execute plugin check for Patch Check Prereq.
Restart ILOM on all cell nodes (optional)
dcli -l root -g <cell_group> "ipmitool bmc reset cold"
Note: Change the <cell_group> to the cell server you are patching as below
cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7
Example
dcli -l root -g cell_node_1 "ipmitool bmc reset cold"
Check repair times for all mounted disk groups in the Oracle ASM instance and adjust if needed Note: Set disk_repair_time to 8.5 hours.
sqlplus / as sysasm
select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';
NAME VALUE
------------ ----------
DATA_DG 3.6h
DBFS_DG 3.6h
RECO_DG 3.6h
If the repair time is not 8.5 hours then note the value and the diskgroup names. Replace <diskgroup_name> in the following statement to adjust. alter diskgroup '<diskgroup_name>' set attribute 'disk_repair_time'='8.5h';
Repeat the above statement for each diskgroup as below
alter diskgroup DATA_DG set attribute 'disk_repair_time'='8.5h';
alter diskgroup DBFS_DG set attribute 'disk_repair_time'='8.5h';
alter diskgroup RECO_DG set attribute 'disk_repair_time'='8.5h';
Also increase the rebalance power to 5
alter diskgroup RECO_DG rebalance power 5;
alter diskgroup DATA_DG rebalance power 5;
alter diskgroup DBFS_DG rebalance power 5;
Check uptime and reboot if needed
cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
dcli -l root -g cell_group "uptime"
[root@abcxyzadm01 ~]# dcli -l root -g cell_group "uptime"
xyzceladm01: 12:51:44 up 222 days, 17:17, 0 users, load average: 1.45, 1.33, 1.35
xyzceladm02: 12:51:44 up 222 days, 17:17, 0 users, load average: 1.64, 1.21, 1.25
xyzceladm03: 12:51:44 up 222 days, 17:18, 0 users, load average: 1.34, 1.45, 1.43
xyzceladm04: 12:51:44 up 222 days, 17:17, 0 users, load average: 0.98, 1.25, 1.35
xyzceladm05: 12:51:44 up 222 days, 17:17, 0 users, load average: 1.10, 1.33, 1.42
xyzceladm06: 12:51:44 up 222 days, 17:17, 0 users, load average: 0.88, 1.21, 1.34
xyzceladm07: 12:51:44 up 222 days, 17:17, 0 users, load average: 0.99, 1.21, 1.30
[root@abcxyzadm01 ~]#
Note: If cells up more than 7 days then reboot each cell in rolling fashion using the note below.
Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)
Note: In our case we need to reboot the cell server
cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
Step 4: Cleanup space from any previous runs
The -reset_force command is only done the first time the cells are patched to this release.
It is not necessary to use the command for subsequent cell patching, even after rolling back the patch.
We need to use 'cleanup' option not 'reset_force' option
[root@abcxyzadm05 ~]#cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
./patchmgr -cells <cell_group> -reset_force
Note : Always use the -cleanup option before retrying a failed or halted run of the patchmgr utility.
[root@abcxyzadm05 ~]#cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
./patchmgr -cells <cell_group> -cleanup
Note: Please use the <cell_gorup> as per respective cell you are patching
Note: Change the <cell_group> to the cell server you are patching as below
cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7
Example : ./patchmgr -cells cell_node_1 -cleanup
Step 5: Run prerequisites check
=======================
#cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
./patchmgr -cells <cell_group> -patch_check_prereq -rolling
Note: Change the <cell_group> to the cell server you are patching as below
cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7
Example : ./patchmgr -cells cell_node_1 -patch_check_prereq -rolling
Step 6: Patch the cell nodes
#cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
dcli -l root -g <cell_group> imageinfo
nohup ./patchmgr -cells <cell_group> -patch -rolling &
Step 7:To Check Progress of Image Upgrade
Monitor the patch progress
Monitor the ILOM console for each cell being patched. You may want to download the ilom-login.sh script from note 1616791.1 for assisting in logging into the iloms.
cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
tail -f nohup.out
Step 8: Post Patch Space Cleanup
Cleanup space
./patchmgr -cells <cell_group> -cleanup
Step 9:
Post Image Upgrade Validations
==============================
Post Checks
dcli -l root -g <cell_group> imageinfo -version
dcli -l root -g <cell_group> imageinfo -status
dcli -l root -g <cell_group> "uname -r"
dcli -l root -g <cell_group> cellcli -e list cell
dcli -l root -g <cell_group> /opt/oracle.cellos/CheckHWnFWProfile
Also in post check - Verify grid disk status:
(a) Verify all grid disks have been successfully put online using the following command:
dcli -l root -g cell_group cellcli -e list griddisk attributes name, asmmodestatus
[root@abcxyzadm01 patch_PATCHNUMBER]# dcli -l root -g <cell_group> cellcli -e list griddisk attributes name, asmmodestatus
xyzcel07: DATA_DG_CD_00_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_01_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_02_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_03_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_04_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_05_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_06_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_07_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_08_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_09_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_10_xyzcel07 ONLINE
xyzcel07: DATA_DG_CD_11_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_02_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_03_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_04_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_05_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_06_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_07_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_08_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_09_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_10_xyzcel07 ONLINE
xyzcel07: DBFS_DG_CD_11_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_00_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_01_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_02_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_03_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_04_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_05_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_06_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_07_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_08_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_09_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_10_xyzcel07 ONLINE
xyzcel07: RECO_DG_CD_11_xyzcel07 ONLINE
[root@abcxyzadm01 patch_PATCHNUMBER]#
(b) Wait until asmmodestatus is ONLINE for all grid disks. Each disk will go to a 'SYNCING' state first then 'ONLINE'. The following is an example of the output:
DATA_CD_00_xyzcel01 ONLINE
DATA_CD_01_xyzcel01 SYNCING
DATA_CD_02_xyzcel01 OFFLINE
DATA_CD_03_xyzcel01 OFFLINE
DATA_CD_04_xyzcel01 OFFLINE
DATA_CD_05_xyzcel01 OFFLINE
DATA_CD_06_xyzcel01 OFFLINE
DATA_CD_07_xyzcel01 OFFLINE
DATA_CD_08_xyzcel01 OFFLINE
DATA_CD_09_xyzcel01 OFFLINE
DATA_CD_10_xyzcel01 OFFLINE
DATA_CD_11_xyzcel01 OFFLINE
(c) Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.
( Please note: this operation uses Fast Mirror Resync operation - which does not trigger an ASM rebalance. The Resync operation restores only the extents that would have been written while the disk was offline.)
Note: In above command change the <cell_group> to the cell server you are patching as below
cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7
Step 10 : Post Execution After Validations of Image Upgrade
1. Change disk_repair_time back to original value
select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';
sqlplus / as sysasm
select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';
NAME VALUE
------------------------------ ------------
DATA_DG 8.5h
DBFS_DG 8.5h
RECO_DG 8.5h
Reset the repair time to the original value if it was changed at the start of patching. Replace <diskgroup_name> in the following statement to adjust. alter diskgroup '<diskgroup_name>' set attribute 'disk_repair_time'='<original value>';
alter diskgroup '<diskgroup_name>' set attribute 'disk_repair_time'='3.6h';
Repeat the above statement for each diskgroup as below
alter diskgroup DATA_DG set attribute 'disk_repair_time'='3.6h';
alter diskgroup DBFS_DG set attribute 'disk_repair_time'='3.6h';
alter diskgroup RECO_DG set attribute 'disk_repair_time'=3.6h';
Also put back the rebalance power to 2
alter diskgroup RECO_DG rebalance power 2;
alter diskgroup DATA_DG rebalance power 2;
alter diskgroup DBFS_DG rebalance power 2;
Step 11: Known Issues(In case Image Upgrade Fails)
Additional checks (if there were problems)
cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
cat patchmgr.stdout
cat _wip_stdout file
ssh <cell-node>
cd /var/log/cellos
grep -i 'fail' validations.log
grep -i 'fail' vldrun*.log
cat validations.log
cat vldrun.upgrade_reimage_boot.log
cat vldrun.first_upgrade_boot.log
cat CheckHWnFWProfile.log
cat cell.bin.install.log
cat cellFirstboot.log
cat exachkcfg.log
cat patch.out.place.sh.log
cat install.sh.log
---------------------------------------------------------------------------------------------------
Rolling Back Successfully Patched Exadata Cells
(This section describes how to roll back successfully patched Exadata Cells. Cells with incomplete or failed patching cannot be rolled back.)
Do not run more than one instance of the patchmgr utility at a time in the deployment.
Check the prerequisites using the following command:
./patchmgr -cells cell_group -rollback_check_prereq [-rolling]
Perform the rollback using the following command:
./patchmgr -cells cell_group –rollback [-rolling]
You can learn in detail on Exadata from book Expert Oracle Exadata
==========================================================
Please check our other blogs for Exadata.
Comments
Post a Comment
Please do not enter any spam link in comment Section suggestions are Always Appreciated. Thanks.. !