Exadata (Half Rack) Image Upgrade (Rolling) Part 1-Cell Server

As DMA its regular activity to patch exadata machine. There are two ways of patching  Exadata box Rolling and Non-Rolling. In this blog , We will start with part 1 of the Exadata (Half Rack) Image Upgrade (Rolling).

Step 1: Backup Current Configurations

echo "Executing prechecks specific to cell nodes..."

This is one time precheck configs will collect for all 7 cell nodes of the cell group

#cd /root

echo ""  >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

echo "************************" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

echo "Cell specific prechecks" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

echo "************************" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

echo -e "\n# Need at least 1.5gb + size of ISO file (approx 3gb total for Jan-July releases) space on / partition of cells to do cell update"  >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

dcli -l root -g cell_group df -h / >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

echo -e "\n# Check all cells are up"  >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

dcli -l root -g cell_group cellcli -e list cell >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

echo -e "\n# Check cell network configuration"  >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

dcli -l root -g cell_group "/opt/oracle.cellos/ipconf -verify" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

echo -e "\n# Validate cell disks for valid physicalInsertTime (should be no output)"  >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

dcli -l root -g cell_group cellcli -e 'list physicaldisk attributes luns where physicalInsertTime = null' >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

echo -e "\n# Check for WriteBack Flash Cache"  >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

dcli -l root -g cell_group cellcli -e "list cell attributes flashcachemode" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

echo -e "\n# Check for Flash Cache Compression"  >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt

dcli -l root -g cell_group cellcli -e "list cell attributes flashCacheCompress" >>/u01/exa_img_upg/prechecks/cell_node_prechecks.txt


echo "Executing prechecks specific to compute nodes..."

This is one time precheck configs will collect for all 4 db nodes of the dbs group

#cd /root

echo ""  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo "************************" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo "Compute specific prechecks" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo "************************" >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo -e "\n# need 3-5 gb on / partition"  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

dcli -l root -g dbs_group df -h /  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo -e "\n# need ~40 mb on /boot partition"  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

dcli -l root -g dbs_group df -h /boot  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo -e "\n# check freespace on /u01 partition"  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

dcli -l root -g dbs_group df -h /u01  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo -e "\n# Make sure not snaps are still active"  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

dcli -l root -g dbs_group lvs >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo -e "\n# Need at least 1.5gb gb free PE"  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

dcli -l root -g dbs_group vgdisplay >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt


echo -e "\n# Mounted filesystems"  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

dcli -l root -g dbs_group mount  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo -e "\n# Contents of /etc/fstab"  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

dcli -l root -g dbs_group cat /etc/fstab  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

echo -e "\n# Contents of /etc/exports"  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

dcli -l root -g dbs_group cat /etc/exports  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

#

echo "Done."

echo -e "\n# Contents of /etc/exports"  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

dcli -l root -g dbs_group cat /etc/exports  >>/u01/exa_img_upg/prechecks/compute_node_prechecks.txt

#

echo "Done."

Step 2: Pre-Validation - Validate if there is any Critical and Stateful alert is leftover on the cell servers

cd /root


[root@abcxyzadm01 ~]# dcli -g  cell_group -l root "cellcli -e list alerthistory attributes name,beginTime,alertShortName,alertDescription,severity where alerttype=stateful and severity=critical"

We can drop all the alert history
CellCLI> drop alerthistory all

OR
====
To drop individual alert we need to use below command
Example :
CellCLI> list alerthistory <6_1>
CellCLI> list alerthistory <6_1>  detail

CellCLI> drop alerthistory <6_1>

Only drop alert which are taken care. 

Step 3: Precheck

[root@abcxyzadm01 patch_PATCHNUMBER]# ./patchmgr -cells <cell_group> -patch_check_prereq -rolling

-rw-r--r-- 1 root   root             10 May  4 15:53 cell_node_1
-rw-r--r-- 1 root   root             11 May  4 16:10 cell_node_2
-rw-r--r-- 1 root   root             10 May  4 16:11 cell_node_3
-rw-r--r-- 1 root   root             10 May  4 16:11 cell_node_4
-rw-r--r-- 1 root   root             11 May  4 16:11 cell_node_5
-rw-r--r-- 1 root   root             10 May  4 16:11 cell_node_6
-rw-r--r-- 1 root   root             10 May  4 16:12 cell_node_7
Note :  change the cell_group name to one which you are patching
Example : For first cell node
[root@abcxyzadm01 patch_PATCHNUMBER]# ./patchmgr -cells cell_node_1 -patch_check_prereq -rolling
[root@abcxyzadm01 patch_PATCHNUMBER]# ./patchmgr -cells cell_node_1 -patch_check_prereq -rolling
2017-05-08 11:24:52 -0700 [INFO] Disabling /var/log/cellos cleanup on this node for the duration of the patchmgr session.

2017-05-08 11:25:56 -0700        :Working: DO: Check cells have ssh equivalence for root user. Up to 10 seconds per cell ...
2017-05-08 11:25:56 -0700        :SUCCESS: DONE: Check cells have ssh equivalence for root user.
2017-05-08 11:26:00 -0700        :Working: DO: Initialize files, check space and state of cell services. Up to 1 minute ...
2017-05-08 11:26:44 -0700        :SUCCESS: DONE: Initialize files, check space and state of cell services.
2017-05-08 11:26:44 -0700        :Working: DO: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction. Up to 40 minutes ...
2017-05-08 11:26:59 -0700 Wait correction of degraded md11 due to md partner size mismatch. Up to 30 minutes.


2017-05-08 11:27:00 -0700        :SUCCESS: DONE: Copy, extract prerequisite check archive to cells. If required start md11 mismatched partner size correction.
2017-05-08 11:27:00 -0700        :Working: DO: Check prerequisites on all cells. Up to 2 minutes ...
2017-05-08 11:27:39 -0700        :SUCCESS: DONE: Check prerequisites on all cells.
2017-05-08 11:27:39 -0700        :Working: DO: Execute plugin check for Patch Check Prereq ...
2017-05-08 11:27:39 -0700 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 22909764 v1.0. Details in logfile /u01/exa_img_upg/CELL/patch_PATCHNUMBER/patchmgr.stdout.
2017-05-08 11:27:39 -0700 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 17854520 v1.3. Details in logfile /u01/exa_img_upg/CELL/patch_PATCHNUMBER/patchmgr.stdout.
2017-05-08 11:27:41 -0700 :WARNING: ACTION REQUIRED: Cells to be upgraded pass version check, however other cells not being upgraded may be at version 11.2.3.1.x or 11.2.3.2.x, exposing the system to bug 17854520.  Manually check other cells for version 11.2.3.1.x or 11.2.3.2.x.
2017-05-08 11:27:41 -0700 :INFO: Checking database homes for remote db nodes with oracle-user ssh equivalence to the local system.
2017-05-08 11:27:41 -0700 :INFO: Database homes that exist only on remote nodes must be checked manually.
2017-05-08 11:27:47 -0700 :SUCCESS: Patchmgr plugin complete: Prereq check passed - no exposure to bug 17854520
2017-05-08 11:27:47 -0700 :INFO: Patchmgr plugin start: Prereq check for exposure to bug 22468216 v1.0. Details in logfile /u01/exa_img_upg/CELL/patch_PATCHNUMBER/patchmgr.stdout.
2017-05-08 11:27:48 -0700 :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 22468216
2017-05-08 11:27:48 -0700        :INFO   : Patchmgr plugin start: Prereq check for exposure to bug 24625612 v1.0.
2017-05-08 11:27:48 -0700        :INFO   : Details in logfile /u01/exa_img_upg/CELL/patch_PATCHNUMBER/patchmgr.stdout.
2017-05-08 11:27:48 -0700        :SUCCESS: Patchmgr plugin complete: Prereq check passed for the bug 24625612
2017-05-08 11:27:48 -0700        :SUCCESS: DONE: Execute plugin check for Patch Check Prereq.

Restart ILOM on all cell nodes (optional)
dcli -l root -g <cell_group> "ipmitool bmc reset cold"
Note: Change the <cell_group> to the cell server you are patching as below
cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7
Example
dcli -l root -g cell_node_1 "ipmitool bmc reset cold"


Check repair times for all mounted disk groups in the Oracle ASM instance and adjust if needed  Note: Set disk_repair_time to 8.5 hours.                                                     


sqlplus / as sysasm
select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';
NAME         VALUE
------------ ----------
DATA_DG   3.6h
DBFS_DG      3.6h
RECO_DG   3.6h
If the repair time is not 8.5 hours then note the value and the diskgroup names. Replace <diskgroup_name> in the following statement to adjust. alter diskgroup '<diskgroup_name>' set attribute 'disk_repair_time'='8.5h';
Repeat the above statement for each diskgroup as below
alter diskgroup DATA_DG set attribute 'disk_repair_time'='8.5h';
alter diskgroup DBFS_DG set attribute 'disk_repair_time'='8.5h';
alter diskgroup RECO_DG set attribute 'disk_repair_time'='8.5h';


Also increase the rebalance power to 5

alter diskgroup RECO_DG rebalance power 5;
alter diskgroup DATA_DG rebalance power 5;
alter diskgroup DBFS_DG rebalance power 5;

Check uptime and reboot if needed
                                                         
cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
dcli -l root -g cell_group "uptime"

[root@abcxyzadm01 ~]# dcli -l root -g cell_group "uptime"
xyzceladm01: 12:51:44 up 222 days, 17:17,  0 users,  load average: 1.45, 1.33, 1.35
xyzceladm02: 12:51:44 up 222 days, 17:17,  0 users,  load average: 1.64, 1.21, 1.25
xyzceladm03: 12:51:44 up 222 days, 17:18,  0 users,  load average: 1.34, 1.45, 1.43
xyzceladm04: 12:51:44 up 222 days, 17:17,  0 users,  load average: 0.98, 1.25, 1.35
xyzceladm05: 12:51:44 up 222 days, 17:17,  0 users,  load average: 1.10, 1.33, 1.42
xyzceladm06: 12:51:44 up 222 days, 17:17,  0 users,  load average: 0.88, 1.21, 1.34
xyzceladm07: 12:51:44 up 222 days, 17:17,  0 users,  load average: 0.99, 1.21, 1.30
[root@abcxyzadm01 ~]#
Note: If cells up more than 7 days then reboot each cell in rolling fashion using the note below.
Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)
Note: In our case we need to reboot the cell server
cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER


Step 4: Cleanup space from any previous runs

The -reset_force command is only done the first time the cells are patched to this release.
It is not necessary to use the command for subsequent cell patching, even after rolling back the patch.
We need to use 'cleanup' option not 'reset_force' option
[root@abcxyzadm05 ~]#cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER

 ./patchmgr -cells <cell_group> -reset_force

Note : Always use the -cleanup option before retrying a failed or halted run of the patchmgr utility.
[root@abcxyzadm05 ~]#cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
  ./patchmgr -cells <cell_group> -cleanup
Note: Please use the <cell_gorup> as per respective cell you are patching
Note: Change the <cell_group> to the cell server you are patching as below
cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7
Example : ./patchmgr -cells cell_node_1 -cleanup

Step 5: Run prerequisites check
=======================
 #cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
 ./patchmgr -cells <cell_group> -patch_check_prereq -rolling

Note: Change the <cell_group> to the cell server you are patching as below

cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7
Example : ./patchmgr -cells cell_node_1 -patch_check_prereq -rolling

Step 6: Patch the cell nodes

 #cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
dcli -l root -g <cell_group> imageinfo

 nohup ./patchmgr -cells <cell_group> -patch -rolling &

Step 7:To Check Progress of Image Upgrade

Monitor the patch progress
Monitor the ILOM console for each cell being patched. You may want to download the ilom-login.sh script from note 1616791.1 for assisting in logging into the iloms.
 cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER
 tail -f nohup.out

Step 8: Post Patch Space Cleanup

Cleanup space
./patchmgr -cells <cell_group> -cleanup

Step 9:
Post Image Upgrade Validations
==============================
Post Checks
 dcli -l root -g <cell_group> imageinfo -version
 dcli -l root -g <cell_group> imageinfo -status
 dcli -l root -g <cell_group> "uname -r"
 dcli -l root -g <cell_group> cellcli -e list cell
 dcli -l root -g <cell_group> /opt/oracle.cellos/CheckHWnFWProfile
Also in post check - Verify grid disk status:
(a) Verify all grid disks have been successfully put online using the following command:
dcli -l root -g cell_group cellcli -e  list griddisk attributes name, asmmodestatus
[root@abcxyzadm01 patch_PATCHNUMBER]# dcli -l root -g <cell_group> cellcli -e  list griddisk attributes name, asmmodestatus
xyzcel07: DATA_DG_CD_00_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_01_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_02_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_03_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_04_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_05_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_06_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_07_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_08_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_09_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_10_xyzcel07    ONLINE
xyzcel07: DATA_DG_CD_11_xyzcel07    ONLINE
xyzcel07: DBFS_DG_CD_02_xyzcel07       ONLINE
xyzcel07: DBFS_DG_CD_03_xyzcel07       ONLINE
xyzcel07: DBFS_DG_CD_04_xyzcel07       ONLINE
xyzcel07: DBFS_DG_CD_05_xyzcel07       ONLINE
xyzcel07: DBFS_DG_CD_06_xyzcel07       ONLINE
xyzcel07: DBFS_DG_CD_07_xyzcel07       ONLINE
xyzcel07: DBFS_DG_CD_08_xyzcel07       ONLINE
xyzcel07: DBFS_DG_CD_09_xyzcel07       ONLINE
xyzcel07: DBFS_DG_CD_10_xyzcel07       ONLINE
xyzcel07: DBFS_DG_CD_11_xyzcel07       ONLINE
xyzcel07: RECO_DG_CD_00_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_01_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_02_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_03_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_04_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_05_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_06_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_07_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_08_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_09_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_10_xyzcel07    ONLINE
xyzcel07: RECO_DG_CD_11_xyzcel07    ONLINE
[root@abcxyzadm01 patch_PATCHNUMBER]#


(b) Wait until asmmodestatus is ONLINE for all grid disks. Each disk will go to a 'SYNCING' state first then 'ONLINE'. The following is an example of the output:
DATA_CD_00_xyzcel01 ONLINE
DATA_CD_01_xyzcel01 SYNCING
DATA_CD_02_xyzcel01 OFFLINE
DATA_CD_03_xyzcel01 OFFLINE
DATA_CD_04_xyzcel01 OFFLINE
DATA_CD_05_xyzcel01 OFFLINE
DATA_CD_06_xyzcel01 OFFLINE
DATA_CD_07_xyzcel01 OFFLINE
DATA_CD_08_xyzcel01 OFFLINE
DATA_CD_09_xyzcel01 OFFLINE
DATA_CD_10_xyzcel01 OFFLINE
DATA_CD_11_xyzcel01 OFFLINE
(c) Oracle ASM synchronization is only complete when all grid disks show asmmodestatus=ONLINE.
( Please note:  this operation uses Fast Mirror Resync operation - which does not trigger an ASM rebalance. The Resync operation restores only the extents that would have been written while the disk was offline.)
Note: In above command change the <cell_group> to the cell server you are patching as below
cell_node_1
cell_node_2
cell_node_3
cell_node_4
cell_node_5
cell_node_6
cell_node_7

Step 10 : Post Execution After Validations of Image Upgrade


 1. Change disk_repair_time back to original value
select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';


sqlplus / as sysasm
select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a where dg.group_number=a.group_number and a.name='disk_repair_time';
NAME                           VALUE
------------------------------ ------------
DATA_DG                          8.5h
DBFS_DG                               8.5h
RECO_DG                         8.5h
Reset the repair time to the original value if it was changed at the start of patching. Replace <diskgroup_name> in the following statement to adjust. alter diskgroup '<diskgroup_name>' set attribute 'disk_repair_time'='<original value>';
alter diskgroup '<diskgroup_name>' set attribute 'disk_repair_time'='3.6h';
Repeat the above statement for each diskgroup as below
alter diskgroup DATA_DG set attribute 'disk_repair_time'='3.6h';
alter diskgroup DBFS_DG set attribute 'disk_repair_time'='3.6h';
alter diskgroup RECO_DG set attribute 'disk_repair_time'=3.6h';


Also put back the rebalance power to 2

alter diskgroup RECO_DG rebalance power 2;
alter diskgroup DATA_DG rebalance power 2;
alter diskgroup DBFS_DG rebalance power 2;

Step 11: Known Issues(In case Image Upgrade Fails)

 Additional checks (if there were problems)

 cd /u01/exa_img_upg/CELL/patch_PATCHNUMBER

 cat  patchmgr.stdout

 cat  _wip_stdout file

 ssh <cell-node>

 cd /var/log/cellos

 grep -i 'fail' validations.log

 grep -i 'fail' vldrun*.log

 cat validations.log

 cat vldrun.upgrade_reimage_boot.log

 cat vldrun.first_upgrade_boot.log

 cat CheckHWnFWProfile.log

 cat cell.bin.install.log

 cat cellFirstboot.log

 cat exachkcfg.log

 cat patch.out.place.sh.log

 cat install.sh.log

---------------------------------------------------------------------------------------------------
 Rolling Back Successfully Patched Exadata Cells
 (This section describes how to roll back successfully patched Exadata Cells. Cells with incomplete or failed patching cannot be rolled back.)
 
 Do not run more than one instance of the patchmgr utility at a time in the deployment.
 
 Check the prerequisites using the following command:
 
 ./patchmgr -cells cell_group -rollback_check_prereq [-rolling]
 
 Perform the rollback using the following command:
 
 ./patchmgr -cells cell_group –rollback [-rolling]

Click on to for Switch Firmware upgrade.

 You can learn in detail on Exadata from book Expert Oracle Exadata 

 ==========================================================

Please check our other blogs for Exadata

Comments

Popular posts from this blog

Restore MySQL Database from mysqlbackup

Oracle Database 19C Installation on Windows Server 2016

MySQL InnoDB Cluster Restore/Create Issue : - Dba.createCluster: Group Replication failed to start: MySQL Error 3094 (HY000)