Exadata half RACK Image Upgrade-Non-Rolling-Compute Node

 As DMA its regular activity to patch exadata machine. There are two ways of patching Exadata box Rolling and Non-Rolling. In this blog, we will start with part 3 of the Exadata (Half Rack) Image Upgrade (Non-Rolling).

Precheck: Exadata Image Upgrade

  1.  Oracle recommends to clear all the stateful alrts from all the cell nodes
    [root@abcxyzadm01 ~]# dcli -g  cell_group -l root "cellcli -e list alerthistory attributes name,beginTime,alertShortName,alertDescription,severity where alerttype=stateful and severity=critical"

  2. Based on Exachk report check if Oracle finds any hardware failure which much be fixed before you proceed for the patching.

Compute Node/ DB Note / YUM Patch Plan (Non-Rolling) 

  1. Check image version

    dcli -l root -g dbs_group imageinfo -versio
    dcli -l root -g dbs_group imageinfo -status
    dcli -l root -g dbs_group uname -r

  2. Verify dbnodeupdate script version 

    Download latest version of dbnodeupdate script from patch 21634633
    Download dbserver.patch.zip as p21634633_122110_Linux-x86-64.zip, which contains dbnodeupdate.zip and patchmgr for dbnodeupdate orchestration via patch 21634633

    cd /u01/exa_img_upg/YUM
    unzip -o p21634633_122110_Linux-x86-64.zi
    Should be at least version
    ./dbnodeupdate.sh -V
    ver=$(./dbnodeupdate.sh -V | awk '{print $3}'); if (( $(echo "$ver < 5.151022" | bc -l) )); then echo -e "\nFAIL: dbnodeupdate version too low. Update before proceeding.\n"; elif (( $(echo "$ver > 5.151022" | bc -l) )); then echo -e "\nPASS: dbnodeupdate version OK\n"; else echo -e "\nWARN: dbnodeupdate minimum version ($ver) detected. Check if there is a newer version before proceeding.\n"; fi

    dbnodeupdate script is updated frequently (sometimes daily). If not current then download updated version.

  3. Check databases running before stopping CRS 

    /u01/app/19c/grid/bin/crsctl status resource -t -w "TYPE = ora.database.type"
    ps -ef | grep pmon_ | grep -v grep

  4. Stop the CRS (Non-Rolling) 

    Execute on one node
    dcli -l root -g dbs_group /u01/app/12.1.0.2/grid/bin/crsctl disable crs
    /u01/app/19c/grid/bin/crsctl stop cluster -all
    dcli -l root -g dbs_group /u01/app/19c/grid/bin/crsctl stop crs
    dcli -l root -g dbs_group '/u01/app/19c/grid/bin/crsctl check crs | grep online | wc -l | while read retval; do if [[ $retval -eq 0 ]]; then echo CRS Stopped; elif [[ $retval -eq 4 ]]; then echo CRS Running; else echo CRS Not Ready; fi; done;'

  5. Reboot servers and reset ILOM 

    dcli -l root -g dbs_group uptime
    If uptime more than 7 days then reboot servers
    dcli -l root -g dbs_group reboot
    Reset the iloms
    dcli -l root -g dbs_group 'ipmitool bmc reset cold'

  6. Unmount NFS partitions 

    dcli -l root -g dbs_group 'umount -a -t nfs -f -l'

  7. Run precheck

    cd /u01/exa_img_upg/YUM
    ./dbnodeupdate.sh -u -l /u01/exa_img_upg/YUM/pXXXXXXXX_Linux-x86-64.zip -t XXXXX -g -v

  8. Perform backup and upgrade 

    Make sure to check known issues section above prior to executing dbnodeupdate.sh
    ./dbnodeupdate.sh -u -l /u01/exa_img_upg/YUM/pXXXXX_Linux-x86-64.zip -t XXXX -q

  9. Monitor the reboot
    Monitor the reboot of each node by logging into the ilom console.

  10. After reboot completes 

    Before running the completion step, run the CheckHWnFWProfile script to make sure it passes. If not, shut the system down and power cycle it from the ilom ( stop /SYS, wait 5 minutes, start /SYS) 
    /opt/oracle.cellos/CheckHWnFWProfile
    cd /u01/exa_img_upg/YUM
    umount -a -t nfs -f -l
    ./dbnodeupdate.sh -t XXXXXX -c -g
    mount -a

  11. Verify fuse RPMs are Installed 
    yum list installed | grep fuse
    There should be 3 fuse rpm's. If not check note "Fuse packages removed as part of dbnodeupdate prereq check (Doc ID 2066488.1)"

  12. Check version and status 

    dcli -l root -g dbs_group imageinfo -version
    dcli -l root -g dbs_group imageinfo -status
    dcli -l root -g dbs_group uname -r

  13. Enable CRS 

    /u01/app/19c/grid/bin/crsctl enable crs
    /u01/app/19c/grid/bin/crsctl check crs
    dcli -l root -g dbs_group '/u01/app/19c/grid/bin/crsctl check crs | grep online | wc -l | while read retval; do if [[ $retval -eq 0 ]]; then echo CRS Stopped; elif [[ $retval -eq 4 ]]; then echo CRS Running; else echo CRS Not Ready; fi; done;'

  14. Post checks

    /u01/app/19c/grid/bin/crsctl status resource -t -w "TYPE = ora.database.type"
    The following checks if APM is disabled across all nodes
    dcli -l root -g dbs_group 'cat /sys/module/ib_sdp/parameters/sdp_apm_enable'

  15. Additional checks (if there were problems)

    ssh <database-node>
    cd /var/log/cellos/
    cat dbnodeupdate.log
    cat dbserver_backup.sh.log
    cat CheckHWnFWProfile.log
    cat exadata.computenode.post.log
    cat cellFirstboot.log
    cat exachkcfg.log
    cat vldrun.each_boot.log
    cat validations.log
Rollback Steps :

  1. Rolling back the update with the dbnodeupdate.sh utility:
    ./dbnodeupdate.sh -r

  2. Reboot the server using the reboot command.
    # reboot

  3. Run the dbnodeupdate.sh utility in 'completion mode' to finish post patching steps
    Similar like with regular updates or One-Time updates, when switching OS binaries with the same Oracle Home, the database kernel should be relinked, so the 'post completion' step needs to be performed.

    ./dbnodeupdate.sh -c

Click on to for Switch Firmware upgrade.

Click on to for Cell node image upgrade.

 You can learn in detail on Exadata from book Expert Oracle Exadata 

 ==========================================================

Please check our other blogs for Exadata

Comments

Popular posts from this blog

Restart Innodb MySQL Cluster after Complete outage(All node Down)

Oracle Block Corruption - Detection and Resolution

Add or Remove a node in MySQL Innodb Cluster