Sunday, April 12, 2020

Oracle Non-RAC DB StatefulSet HA 1 Command Failover Test in OKE

Requirement:

OKE has a very powerful Block Volume management built-in. It can find, detach, reattach block storage volumes among different worker nodes seamlessly. Here is what we are going to test.
We create an Oracle DB statefulset on OKE. Imagine we have hardware or OS issue on the worker node and test HA failover to another worker node with only 1 command (kubectl drain).
Below things happen automatically when draining the node
  • OKE will shutdown DB pod
  • OKE will detach PV on the worker node
  • OKE will find a new worker node in the same AD
  • OKE will attach PV in the new worker node
  • OKE will start DB pod in the new worker node
DB in statefulset is not RAC, but with the power of OKE, we can failover a DB to new VM in less than a few minutes

Solution:

  • Create service for DB statefulset
    $ cat testsvc.yaml 
    apiVersion: v1
    kind: Service
    metadata:
      labels:
         name: oradbauto-db-service
      name: oradbauto-db-svc
    spec:
      ports:
      - port: 1521
        protocol: TCP
        targetPort: 1521
      selector:
         name: oradbauto-db-service
  • Create a DB statefulset, wait about 15 min to let DB fully up
    $ cat testdb.yaml 
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: oradbauto
      labels:
        app: apexords-operator
        name: oradbauto
    spec:
      selector:
         matchLabels:
            name: oradbauto-db-service
      serviceName: oradbauto-db-svc
      replicas: 1
      template:
        metadata:
            labels:
               name: oradbauto-db-service
        spec:
          securityContext:
             runAsUser: 54321
             fsGroup: 54321
          containers:
            - image: iad.ocir.io/espsnonprodint/autostg/database:19.2
              name: oradbauto
              ports:
                - containerPort: 1521
                  name: oradbauto
              volumeMounts:
                - mountPath: /opt/oracle/oradata
                  name: oradbauto-db-pv-storage
              env:
                - name: ORACLE_SID
                  value: "autocdb"
                - name: ORACLE_PDB
                  value: "autopdb"
                - name:  ORACLE_PWD
                  value: "whateverpass"
      volumeClaimTemplates:
      - metadata:
          name: oradbauto-db-pv-storage
        spec:
          accessModes: [ "ReadWriteOnce" ]
          resources:
            requests:
              storage: 50Gi

  • Image we have hardware issues on this node, we need to failover to a new node 
    • Before Failover: Check the status of PV and Pod. and the pod is running on the node  1.1.1.1
    • Check any if other  pods running on the node will be affected
    • We have a node ready in the same AD as statefulset Pod
    • kubectl get pv,pvc
      kubectl get po -owide
      NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
      oradbauto-0 1/1 Running 0 20m 10.244.3.40 1.1.1.1 <none> <none>
    • 1 command to failover DB to new worker node
      • kubectl drain  <node name> --ignore-daemonsets --delete-local-data
      • kubectl drain  1.1.1.1    --ignore-daemonsets --delete-local-data
      • No need to update MT connection string as DB servicename is untouched and transparent to new DB pod
    • After failover: Check the status of PV and Pod. and the pod is running on the new node 
      • kubectl get pv,pvc
      • kubectl get pod -owide
  • The movement of PV,PVC  work on  volumeClaimTemplates as well as  the PV,PVC when we create them via yaml files with storage class "oci"

No comments: