Wednesday, July 29, 2020

Tip: No route to host issues in Kubernetes Pods

Symptom:

    We see intermittent the network issues in OKE (Oracle Kubernetes Engine). ingress controller pods have difficult to access other services.  We use curl to test the network port, we get an error like below:
 
$ curl -v telnet://10.244.97.24:9090
* Expire in 0 ms for 6 (transfer 0x560b9cdd7dd0)
*   Trying 10.244.97.24...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x560b9cdd7dd0)
* connect to 10.244.97.24 port 9090 failed: No route to host

Solution:

   There are quite a few reasons for that. Check my another blog 
   In this case, it is related to firewall ports open. 
  By default, the network team open all ingress and egress ports for the same worker nodes Subnet which means no firewall among all worker nodes.  However, it was set stateful.  As Kubernetes overly network heavily depends on UDP which is stateless, so we need to open ports as stateless

Thursday, July 02, 2020

How To RMAN Backup Oracle Database 19c running in Kubernetes

Requirement:

   We have an Oracle Database 19c running in OKE( Oracle Kubernetes Engine). We would like to use rman to backup DB to Object storage of  Cloud. We use Oracle Cloud Infrasture (OCI) as an example. The same concept applied to other Clouds.

Steps:

  • Create a docker image with python 3 and Oracle OCI CLI installed. Please refer official doc how to install Oracle OCI CLI. Also, Dockerfile can be found via  GitHub repo 
  • Create a statefulset using the docker image. Yaml files can be found via GitHub Repo
  • Download the rman backup module of OCI. link
  • Follow the instructions of installation. link
    • Attention: when we set up oci cli, the config file should not be in the docker image, but to the persistent block storage volume. ie /opt/oracle/diag/.oci/config and export OCI_CLI_CONFIG_FILE=/opt/oracle/diag/.oci/config
    •  Attention: when we set up rman backup module and create wallet files,  all config files should not be put in the docker image, but to the persistent block storage volume. ie /opt/oracle/diag/
      • java -jar oci_install.jar \
      • -host https://objectstorage.us-phoenix-1.oraclecloud.com \
      • -pvtKeyFile /opt/oracle/diag/.oci/testuser_ww-oci_api_key.pem \
      • -pubFingerPrint 52:b6:0e:2e:***:a1 \
      • -uOCID "ocid1.user.oc1..aaaaahjia***adfe" \
      • -tOCID "ocid1.tenancy.oc1..aanh7gl5**dfe" \
      • -walletDir /opt/oracle/diag/.oci/opc_wallet \
      • -configFile /opt/oracle/diag/.oci/opc_wallet/opcAUTOCDB.ora \
      • -libDir $ORACLE_HOME/lib \
      • -bucket BUK-OBJECT-STORAGE-BAK-TEMP \
      • -proxyHost yourproxy.com \
      • -proxyPort 80
    • Use java- jar oci_installer.jar -h for more details
    • Tip:If you have libopc.so in place in $ORACLE_HOME/lib which is in docker image, we can ignore the warning of  downloading part of the process
    • Tip: You can copy opc_wallet to other servers or OKE clusters without doing oci cli and java -jar oic_installer.jar steps .
    • Tip: If you see error " KBHS-00713: HTTP client error '', check http_proxy and https_proxy settings. Rman backup to object storage module uses  HTTP HTTPS protocols. 
    • Tip: If you see error " KBHS-01012: ORA-28759 occurred during wallet operation; WRL file:/home/oracle/opc_wallet ",  it maybe due to there are some old opc<sid>.ora config files in $ORACLE_HOME/dbs. DB always try to read the config file in ./dbs instead of using parameters. Remove the files should clear it
    • To avoid error "KBHS-01006: Parameter OPC_HOST was not specified", we need to put all parameters in opcAUTOCDB.ora in the rman script.
  • Test RMAN backup inside your statefulset DB pod
    • rman target /
    • SET ENCRYPTION ON IDENTIFIED BY 'testtest' ONLY;
    • run {
    • SET ENCRYPTION ON IDENTIFIED BY 'changeme' ONLY;
    • ALLOCATE CHANNEL t1 DEVICE TYPE sbt PARMS "SBT_LIBRARY=/opt/oracle/product/19c/dbhome_1/lib/libopc.so ENV=(OPC_HOST=https://objectstorage.us-phoenix-1.oraclecloud.com/n/testnamespace, OPC_WALLET='LOCATION=file:/opt/oracle/diag/.oci/opc_wallet CREDENTIAL_ALIAS=alias_oci', OPC_CONTAINER=TEST-OBJECT-STORAGE-RMAN, OPC_COMPARTMENT_ID=ocid1.compartment.oc1..aa****sddfeq, OPC_AUTH_SCHEME=BMC)";
    • backup current controlfile;
    • }

Monday, June 15, 2020

Dockerfile for Oracle Database 19.5 image with patches applied

Summary:

Here is the github link for Dockerfile of Oracle Database 19.5 image with patches applied

https://github.com/HenryXie1/Dockerfile/tree/master/OracleDatabase

The docker image has 19.3 installed and apply below patches to 19.5
OCT_RU_DB_p30125133_190000_Linux-x86-64.zip  OCT_RU_OJVM_p30128191_190000_Linux-x86-64.zip  
p30083488_195000DBRU_Linux-x86-64.zip

The docker image has updates to facilitate automated block storage provision in  OKE (Oracle Kubernetes Engine)

The docker image creates three different volumes for  Oradata,  Fast Recovery Area (FRA)  and Diagnose area (diag). The three would help to keep datafiles safe, dedicated space for recovery and separated place for diagnosing avoid filling up Data and FRA places.

The testdb yaml files utilize oci-bv (Container Storage Interface -- CSI based)  of OKE

Sunday, June 14, 2020

Tip: Sending build context to Docker daemon when Docker build

Symptom:

  When we run docker build
Sending build context to Docker daemon...
   After a while, we hit out of space issue.

Solution:

When docker build large image like oracle database, we better only keep only 1 version DB downloaded binary file in the docker build directory. 
By default docker daemon sending build context will include all the zip files in it (include unused version zip files), it may cause unnecessary space pressure.

Tip: Error: OCI runtime create failed: container_linux.go:349 starting container process caused "exec: \"/bin/sh\": stat /bin/sh: permission denied"


Symptom:

When we run docker build for an image, we got below error:
OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"/bin/sh\": stat /bin/sh: permission denied": unknown
The error happens on the line " From base"

Solution:

The reason is due to the base image somehow is not available or ruined. We need to refresh the base image, then the issue is gone