Thursday, November 29, 2018

How To Setup Sending Monitoring Emails via OCI Email Delivery Service

Requirement:

   We often use scripts or program to send monitoring emails from linux to engineers. We plan to use mailx to send emails via smtp services provided by OCI Email Delivery Service

Solution:

   We followed the instructions of  the official doc and setup smtp credential and smtp connections


  • Generate SMTP credentials for a user.
  • Create an approved sender.
  • Configure SPF.
  • Configure the SMTP connection.


  • Once we  have smtp-auth-user and password
    We need to get SSL/TLS CA details from OCI email smtp hosts as we must secure the email connections
    • mkdir /etc/certs
    • # certutil -N -d /etc/certs
    • To get smtp domain CA details ,run this 
    • if it is on ashburon:  openssl s_client -showcerts -connect smtp.us-ashburn-1.oraclecloud.com:587 -starttls smtp  > /etc/certs/mycerts-ashburn
    • if it is on phoenix :  openssl s_client -showcerts -connect smtp.us-phoenix-1.oraclecloud.com:587 -starttls smtp  > /etc/certs/mycerts -phoenix
    • Vi mycerts-ashburn or phoenix and copy each certificate including the --BEGIN CERTIFICATE-- and --END CERTIFICATE-- and paste them into their respective files. ie:  ocismtp-ashburn1.pem ocismtp-ashburn2.pem
    • Import them into the nss-config-dr  /etc/certs  via below commands
    • certutil -A -n "DigiCert SHA2 Secure Server CA" -t "TC,," -d /etc/certs -i /etc/certs/ocismtp-ashburn1.pem
    • certutil -A -n "DigiCert SHA2 Secure Server CA smtp" -t "TC,," -d /etc/certs -i /etc/certs/ocismtp-ashburn2.pem
    • use certutil -L -d  /etc/certs   to verify they are imported well. output would like
    #  certutil -L -d  /etc/certs
    Certificate Nickname                                         Trust Attributes
                                                                 SSL,S/MIME,JAR/XPI

    DigiCert SHA2 Secure Server CA                               CT,,
    DigiCert SHA2 Secure Server CA smtp                          CT,,


    • Add below config at the bottom of /etc/mail.rc

    set nss-config-dir=/etc/certs
    set smtp-use-starttls
    set smtp-auth=plain
    set smtp=smtp.us-ashburn-1.oraclecloud.com
    set from="no-reply@test.com(henryxie)"
    set smtp-auth-user="<ocid from smtp credentials doc >"
    set smtp-auth-password="<password from smtp credentials doc >" 

    •  run test command:
    echo "test test from henry" | mailx  -v -s "test test test"    test@test.com

    Wednesday, November 28, 2018

    OCI Email Delivery smtp-server: 504 The requested authentication mechanism is not supported

    Symptom:

      We plan to use mailx in  Oracle Linux 7.6 VM to send emails via smtp services provided by OCI Email Delivery Service
       We followed the instructions of  the official doc and get smtp credential and smtp connections setup
    When we run this command:

    echo "test test from henry" | mailx  -v -s "test test test"  \
    -S nss-config-dir=/etc/certs  \
    -S smtp-use-starttls \
    -S smtp-auth=login \
    -S smtp=smtp.us-ashburn-1.oraclecloud.com \
    -S from="no-reply@test.com(henryxie)" \
    -S smtp-auth-user="<ocid from smtp credentials doc >" \
    -S smtp-auth-password="<password from smtp credentials doc >"  henry.xie@oracle.com

    We get error
    smtp-server: 504 The requested authentication mechanism is not supported

    Solution:

        Change smtp-auth=login -->  smtp-auth=plain
        Later OCI email delivery will support smtp-auth=login

    Tuesday, November 27, 2018

    OCI Email Delivery gives “Error in certificate: Peer's certificate issuer is not recognized.”

    Symptom:

      We plan to use mailx in  Oracle Linux 7.6 VM to send emails via smtp services provided by OCI Email Delivery Service
       We followed the instructions of  the official doc and get smtp credential and smtp connections setup
    When we run this command:

    echo "test test from henry" | mailx  -v -s "test test test"  \
    -S nss-config-dir=/etc/certs  \
    -S smtp-use-starttls \
    -S smtp-auth=plain \
    -S smtp=smtp.us-ashburn-1.oraclecloud.com \
    -S from="no-reply@test.com(henryxie)" \
    -S smtp-auth-user="<ocid from smtp credentials doc >" \
    -S smtp-auth-password="<password from smtp credentials doc>"  henry.xie@oracle.com

    We get error
    “Error in certificate: Peer's certificate issuer is not recognized.”

    Solution:

    The reason is due to  nss-config-dir  has not included the CA publisher of the smtp.us-ashburn-1.oraclecloud.com . We need to add them into the nss-config-dir

    • To get details of CA details ,run this 
    •  openssl s_client -showcerts -connect smtp.us-ashburn-1.oraclecloud.com:587 -starttls smtp  > /etc/certs/mycerts
    • Vi mycerts and copy each certificate including the --BEGIN CERTIFICATE-- and --END CERTIFICATE-- and paste them into their respective files. ie:  ocismtp-ashburn1.pem ocismtp-ashburn2.pem
    • Import them into the nss-config-dr  /etc/certs  via below commands
    • certutil -A -n "DigiCert SHA2 Secure Server CA" -t "TC,," -d /etc/certs -i /etc/certs/ocismtp-ashburn1.pem
    • certutil -A -n "DigiCert SHA2 Secure Server CA smtp" -t "TC,," -d /etc/certs -i /etc/certs/ocismtp-ashburn2.pem
    • use certutil -L -d  /etc/certs   to verify they are imported well

    The error should be gone

    Monday, November 26, 2018

    Kubectl: Unable to connect to the server: x509: certificate is valid for ..... , not ..... in K8S

    Symptom:

         When we setup kubectl on local workstation to access remote Kubernete Cluster.  The remote public IP of K8S API server access point is 52.64.132.188.  Port 6443 is open. We obtain the ca.pem file locally and run below to generete kubeconfig file locally.
    kubectl config set-cluster kubernetes-the-hard-way \
      --certificate-authority=ca.pem \
      --embed-certs=true \
      --server=https://52.64.132.188:6443
    After that, we try to run kubectl get node but get Unable to connect to the server: x509: certificate error. Details like
    $ kubectl get node
    Unable to connect to the server: x509: certificate is valid for 10.32.0.1, 172.31.44.176, 172.31.2.170, 172.31.3.17, 127.0.0.1, not 52.64.132.188

    Diagnosis:

    The reason of  "Unable to connect to the server: x509: certificate is valid for ..... , not ....." is quite likely the  K8S API server does not have "52.64.132.188" in its CA authority host list. We need to go back and check what cert hosts were added into the kubernetes.pem when K8S cluster was initiated.
    In my case, I ran 

    cfssl gencert \
      -ca=ca.pem \
      -ca-key=ca-key.pem \
      -config=ca-config.json \
      -hostname=10.32.0.1, 172.31.44.176, 172.31.2.170, 172.31.3.17, 127.0.0.1,test.testdomain.com \
      -profile=kubernetes \
      kubernetes-csr.json | cfssljson -bare kubernetes
     
    I used test.testdomain.com  not ip address "52.64.132.188" because public ip can be changed later.K8S CA has "test.testdomain.com" in the CA list, not ip address.  That is the reason why K8S API server does not think "52.64.132.188" is a valid client to access the API.

    Solution:

        To solve it, we need to update our local kubeconfig file to use test.testdomain.com not IP address.
        kubectl config set-cluster kubernetes-the-hard-way \
      --certificate-authority=ca.pem \
      --embed-certs=true \
      --server=https://test.testdomain.com:6443

    Wednesday, November 21, 2018

    How To Create Mysql Cluster in K8S via Oracle Mysql-Operator

    See  GitHub Doc

    The Easy Way To Let Kubernetes Master Node To Run Pods

    Symptom:

       By default, as Kubernetes master node has quite heavy admin load, so it normally does not  run other work pods.  However when we don't have many nodes, so we would like to let master node to run some workload too ,specially in Dev and Stage enviroments.

    Solution:

        There are a few ways to do that. The easy way is to remove the taint of the master node.
    Default, master has taint like this:
    kubectl describe node <master node>  |grep -i taint
    Taints:             node-role.kubernetes.io/master:NoSchedule

    We remove it via kubectl
    kubectl taint nodes <master node>  node-role.kubernetes.io/master-
    node "<master node>" untainted
    or
    kubectl taint nodes <master node> node-role.kubernetes.io:NoSchedule-
    node "<master node>" untainted
    When we scale up pods, some of them will run on master node

    We can add it back via kubectl
     kubectl taint nodes<master node>  node-role.kubernetes.io=master:NoSchedule
    node "<master node>" tainted

    We can  use taint to prevent pod schedules on normal worker node as well.
    kubectl taint nodes <node> key=value:NoSchedule

    Saturday, November 17, 2018

    Proxy Examples For Ssh,Ssh Tunnel, Sftp Kubectl To Access Internet via Git Bash

    Requirement:

       In company intranet, workstations are behind firewall. We use Git bash, we need to use ssh, sftp ,kubectl access internet via proxy server in Git bash

    Solution:

    Set env variables for your local proxy servers for kubectl

    $export http_proxy=http://www-proxy.us.test.com:80/
    $export https_proxy=http://www-proxy.us.test.com:80/
    $kubectl config set-cluster kubernetes-the-hard-way \
      --certificate-authority=ca.pem \
      --embed-certs=true \
      --server=https://test.testdomain.com:6443
    $kubectl get node

    ssh with proxy and keeplive

    $ ssh -o ServerAliveInterval=5 -o ProxyCommand="connect  -H www-proxy.us.test.com:80 %h %p" user@<public ip address or domain name>

    ssh tunnel with private key

    $ ssh -oIdentityFile=/d/OCI-VM-PrivateKey.txt -L 8001:127.0.0.1:8001 opc@<ip address>

    ssh tunnel with proxy keeplive parameter

    $ ssh -L 6443:localhost:6443 -o ServerAliveInterval=5 -o ProxyCommand="connect  -H www-proxy.us.test.com:80 %h %p" user@<public ip address or domain name>

    sftp with proxy

    $ sftp -o ProxyCommand="connect  -H www-proxy.us.test.com:80 %h %p"  user@<public ip address or domain name>


    Wednesday, November 14, 2018

    How To Add Worker Node Across Region in same K8S in OCI

    Requirement:

       We would like to spread our kubernetes workload across region. So we can have safer DR solution for our services. ie we have  worker nodes in phoenix of OCI, we would like to add new worker nodes in ashburn of OCI within the same tenancy  and the same kubernetes cluster. This wiki is based on oracle provided kubernete and container service  see official doc .

    Solution:

       The main part is on firewall side between the 2 regions. As long as the ports are open  among nodes for kubernetes own communication and services of pods. It would be fine. The network we use flannel which is on VXLAN.
       Once firewall ports are open, refer this blog to add a new worker node

    Firewall Part :

    Kubernetes own communctions between the 2 regions

    All the worker nodes in the clusters should open "ports:  10250  8472"  to be able to receive connections
    Source: All the nodes
    Destination : worker nodes
    port: TCP: 10250  UDP:8472


    All  Master nodes should open "port : 6443" (API server)  to be able to receive  connections
    Source: All worker nodes  and  End users Program to access API server
    Destation : Master nodes
    port: 6443 


    All  Etcd nodes should open "port :  2379 " (etcd service) to be able to receive  connections
    Source: All the nodes ,
    Destation : Etcd nodes
    port: 2379

    All  services ports need to be exposed to outside kubernetes
    Source: 0.0.0.0 or  restricted users depends on what services
    Destation : All the worker nodes
    port: the ports to be exposed

    Access K8S Pod Service Port via Kubectl Port-forward on Remote Workstation

    Requirement:

         We would like to access the service of a Pod from a remote desktop. ie  there is pod running nginx with port 80 in Oracle OCI K8S. We would like to access it on local windows 10 desktop.
    We can use kubectl port forward  via internet. The workstation can be in company intranet behind firewall. As long as "kubectl get nodes" (kubectl can access API server) works via proxy or ssh tunnel,  we can use our local workstation to access the remote pod. This command can be very useful in troubleshooting scenarios

    Solution:

    $ kubectl port-forward <POD_NAME >  8081:80
    Forwarding from 127.0.0.1:8081 -> 80
    Forwarding from [::1]:8081 -> 80

    Open a new window
    curl --head http://127.0.0.1:8081

    We should get page from the POD

    DispatcherNotFoundException 404 Error on ORDS Standalone and APEX

    Symptom:

          After we install APEX 18.1 and ORDS 18.2 , we get 404 error on  Browser , Error stack is like:

    DispatcherNotFoundException [statusCode=404, reasons=[]]    at oracle.dbtools.http.entrypoint.Dispatcher.choose(Dispatcher.java:87) 

    Diagnosis:

         There are many reasons for that. One of reasons we hit is that the APEX_LISTENER  APEX_PUBLIC_USER  APEX_REST_PUBLIC_USER have not been setup correctly  when we install ORDS.war .
           The java -jar $ORDS_HOME/ords.war install advanced
           The process will read  ORDShome/ords/conf/*.xml , try to figure out the existing settings for the any new installation. It will skip apex listener setup if there are old settings there. Thus skip generating the xml files for each connections to DB.

          So in ords/conf/  , there should be 2 - 4 xml files .Each file define a connection pool to Database.  If you only see 1 xml, it means apex listeners settings are missing.

    Solution:

        Remove ords_params.properties and *.xml  in ords/conf  and remove  standalone.properties  in ords/standalone
        Rerun java -jar $ORDS_HOME/ords.war install advanced
        Or  java -jar $ORDS_HOME/ords.war install simple  --- with correct parameter file
     


    Thursday, November 08, 2018

    Useful Urls To Get Prometheus Settings

    To get targets status:

    http://<ip address>:<port>/targets
    ie: http://1.1.1.1:30304/targets

    To get prometheus startup parameters

    http://<ip address>:<port>/flags
    ie: http://1.1.1.1:30304/flags

    Wednesday, November 07, 2018

    Install Prometheus&Grafana and Store Data On NFS in Oracle OCI

    See details in Github Link

    How To Fix "server returned HTTP status 403 Forbidden" in Prometheus

    Requirement:

       We installed and started Prometheus, however we can't get node metrics.  via /targets  , we find the error " server returned HTTP status 403 Forbidden "

    Solution:

        The Prometheus /targets page will show the kubelet job with the error 403 Unauthorized, when token authentication is not enabled. Ensure, that the --authentication-token-webhook=true flag is enabled on all kubelet configurations.
        We need to enable --authentication-token-webhook=true in our kubelet conf
    In the Host OS:
    cd /etc/systemd/system/kubelet.service.d
    vi 10-kubeadm.conf
    Add  "--authentication-token-webhook=true"  into  "KUBELET_AUTHZ_ARGS"
    After that, it would be like
    Environment="KUBELET_AUTHZ_ARGS=--authorization-mode=Webhook --client-ca-file=/etc/kubernetes/pki/ca.crt --authentication-token-webhook=true"
    # systemctl daemon-reload
    # systemctl restart kubelet.service

    403 error should be gone. More details Refer github doc

    How To Fix "No Route to Host" in Prometheus node-exporter

    Requirement:

       We installed and started Prometheus, however we can't get node-exporter metrics.  via /targets  , we find the error " ... no route to host "

    Solution:

        The error means Prometheus can't reach http endpoint  http://<ip address>:9100/metrics
         First test localhost if it is working on the node
    Login Node:
    #wget -O- localhost:9100/metrics
    If you get output, it means endpoint is working fine. Otherwise check prometheus pod and logs

         Then test from the other node
    Login other Node:
    #wget -O- <ip address>:9100/metrics
    If you can't get output, means there are some network or firewall issues
    * check the your cloud provider and network security settings, make sure port 9100 is open
    * check Node linux firewall service settings. In EL7, default port 9100 is not open
    #firewall-cmd --add-port=9100/tcp --permanent
    # systemctl restart firewalld

    "no route to host"  error should be gone. More details Refer github doc

    Friday, November 02, 2018

    Issue with Makefile:6: *** missing separator.

    Symptom:

      When you  run make , you got error  below
    (oracle-svi5TViy)  $make
    Makefile:6: *** missing separator.  Stop.

    Makfile Details:
    .PHONY: default install test

    default: test

    install:
            pipenv install --dev --skip-lock

    test:
            PYTHONPATH=./src pytest

    Solution:

       The Makefile  format uses <tab>  not <space> to indent. As they are invisble, easy to overlook.
    To fix it, replace the <space> before pipenv and PYTHONPATH with <tab>