Showing posts with label k8s. Show all posts
Showing posts with label k8s. Show all posts

Thursday, November 10, 2022

Apex Ords Operator for Kubernetes

Requirement:

We often need to provision Apex and Ords for Dev, Stage, Prod. 
This is the operator to automate Apex Oracle Application Express 19.1 and Ords oracle rest data service via Kubernetes CRD, it creates a brand new Oracle 19c database statefulset, apex, ords deployment plus load balancer in the Kubernetes cluster

Solution:

Full details and source codes are on GitHub repository

Demo:



Tuesday, November 08, 2022

OKE Admission Control Webhook Sample

Requirement:

We need to implement a policy requested by the security team that Kubernetes service should have an annotation : service.beta.kubernetes.io/oci-load-balancer-security-list-management-mode: None Thus no security list will be updated by Kubernetes. This is an example that how we build our own admission controller which implements various policies from security or other teams. ie we can add only internal load balancer is allowed for internal service.....etc

Solution:

  • Please refer github repo
  • git clone https://github.com/HenryXie1/oke-admission-webhook
  • go build -o oke-admission-webhook
  • docker build --no-cache -t repo-url/oke-admission-webhook:v1 .
  • rm -rf oke-admission-webhook
  • docker push repo-url/oke-admission-webhook:v1
  • ./deployment/webhook-create-signed-cert.sh --service oke-admission-webhook-svc --namespace kube-system --secret oke-admission-webhook-secret
  • kubectl replace --force -f deployment/validatingwebhook.yaml
  • kubectl replace --force -f deployment/deployment.yaml
  • kubectl replace --force -f deployment/service.yaml

Demo:



Wednesday, February 09, 2022

Tip: kubectl apply --dry-run=client server RBAC role

 kubectl apply --dry-run is very useful to test manifests.

There are differences of RBAC requirement with kubectl apply --dry-run=client and server.

Both need a role in fetching CRD to go through the validation admission chain and the mutating admission chain.

We need READ ONLY  role for kubectl apply --dry-run=client and READ WRITE role for kubectl apply --dry-run=server

Wednesday, December 01, 2021

How to expose kube api server via nginx proxy

Requirement:

    Kubernetes API (Control Plane) are often sitting behind the firewall. To provide more security and load balancing, we need to set up an nginx proxy in front of them. There are 2 solutions.

Solution1: Use L4 TCP proxy pass of nginx

nginx stream core module provides L4 TCP UDP proxy pass functionalities. link
To proxy pass K8S API on port 6443 via nginx listening port 8888, we can implement the below code in nginx.conf:
stream {
    upstream api {
    server kubernetes.default.svc.cluster.local:6443;
    }
server {
    listen 8888;
    proxy_timeout 20s;
    proxy_pass API;
    }
}
kubeconfig has below elements:
  • the server is pointing nginx proxy ie https://myapi.myk8s.com:8888
  • certificate-authority is the CA of K8S API CA( not the CA of myapi.myk8s.com )
  • client-certificate: path/to/my/client/cert
  • client-key: path/to/my/client/key

Solution2: Use L7 Https proxy pass of nginx

nginx HTTP core module provides L7 SSL proxy pass functionalities. link
To proxy pass K8S API on https://myapi.myk8s.com/api/  via nginx listening 443 SSL, we can implement the below code in nginx.conf
http {
    upstream api {
        kubernetes.default.svc.cluster.local:6443;
    }
    server {
      listen              443 ssl;
      server_name         myapi.myk8s.com;
      ssl_certificate     /etc/nginx/ssl/tls.crt;
      ssl_certificate_key /etc/nginx/ssl/tls.key;
      location / {
        root /usr/local/nginx/html;
        index index.htm index.html;
      }
      location /api/ {
        rewrite ^/api(/.*)$ $1 break;
        proxy_pass https://api;
       
      }
    }
}
kubeconfig has below elements:
  • the server is pointing nginx proxy ie https://myapi.myk8s.com/api/
  • certificate-authority is the CA of myapi.myk8s.com (not K8S API CA)
  • can't use client-certificate and client-key like we do on L4 TCP proxy pass
  • Because TLS traffic to kube API server 6443 is regular anonymous TLS from nginx proxy, API server won't allow it. To solve it:
    • Option 1: use JWT token via OpenID connect
users:
- name: testuser
  user:
    auth-provider:
      config:
        idp-issuer-url: https://openid.myk8s.com/dex
        client-id: oidc-loginapp
        id-token: eyJhbGciOiJSUzI1NiIs....****
      name: oidc

    •  Option 2: Use mTLS and add client-certificate and client-key in the nginx proxy pass settings.

location /api/ {
        rewrite ^/api(/.*)$ $1 break;
        proxy_pass https://api;
        proxy_ssl_certificate         /etc/nginx/k8s-client-certificate.pem;
        proxy_ssl_certificate_key     /etc/nginx/k8s-client-key.key;
        proxy_ssl_session_reuse on;
      }



Wednesday, September 22, 2021

Error: invalid bearer token, oidc: verify token: oidc: expected audience

Symptom:

After we implemented dex + github via link. With example-app,we are able to get ID-token via http://127.0.0.1:5555/
With ID-token, we construct kubeconfig, but when we access k8s cluster we hit error:

error: You must be logged in to the server (Unauthorized)

In kube api server logs, we see error:

invalid bearer token, oidc: verify token: oidc: expected audience \"123456\" got [\"example-app\"]]"

 Triage:

Check payload and verify JWT ID-token on https://jwt.io

Check dex container logs 

Find similar issues in github link1 link2

Solution:

It turns out the client-id is not matched. 

The client-id set on K8S API server (--oidc-client-id) link   needs to match the client-id in example-app.

In above example, “123456” is the one I set on K8S API server, while client-id is “example-app”  in the example-app which caused the problem

Saturday, August 21, 2021

Kubectl Plugin for Oracle Database

 Requirement:

We often need to provision new oracle databases for developers
This is the kubectl plugin to automate the creation of oracle database statefulset in the Kubernetes cluster

Solution:

Full details and source codes are on the GitHub repository

Demo:



Tuesday, July 20, 2021

Tip: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox

Kubelet report such error when you deploy a pod: 

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox : "*****"  error getting ClusterInformation: connection is unauthorized: Unauthorized

It's due to CNI is not set up well or is not functioning on the node. 

Redeploy the CNI provider may help. i.e. Flannel or Calico

Friday, July 16, 2021

Tip: How to get Go Client in Kubernetes Operator Reconcile()

Requirement:

      When we build a K8S operator via kubebuilder, we often need to interact with Control Plane. By default, we use controller-runtime client 

 However, when we use this client to fulfil some functions of kubectl i.e. drain, we hit an error:

 cannot use r.Client (variable of type client.Client) as kubernetes.Interface value in struct literal: missing method AdmissionregistrationV1

Solution:

The error indicates the controller-runtime client does not implement method AdmissionregistrationV1, so we can't use it, instead, we init a new GO Client in the reconcile(). Sample code is like

"k8s.io/client-go/kubernetes"
ctrlconfig "sigs.k8s.io/controller-runtime/pkg/client/config"

cfg, err := ctrlconfig.GetConfig()
if err != nil {
log.Log.Error(err, "unable to get kubeconfig")
return err
}
kubeclientset, err := kubernetes.NewForConfig(cfg)
if err != nil {
return err
}


Thursday, July 01, 2021

Pass CKA Exam Tips

On Thursday, I passed the CKA exam with a 93 score mark and get the certificate. I share some tips on how I achieve that.

  • 17 questions in 2 hours. 
  • Don't worry about copy and paste. You can copy it by clicking it when your mouse hovers on the "important text".
  • Read each question carefully. Always understand questions before starting to do it. Check the weight of each question. The high weight means more mark points in the question. 
  • Skip the difficult questions and make sure you get easy marks.  Only a 66 score mark is needed to pass the exam. 
  • Practise and create examples for each test point in the CKA curriculum
  • Strongly recommend this udemy course.  The practice and mock exams are great to prepare for CKA exams.
  • Practise all commands in the kubectl cheatsheet

Sunday, May 23, 2021

Tip: Can't find docker networking namespace via ip netns list

Symptom:

    In ubuntu, we start a docker container, try to find docker networking namespace via "ip netns list". The output is empty.

Reason:

   The docker by default , it records netns on /var/run/docker/netns. While "ip netns list" is checking /var/run/netns

Workaround:  

 stop all containers , rm -rf /var/run/netns,  ln -s /var/run/docker/netns  /var/run/netns

Tip:

To find netns id of container use

docker ps ---> find container ID

docker inspect <contain ID> |grep netns

Wednesday, April 07, 2021

Tip: Pods keep crashloopbackoff

 Symptom:

 Pods always crashloopbackoff 

"kubectl describe pod..."  does not give meaningful info, as well as "kubectl get events"

Reason:

One of the likely reason is related to pod security policy. My situation is the existing pod security policy does not allow Nginx or Apache to run. It does not have

 allowedCapabilities:

  - NET_BIND_SERVICE

  # apache or nginx need escalation to root to function well

  allowPrivilegeEscalation: true


So the pods keep crashloopbackoff. To fix it is to add the above into the pod security policy.


Saturday, April 03, 2021

Tip: Istio TLS secrets, Gateway, VirtualService namespace scope

There is some confusion about where we should put istio objects. Is it in the istio-system or users namespace?

Here are some tips:

For TLS,mTLS CA, certs, key management in istio, the Kubernetes secrets should be created in the istio-system. Not in users' namespace

Gateway and VirtualService need to be created on the users' namespace 

Tuesday, March 09, 2021

How to find which type of VMs pods are running via promQL

Requirement:

     Users need to know which type of VMs their pods are running. i.e. users wanna verify pods are running on GPU VMs

Solution:

In Prometheus, we have 2 metrics:  kube_pod_info{} and kube_node_lables{}

kube_node_labels often has a label to tell which type of VM it is. 

We can use "node" to join these 2 metrics to provide a report to users

sum( kube_pod_info{}) by(pod,node) *on(node) group_left(label_beta_kubernetes_io_instance_type) sum(kube_node_labels{}) by (node,label_beta_kubernetes_io_instance_type)

Please refer official promQL doc 

Tip: create grafana API for it:

curl -g -k -H "Authorization: Bearer ******" https://grafana.testtest.com/api/datasources/proxy/1/api/v1/query?query=sum\(kube_pod_info{}\)by\(pod,node\)*on\(node\)group_left\(label_beta_kubernetes_io_instance_type\)sum\(kube_node_labels{}\)by\(node,label_beta_kubernetes_io_instance_type\)

Als refer my blog how to convert promQL into grafana API call

Monday, March 08, 2021

How to convert PromQL into Grafana API call

Requirement:

     We use promQL to fetch some metadata of a Kubernetes cluster. i.e existing namespaces

sum(kube_pod_info) by (namespace)

We would like to convert it to a grafana API call, so other apps can consume this metadata

Solution:

  • First, we need to generate an API token. Refer grafana doc 
  • Second, below is a curl example to consume it:
curl -k -H "Authorization: Bearer e*****dfwefwef0=" https://grafana-test.testtest.com/api/datasources/proxy/1/api/v1/query?query=sum\(kube_pod_info\)by\(namespace\)

Thursday, February 25, 2021

Istio install against different Docker Repos

Requirement:

       With istioctl, it has built-in manifests. However, these manifests or docker images may not be accessible in the corporate network, or users use other docker repo other than docker.io.  How to install it?

Solution:

  • istioctl manifest generate --set profile=demo > istio_generate_manifests_demo.yaml
  • find docker images path in the yaml ,download and upload them to your internal docker repo.
  • edit the file with right docker image path of internal docker repo
  • kubectl apply -f istio_generate_manifests_demo.yaml
  • istioctl verify-install -f istio_generate_manifests_iad_demo.yaml
  • to purge the deployment:
    • istioctl x uninstall --purge

Tuesday, February 16, 2021

Tip: Pod FQDN in Kubernetes

Pods from deployment, statefulset. daemonset exposed by service

FQDN is  pod-ip-address.svc-name.my-namespace.svc.cluster.local

i.e  172-12-32-12.test-svc.test-namespace.svc.cluster.local

not 172.12.32.12.test-svc.test-namespace.svc.cluster.local

Isolated Pods:

FQDN is  pod-ip-address.my-namespace.pod.cluster.local

i.e  172-12-32-12.test-namespace.pod.cluster.local

Wednesday, February 03, 2021

Tip: Kubernetes intermittent DNS issues of pods

 Symptom:

     The pods get "unknown name" or "no such host" for the external domain name. i.e. test.testcorp.com

The issues are intermittent.

Actions:

  • Follow k8s guide and check all  DNS pods are running well. 
  • One possible reason is one or a few of namespaces in /etc/resolv.conf of hosts may not be able to solve the DNS name  test.testcorp.com
    • i.e. *testcorp.com is  corp intranet name, it needs to be resolved by corp name servers. however, in normal cloud VM setup, we have name server option 169.254.169.254 in the /etc/resolv.conf,  in this case 169.254.169.254 has no idea for *.testcorp.com, thus we have intermittent issues
    • To solve this, we need to update DHCP server, remove 169.254.169.254 from /etc/resolv.conf
    • kubectl rollout restart deployment coredns -n kube-system
  • One possible reason is some of the nodes have network issues which DNS pods are not functioning well.  use below commands to test DNS pods. 

kubectl -n kube-system get po -owide|grep coredns |awk '{print $6 }' > /tmp/1.txt

cat /tmp/1.txt  | while read -r line; do echo $line | awk '{print "curl -v --connect-timeout 10 telnet://"$1":53", "\n"}'; done
  • Enable debug log of DNS pods per  k8s guide
  • test the DNS and kubectl tail all DNS pods to get debug info
kubectl -n kube-system logs -f deployment/coredns --all-containers=true --since=1m |grep testcorp

  • You may get log like

INFO] 10.244.2.151:43653 - 48702 "AAAA IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000300408s

[INFO] 10.244.2.151:43653 - 64047 "A IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000392158s 

  • The /etc/resolv.conf has  "options ndots:5"  which may impact the external domain DNS resolution. To use full qualified name can mitigate the issue. test.testcorp.com --> test.testcorp.com.  (there is a .  at the end)
  • Disable coredns AAAA (IPv6) queries. it will reduce NXDOMAIN (not found), thus reduce the fail rate back to the dns client
    • Add below into coredns config file. refer coredns rewrite
    • rewrite stop type AAAA A
  • Install node local DNS to speed DNS queries. Refer kubernetes doc
  • test dig test.testcorp.com +all many times, it will show authorization section
;; AUTHORITY SECTION:
test.com.     4878    IN      NS      dnsmaster1.test.com.
test.com.     4878    IN      NS      dnsmaster5.test.com.
    • to find out which DNS server  timeout
  • Add below parameter in /etc/resolv.conf to improve DNS query performance
    • options single-request-reopen   refer manual
    • options single-request   refer manual
  • Another solution is to use an external name:

    // code placeholder
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
      name: test-stage
      namespace: default
    spec:
      externalName: test-stage.testcorp.com
      ports:
      - port: 636
        protocol: TCP
        targetPort: 636
      type: ExternalName

Tuesday, February 02, 2021

Tip: A Command to get all resources and subresources in Kuberentes Cluster

 list=($(kubectl get --raw / | jq -r '.paths[] | select(. | startswith("/api"))')); for tgt in ${list[@]}; do aruyo=$(kubectl get --raw ${tgt} | jq .resources); if [ "x${aruyo}" != "xnull" ]; then echo; echo "===${tgt}==="; kubectl get --raw ${tgt} | jq -r ".resources[] | .name,.verbs"; fi; done

Tuesday, January 05, 2021

Tip: Change default storageclass in Kubernetes

The below example is for OKE (Oracle Kubernetes Engine), the same concept for other Kubernetes 

Change default storageclass from oci to oci-bv:

kubectl patch storageclass oci -p '{"metadata": {"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"false"}}}'


kubectl patch storageclass oci-bv -p '{"metadata": {"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"true"}}}'

Tuesday, December 29, 2020

Tip: Attach volume conflict Error in OKE

Symptom:

    The pods with block volumes in OKE (oracle Kubernetes engine) are reporting such error:

Warning  FailedAttachVolume  4m26s (x3156 over 4d11h)  attachdetach-controller  (combined from similar events): AttachVolume.Attach failed for volume "*******54jtgiq" : attach command failed, status: Failure, reason: Failed to attach volume: Service error:Conflict. Volume *****osr6g565tlxs54jtgiq currently attached. http status code: 409. Opc request id: *********D6D97

Solution:

    There are quite a few reasons for that. One of them is as the error states:  the volume is attached to another host instance, thus it can't be attached again. 

    To fix that, we can find attach status and VM instance details via volume id. Then manually detach the volume from the VM via SDK or console. The error would be gone