Thursday, November 10, 2022
Apex Ords Operator for Kubernetes
Tuesday, November 08, 2022
OKE Admission Control Webhook Sample
Requirement:
Solution:
- Please refer github repo
- git clone https://github.com/HenryXie1/oke-admission-webhook
- go build -o oke-admission-webhook
- docker build --no-cache -t repo-url/oke-admission-webhook:v1 .
- rm -rf oke-admission-webhook
- docker push repo-url/oke-admission-webhook:v1
- ./deployment/webhook-create-signed-cert.sh --service oke-admission-webhook-svc --namespace kube-system --secret oke-admission-webhook-secret
- kubectl replace --force -f deployment/validatingwebhook.yaml
- kubectl replace --force -f deployment/deployment.yaml
- kubectl replace --force -f deployment/service.yaml
Demo:
Wednesday, February 09, 2022
Tip: kubectl apply --dry-run=client server RBAC role
kubectl apply --dry-run is very useful to test manifests.
There are differences of RBAC requirement with kubectl apply --dry-run=client and server.
Both need a role in fetching CRD to go through the validation admission chain and the mutating admission chain.
We need READ ONLY role for kubectl apply --dry-run=client and READ WRITE role for kubectl apply --dry-run=server
Wednesday, December 01, 2021
How to expose kube api server via nginx proxy
Requirement:
Kubernetes API (Control Plane) are often sitting behind the firewall. To provide more security and load balancing, we need to set up an nginx proxy in front of them. There are 2 solutions.Solution1: Use L4 TCP proxy pass of nginx
nginx stream core module provides L4 TCP UDP proxy pass functionalities. linkTo proxy pass K8S API on port 6443 via nginx listening port 8888, we can implement the below code in nginx.conf:
stream {kubeconfig has below elements:
upstream api {
server kubernetes.default.svc.cluster.local:6443;
}
server {
listen 8888;
proxy_timeout 20s;
proxy_pass API;
}
}
- the server is pointing nginx proxy ie https://myapi.myk8s.com:8888
- certificate-authority is the CA of K8S API CA( not the CA of myapi.myk8s.com )
- client-certificate: path/to/my/client/cert
- client-key: path/to/my/client/key
Solution2: Use L7 Https proxy pass of nginx
To proxy pass K8S API on https://myapi.myk8s.com/api/ via nginx listening 443 SSL, we can implement the below code in nginx.conf
http {upstream api {kubernetes.default.svc.cluster.local:6443;}server {listen 443 ssl;server_name myapi.myk8s.com;ssl_certificate /etc/nginx/ssl/tls.crt;ssl_certificate_key /etc/nginx/ssl/tls.key;location / {root /usr/local/nginx/html;index index.htm index.html;}location /api/ {rewrite ^/api(/.*)$ $1 break;proxy_pass https://api;}}}
- the server is pointing nginx proxy ie https://myapi.myk8s.com/api/
- certificate-authority is the CA of myapi.myk8s.com (not K8S API CA)
- can't use client-certificate and client-key like we do on L4 TCP proxy pass
- Because TLS traffic to kube API server 6443 is regular anonymous TLS from nginx proxy, API server won't allow it. To solve it:
- Option 1: use JWT token via OpenID connect
users:- name: testuseruser:auth-provider:config:idp-issuer-url: https://openid.myk8s.com/dexclient-id: oidc-loginappid-token: eyJhbGciOiJSUzI1NiIs....****name: oidc
- Option 2: Use mTLS and add client-certificate and client-key in the nginx proxy pass settings.
location /api/ {rewrite ^/api(/.*)$ $1 break;proxy_pass https://api;proxy_ssl_certificate /etc/nginx/k8s-client-certificate.pem;proxy_ssl_certificate_key /etc/nginx/k8s-client-key.key;proxy_ssl_session_reuse on;}
Wednesday, September 22, 2021
Error: invalid bearer token, oidc: verify token: oidc: expected audience
Symptom:
After we implemented dex +
github via link. With
example-app,we are able to get ID-token via http://127.0.0.1:5555/
With ID-token, we construct kubeconfig, but when we access k8s cluster we hit
error:
error: You must be logged in to the server (Unauthorized)
In kube api server logs, we see error:
invalid bearer token, oidc: verify token: oidc: expected audience \"123456\" got [\"example-app\"]]"
Triage:
Check payload and verify JWT ID-token on https://jwt.ioCheck dex container logs
Find similar issues in github link1 link2
Solution:
It turns out the client-id is not matched.
The client-id set on K8S API server (--oidc-client-id) link needs to match the client-id in example-app.
In above example, “123456” is the one I set on K8S API server, while client-id is “example-app” in the example-app which caused the problemSaturday, August 21, 2021
Kubectl Plugin for Oracle Database
Requirement:
Solution:
Demo:
Tuesday, July 20, 2021
Tip: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox
Kubelet report such error when you deploy a pod:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox : "*****" error getting ClusterInformation: connection is unauthorized: Unauthorized
It's due to CNI is not set up well or is not functioning on the node.
Redeploy the CNI provider may help. i.e. Flannel or Calico
Friday, July 16, 2021
Tip: How to get Go Client in Kubernetes Operator Reconcile()
Requirement:
When we build a K8S operator via kubebuilder, we often need to interact with Control Plane. By default, we use controller-runtime client
However, when we use this client to fulfil some functions of kubectl i.e. drain, we hit an error:
cannot use r.Client (variable of type client.Client) as kubernetes.Interface value in struct literal: missing method AdmissionregistrationV1
Solution:
The error indicates the controller-runtime client does not implement method AdmissionregistrationV1, so we can't use it, instead, we init a new GO Client in the reconcile(). Sample code is like
"k8s.io/client-go/kubernetes"ctrlconfig "sigs.k8s.io/controller-runtime/pkg/client/config"cfg, err := ctrlconfig.GetConfig()if err != nil {log.Log.Error(err, "unable to get kubeconfig")return err}kubeclientset, err := kubernetes.NewForConfig(cfg)if err != nil {return err}
Thursday, July 01, 2021
Pass CKA Exam Tips
On Thursday, I passed the CKA exam with a 93 score mark and get the certificate. I share some tips on how I achieve that.
- 17 questions in 2 hours.
- Don't worry about copy and paste. You can copy it by clicking it when your mouse hovers on the "important text".
- Read each question carefully. Always understand questions before starting to do it. Check the weight of each question. The high weight means more mark points in the question.
- Skip the difficult questions and make sure you get easy marks. Only a 66 score mark is needed to pass the exam.
- Practise and create examples for each test point in the CKA curriculum
- Strongly recommend this udemy course. The practice and mock exams are great to prepare for CKA exams.
- Practise all commands in the kubectl cheatsheet
Sunday, May 23, 2021
Tip: Can't find docker networking namespace via ip netns list
Symptom:
In ubuntu, we start a docker container, try to find docker networking namespace via "ip netns list". The output is empty.
Reason:
The docker by default , it records netns on /var/run/docker/netns. While "ip netns list" is checking /var/run/netns
Workaround:
stop all containers , rm -rf /var/run/netns, ln -s /var/run/docker/netns /var/run/netns
Tip:
To find netns id of container use
docker ps ---> find container ID
docker inspect <contain ID> |grep netns
Wednesday, April 07, 2021
Tip: Pods keep crashloopbackoff
Symptom:
Pods always crashloopbackoff
"kubectl describe pod..." does not give meaningful info, as well as "kubectl get events"
Reason:
One of the likely reason is related to pod security policy. My situation is the existing pod security policy does not allow Nginx or Apache to run. It does not have
allowedCapabilities:
- NET_BIND_SERVICE
# apache or nginx need escalation to root to function well
allowPrivilegeEscalation: true
So the pods keep crashloopbackoff. To fix it is to add the above into the pod security policy.
Saturday, April 03, 2021
Tip: Istio TLS secrets, Gateway, VirtualService namespace scope
There is some confusion about where we should put istio objects. Is it in the istio-system or users namespace?
Here are some tips:
For TLS,mTLS CA, certs, key management in istio, the Kubernetes secrets should be created in the istio-system. Not in users' namespace
Gateway and VirtualService need to be created on the users' namespace
Tuesday, March 09, 2021
How to find which type of VMs pods are running via promQL
Requirement:
Users need to know which type of VMs their pods are running. i.e. users wanna verify pods are running on GPU VMs
Solution:
In Prometheus, we have 2 metrics: kube_pod_info{} and kube_node_lables{}
kube_node_labels often has a label to tell which type of VM it is.
We can use "node" to join these 2 metrics to provide a report to users
sum( kube_pod_info{}) by(pod,node) *on(node) group_left(label_beta_kubernetes_io_instance_type) sum(kube_node_labels{}) by (node,label_beta_kubernetes_io_instance_type)
Please refer official promQL doc
Tip: create grafana API for it:
curl -g -k -H "Authorization: Bearer ******" https://grafana.testtest.com/api/datasources/proxy/1/api/v1/query?query=sum\(kube_pod_info{}\)by\(pod,node\)*on\(node\)group_left\(label_beta_kubernetes_io_instance_type\)sum\(kube_node_labels{}\)by\(node,label_beta_kubernetes_io_instance_type\)
Als refer my blog how to convert promQL into grafana API call
Monday, March 08, 2021
How to convert PromQL into Grafana API call
Requirement:
We use promQL to fetch some metadata of a Kubernetes cluster. i.e existing namespaces
sum(kube_pod_info) by (namespace)
We would like to convert it to a grafana API call, so other apps can consume this metadata
Solution:
- First, we need to generate an API token. Refer grafana doc
- Second, below is a curl example to consume it:
curl -k -H "Authorization: Bearer e*****dfwefwef0=" https://grafana-test.testtest.com/api/datasources/proxy/1/api/v1/query?query=sum\(kube_pod_info\)by\(namespace\)
Thursday, February 25, 2021
Istio install against different Docker Repos
Requirement:
With istioctl, it has built-in manifests. However, these manifests or docker images may not be accessible in the corporate network, or users use other docker repo other than docker.io. How to install it?
Solution:
- istioctl manifest generate --set profile=demo > istio_generate_manifests_demo.yaml
- find docker images path in the yaml ,download and upload them to your internal docker repo.
- edit the file with right docker image path of internal docker repo
- kubectl apply -f istio_generate_manifests_demo.yaml
- istioctl verify-install -f istio_generate_manifests_iad_demo.yaml
- to purge the deployment:
- istioctl x uninstall --purge
Tuesday, February 16, 2021
Tip: Pod FQDN in Kubernetes
Pods from deployment, statefulset. daemonset exposed by service
FQDN is pod-ip-address.svc-name.my-namespace.svc.cluster.local
i.e 172-12-32-12.test-svc.test-namespace.svc.cluster.local
not 172.12.32.12.test-svc.test-namespace.svc.cluster.local
Isolated Pods:
FQDN is pod-ip-address.my-namespace.pod.cluster.local
i.e 172-12-32-12.test-namespace.pod.cluster.local
Wednesday, February 03, 2021
Tip: Kubernetes intermittent DNS issues of pods
Symptom:
The pods get "unknown name" or "no such host" for the external domain name. i.e. test.testcorp.com
The issues are intermittent.
Actions:
- Follow k8s guide and check all DNS pods are running well.
- One possible reason is one or a few of namespaces in /etc/resolv.conf of hosts may not be able to solve the DNS name test.testcorp.com
- i.e. *testcorp.com is corp intranet name, it needs to be resolved by corp name servers. however, in normal cloud VM setup, we have name server option 169.254.169.254 in the /etc/resolv.conf, in this case 169.254.169.254 has no idea for *.testcorp.com, thus we have intermittent issues
- To solve this, we need to update DHCP server, remove 169.254.169.254 from /etc/resolv.conf
- kubectl rollout restart deployment coredns -n kube-system
- One possible reason is some of the nodes have network issues which DNS pods are not functioning well. use below commands to test DNS pods.
kubectl -n kube-system get po -owide|grep coredns |awk '{print $6 }' > /tmp/1.txtcat /tmp/1.txt | while read -r line; do echo $line | awk '{print "curl -v --connect-timeout 10 telnet://"$1":53", "\n"}'; done
- Enable debug log of DNS pods per k8s guide
- test the DNS and kubectl tail all DNS pods to get debug info
kubectl -n kube-system logs -f deployment/coredns --all-containers=true --since=1m |grep testcorp
- You may get log like
INFO] 10.244.2.151:43653 - 48702 "AAAA IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000300408s
[INFO] 10.244.2.151:43653 - 64047 "A IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000392158s
- The /etc/resolv.conf has "options ndots:5" which may impact the external domain DNS resolution. To use full qualified name can mitigate the issue. test.testcorp.com --> test.testcorp.com. (there is a . at the end)
- Disable coredns AAAA (IPv6) queries. it will reduce NXDOMAIN (not found), thus reduce the fail rate back to the dns client
- Add below into coredns config file. refer coredns rewrite
- rewrite stop type AAAA A
- Install node local DNS to speed DNS queries. Refer kubernetes doc
- test dig test.testcorp.com +all many times, it will show authorization section
;; AUTHORITY SECTION:test.com. 4878 IN NS dnsmaster1.test.com.test.com. 4878 IN NS dnsmaster5.test.com.
- to find out which DNS server timeout
- Add below parameter in /etc/resolv.conf to improve DNS query performance
Another solution is to use an external name:
// code placeholder apiVersion: v1 kind: Service metadata: annotations: name: test-stage namespace: default spec: externalName: test-stage.testcorp.com ports: - port: 636 protocol: TCP targetPort: 636 type: ExternalName
Tuesday, February 02, 2021
Tip: A Command to get all resources and subresources in Kuberentes Cluster
list=($(kubectl get --raw / | jq -r '.paths[] | select(. | startswith("/api"))')); for tgt in ${list[@]}; do aruyo=$(kubectl get --raw ${tgt} | jq .resources); if [ "x${aruyo}" != "xnull" ]; then echo; echo "===${tgt}==="; kubectl get --raw ${tgt} | jq -r ".resources[] | .name,.verbs"; fi; done
Tuesday, January 05, 2021
Tip: Change default storageclass in Kubernetes
The below example is for OKE (Oracle Kubernetes Engine), the same concept for other Kubernetes
Change default storageclass from oci to oci-bv:
kubectl patch storageclass oci -p '{"metadata": {"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"false"}}}'
kubectl patch storageclass oci-bv -p '{"metadata": {"annotations":{"storageclass.beta.kubernetes.io/is-default-class":"true"}}}'
Tuesday, December 29, 2020
Tip: Attach volume conflict Error in OKE
Symptom:
The pods with block volumes in OKE (oracle Kubernetes engine) are reporting such error:
Warning FailedAttachVolume 4m26s (x3156 over 4d11h) attachdetach-controller (combined from similar events): AttachVolume.Attach failed for volume "*******54jtgiq" : attach command failed, status: Failure, reason: Failed to attach volume: Service error:Conflict. Volume *****osr6g565tlxs54jtgiq currently attached. http status code: 409. Opc request id: *********D6D97
Solution:
There are quite a few reasons for that. One of them is as the error states: the volume is attached to another host instance, thus it can't be attached again.
To fix that, we can find attach status and VM instance details via volume id. Then manually detach the volume from the VM via SDK or console. The error would be gone