Henry Xie 's blog: February 2021

Thursday, February 25, 2021

Istio install against different Docker Repos

Requirement:

With istioctl, it has built-in manifests. However, these manifests or docker images may not be accessible in the corporate network, or users use other docker repo other than docker.io. How to install it?

Solution:

istioctl manifest generate --set profile=demo > istio_generate_manifests_demo.yaml
find docker images path in the yaml ,download and upload them to your internal docker repo.
edit the file with right docker image path of internal docker repo
kubectl apply -f istio_generate_manifests_demo.yaml
istioctl verify-install -f istio_generate_manifests_iad_demo.yaml
to purge the deployment:

istioctl x uninstall --purge

Tuesday, February 16, 2021

Tip: Pod FQDN in Kubernetes

Pods from deployment, statefulset. daemonset exposed by service

FQDN is pod-ip-address.svc-name.my-namespace.svc.cluster.local

i.e 172-12-32-12.test-svc.test-namespace.svc.cluster.local

not 172.12.32.12.test-svc.test-namespace.svc.cluster.local

Isolated Pods:

FQDN is pod-ip-address.my-namespace.pod.cluster.local

i.e 172-12-32-12.test-namespace.pod.cluster.local

Wednesday, February 03, 2021

Tip: Kubernetes intermittent DNS issues of pods

Symptom:

The pods get "unknown name" or "no such host" for the external domain name. i.e. test.testcorp.com

The issues are intermittent.

Actions:

Follow k8s guide and check all DNS pods are running well.
One possible reason is one or a few of namespaces in /etc/resolv.conf of hosts may not be able to solve the DNS name test.testcorp.com

i.e. *testcorp.com is corp intranet name, it needs to be resolved by corp name servers. however, in normal cloud VM setup, we have name server option 169.254.169.254 in the /etc/resolv.conf, in this case 169.254.169.254 has no idea for *.testcorp.com, thus we have intermittent issues
To solve this, we need to update DHCP server, remove 169.254.169.254 from /etc/resolv.conf
kubectl rollout restart deployment coredns -n kube-system

One possible reason is some of the nodes have network issues which DNS pods are not functioning well. use below commands to test DNS pods.

kubectl -n kube-system get po -owide|grep coredns |awk '{print $6 }' > /tmp/1.txt

cat /tmp/1.txt | while read -r line; do echo $line | awk '{print "curl -v --connect-timeout 10 telnet://"$1":53", "\n"}'; done

Enable debug log of DNS pods per k8s guide
test the DNS and kubectl tail all DNS pods to get debug info

kubectl -n kube-system logs -f deployment/coredns --all-containers=true --since=1m |grep testcorp

You may get log like

INFO] 10.244.2.151:43653 - 48702 "AAAA IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000300408s
[INFO] 10.244.2.151:43653 - 64047 "A IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000392158s

The /etc/resolv.conf has "options ndots:5" which may impact the external domain DNS resolution. To use full qualified name can mitigate the issue. test.testcorp.com --> test.testcorp.com. (there is a . at the end)
Disable coredns AAAA (IPv6) queries. it will reduce NXDOMAIN (not found), thus reduce the fail rate back to the dns client

Add below into coredns config file. refer coredns rewrite
rewrite stop type AAAA A

Install node local DNS to speed DNS queries. Refer kubernetes doc
test dig test.testcorp.com +all many times, it will show authorization section

;; AUTHORITY SECTION:
test.com. 4878 IN NS dnsmaster1.test.com.
test.com. 4878 IN NS dnsmaster5.test.com.

to find out which DNS server timeout

Add below parameter in /etc/resolv.conf to improve DNS query performance

options single-request-reopen refer manual
options single-request refer manual

Another solution is to use an external name:

// code placeholder
apiVersion: v1
kind: Service
metadata:
  annotations:
  name: test-stage
  namespace: default
spec:
  externalName: test-stage.testcorp.com
  ports:
  - port: 636
    protocol: TCP
    targetPort: 636
  type: ExternalName

Tuesday, February 02, 2021

Tip: A Command to get all resources and subresources in Kuberentes Cluster

list=($(kubectl get --raw / | jq -r '.paths[] | select(. | startswith("/api"))')); for tgt in ${list[@]}; do aruyo=$(kubectl get --raw ${tgt} | jq .resources); if [ "x${aruyo}" != "xnull" ]; then echo; echo "===${tgt}==="; kubectl get --raw ${tgt} | jq -r ".resources[] | .name,.verbs"; fi; done

Tip: Use oci cli to reboot a VM

oci compute instance action --action SOFTRESET --region us-ashburn-1 --instance-id <instance id you can get from kubectl describe node>

oci compute instance get --region us-ashburn-1 --instance-id <instance id you can get from kubectl describe node>

sometimes, you may get 404 error if you omit " --region us-ashburn-1"

Tip: Collect console serial Logs of Oracle Cloud Infrastructure

oci compute console-history capture --region us-ashburn-1 --instance-id <instance-ocid>

--> oci compute console-history get --region us-ashburn-1 --instance-console-history-id <OCID from the command before>

--> oci compute console-history get-content --region us-ashburn-1 --length 1000000000 --file /tmp/logfile.txt --instance-console-history-id <OCID from the command before>