Thursday, February 25, 2021

Istio install against different Docker Repos

Requirement:

       With istioctl, it has built-in manifests. However, these manifests or docker images may not be accessible in the corporate network, or users use other docker repo other than docker.io.  How to install it?

Solution:

  • istioctl manifest generate --set profile=demo > istio_generate_manifests_demo.yaml
  • find docker images path in the yaml ,download and upload them to your internal docker repo.
  • edit the file with right docker image path of internal docker repo
  • kubectl apply -f istio_generate_manifests_demo.yaml
  • istioctl verify-install -f istio_generate_manifests_iad_demo.yaml
  • to purge the deployment:
    • istioctl x uninstall --purge

Tuesday, February 16, 2021

Tip: Pod FQDN in Kubernetes

Pods from deployment, statefulset. daemonset exposed by service

FQDN is  pod-ip-address.svc-name.my-namespace.svc.cluster.local

i.e  172-12-32-12.test-svc.test-namespace.svc.cluster.local

not 172.12.32.12.test-svc.test-namespace.svc.cluster.local

Isolated Pods:

FQDN is  pod-ip-address.my-namespace.pod.cluster.local

i.e  172-12-32-12.test-namespace.pod.cluster.local

Wednesday, February 03, 2021

Tip: Kubernetes intermittent DNS issues of pods

 Symptom:

     The pods get "unknown name" or "no such host" for the external domain name. i.e. test.testcorp.com

The issues are intermittent.

Actions:

  • Follow k8s guide and check all  DNS pods are running well. 
  • One possible reason is one or a few of namespaces in /etc/resolv.conf of hosts may not be able to solve the DNS name  test.testcorp.com
    • i.e. *testcorp.com is  corp intranet name, it needs to be resolved by corp name servers. however, in normal cloud VM setup, we have name server option 169.254.169.254 in the /etc/resolv.conf,  in this case 169.254.169.254 has no idea for *.testcorp.com, thus we have intermittent issues
    • To solve this, we need to update DHCP server, remove 169.254.169.254 from /etc/resolv.conf
    • kubectl rollout restart deployment coredns -n kube-system
  • One possible reason is some of the nodes have network issues which DNS pods are not functioning well.  use below commands to test DNS pods. 

kubectl -n kube-system get po -owide|grep coredns |awk '{print $6 }' > /tmp/1.txt

cat /tmp/1.txt  | while read -r line; do echo $line | awk '{print "curl -v --connect-timeout 10 telnet://"$1":53", "\n"}'; done
  • Enable debug log of DNS pods per  k8s guide
  • test the DNS and kubectl tail all DNS pods to get debug info
kubectl -n kube-system logs -f deployment/coredns --all-containers=true --since=1m |grep testcorp

  • You may get log like

INFO] 10.244.2.151:43653 - 48702 "AAAA IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000300408s

[INFO] 10.244.2.151:43653 - 64047 "A IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000392158s 

  • The /etc/resolv.conf has  "options ndots:5"  which may impact the external domain DNS resolution. To use full qualified name can mitigate the issue. test.testcorp.com --> test.testcorp.com.  (there is a .  at the end)
  • Disable coredns AAAA (IPv6) queries. it will reduce NXDOMAIN (not found), thus reduce the fail rate back to the dns client
    • Add below into coredns config file. refer coredns rewrite
    • rewrite stop type AAAA A
  • Install node local DNS to speed DNS queries. Refer kubernetes doc
  • test dig test.testcorp.com +all many times, it will show authorization section
;; AUTHORITY SECTION:
test.com.     4878    IN      NS      dnsmaster1.test.com.
test.com.     4878    IN      NS      dnsmaster5.test.com.
    • to find out which DNS server  timeout
  • Add below parameter in /etc/resolv.conf to improve DNS query performance
    • options single-request-reopen   refer manual
    • options single-request   refer manual
  • Another solution is to use an external name:

    // code placeholder
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
      name: test-stage
      namespace: default
    spec:
      externalName: test-stage.testcorp.com
      ports:
      - port: 636
        protocol: TCP
        targetPort: 636
      type: ExternalName

Tuesday, February 02, 2021

Tip: A Command to get all resources and subresources in Kuberentes Cluster

 list=($(kubectl get --raw / | jq -r '.paths[] | select(. | startswith("/api"))')); for tgt in ${list[@]}; do aruyo=$(kubectl get --raw ${tgt} | jq .resources); if [ "x${aruyo}" != "xnull" ]; then echo; echo "===${tgt}==="; kubectl get --raw ${tgt} | jq -r ".resources[] | .name,.verbs"; fi; done

Tip: Use oci cli to reboot a VM

oci compute instance action --action SOFTRESET --region us-ashburn-1 --instance-id  <instance id you can get from kubectl describe node>

oci compute instance get  --region us-ashburn-1 --instance-id  <instance id you can get from kubectl describe node>

sometimes, you may get 404 error if you omit " --region us-ashburn-1"

Tip: Collect console serial Logs of Oracle Cloud Infrastructure

oci compute console-history capture   --region us-ashburn-1 --instance-id <instance-ocid>

--> oci compute console-history get  --region us-ashburn-1 --instance-console-history-id <OCID from the command before> 

--> oci compute console-history get-content --region us-ashburn-1  --length 1000000000 --file /tmp/logfile.txt --instance-console-history-id <OCID from the command before>