Wednesday, February 03, 2021

Tip: Kubernetes intermittent DNS issues of pods

 Symptom:

     The pods get "unknown name" or "no such host" for the external domain name. i.e. test.testcorp.com

The issues are intermittent.

Actions:

  • Follow k8s guide and check all  DNS pods are running well. 
  • One possible reason is some of the nodes have network issues which DNS pods are not functioning well.  use below commands to test DNS pods. 

kubectl -n kube-system get po -owide|grep coredns |awk '{print $6 }' > /tmp/1.txt

cat /tmp/1.txt  | while read -r line; do echo $line | awk '{print "curl -v --connect-timeout 10 telnet://"$1":53", "\n"}'; done
  • Enable debug log of DNS pods per  k8s guide
  • test the DNS and kubectl tail all DNS pods to get debug info
kubectl -n kube-system logs -f deployment/coredns --all-containers=true --since=1m |grep testcorp

  • You may get log like

INFO] 10.244.2.151:43653 - 48702 "AAAA IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000300408s

[INFO] 10.244.2.151:43653 - 64047 "A IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000392158s 

  • The /etc/resolv.conf has  "options ndots:5"  which may impact the external domain DNS resolution. To use full qualified name can mitigate the issue. test.testcorp.com --> test.testcorp.com.  (there is a .  at the end)
  • Disable coredns AAAA (IPv6) queries. it will reduce NXDOMAIN (not found), thus reduce the fail rate back to the dns client
    • Add below into coredns config file. refer coredns rewrite
    • rewrite stop type AAAA A
  • Install node local DNS to speed DNS queries. Refer kubernetes doc
  • Add below parameter in /etc/resolv.conf to improve DNS query performance
    • options single-request-reopen   refer manual
    • options single-request   refer manual
  • Another solution is to use an external name:

    // code placeholder
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
      name: test-stage
      namespace: default
    spec:
      externalName: test-stage.testcorp.com
      ports:
      - port: 636
        protocol: TCP
        targetPort: 636
      type: ExternalName

No comments: