Showing posts with label dns. Show all posts
Showing posts with label dns. Show all posts

Tuesday, February 16, 2021

Tip: Pod FQDN in Kubernetes

Pods from deployment, statefulset. daemonset exposed by service

FQDN is  pod-ip-address.svc-name.my-namespace.svc.cluster.local

i.e  172-12-32-12.test-svc.test-namespace.svc.cluster.local

not 172.12.32.12.test-svc.test-namespace.svc.cluster.local

Isolated Pods:

FQDN is  pod-ip-address.my-namespace.pod.cluster.local

i.e  172-12-32-12.test-namespace.pod.cluster.local

Wednesday, February 03, 2021

Tip: Kubernetes intermittent DNS issues of pods

 Symptom:

     The pods get "unknown name" or "no such host" for the external domain name. i.e. test.testcorp.com

The issues are intermittent.

Actions:

  • Follow k8s guide and check all  DNS pods are running well. 
  • One possible reason is one or a few of namespaces in /etc/resolv.conf of hosts may not be able to solve the DNS name  test.testcorp.com
    • i.e. *testcorp.com is  corp intranet name, it needs to be resolved by corp name servers. however, in normal cloud VM setup, we have name server option 169.254.169.254 in the /etc/resolv.conf,  in this case 169.254.169.254 has no idea for *.testcorp.com, thus we have intermittent issues
    • To solve this, we need to update DHCP server, remove 169.254.169.254 from /etc/resolv.conf
    • kubectl rollout restart deployment coredns -n kube-system
  • One possible reason is some of the nodes have network issues which DNS pods are not functioning well.  use below commands to test DNS pods. 

kubectl -n kube-system get po -owide|grep coredns |awk '{print $6 }' > /tmp/1.txt

cat /tmp/1.txt  | while read -r line; do echo $line | awk '{print "curl -v --connect-timeout 10 telnet://"$1":53", "\n"}'; done
  • Enable debug log of DNS pods per  k8s guide
  • test the DNS and kubectl tail all DNS pods to get debug info
kubectl -n kube-system logs -f deployment/coredns --all-containers=true --since=1m |grep testcorp

  • You may get log like

INFO] 10.244.2.151:43653 - 48702 "AAAA IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000300408s

[INFO] 10.244.2.151:43653 - 64047 "A IN test.testcorp.com.default.svc.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.000392158s 

  • The /etc/resolv.conf has  "options ndots:5"  which may impact the external domain DNS resolution. To use full qualified name can mitigate the issue. test.testcorp.com --> test.testcorp.com.  (there is a .  at the end)
  • Disable coredns AAAA (IPv6) queries. it will reduce NXDOMAIN (not found), thus reduce the fail rate back to the dns client
    • Add below into coredns config file. refer coredns rewrite
    • rewrite stop type AAAA A
  • Install node local DNS to speed DNS queries. Refer kubernetes doc
  • test dig test.testcorp.com +all many times, it will show authorization section
;; AUTHORITY SECTION:
test.com.     4878    IN      NS      dnsmaster1.test.com.
test.com.     4878    IN      NS      dnsmaster5.test.com.
    • to find out which DNS server  timeout
  • Add below parameter in /etc/resolv.conf to improve DNS query performance
    • options single-request-reopen   refer manual
    • options single-request   refer manual
  • Another solution is to use an external name:

    // code placeholder
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
      name: test-stage
      namespace: default
    spec:
      externalName: test-stage.testcorp.com
      ports:
      - port: 636
        protocol: TCP
        targetPort: 636
      type: ExternalName

Sunday, May 17, 2020

A few simple Tips on Kubernetes


  • Core DNS is deployment and Flannel is DaemonSet
  • Persistent Volume (PV)  is not under the namespace
  • Persistent Volume Claim(PVC) is under the namespace
  • Delete PVC will delete PV automatically by default (Oracle Kubernetes Engine). Change retain policy if necessary
  • Drain the node before rebooting worker node
  • Both TCP and UDP must be open in the worker node subnet.
  • If UDP is open after VMs are up and running, we may need to recreate VMs to let docker daemon to work with new settings

Tip:Name or service not known isues in Kubernetes

Symptom:

    We got below error when we try to psql into Postgres in Kubernetes Pods. The error is intermittent.
psql: could not translate host name “test-dbhost” to address: Name or service not known

We use this command to test Kube DNS service, curl -v telnet://10.96.5.5:53
The result is also intermittent and DNS resolution is kind of working but very slow

Solution:

   We have checked Kube-DNS service is up and running.  The core DNS Pods are up and well.
We also found if the pods are in the same worker node, they are working fine. However, if cross the nodes, we hit issues. It seems issues on the node to node communications.

   Finally, we find the TCP is open but UDP is not open in the node subnet. We have to open the UDP.
After UDP is open, the intermittent issues are still existing. It is quite possibly related to docker daemon stuck in the old settings.  We need to rolling restart worker nodes to fix it.

To rolling restart worker node:

1. Assume you have nodes available in the same AD of OKE, kubectl drain will move pv,pvc to the new node automatically for statefulset and deployment
2. kubectl drain <node> --ignore-daemonsets  --delete-local-data
3. reboot the node
4. kubectl uncordon <node>

Wednesday, April 15, 2020

Error: ExternaName not working In Kubernetes

Symptom:

   We have ExternName for our DB service: test-db-svc:

apiVersion: v1
kind: Service
metadata:
  name: test-db-svc
  namespace: test-stage-ns
spec:
  externalName: 10.10.10.10
  ports:
  - port: 1521
    protocol: TCP
    targetPort: 1521
  sessionAffinity: None
  type: ExternalName

   After we upgrade Kubernetes Master nodes, DNS service stops resolving the ExternalName

 curl -v telnet://test-db-svc:1521
* Could not resolve host: test-db-svc; Name or service not known
* Closing connection 0curl: 
(6) Could not resolve host: test-db-svc; Name or service not known

Solution:

   It is due to the new version of Kubernetes doesn't support IP address on ExternalName. We need to replace it with FQDN

apiVersion: v1
kind: Service
metadata:
  name: test-db-svc
  namespace: test-stage-ns
spec:
  externalName: testdb.testdomain.com
  ports:
  - port: 1521
    protocol: TCP
    targetPort: 1521
  sessionAffinity: None
  type: ExternalName


Tuesday, April 14, 2020

Tip: use curl to test network port and DNS service in the Pod

curl is installed in most of docker images by default. Most of pods have it
We can use curl to test if network ports are open and DNS service is working

Example:  To test DB service port 1521
curl -v telnet://mydb.testdb.com:1521

*   Trying 1.1.1.1:1521...
*   TCP_NODELAY set
*   connect to 1.1.1.1:1521 port 1521 failed: Connection timed out
*   Failed to connect to port 1521: Connection timed out
*  Closing connection 0curl: (7) Failed to connect to  Connection timed out


It tells us DNS is working as we see the IP address 1.1.1.1
But port is not open.

To test url behind proxy, we can use blow command

curl -x http://proxy-mine:3128 -v telnet://abc.com:443