Sunday, May 17, 2020

Tip:Name or service not known isues in Kubernetes

Symptom:

    We got below error when we try to psql into Postgres in Kubernetes Pods. The error is intermittent.
psql: could not translate host name “test-dbhost” to address: Name or service not known

We use this command to test Kube DNS service, curl -v telnet://10.96.5.5:53
The result is also intermittent and DNS resolution is kind of working but very slow

Solution:

   We have checked Kube-DNS service is up and running.  The core DNS Pods are up and well.
We also found if the pods are in the same worker node, they are working fine. However, if cross the nodes, we hit issues. It seems issues on the node to node communications.

   Finally, we find the TCP is open but UDP is not open in the node subnet. We have to open the UDP.
After UDP is open, the intermittent issues are still existing. It is quite possibly related to docker daemon stuck in the old settings.  We need to rolling restart worker nodes to fix it.

To rolling restart worker node:

1. Assume you have nodes available in the same AD of OKE, kubectl drain will move pv,pvc to the new node automatically for statefulset and deployment
2. kubectl drain <node> --ignore-daemonsets  --delete-local-data
3. reboot the node
4. kubectl uncordon <node>

No comments: