Friday, December 25, 2020

Tip: Ngnix ingress controller can't startup

Symptom:

     We try to restart a pod of nginx ingress controller. After the restart, the pods can't startup 

Error like

status.go:274] updating Ingress ingress-nginx-internal/prometheus-ingress status from [] to [{100.114.90.8 }]

I1226 02:11:14.106423       6 event.go:255] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"ingress-nginx-internal", Name:"prometheus-ingress", UID:"e26f55f2-d87d-4efe-a4dd-5ae02768814a", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"46816813", FieldPath:""}): type: 'Normal' reason: 'UPDATE' Ingress ingress-nginx-internal/prometheus-ingress

I1226 02:11:49.153889       6 main.go:153] Received SIGTERM, shutting down

I1226 02:11:49.153931       6 nginx.go:390] Shutting down controller queues

Workaround:

   Somehow the existing ingress rule "prometheus-ingress" is the causeRemove the rule then the pod can startup well. We can add the rule back after that.

Wednesday, November 18, 2020

Tip: Change status of PVC from Released to Available

Symptoms:

    When users delete PVC in Kubernetes, the PV status stays on "Released".  Users would like to recreate the PVC with the same PV but failed. new PVC status always stays on "Pending".

Solution:

   We need to manually clear the status to make it "Available" via below command

 kubectl patch pv  <pv name> -p '{"spec":{"claimRef": null}}'

Tuesday, November 10, 2020

Tip: OpenSSL SSL_connect: SSL_ERROR_SYSCALL

 Symptoms:

We use curl -v https://<domain> to test if the network traffic is allowed
The expected result would be like 
*  Trying 12.12.12.12:443...
* TCP_NODELAY set
* Connected to ***  port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (OUT), TLS alert, unknown CA (560):
However, we see this error :
*  Trying 12.12.12.12:443...
* TCP_NODELAY set
* Connected to *** port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to *** :443

Solution:

From the output, we see 443 is open but the TLS handshake, Server hello is missing. We have mid-tiers to handle TLS certificates. So it is very likely that the network is interrupted between  LB and mid-tiers where TLS is being handled.  It would be a good approach to double-check firewall ports between them. :)

Another reason is ingress controller pods were stuck may bounce or scale up to workaround it


Sunday, October 11, 2020

Tip: Error Http 504 gateway timeout on ingress controller

 Symptom:

    We have micro-services behind our ingress controller in our Kubernetes cluster. We are hitting HTTP 504 error in our ingress controller logs intermittently.

100.112.95.12 - - [01/Oct/2020:20:32:13 +0000] "GET /mos/products?limit=50&offset=0&orderBy=Name%3Aasc HTTP/2.0" 504 173 "https://ep******" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:78.0) Gecko/20100101 Firefox/78.0" 1578 180.004 [ingress-nginx-external2-mag-oke-products-svc-8080] [] 10.96.63.211:8080, 10.96.63.211:8080, 10.96.63.211:8080 0, 0, 0 60.001, 60.001, 60.002 504, 504, 504 c5b8cb67927d3997b4019e9830762694

  Bounce ingress controller would fix the issues temporarily.

Solution:

  We find the issues are caused parameters of nginx which stated

https://github.com/kubernetes/ingress-nginx/issues/4567

Add below annotations into ingress rules to fix it

nginx.ingress.kubernetes.io/proxy-connect-timeout: "5"

nginx.ingress.kubernetes.io/proxy-next-upstream-timeout: "10"


Friday, October 02, 2020

Tip:Node and Namespace drop down menu missing node names in Grafana

 Symptom:

      We have Prometheus and Grafana setup running well. Suddenly the node and namespace drop-down list disappeared.  No config changes were made. 


Solution:

   It is very likely the kube-state-metrics service have problems. That's the place grafana get the info from.

   Bounce the pod or recreate the deployment to fix it