Istio, mTLS, debugging a 503 error

Hello folks. In this blog post I’ll share with you a problem that I had while trying out the Circuit Breaking tutorial in the Istio documentation. I’ll follow all the steps I did while troubleshooting this issue, and hopefully it will be useful for someone out there. It was for me at least, I learned a bit more about Istio in the process.

The steps of the task are quite simple:
1) Install a couple of pods (one serving httpbin + one having curl to communicate with the httpbin service)
2) Create a DestinationRule object so that calls to httpbin service are limited (Circuit Breaking)
Pretty straightforward, no? So, let’s the fun begin.

Let’s install the httpbin and helper pods:

$ kubectl create ns foo
$ kubectl apply -f <(istioctl kube-inject -f samples/httpbin/httpbin.yaml) -n foo
$ kubectl apply -f <(istioctl kube-inject -f samples/sleep/sleep.yaml) -n foo

$ kubectl -n foo get pod,svc
NAME                           READY     STATUS    RESTARTS   AGE
pod/httpbin-6bbb775889-wcp45   2/2       Running   0          35s
pod/sleep-5b597748b4-77kj5     2/2       Running   0          35s

NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/httpbin   ClusterIP   10.105.25.98           8000/TCP   36s
service/sleep     ClusterIP   10.111.0.72            80/TCP     35s

From the helper pod, let’s invoke the httpbin service, using curl:

$ kubectl -n foo exec -it -c sleep sleep-5b597748b4-77kj5 -- curl http://httpbin:8000/get
{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Content-Length": "0", 
    "Host": "httpbin:8000", 
    "User-Agent": "curl/7.35.0", 
    "X-B3-Sampled": "1", 
    "X-B3-Spanid": "b5d006d3d9bf1f4d", 
    "X-B3-Traceid": "b5d006d3d9bf1f4d", 
    "X-Request-Id": "970b84b2-999b-990c-91b4-b6c8d2534e77"
  }, 
  "origin": "127.0.0.1", 
  "url": "http://httpbin:8000/get"
}

So far, so good. Next step in the Circuit Breaking task is to add a DestinationRule:

$ cat <<EOF | kubectl -n foo apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: httpbin
spec:
  host: httpbin
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
    outlierDetection:
      consecutiveErrors: 1
      interval: 1s
      baseEjectionTime: 3m
      maxEjectionPercent: 100
EOF

Now let’s retry the httpbin service:

$ kubectl -n foo exec -it -c sleep sleep-5b597748b4-77kj5 -- curl http://httpbin:8000/get
upstream connect error or disconnect/reset before headers

Oops, something went wrong. Let’s make curl be more verbose:

$ kubectl -n foo exec -it -c sleep sleep-5b597748b4-77kj5 -- curl -v http://httpbin:8000/get
* Hostname was NOT found in DNS cache
*   Trying 10.105.235.142...
* Connected to httpbin (10.105.235.142) port 8000 (#0)
> GET /get HTTP/1.1
> User-Agent: curl/7.35.0
> Host: httpbin:8000
> Accept: */*
> 
< HTTP/1.1 503 Service Unavailable
< content-length: 57
< content-type: text/plain
< date: Tue, 28 Aug 2018 12:26:54 GMT
* Server envoy is not blacklisted
< server: envoy
< 
* Connection #0 to host httpbin left intact
upstream connect error or disconnect/reset before headers

Hmmm, error 503… Why? According to the Circuit Breaking task, it should work fine. We only added a rule that sets the maximum number of TCP connections to 1, and indeed, with the curl command above we spawned just one connection. So, what’s wrong?

First thing that came into my mind was to issue the command to verify if Circuit Breaking was in effect:

$ kubectl -n foo exec -it -c istio-proxy sleep-5b597748b4-77kj5 -- curl localhost:15000/stats | grep httpbin | grep pending
cluster.outbound|8000||httpbin.foo.svc.cluster.local.upstream_rq_pending_active: 0
cluster.outbound|8000||httpbin.foo.svc.cluster.local.upstream_rq_pending_failure_eject: 0
cluster.outbound|8000||httpbin.foo.svc.cluster.local.upstream_rq_pending_overflow: 0
cluster.outbound|8000||httpbin.foo.svc.cluster.local.upstream_rq_pending_total: 5

So, the value 0 for the upstream_rq_pending_overflow variable confirms that no call has been trapped by the Circuit Breaking.

Explanation of the above command:
Istio sidecar (Envoy container named istio-proxy) exposes (locally) the port 15000, which is accessible via HTTP and has some utilities, such as printing some statistics about the service.
So, in the command above we executed curl (curl localhost:15000/stats) within the sidecar container (-c istio-proxy) of the helper pod (sleep-5b597748b4-77kj5), filtering the output by the service we want to investigate (| grep httpbin) then filtering for the circuit breaker pending state (| grep pending).

In order to confirm that the guilty guy is that DestinationRule, let’s delete it and try again:

$ kubectl -n foo delete DestinationRule httpbin
destinationrule.networking.istio.io "httpbin" deleted
$ kubectl -n foo exec -it -c sleep sleep-5b597748b4-77kj5 -- curl -v http://httpbin:8000/get
...
< HTTP/1.1 200 OK
...

Adding it back:

...
< HTTP/1.1 503 Service Unavailable
...

So, it seems the DestinationRule is the bad guy here. But why? We need to investigate a bit more. Hey! What if we check the Envoy (istio-proxy sidecar) logs? Let’s do it:

$ kubectl -n foo logs -c istio-proxy sleep-5b597748b4-77kj5 -f
# In other terminal, issue the curl command (kubectl -n foo exec -it -c sleep sleep-5b597748b4-77kj5 -- curl -v http://httpbin:8000/get)
# Then we see in the log:
[2018-08-28T13:06:56.454Z] "GET /get HTTP/1.1" 503 UC 0 57 0 - "-" "curl/7.35.0" "19095d07-320a-9be0-8ba5-e0d08cf58f52" "httpbin:8000" "172.17.0.14:8000"

It doesn’t help. The log tells us that Envoy is getting the 503 error from the server. So, let’s check the logs for the server side (httpbin):

$ kubectl -n foo logs -c istio-proxy httpbin-94fdb8c79-h9zrq -f
# In other terminal, issue the curl command (kubectl -n foo exec -it -c sleep sleep-5b597748b4-77kj5 -- curl -v http://httpbin:8000/get)
# Log is empty...

What? We don’t see anything in the log output. It’s like the request is not reaching the server. So, what to do now?… Hey! What if we could increase the log verbosity? Maybe the request is coming but it’s just not being outputted? Let’s see.

Remember I told above about the Envoy’s port 15000 being exposed locally to the service pod? We used it to grab statistics. Let’s take a look at it to find out what else it offers:

$ kubectl -n foo exec -it -c istio-proxy httpbin-94fdb8c79-h9zrq -- curl http://localhost:15000/help
admin commands are:
  /: Admin home page
  /certs: print certs on machine
...
  /logging: query/change logging levels
...

Hey! It seems we found what we were looking for: /logging. Let’s use it:

$ kubectl -n foo exec -it -c istio-proxy httpbin-94fdb8c79-h9zrq -- curl http://localhost:15000/logging?level=trace
active loggers:
  admin: trace
...

The command above set all Envoy loggers to the trace level, the finest one. For more information about this administrative interface, checkout the Envoy docs. Now, let’s retry retrieving the Envoy server log, hopefully with the trace level we will get something (in fact, we get lots of logs!):

$ kubectl -n foo logs -c istio-proxy httpbin-94fdb8c79-h9zrq -f
# In other terminal, issue the curl command (kubectl -n foo exec -it -c sleep sleep-5b597748b4-77kj5 -- curl -v http://httpbin:8000/get)
# Now we see in the logs (I filtered some content not relevant)
[debug][filter] external/envoy/source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[debug][main] external/envoy/source/server/connection_handler_impl.cc:217] [C31] new connection
[trace][connection] external/envoy/source/common/network/connection_impl.cc:389] [C31] socket event: 2
[trace][connection] external/envoy/source/common/network/connection_impl.cc:457] [C31] write ready
[debug][connection] external/envoy/source/common/ssl/ssl_socket.cc:111] [C31] handshake error: 2
[trace][connection] external/envoy/source/common/network/connection_impl.cc:389] [C31] socket event: 3
[trace][connection] external/envoy/source/common/network/connection_impl.cc:457] [C31] write ready
[debug][connection] external/envoy/source/common/ssl/ssl_socket.cc:111] [C31] handshake error: 1
[debug][connection] external/envoy/source/common/ssl/ssl_socket.cc:139] [C31] SSL error: 268435612:SSL routines:OPENSSL_internal:HTTP_REQUEST
[debug][connection] external/envoy/source/common/network/connection_impl.cc:133] [C31] closing socket: 0

Wow, now that seems interesting! We can see that the request is indeed coming to the server, but it’s failing due to a handshake error and Envoy is closing the connection. The question now is: Why a handshake error? Why is SSL involved at all?

When speaking of SSL in the context of Istio, we remember of Mutual TLS. Then I went to Istio docs, trying to find something relevant to my problem. Reading the Security tutorial Task opened my eyes!

I found out that I had installed Istio with mutual TLS activated!

Let’s do some checking:

$ kubectl get MeshPolicy default -o yaml
apiVersion: authentication.istio.io/v1alpha1
kind: MeshPolicy
metadata: ...
spec:
  peers:
  - mtls: {}

$ kubectl -n istio-system get DestinationRule default -o yaml
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata: ...
spec:
  host: '*.local'
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL

Those outputs above show that mTLS is installed in the cluster. Those objects only exist when mTLS is on.

OK, looking back at my installation scripts, I realized that I really messed up and installed Istio with mTLS on. However, the question is still there: Why the httpbin service is failing? Knowing that mTLS is active in the mesh, and reading the documentation, it’s not hard to deduce that the server is expecting a TLS connection and the client is issuing a plain text one. We changed the question again: Why is the client (sleep pod) connecting to the server (httpbin pod) using plain text?

Again, looking at the documentation we find the answer. The way mTLS works in Istio is simple: There is a default DestinationRule object (called “default”, as we can see in the command above) that instructs all traffic in the mesh to go through TLS. However, when we created our own DestinationRule, for the purpose of the Circuit Breaking task, we did overwrite that default configuration with our own, which has no TLS at all! This is stated at the TLS documentation for Istio:

Don’t forget that destination rules are also used for non-auth reasons such as setting up canarying, but the same order of precedence applies. So if a service requires a specific destination rule for any reason – for example, for a configuration load balancer – the rule must contain a similar TLS block with ISTIO_MUTUAL mode, as otherwise it will override the mesh- or namespace-wide TLS settings and disable TLS.

So, it’s clear now what we should do: Modify the DestinationRule for the Circuit Breaking task to include the TLS block (lines 20-21 below):

cat <<EOF | kubectl -n foo apply -f -
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: httpbin
spec:
  host: httpbin
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
    outlierDetection:
      consecutiveErrors: 1
      interval: 1s
      baseEjectionTime: 3m
      maxEjectionPercent: 100
    tls:
      mode: ISTIO_MUTUAL
EOF
destinationrule.networking.istio.io/httpbin configured

Now let’s try our httpbin service:

kubectl -n foo exec -it -c sleep sleep-5b597748b4-77kj5 -- curl -v http://httpbin:8000/get
...
< HTTP/1.1 200 OK
...

\o/

Now I can continue with my Circuit Breaking tutorial!

Some lessons learned:
– Confirm whether you are using mTLS or not; Enabling it opens the door to obscure errors 🙂
– DestinationRules have precedence order: More specific ones overwrite global ones
– We can sometimes make good use of the Sidecar’s administrative interface (local port 15000)
– Always read the docs 🙂

See you next time!

Istio, Service Mesh, Updates

Hello folks! This blog is so quiet lately… I just wanted to give you an update on what I am doing these days. So…

Istio Logo

Istio project

Since December2017 /January 2018 I’ve switched teams at Red Hat, and started working with Istio. Istio is a tool/platform that helps us to deliver micro-services, in a number of different aspects. I intend to write more posts on that. Meanwhile, you can find out more about Istio on their website. Trust me, it’s a wonderful tool to give a look at, if you’re somehow involved or planning to enter in this micro-service world, be you dev or ops :).

So far our work has been mainly in understanding the benefits of Istio when used on top of Kubernetes and Openshift (Red Hat’s improved and supported version of Kubernetes). This means that we have been mostly involved in downstream tasks, like creating CentOS and RHEL RPM binaries and container images, although we are already contributing with upstream, community effort. The plans are to get much more involved in the community in a soonish timeframe.

Events

Last december I attended Kubecon US in Austin, Texas. It was my first immersion on Istio, given there were a IstioCon, a pre-event of Kubecon with a bunch of talks and workshop. Last May I attended Kubecon EU, in Copenhagen, Denmark. All talks are freely available on CNCF channel on YouTube. Check them out! It is definitively a great material to keep up with latest and greatest stuff that’s happening out there, not only in Istio world, but in the whole Kubernetes ecosystem.

Last April I attended a great Brazilian conference – The Developers Conference (TDC) – in Florianópolis and spoke about Container Ecosystem – alternatives to Docker (CRI-O, Buildah, Podman, etc).

In July I’ll be speaking at TDC again, this time in São Paulo. I’ll give two talks about Istio and one on Containers Ecosystem, mostly the same I gave on previous TDC in Florianópolis. This event is already such a success that organizers had to duplicate some tracks in order to be able to handle so many attendees. Therefore, my talks about Istio on the microservices track (Introduction to Service Mesh with Istio) and devops track (How Istio facilitates A/B deployments on micro-services) will be duplicated and I will give two of each, giving a total of 5 talks! I hope to have enough swags for so many talks!

In August I’ll be traveling to Boston, MA to attend the very first Devconf to be held in the US! I’ll be speaking about Istio there as well.

There is more to come, I’d guess, due to the amount of hype and momentum that Istio/Service Mesh is having everywhere.

Plans

So, my plans are to blog regularly about the cool stuff we are doing around Istio, and how it can be used to improve the micro-services deployments, making admins and developers’ lives easier!

See you soon!