Getting a client IP address through... CloudFront -> AWS NLB -> traefik -> Kubernetes

Gee Batman, that's a lot of layers!

Mar 28, 2023

All I want is to get the client IP address to my application code, but when there are so many different layers in my networking stack, was that going to be harder than I was hoping? Turns out you can only go so far.

So let’s start at the top of the request path. If you go to https://mentallyanimated.com/, which is my website and is hosted on my playground EKS cluster, you go through a bunch of layers. I have a CloudFront distribution which has an AWS NLB as its origin, which points to traefik as my ingress controller, which then routes to an nginx container that has the static assets of the website. At each step in that request path, there’s an opportunity to lose the calling client’s IP address, which is useful for things like rate limiting or allow listing access. Even though the requests start with CloudFront, let’s start with the NLB.

AWS Network Load Balancer

I like using a Network Load Balancer in my stack. It’s highly performant, and a specifically, I like that it doesn’t have any smarts. I can centralize any and all additional middleware features in my traefik configuration, or swap traefik out for any ingress controller and keep any advanced features like rate limits, or path rewriting within the cluster. One catch with it is that it’s layer 4 load balancer. In other words, it doesn’t really understand any concept of things like HTTP headers. It’s just a packet forwarder. So how do we maintain the relevant HTTP headers such that the client IP gets preserved?

As it turns out, NLB’s actually support a configuration setting named Client IP Preservation, which sounds great, right? When creating my kind: Service type: LoadBalancer for traefik, all I’m supposed to need to do is add the right annotation to control the NLB target group attributes.

service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true

This alone actually doesn’t work. But why not? It’s because of the way that Kubernetes networking itself works. I’m not going to try to explain it all, but I will link you to this blog post which I read to understand the problem.

To make the preserve_client_ip to work at the NLB level, you need to set the externalTrafficPolicy of your kind: Service to Local, instead of the default Cluster. There’s just one problem with this. If you set that policy to Local, traefik will only ever try to route to pods on the same node as your traffic. If the targeted application pod is scheduled on a different node, traefik will just timeout trying to forward a request. That’s no good. One thing you could do with this is set an affinity rule so that your applications always get scheduled wherever there is an existing traefik pod. Similarly, if you made traefik a kind: DaemonSet and your application also a kind: DaemonSet, then you know that there will always be pods next to each other. These solutions are hacks and there must be something better!

If you look at the documentation for enabling client IP preservation, there’s a fair number of considerations that you have to account for also. One of which seems particularly gnarly.

When client IP preservation is enabled, you might encounter TCP/IP connection limitations related to observed socket reuse on the targets. These connection limitations can occur when a client, or a NAT device in front of the client, uses the same source IP address and source port when connecting to multiple load balancer nodes simultaneously. If the load balancer routes these connections to the same target, the connections appear to the target as if they come from the same source socket, which results in connection errors. If this happens, the clients can retry (if the connection fails) or reconnect (if the connection is interrupted). You can reduce this type of connection error by increasing the number of source ephemeral ports or by increasing the number of targets for the load balancer. You can prevent this type of connection error, by disabling client IP preservation or disabling cross-zone load balancing.

traefik

Traefik is a very full featured ingress controller. It supports a lot of different kinds of workloads. You can even start using it for HTTP/3, it supports gRPC, etc. One of the things that I learned about along this journey was about something called the proxy protocol. Simply put, the proxy protocol gives the NLB a way to encode things like HTTP header information so that it can be kept and interpreted downstream. Perfect. AWS NLBs also support the proxy protocol, which is still just a simple annotation to add to our kind: Service for traefik.

service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"

And then in my terraform helm_release for traefik, all I had to add to enable proxy protocol functionality was set the following template values.

  set {
    name  = "ports.websecure.proxyProtocol.trustedIPs[0]"
    value = "10.9.0.0/24"
  }

  set {
    name  = "ports.websecure.proxyProtocol.trustedIPs[1]"
    value = "10.9.1.0/24"
  }

These are the private IP ranges for the subnet that my NLB can use, which means we just have to tell traefik that if a request comes from this IP range and is trying to use the proxy protocol, to go ahead and trust those requests.

And that’s literally it. Setting the annotation and setting the trusted IPs made it so that my applications could see the real client IP address. Unfortunately, there’s one more layer to this story.

CloudFront

At the top is CloudFront, which made me think that I was very lucky because only recently did they start supporting forwarding the client IP address. Otherwise, they actually set the X-Forwarded-For header to the IP address of the edge location that the client has connected with.

Previously, IP address and client connection port information were available only in CloudFront access logs, making it harder to resolve issues or perform real-time decision-making based on these data.

There’s a problem with this though. The industry standard is to only check with X-Forwarded-For header when dealing with proxy IP addresses. CloudFront’s solution here is to actually set an entirely new header. You can see these extra headers by using the Managed-AllViewerAndCloudFrontHeaders-2022-06 request policy. They put the client’s actual IP address and the requested port under CloudFront-Viewer-Address. This is both a different header AND a different format since it includes the port information. Traefik and other proxies don’t know what to do with that. You might think to yourself, but Aaron, what if we just modify the headers? I had the same idea and I thought I was being super clever. Separate from a CloudFront Lambda@Edge, I learned that they have even faster and lighter weight CloudFront Functions which run a custom JavaScript runtime, but it’s perfect for modifying headers.

resource "aws_cloudfront_function" "fix_x_forwarded_for" {
  name    = "fix-x-forwarded-for"
  runtime = "cloudfront-js-1.0"
  comment = "Sets the X-Forwarded-For header to the client's IP address."
  code    = <<EOF
function handler(event) {
    var request = event.request;
    var clientIP = event.viewer.ip;

    request.headers['x-forwarded-for'] = {value: clientIP};

    return request;
}
EOF
}

I tried overriding the header before it was sent off to the NLB. Unfortunately, but actually probably fortunately, CloudFront will not forward the request at all when this header is tampered. So that means that I could set yet another header with the correct client IP value, but traefik itself doesn’t let you rewrite headers natively either. I’d have to write a custom plugin or use one of the existing plugins. Maybe Lambda@Edge will also work? That's for another time to try.

Conclusion

So what does this mean? It means that I can’t actually get the client IP address for requests that go through CloudFront. So that’s kind of another L for me this week.

Either way, thanks for reading and see ya’ll in the next one.

endzyme

May 9

Here's a thing that appears to have worked with some testing.

ingress-nginx configured to use `enable-real-ip: "true"` in conjunction with `forwarded-for-header: "Cloudfront-Viewer-Address"` (chart values settings which make their way into the ConfigMap on cluster)

These setting appear to consistently yield the correctly logged client IP, and you don't have to mess with `proxy-real-ip-cidr: "cloudfrontIPs,commaseparated"`.

If you look through the https://nginx.org/en/docs/http/ngx_http_realip_module.html docs you'll see: "The ngx_http_realip_module module is used to change the client address and optional port to those sent in the specified header field." -- This appears to parse things correctly and the logs both on ingress-nginx (for `addr`) and downstream services get the right client IP.

I haven't dug into the nginx.conf to see what the controller is actually writing into config to make this work reliably but it's worth noting that your "Origin request policy" must be setup to either "exclude" x-forwarded-for -- or -- to "include" Cloudfront-Viewer-Address. It's also worth noting that when Cloudfront-Viewer-Address was not in the request header, the behavior appears to revert to trusting X-Forwarded-For if it's in the request header (which can be dangerous).

Expand full comment

A slice of experiments

Discussion about this post