back to posts

January 6, 2025

What I Learned Setting Up a CloudFront Distribution

Some mistakes and learnings from configuring a CloudFront Distribution and cache behaviours for a HTTP Server for the fisrt time

Sometimes, you finally get the chance to put some of your theoretical knowledge into practice. I decided to write about how this one went. I already missed lots of opportunities to write because of procrastination, as this is still fresh I’ll get it done.

This work is part of a larger project I’ve been planning and executing. As the project its over soon, I’ll also soon share a reflection on how it went.

In this case, I was setting up a CloudFront Distribution for an HTTP server. I planned all the high-level steps and wanted to dive deep to fully understand the implementation.

Context

The project involves an NGINX server with a couple of location paths. For this server we can have numerous different endpoints, as they have some logic, each path is defined by regex-based rules. Let me create a quick example to illustrate:

Suppose users can retrieve specific paragraphs from a book. The API requires a book name, a chapter number, and a paragraph number. The location paths would reflect this logic: book_name/chapter/paragraph. For example:

  • guide_to_philosophy/7/2
  • guide_to_philosophy/20/50
  • running_tales/2/85

curiosity: I've been reading some philosophy stuff since a couple months, that's why the examples like that :)

Some paragraphs are accessed more frequently than others. Since these are static and rarely change, there’s no need to query the database for every request and for that reason can significantly improve performance and reduce database load when caching the data.

On my real scenario each user can request ~15 endpoints simultaneously and, in one session, may hit over 75 different endpoints, and some previous call can happen again and again. The goal was to reduce database access and enhance performance by caching static content.

Solution

We decided to use CloudFront as a CDN, considering its caching capabilities. If for any reason the data should be updated, there will be also a cache clear mechanism whenever the data is updated. We also wanted to setup a new subdomain for this service as it’s client facing and we don’t want to expose internal URLs.

For having that setup I defined some high level steps to make it work:

  • The subdomain setup via Route53 hosted zone.
  • CloudFront Configuration:
    • Target CloudFront to the HTTP server as the origin.
    • Define cache behaviors and cache headers on the origin server.
    • Configure a DNS record under the hosted zone to expose the service via CloudFront using the subdomain.
  • Set up an ACM certificate for the subdomain.

From my learnings I will focus on the Cloudfront configuration step as it was the one which I had done some mistakes and took some learnings out of it. I wanted to understand for how long I could cache the data and I was afraid of setting up some cache that would be problematic to invalidate, specially when the browser caches the data.

When discussing with the team we decided to introduce a version identifier in the path (using the example I mentioned above, it would be something like this: book_name/book_version/chapter/paragraph). This ensures users always fetch the latest data when the frontend knows that the version has changed, that would prevent the stale data in the browser cache.

Learnings

The whole AWS infrastructure is using CloudFormation, which automates the resource creation process. Configuring subdomains and DNS records was straightforward since I’m already familiar with subdomains and DNS records, I knew what to do but this time I needed to do on AWS environment using CloudFormation.

As mention, my main challenges were around creating the CloudFront Distribution, and defining cache behaviors. I needed first to assimilate how to configure the cache, what is needed, what are the common issues, and so on. Here are some links I liked to read for better understanding about cache:

At the end I defined cache headers on my origin server and CloudFront can also consider the value from the cache headers. However that also needs to be aligned with the TTL values.

https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html#expiration-individual-objects

Setting everything up made me go through some mistakes I was realizing just after debugging and noticing behaviors.

Overcomplicating Cache TTLs:

Initially, I tried setting different values for Cache-Control max-ageMaxTTL, and MinTTL. For instance, I configured the browser cache to last 24 hours and CloudFront’s to 12 hours, but this approach was unnecessary and it was just overcomplicating things.

# example of my origin server
add_header Cache-Control "public, max-age=86400"; # 1 day
# example of my CloudFront Distribution
CloudFrontDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      Origins:
        - DomainName: my-server.eu-west-1.amazon.com
      CacheBehaviors:
        - PathPattern: "/book_name/*/*/*"
        MinTTL: 0
        MaxTTL: 43200 # 12 hours
        DefaultTTL: 0

Simplifying the setup by letting CloudFront inherit the origin's Cache-Control headers and using an AWS-managed cache policy (UseOriginCacheControlHeaders) which has already the TTL values pre-defined resolved the issue.

CloudFrontDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      Origins:
        - DomainName: [my-server.eu-west-1.amazon.com](http://my-server.eu-west-1.amazon.com) 
      DefaultCacheBehavior:
        CachePolicyId: 83da9c7e-98b4-4e11-a168-04f0df8e2c65 # Name: UseOriginCacheControlHeaders (AWS Managed)

I found that the UseOriginCacheControlHeaders cache policy better suits to my scenario as per its description:

“With the new managed cache policies, CloudFront caches content based on the Cache-Control headers returned by the origin, and defaults to not caching when no Cache-Control header is returned”

https://aws.amazon.com/about-aws/whats-new/2024/07/amazon-cloudfront-managed-cache-policies-web-applications/

Caching Issues with ForwardedValues:

I wrongly configured CloudFront to forward all headers and cookies, making each request unique and bypassing caching.

CloudFrontDistribution:
  Type: AWS::CloudFront::Distribution
    Properties:
      DistributionConfig:
        Origins:
          - DomainName: [my-server.eu-west-1.amazon.com](http://my-server.eu-west-1.amazon.com) 
        DefaultCacheBehavior:
          ForwardedValues:
            Headers:
              - '*'
            QueryString: true
            Cookies:
              Forward: all

Also with the switch to the AWS-managed cache policy (UseOriginCacheControlHeaders) fixed this by allowing CloudFront to cache responses based on the origin’s Cache-Control headers.

DNS Record Type Alias:

Initially, I thought I would use a CNAME record to have the traffic from my subdomain targeting the CloudFront domain. But I learned that an A record is recommended for correct routing to CloudFront Distribution as on AWS the ALIAS records are designed to route traffic to specific AWS resources.

Reflection

These were the main mistakes I went through, as I had to assimilate everything in one week. It was something not so easy to debug as I was using curl to make the requests, using different browsers, change settings on AWS directly and changing it on the CloudFormation template to have the configuration as a code.

At the end I had some learnings that I was looking forward to have, and I see one main point of improvement: LESS OVERTHINKING.

In some aspect is good because it enables me to think about different possibilities by thinking thoroughly over some topics, but too much leads to rabbit holes and we need to find the way out with some pragmatic solution.

Well, I gained new insights into caching and CloudFront’s behavior. In general all the digging for understanding the “what ifs” for me is essential to build my own understanding about certain topic.

References:


If you're still here. Thanks for reading :)

back to posts