Cheating Lambda scalability

I am one of those people that can be defined as serverless first. Almost everything can be serverless, and with some bending, you can find a way when serverless has limitations.

Image description

While some argue that it is better to have an architecture with 2 boxes (CLUSTER and Relational Database), I use the features of services to achieve the same results.

For example:

  • Duplicating data in DynamoDB DB table for better data access vs 1 Relational DB with complex SQL

  • DynamoDB Streams vs Background Job to find out what has "changed"

  • Optimising Lambda functions Duration to bring down the cost vs having all in one big cluster

  • Swapping services when they reach a scale that price makes it no sense, for example, ALB in favour of API Gateway

As code is a liability, with Serverless, I have more control as I keep it to a minimum. If I would build with the Monolith approach of a cluster, I would make the code more "complex" as I would use patterns like:

  • Multi-tier Architecture or Hexagonal Architecture...

  • Dependency injection framework

  • Middleware framework

When I say "complex" with the above options, it is because I now need to leverage some framework, while writing single-responsibility Lambda functions is just a matter of:

  • Validate the input

  • Some checks (business logic) with the usage of AWS SDK

  • Return a response

Now that I have explained why I prefer serverless, I can talk about the limitations of Lambda at scale.

Most people compare only the AWS Bill, and this is not the best way to do it.

Image description

There are a lot of good articles out there:

It is not arguable that Lambda can be more expensive per request than a cluster and that we have some hard limits on scalability.

Image description

Read more here.

It is also not arguable that Lambda is the best choice if you have spiky traffic or higher traffic at a particular time of the day (predictable) that varies.

Someone would say:

  • If you know that you have predictable high traffic at a particular time, better Cluster with pre-scaling

This is true, but what if I have at random different demands at that predictable time?

Someone would say:

  • Overprovision 3x time the scale you expect.

  • Move to Lambda

As Vlad Ionescu pointed out in his post, Lambda scales significantly faster than other container services in AWS.

Image description

In the past years, I wrote a lot about Serverless latency, and you can find some articles here

I have learned a few tricks that make my architecture scalable and reliable at a relatively low cost. One trick is to use all the services available to me.

The first and most important is CACHE -- More here

CloudFront is my primary option for server-side caching. Caching at the edge reduces latency and is cost-effective because it decreases the number of calls to the service. CloudFront only caches responses to GET, HEAD, and OPTIONS requests.

If my architecture would use only POST for some reason, I cannot utilise it for caching but only for API Acceleration.

Let's expand only POST architecture, assuming that I will not use APIGW for only POST but a GrapQL API like AppSync.

This architecture needs to resolve a requestById that, in theory, is a GET.

In a setup, CloudFront (shield enabled DE and IRL) -> APIGW/ALB (DE and IRL) with:

  • 4.5B requests per month on CloudFront

  • 700M of requests per month to the origin

  • 10TB of data transfer per month

It will cost roughly:

  • $ ~1K $ for CloudFront

  • $ ~2K for APIGW, and $ ~500 with ALB

NOTE: If I enable CORS APIGW for each GET, I also send OPTION, so APIGW makes me pay for it and explains why the bill is high. Check more here

If we do the same with AppSync, where nothing is cached, only the 700M to the origin will cost over ~3K, and around 4B will cost ~16K.

Of course, because it is a POST, we can enable CACHE at the endpoint level to reduce the load on our computation (Lambda), but the price will increase based on the Cache memory size type.

Given this, I would drop the AppSync option and move to Cluster with ApolloServer, which will be accompanied by a cache-aside cluster. I still have the problem of processing all those requests.

If I consider the classic API (APIGW can be replaced with ALB) example:

Image description

I need to overcome the limitation of Lambda at scale.

The simplest way is to move to Multi-Region:

Image description

Thanks to Route 53, I can now have an active configuration based on latency, weight, and so on, and I can spread the traffic. Still, with this, I have only 2000 concurrency at my disposal. What if I want more?

Read more here

Sometimes, Multi-Region is not a solution, so the next step, which I found super efficient, is traffic isolation.

Imagine I am hitting the API, and I can distinguish the traffic, for example, by:

  • registered users and anonymous users

  • platforms

  • country

  • whatever

I can now rewrite the URL at the CloudFront level and route the request to a specific APIGW resource, which will hit a Lambda in a specific path instead of one for all.

This Lambda is the same Lambda function but has been deployed multiple times.

Image description

With this trick, I do not have 1000 concurrent executions instead I have N times the concurrency based on my duplication at no extra complexity because the Lambda is coded and maintained once as it is the same logic. The only difference is at the infrastructure level, where you need to have multiple paths on your API:

MyApi:
    Type: AWS::Serverless::Api
    Properties:
      DefinitionBody:
        openapi: "3.0.3"
        paths:
          /api/path1
            get:
              x-amazon-apigateway-integration:
                credentials: !Ref ApiGwRole
                httpMethod: "POST"
                uri:
                  Fn::Sub: "arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${LambdaARN_Path1}:live/invocations"
                passthroughBehavior: "when_no_match"
                type: "aws_proxy"
          .....
          /api/path2
            get:
              x-amazon-apigateway-integration:
                credentials: !Ref ApiGwRole
                httpMethod: "POST"
                uri:
                  Fn::Sub: "arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${LambdaARN_Path2}:live/invocations"
                passthroughBehavior: "when_no_match"
                type: "aws_proxy"
          .....

This design gives me an extra benefit. The complete isolation of traffic allows me not to go down entirely if one of these paths reaches its limit.

I can double all if I go Multi-Region another no brainer with Serverless architecture.

At this point, I have a scalable system that can handle the spiky traffic very well, but is it still more expensive than a Cluster? I am pretty sure that I will most likely be 100% over-provisioned.

Of course, it is all a trade-off, and this image says all

Image description

I have honestly 3 options:

  1. Overprovisioning and paying more all the time with Cluster

  2. Use the fastest runtime only for my single API functionality Lambda

  3. Stay with my runtime and pay-as-you-go with my slower serverless runtime.

For example, if we compare a Lambda written in Node to one written in Rust with up to 100M invocations per day, changing the runtime can reduce the price by 45%.

Image description

I am not suggesting that Rust is the answer to everything, but with Serverless, I can effectively leverage advantages that significantly reduce costs to levels comparable to ECS Fargate without the need to:

  • overprovision resources

  • maintain only the bare minimum

  • compromise on reliability during peak usage periods