Cheating Lambda scalability

I am one of those people that can be defined as serverless first. Almost everything can be serverless, and with some bending, you can find a way when serverless has limitations.

Image description

While some argue that it is better to have an architecture with 2 boxes (CLUSTER and Relational Database), I use the features of services to achieve the same results.

For example:

Duplicating data in DynamoDB DB table for better data access vs 1 Relational DB with complex SQL
DynamoDB Streams vs Background Job to find out what has "changed"
Optimising Lambda functions Duration to bring down the cost vs having all in one big cluster
Swapping services when they reach a scale that price makes it no sense, for example, ALB in favour of API Gateway

As code is a liability, with Serverless, I have more control as I keep it to a minimum. If I would build with the Monolith approach of a cluster, I would make the code more "complex" as I would use patterns like:

Multi-tier Architecture or Hexagonal Architecture...
Dependency injection framework
Middleware framework

When I say "complex" with the above options, it is because I now need to leverage some framework, while writing single-responsibility Lambda functions is just a matter of:

Validate the input
Some checks (business logic) with the usage of AWS SDK
Return a response

Now that I have explained why I prefer serverless, I can talk about the limitations of Lambda at scale.

Most people compare only the AWS Bill, and this is not the best way to do it.

Image description

There are a lot of good articles out there:

It is not arguable that Lambda can be more expensive per request than a cluster and that we have some hard limits on scalability.

Image description

Read more here

Sometimes, Multi-Region is not a solution, so the next step, which I found super efficient, is traffic isolation.

Imagine I am hitting the API, and I can distinguish the traffic, for example, by:

registered users and anonymous users
platforms
country
whatever

I can now rewrite the URL at the CloudFront level and route the request to a specific APIGW resource, which will hit a Lambda in a specific path instead of one for all.

This Lambda is the same Lambda function but has been deployed multiple times.

Image description

With this trick, I do not have 1000 concurrent executions instead I have N times the concurrency based on my duplication at no extra complexity because the Lambda is coded and maintained once as it is the same logic. The only difference is at the infrastructure level, where you need to have multiple paths on your API:

MyApi:
    Type: AWS::Serverless::Api
    Properties:
      DefinitionBody:
        openapi: "3.0.3"
        paths:
          /api/path1
            get:
              x-amazon-apigateway-integration:
                credentials: !Ref ApiGwRole
                httpMethod: "POST"
                uri:
                  Fn::Sub: "arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${LambdaARN_Path1}:live/invocations"
                passthroughBehavior: "when_no_match"
                type: "aws_proxy"
          .....
          /api/path2
            get:
              x-amazon-apigateway-integration:
                credentials: !Ref ApiGwRole
                httpMethod: "POST"
                uri:
                  Fn::Sub: "arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${LambdaARN_Path2}:live/invocations"
                passthroughBehavior: "when_no_match"
                type: "aws_proxy"
          .....

This design gives me an extra benefit. The complete isolation of traffic allows me not to go down entirely if one of these paths reaches its limit.

I can double all if I go Multi-Region another no brainer with Serverless architecture.

At this point, I have a scalable system that can handle the spiky traffic very well, but is it still more expensive than a Cluster? I am pretty sure that I will most likely be 100% over-provisioned.

Of course, it is all a trade-off, and this image says all

Image description

I have honestly 3 options:

Overprovisioning and paying more all the time with Cluster
Use the fastest runtime only for my single API functionality Lambda
Stay with my runtime and pay-as-you-go with my slower serverless runtime.

For example, if we compare a Lambda written in Node to one written in Rust with up to 100M invocations per day, changing the runtime can reduce the price by 45%.

Image description

I am not suggesting that Rust is the answer to everything, but with Serverless, I can effectively leverage advantages that significantly reduce costs to levels comparable to ECS Fargate without the need to:

overprovision resources
maintain only the bare minimum
compromise on reliability during peak usage periods