Considering design trade-offs when building serverless APIs on AWS
When I talk to customers who are building serverless APIs, the architects often ask for some prescriptive guidance with some authoritative best practice, as this makes it easier for them to then provide direction to the developers in their organizations. However, my response, perhaps to the frustration of said customers, is that it depends. It’s a matter of trade-offs that need to take into account the business and technical requirements of a particular use case along with where that use case is in its life cycle.
In this post, I’ll cover a few frequently asked questions and how I navigate the trade-offs. When referring to serverless APIs, I assume that Amazon API Gateway is used as the front door and AWS Lambda is used as the compute backend. I also assume the serverless APIs are synchronous request/response RESTful APIs.
To clarify upfront, here are some terms and definitions for this post. I use endpoint to refer to each instantiation of Amazon API Gateway. I use resources and methods to refer to API resources and the associated methods of each API resource. I use functions to refer to AWS Lambda functions. I use layers to refer to AWS Lambda layer versions.
Picking the appropriate use cases and overcoming cold starts
Let’s start with the elephant in the room. Whenever working with synchronous workloads and serverless compute, cold starts invariably come up. Some might even immediately eliminate Lambda as an option for this use case.
[Q1] What type of p99 response latency is required for these APIs?
A friend pinged me asking if it would be too controversial to say that “Lambda isn’t a good fit for applications that require a response time of <10ms”. If you have p99 requirements of single digit millisecond latency for your APIs, then no, that’s not controversial. That’s accurate, and I’d say that serverless compute might not be the place to start with those kind of requirements.
That said, trust but verify. I’d double check if the single digit millisecond latency is really a requirement. I’ve had conversations where the developers say it absolutely is a requirement, but then the business folks say that a 1 second response would be just fine for the use case. I’d also check to see what the workload looks like over time. If it’s a steady state application, then it’s possible that the percentage of observed cold starts for that function could actually be quite low. I’d also check to see what language runtime they’re using, as compiled languages like C/C++, Go, or Rust might be able to meet those performance requirements. It depends.
Lambda has also done a lot to optimize cold starts over the years, e.g. provisioned concurrency, language/runtime-specific guidance for tuning, and SnapStart for Java, .NET, and Python. If your use case has a little more wiggle room with response time SLAs, then API Gateway and Lambda could be a fit. If it’s possible to invest time in building a minimal prototype, you should deploy the solution and load test it. Use the results of the load test to model the behavior and cost of your application so that you have estimates based on your actual workload.
Mapping API resources and methods to Lambda functions
Let’s suppose you’ve done the calculations, and it looks promising. Now you want to build on the prototype and need to figure out how to structure the code to handle API requests.
[Q2] How should I structure the code in my Lambda functions to handle the API resources and methods?
Monoliths get a bad rap these days but it’s not necessarily a bad place to start if you write your code in a modular way. Using a single function for every resource and method allows you to iterate quickly, especially if the responsibilities of your entities are not yet clear. For example, maybe you have an onboarding API to allow vendors to bulk upload data. The onboarding API likely uses capabilities from both user and product. This approach allows you to build quickly and refactor code fairly easily. Werner Vogels talks about evolvable architectures, namely in adapting your code base as the product matures and scales.
While this approach gives you agility, it might not be optimal from a security standpoint. For example, the /user and /product resources use a backend DynamoDB table while the /onboarding resource accesses an S3 bucket for bulk uploads. With a single function, the execution role necessarily needs access to both the DynamoDB table and the S3 bucket. This means that the modules in the code have access to more AWS resources than is required and thus doesn’t follow least privilege security principles.
Per resources functions might be the next logical choice and, candidly, is my preferred* starting point (see section below on managing shared dependencies). A resource represents an entity within a bounded context, which makes it simple to reason about how to structure the code. The user, product, and onboarding entities can be encapsulated in separate functions. This gives you a better least privilege security posture by configuring the execution role of each function to access only the resources that are required. From a deployment standpoint, this also gives you more fine grained deployments, as an update to the user entity doesn’t impact the product entity and so on.
I like this approach because I can write code for specific entities and can configure security policies per entity. However, some organizations may want even finer grained permissions, bifurcating read-only code from read-write code, akin to principles with CQRS.
Per method functions might then be an option for organizations that are particularly security sensitive. This approach ensures that the GET
method only has read-only access to downstream resources while the POST
, PUT
, and DELETE
methods have the appropriate read-write access.
While this approach does afford the best least privilege security posture, it does come with some downsides. Each entity likely has some class object that defines the properties and methods. If a change is made to the entity object, then the code for that change likely needs to be propagated to and tested with two separate functions. This introduces more deployment and operational overhead.
This is where trade-offs come in. If the use case mandates this type of security separation, then you should ensure that you have software delivery processes that can handle these tests and deployments in an automated fashion. However, this is not likely how I’d start building the application, as it might be too cumbersome when I’m trying to iterate rapidly. Perhaps the application evolves into this over time.
Managing shared dependencies
Let’s suppose you’re building either per resource or per method functions and have some shared logic that all functions might use. This could be observability tooling like a standard logging library or shared application logic. With per method functions, this could be the entity objects that you’d prefer to define once and include in both the read-only and read-write functions.
[Q3] How should I share code across Lambda functions?
Welp. I know there are differing opinions on this one, but here’s where I’ve landed on this one. It depends. Ugh. Bear with me.
It depends on the language runtime. I said above that my preferred approach is with per resource functions. This primarily because my preferred language of choice is Python, an interpreted language. So if I wanted to share entity logic across functions, I’d bundle all my entity logic as a shared layer for all relevant functions to attach. Those functions would then easily get access to that entity logic at runtime via PYTHONPATH
by attaching the layer. Now you might think that this violates entity isolation at the resource layer. It does. Hold that thought until the next section that covers choosing integrations for function dependencies.
However, I’ve also been experimenting with Rust lately, a compiled language. With compiled languages (Java, .NET, C/C++, Golang, Rust), those dependencies are checked at compile time and a single artifact is typically generated. However, Java, for example, does allow for pulling in dependencies via the system by setting the maven scope to system. In theory, you could then put those dependencies in a layer and have the uber jar reference them from a layer at runtime. This isn’t something I’d recommend.
So for compiled languages, I would just compile an artifact with all my dependencies. It’s also got me thinking if a monolith function is better for compiled languages, both for code organization purposes and for performance optimization. This is something I intend to explore more in a future post, as I’m re-writing a Python-based serverless API that I wrote last year in Rust. Stay tuned for that.
It also depends on how you organize your code and what other layers need to be attached to your function. Layers should not be used like a dependency manager, packaging each individual versioned dependency in a layer, and then attaching each one to your function. Lambda sets a quota of 5 attached layers for a given function.
Instead, you want to think of higher order groups and attaching those to your functions, e.g. observability, security, and application-specific layers. With this approach, all of your application-specific dependencies are bundled in a single layer and attached to the appropriate functions. That layer has a version and could have an associated versioned manifest that documents the version of each included dependency in that layer. Then you can attach the observability and security layers too while staying under the 5 attached layer quota.
Be aware that layers also have implications on your deployment processes. Suppose you have an observability layer with a logging dependency for which a new CVE is posted. Every function (and application really) that uses that dependency necessarily needs to do a deployment. Assuming the language runtime supports pulling in dependencies at runtime, a central team can update the layer, publish a new version, and notify development teams to then update the layer ARN (Amazon Resource Name) that the relevant functions use. The development teams would update the appropriate templates (or configuration files), commit those to the relevant source control repositories, and then allow the automated CI/CD processes to run. Alternatively, without layers, you can just update the dependency manifest with the secured version, build/compile a new artifact, and follow the same automated software delivery process.
Choosing integrations for function dependencies
Let’s suppose you’re building the onboarding resource and are thinking about how best to implement that. The onboarding resource allows end-users to upload JSON documents for simplified bulk processing. That’s something I’d normally do using an asynchronous event-driven process. However, for illustration purposes, let’s suppose I need to do that synchronously, to give the end-user immediate feedback that the bulk onboarding has completed successfully. For user onboarding, the process might do something like read an array of items and iteratively create each user. Yes, I can use Step Functions with the map state to do that work asynchronously (or even synchronously) and in parallel, but I’m going to build it synchronously as an API for illustration purposes.
[Q4] How should the onboarding resource make calls to the user resource?
The first option would be that all calls for logic outside my entity should be API calls. At this point, folks are likely familiar with Jeff Bezos’ API mandate that all data and functionality must be accessible via APIs. In this case, the onboarding resource iterates through the list of users that need to be created by constructing POST
requests to the /user resource. For every API call? Hold that thought.
The next option could be to bypass the API resource and go directly to the Lambda function. However, it’s an anti-pattern to synchronously invoke another Lambda function from a Lambda function. You have to configure the execution role of the calling function (or resource policy of the called function) to allow for that function chaining, which could be cumbersome to manage. You also double pay for the synchronous invoke (assuming both functions have the same memory configuration), as the calling function is blocked waiting for the called function to complete. But wait a second. Doesn’t the latter hold true if I’m calling it via an API (plus the cost of the API requests)? Yes. Hold that thought.
The final option could be to use the shared dependencies within the onboarding API and bypass the API resource and associated Lambda function all together. This makes it simpler for the onboarding process to use the same logic that was already built in the user resource and avoids the need to manage the other integrations with either the API endpoint or Lambda function. However, this gets messy, violating the entity isolation and necessitating multiple deployments in the event that entity logic changes.
Back to trade-offs. Similar to how Yan Cui discussed choreography versus orchestration, when coding with Python, I prefer using shared dependencies within a bounded context and using an API first approach when communicating across bounded contexts. Within a bounded context, I have knowledge of the implementation details and could even influence that implementation. Across bounded contexts, I don’t have that knowledge, and it’s ultimately obfuscated from me. As a result, the concern of double paying (within my line of business) for Lambda invocations kind of goes away and is the cost burden of the API provider.
Conducting safe deployments across different environments
Let’s suppose you’re ready to build out the deployment pipelines and have three different environments: development, staging, and production.
[Q5] How should I setup my API Gateway endpoints for these different environments?
The first approach would be to use a single endpoint and deploy the different environments as stages within the API Gateway endpoint. This allows teams to reduce the number of AWS resources and centrally control access and security policies for that endpoint. This also allows teams to leverage built-in canary deployments for stages. While there are operator advantages to using stages for the different environments, this approach is not my preference.
The alternate approach would be to use an endpoint per environment. I like this because you can use the same template to deploy to each of the environments and inject environment-specific parameters via configuration files. This gives you deployment independence across environments and reduces potential blast radius concerns of a staging deployment impacting the production environment. This also allows you to potentially deploy each of the environments in different AWS accounts, as I know many customers have adopted multi-account strategies for their AWS deployments. That said, this does shift some of the management overhead and operator burden to now having to manage more resources across disparate AWS accounts.
So again, it’s about trade-offs. Perhaps your organization has particular requirements about using built-in canary deployments. That might lead you to the first approach. Perhaps you have flexibility to build out your deployment pipelines across multiple AWS accounts. That might lead you to the latter approach. My preference is the latter, but it depends.
Conclusion
When designing serverless applications on AWS, an architect has a lot of requirements to consider when building out the architecture and structuring the software. In this post, I put together a few common questions that I hear and outlined some of the trade-offs that I often discuss with customers. Each approach has a set of pros and cons, and it’s important to understand the business and technical requirements when putting together solution options. And as Werner said, it’s important to periodically revisit architectural choices as the application scales and usage evolves. What made sense at launch, might no longer make sense at the next order of magnitude of growth. Happy building!