Using Amazon Bedrock with AWS Lambda
With the buzz about GenAI and AWS’s recent GA release of Amazon Bedrock, I decided to spend some time understanding how to use the service and built some basic experiences using AWS Lambda.
Starting with Python and synchronous requests
A quick search yielded a number of articles that use Python with LangChain. Being pretty comfortable with Python, I started by building a quick prototype on Lambda using LangChain and Anthropic’s Claude v2 model. In my initial prototype, the function makes a synchronous request to the model and typically receives results anywhere between 10–20 seconds. I added Powertools to the function to emit custom metrics but found that the function was mainly just waiting for Bedrock to return a response.
Switching to response streaming
The chatbots with which users are familiar stream responses back as the text is being generated. I looked to find examples that combined Lambda’s response streaming with Bedrock’s ability to invoke a model with response streaming but couldn’t find much. I mostly found resources about each one separately but not combined. I list some of those helpful resources at the bottom of this post.
Many of the examples I found for Bedrock were written in Python. However, as of this writing, Lambda’s response streaming is only supported in Node.js or a custom runtime. I set out to put together a simple streaming prototype with Lambda that uses just the AWS SDKs (and avoids using a web framework like FastAPI or Express).
Since Node.js is not my strongest language, I started by writing a basic response streaming function that streamed back text from an array. I modularized the approaches I found in documentation or on serverlessland.com so that the function could demonstrate each approach. The doLoop()
method iterates through an array, writing each line to the responseStream
, sleeps for 50 ms, and ends the responseStream
after iterating through the whole array. The doPipeline()
method uses a pipeline to stream the responses and automatically ends the stream when the request stream is empty.
Next I implemented the Bedrock APIs using the AWS SDK for JavaScript v3. I started by creating a doInvokeWithAwsSdk()
method which implements the InvokeModelCommand()
. Normally the output would just be returned but because this is a streaming response function, I use the doPipeline()
method to stream the synchronous model response back. I then tried to figure out how to implement the InvokeModelWithResponseStreamCommand()
but struggled with how to connect the streaming response to Lambda’s streaming response.
I spent time trying a couple different approaches. I tried using a promise like new Promise((resolve, reject) => { bedrock.send(command, (err, data) => {}) });
. I tried not using the promise but using the data
element in the closure to capture the stream like const stream = data.body
or const stream = data.body.options.messageStream.options.inputStream
.
None of those were working for me inside a Lambda function so I started wondering if streaming responses were supported for this, as I saw some forum posts that not all Bedrock models supported streaming. I put together a quick AWS CLI command to list foundation models that did support response streaming, and Anthropic’s Claude V2 model did indeed support response streaming.
I went to Anthropic’s page to use their SDK and found their “running streaming inference” example very helpful. This led to the implementation of the doStreamingWithAnthropicSdk()
method. It illustrated for me how I needed to use await
with a for
loop to stream chunks back. This allowed me to then complete the implementation of the doStreamingWithAwsSdk()
method. And for completeness sake, I went back and wrote the doInvokeWithAnthropicSdk()
method.
Testing the endpoint with curl
I exposed a function URL (furl), which is a public endpoint to invoke the function. I enabled AWS_IAM
auth on the function to ensure that only authorized requests can invoke the furl. I created an IAM user that only had lambda:InvokeFunctionUrl
permission to the function that I deployed. I’d like to switch this to assuming an IAM role to get temporary STS tokens, but have put that on the backlog for now.
I use curl
as my client for API requests, which can also SigV4 sign your requests with your access and secret keys. You can use two parameters to sign the request: --user AWS_ACCESS_KEY:AWS_SECRET_ACCESS_KEY --aws-sigv4 aws:amz:us-east-1:lambda
. You need to update with your access and secret key and the region where your function is deployed.
However, using the command in this way exposes your credentials to terminal history. Alternatively, you can configure a .netrc
file with the credentials, restrict permission on the .netrc
file to only your user, and replace --user AWS_ACCESS_KEY:AWS_SECRET_ACCESS_KEY
with just --netrc
. This allows you to invoke your publicly exposed furls with a SigV4 signed request without exposing your credentials at the command line.
Showing streaming responses in action
Watch a quick demo showing a Lambda function streaming responses from a Bedrock InvokeModelWithStreamResponse
command using curl
.
Conclusion
As organizations investigate how generative AI can enhance user experiences, Amazon Bedrock and AWS Lambda can help developers quickly build solutions from concept to production.
Resources
- https://serverlessland.com/patterns/lambda-streaming-ttfb-pipeline-sam
- https://stackoverflow.com/questions/77342522/how-to-keep-the-conversation-going-using-aws-bedrock-using-model-claude-v2-in-ne
- https://medium.com/@james.tosswill/a-technical-walkthrough-integrating-amazon-bedrock-with-aws-amplify-8cb2e2aadee2
- https://sankara-sabapathy.hashnode.dev/amazon-bedrock-serverless-on-demand-gen-ai-models-and-apis
- https://docs.anthropic.com/claude/docs/claude-on-amazon-bedrock
- https://everything.curl.dev/usingcurl/netrc