Building an MCP server as an API developer
Anthropic released MCP at the end of November 2024. It took a few months for it to catch on, but my, oh my, it feels like the community is ablaze with MCP fever these days.
In this post, I won’t go too deep on what it is and why it could be useful. If you’re not yet familiar with what MCP is and why it it could be useful, I’d recommend reading my colleague Anton’s blog post that covers MCP server fundamentals with Peppa Pig along to help with the illustrations.
In short, it’s a protocol specification that defines a standard way for an LLM to integrate additional context into your LLM-enhanced applications.
Instead, I talk through my journey of building an MCP server as a backend API developer. I walk through the process by starting with core business logic, then building a local MCP server, and then deploying it as a remote MCP server using Amazon API Gateway and AWS Lambda. It builds on Anton’s sample repository of serverless MCP servers but does so in my language of choice, Python.
To be clear, this is an exploration of how to stitch the components together. But just because you can, doesn’t mean you should.
And so, a word of caution. As with deployment of standard APIs, ensure that you leverage appropriate deployment guardrails, implement least privilege security principles, and operate with proper observability. Otherwise, you could be deploying insecure application components, unintentionally exposing data, and operating a production application in the dark.
But before digging in, first the use case.
Starting with a use case
Over the last year and a half, I’ve gotten into running to the point of listening to podcasts and reading up on how to improve endurance and aerobic capacity. In one podcast episode, I heard about one particular runner who used data available via the Strava APIs to help him analyze his runs — both to put together a training plan and to measure if said training plan is yielding results.
On hearing this, I immediately set out to do likewise, to get that data, throw it into a spreadsheet, and better understand my training progress. Then I thought, why not merge the two interests and build my own MCP server using the Strava APIs. Furthermore, because the Strava APIs require OAuth2 for accessing personal data, I figured this would be a good use case for investigating how to implement security in this application flow.
Note that I’m aware that there are already code samples to do this. I intentionally chose not to look and start from scratch, as this is my typical way of learning, by building from the ground up. So I started by building the base functionality as a local script.
Once I had this local functionality, I then set out to deploy this as an API.
Building the API endpoint
As a Python developer and a serverless enthusiast, I decided to stick this behind API Gateway and Lambda. Fortunately, there’s an existing pattern for doing this in Python. Below is the full stack, inclusive of not just the AWS components but also the stack within the Python code.
Let’s work right to left, in numerical order to understand the full stack.
(1) The local script, written in the prior section, is the core business logic, implemented as a class that interacts with the Strava API. This code can be executed on any compute target.
(2) The Strava class is then integrated into FastAPI as an endpoint: /strava
. I have a BaseModel
for what I expect in the POST
body request. Then I just instantiate an object and make an authenticated request to the Strava API.
@app.post("/strava")
async def get_activities(request: StravaRequest):
try:
strava = StravaOAuth(
client_id=request.client_id,
client_secret=request.client_secret
)
payload = strava.make_authenticated_request(
request.access_token,
'GET',
request.endpoint
)
return payload
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
(3) Lambda Web Adapter (LWA) is used to proxy events from the Lambda handler to the FastAPI endpoint. LWA is what allows a web application like FastAPI to operate as a Lambda function.
(4) When using LWA, the handler is actually configured as a bash script. However, in the diagram above, one might notice that I use Python for the handler. This is because the function is still configured as a managed Python runtime.
(5) For the function deployment, all of that code is packaged up and deployed as a zip artifact.
(6) The API Gateway endpoint then targets the Lambda function on a POST
request. I defined my API Gateway endpoint using an OpenAPI specification (OAS) template.
At this point, I have an endpoint deployed in AWS that can retrieve my Strava data from the Strava APIs. As part of the POST
body request, I include my client id, client secret, access token, and the request endpoint for the specific data that I’d like to retrieve.
Building a local MCP server with stdio
Now with that core business logic, I set out to build a basic local MCP server using stdio. The quickstart tutorial made this pretty simple with FastMCP. Integrating my core business logic led to something like this (redacted for simplicity):
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("strava", stateless_http=True)
@mcp.tool()
def get_activities(client_id, client_secret, access_token, endpoint):
try:
strava = StravaOAuth(
client_id=client_id,
client_secret=client_secret
)
payload = strava.make_authenticated_request(
access_token,
'GET',
endpoint
)
return payload
except Exception as e:
return {"error": str(e)}
if __name__ == "__main__":
mcp.run(transport='stdio')
Merging the API into an MCP server using Streamable HTTP
Anthropic introduced Streamable HTTP as an new transport with a number of key properties that make it possible to deploy the MCP server with API Gateway and Lambda. Two of them as are follows: 1/ a completely stateless server that does not require long-lived connections, 2/ the ability to implement in a plain HTTP server without requiring SSE (server-sent events).
Anthropic updated the Python SDK (1.8.0) to support the new Streamable HTTP transport and provided an example of how to hook up an MCP server with FastAPI. So following the example, I added a few lines to my MCP server code, and I was all set!
from fastapi import FastAPI
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("strava", stateless_http=True)
@mcp.tool()
def get_activities(...): ...
app = FastAPI(title="Strava",lifespan=lambda app: mcp.session_manager.run())
app.mount("/strava", mcp.streamable_http_app())
if __name__ == "__main__":
mcp.run(transport='streamable-http')
Except, I wasn’t all set. It wasn’t clear to how I connect to the Streamable HTTP server to test it.
Setting up the host and path properties for the MCP server
While the host and port have default settings, I wanted to know how to set those properties via configuration. The host and port for the server can be configured in one of two ways.
The first is by configuring those properties explicitly on the FastMCP server. The settings property for FastMCP uses **kwargs
to allow for an arbitrary set of parameters. The configuration of the Streamable HTTP server then uses settings.host
and settings.port
to configure the uvicorn
server.
mcp = FastMCP("strava", stateless_http=True, host="127.0.0.1", port=8000)
@mcp.tool()
def get_activities(...): ...
if __name__ == "__main__":
mcp.run(transport='streamable-http')
I use argparse
to run the same MCP server with different transports, so I use --mode streamable-http
to execute the code path above. I run the following to start the MCP server and get the following output:
uv run src/server_fastmcp.py --mode streamable-http
INFO: Started server process [72862]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
The alternate way is to leave the defaults on the FastMCP server and instead to mount the MCP server to a FastAPI server, where the host and port settings are then configured.
mcp = FastMCP("strava", stateless_http=True)
@mcp.tool()
def get_activities(...): ...
app = FastAPI(title="Strava",lifespan=lambda app: mcp.session_manager.run())
app.mount("/strava", mcp.streamable_http_app())
if __name__ == "__main__":
# used argparse to get the host and port parameters in args
uvicorn.run(app, host=args.host, port=args.port, log_level="info")
I use --mode fastapi
, --host 127.0.0.1
, and --port 8000
to explicitly run the uvicorn
server with these options. I run the following to start the MCP server and get the following output:
uv run src/server_fastmcp.py --mode fastapi --host 127.0.0.1 --port 8000
INFO: Started server process [74474]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
Both approaches appear to yield the same result. Now I have my MCP server running with the Streamable HTTP transport at the same address. I chose this latter approach with uvicorn
as my default mode. Alright, I’m now ready to test.
Figuring out the correct path for the MCP server
Except when I tried to connect to http://127.0.0.1:8000/strava
, I got the following messages:
INFO: 127.0.0.1:54272 - "POST /strava HTTP/1.1" 307 Temporary Redirect
INFO: 127.0.0.1:54273 - "POST /strava/ HTTP/1.1" 404 Not Found
First, it was clear that it wanted the URL to end with a /
and second, it couldn’t find the endpoint. And so began the process of reading documentation and trying to figure out how to properly connect. In retrospect, I see now that it was right there, but I just didn’t understand it at the time.
By default, SSE servers are mounted at /sse and Streamable HTTP servers are mounted at /mcp.
The eagle-eyed reader can see the confusion. The FastAPI app mounts the MCP server as /strava
but the SDK then transparently mounts the Streamable HTTP server at /mcp
. It took me some trial and error testing to figure out that /strava/mcp
was the correct endpoint.
After updating the URL to http://127.0.0.1:8000/strava/mcp
, I got a successful connection when using FastMCP with FastAPI:
INFO: 127.0.0.1:54557 - "POST /strava/mcp HTTP/1.1" 307 Temporary Redirect
INFO: 127.0.0.1:54558 - "POST /strava/mcp/ HTTP/1.1" 200 OK
INFO: 127.0.0.1:54557 - "POST /strava/mcp HTTP/1.1" 307 Temporary Redirect
INFO: 127.0.0.1:54558 - "POST /strava/mcp/ HTTP/1.1" 202 Accepted
INFO: 127.0.0.1:54557 - "GET /strava/mcp HTTP/1.1" 307 Temporary Redirect
INFO: 127.0.0.1:54558 - "GET /strava/mcp/ HTTP/1.1" 200 OK
Note that when I run mcp.run(transport='streamable-http')
which is not mounting the MCP server to FastAPI, I need to use the following URL: http://127.0.0.1:8000/mcp
. FastAPI is what injects the additional /strava
mount point in the path.
Alright, so I’ve got it working locally. Now to get it running in API Gateway.
Deploying these updates to API Gateway
When I ran just the FastAPI server behind API Gateway and Lambda, I setup the OAS path simply as /strava
. However, to convert the API into an MCP server using Streamable HTTP, I now learned that I need to update the OAS path to /strava/mcp
. With that change, I have an API Gateway endpoint integrated with Lambda, running a stateless MCP server using the Streamable HTTP transport.
Now I have a similar solution from the beginning but now with the MCP injected in the software stack and with MCP terminology sprinkled across the deployment.
Testing with MCP Inspector
I mentioned testing the MCP server above but glossed over how to test and use the MCP server. Anthropic provides a visual testing tool for MCP servers with MCP Inspector. To run MCP Inspector with a uv
environment, the project should be initialized and installed with the CLI tools. After that, write the code and test with Inspector as follows:
# initializing the project
uv init [project-name]
cd [project-name]
source .venv/bin/activate
uv pip install "mcp[cli]"
# starting inspector
mcp dev src/server_fastmcp.py
If all went well, the terminal displays the following output.
Starting MCP inspector...
⚙️ Proxy server listening on port 6277
🔍 MCP Inspector is up and running at http://127.0.0.1:6274 🚀
So now in one terminal, the MCP server is running locally (outlined in the previous section) and in a second terminal, MCP inspector is running. Two separate processes. At this point, I opened up my browser of choice to connect to the MCP inspector endpoint above. Below is an example of what I get after connecting.
Connecting with Claude Desktop and Q CLI
Now to configure an application to connect to the MCP servers, below is the standard configuration format:
{
"mcpServers": {
"name": {
"command": "/absolute/path/to/uv",
"args": [
"run",
"--with",
"mcp[cli]",
"mcp",
"run",
"/absolute/path/to/server_fastmcp.py"
]
}
}
Note that I include the /absolute/path/to/binary|script
. If the binary or script is not on the system path and the application is unable to find the script or executable, the application displays the following error “MCP [name]: spawn [binary] ENOENT”, akin to the following:
The example configuration above shows how to connect to a local MCP server, executed directly as a script using uv
. Applications can also run an MCP server via a container, like with Github’s MCP server. And of course, applications can connect to a remote MCP server with Streamable HTTP transport, using something like mcp-remote (experimental).
{
"mcpServers": {
"weather": {
"command": "/absolute/path/to/uv",
"args": [
"run",
"--with",
"mcp[cli]",
"mcp",
"run",
"/absolute/path/to/server_fastmcp.py"
]
},
"strava": {
"command": "npx",
"args": [
"mcp-remote",
"https://your-api-id.execute-api.us-east-1.amazonaws.com/dev/strava/mcp/"
]
},
"github": {
"command": "/opt/podman/bin/podman",
"args": [
"run",
"-i",
"--rm",
"-e",
"GITHUB_PERSONAL_ACCESS_TOKEN",
"ghcr.io/github/github-mcp-server"
],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "your-github-personal-access-token"
}
}
}
}
And with that configuration, I can load up my application and submit prompts with my MCP servers providing additional context, as appropriate.
Thinking about APIs with MCP servers
I don’t claim credit for the seed of this insight, but it got me thinking a bunch about the line between what is developed as part of the API or what gets offloaded to the LLM as the reasoning engine of an application.
My colleague Anton was doing a demo of his MCP server on Lambda and showed how he did not implement any pagination logic in his code. Instead, he provided instructions to the tool on how to make paginated calls and deferred to the LLM to make those calls.
When I initially built my Strava class for retrieving my running activities, I built in the logic for making paginated calls until all running activities were retrieved. That logic carried over through to the implementation of the MCP server. It worked fine.
However, I thought about it more. I didn’t need all of the data every time. My initial use case did, as I wanted to download everything and then do some analysis of historical trends. But maybe I just want to look at the last two months of data to see my current trajectory. I could go and update my code to handle that. But I could also just provide instructions on how to get pages of data, and let the LLM make the determination of how much data to retrieve. In other words, I could offload that code and logic. So a part of my tool description included the following:
3/ This tool takes as an argument the endpoint to call, which will target the Strava API endpoint at https://www.strava.com/api/v3/athlete/activities by default.
4/ This tool can also retrieve paginated results at the same endpoint by passing the 'page' and 'per_page' parameters. Retrieving paginated results is useful when the user wants to get a large number of activities.
With that tool description, I was able to get some pretty cool reporting.
And because I had implemented the weather server example from the quickstart tutorial, I was also able to use both MCP servers in collaboration to do some analysis on my running data from Strava, correlated with the local weather for each of those runs. That’s integration code that I did not have to write myself. That was a wow moment.
Of course, there are trade-offs to consider here. My use case is simple — a tool to summarize my current training trends and provide training recommendations based on those trends. On the other hand, an organization could have use cases that are far more stringent and sensitive that requires far more vetting than I put into this prototype. Furthermore, writing your own code is deterministic while leaning on the LLM as the reasoning engine in your workflow adds non-determinism, latency, and cost. If considering this route, make sure that it makes sense given these considerations.
Closing thoughts
Again, just because you can, doesn’t mean you should. This applies to MCP more broadly and even applies to running MCP using serverless. On the former, with the specification is rapidly evolving, there’s still a lot around security, governance, and observability that need to be ironed out. On the latter, the stack could work but it depends on the requirements.
I’m still researching security implications specifically with auth and hope to have a follow-up post that covers some of those considerations. While OAuth2 is being implemented in the MCP SDKs, the issue is this approach violates the separation of roles, as designed in the OAuth2 specification. Anyway, more to come.
I’ve also put operational considerations in my backlog with the aim of covering observability and performance characteristics of running MCP using serverless. Stay tuned!