Overview
A production-grade serverless ML inference API that classifies text sentiment as POSITIVE or NEGATIVE using DistilBERT. Deployed on AWS Lambda via Docker containers with API Gateway, demonstrating the full path from model to production-ready cloud endpoint.
Architecture
API Gateway receives HTTP requests and routes them to AWS Lambda, which runs a Docker container housing the FastAPI application. Mangum bridges FastAPI’s ASGI interface to Lambda’s event-based invocation model. The DistilBERT model (fine-tuned on SST-2) is pre-downloaded at Docker build time to minimize cold start latency.
Key Design Decisions
- Model baking: The ~260 MB model weights are downloaded and cached during
docker build, eliminating cold-start download penalties on Lambda. - ASGI adaptation: Mangum translates between FastAPI and Lambda seamlessly, allowing the same codebase to run locally (via Uvicorn) or serverlessly.
- Infrastructure as Code: AWS SAM template defines the entire stack — Lambda function, API Gateway, IAM roles — in a single declarative file.
Tech Stack
- ML Model: DistilBERT (Hugging Face Transformers)
- API Framework: FastAPI
- Lambda Adapter: Mangum
- Infrastructure: AWS SAM, API Gateway, Lambda
- Containerization: Docker (AWS Lambda Python base)