Dockerizing Your FastAPI ML App: From Script to Container
In Part 1, we built a FastAPI service that loads a scikit-learn model and exposes a /predict endpoint. Now we’ll package that service into a Docker container. Docker containers bundle an application’s code and its environment, ensuring reproducible deployment . Each container includes its own application and libraries (in the diagram below, each Container has its app and BINS/LIBS), all running on top of a shared Docker Engine and host OS . This layered architecture lets multiple containers coexist without interfering, and avoids “it works on my machine” issues.
Writing a Production-Ready Dockerfile
To containerize our FastAPI app, create a Dockerfile in the project root. We’ll start from a minimal Python image and follow best practices to keep the image lean. For example, use the official slim variant (python:3.11-slim) as the base image . Set the working directory, copy dependency files, install with no caching, and then copy in the app code:
# Use lightweight Python base image (slim variant for smaller size)
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Copy and install dependencies (use --no-cache-dir to avoid pip cache)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt && \
rm -rf /root/.cache/pip
# Copy the FastAPI app and model into the image
COPY . .
# (Optional) Create a non-root user for security
RUN addgroup --system appgroup && adduser --system --ingroup appgroup appuser && \
chown -R appuser:appgroup /app
USER appuser
# Expose port and default command to run the app with Uvicorn
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
This Dockerfile does a few things:
- Base image: We use FROM python:3.11-slim which provides a minimal Python environment . Choosing a slim or specific base image reduces bloat compared to the full image.
- Working directory: WORKDIR /app isolates our app in /app. All subsequent commands run in this folder.
- Copy and install dependencies: We copy only requirements.txt first, then run pip install — no-cache-dir -r requirements.txt . Using — no-cache-dir prevents pip from storing download caches, and we immediately delete the pip cache folder (rm -rf /root/.cache/pip) . This keeps the image smaller. Importantly, by copying and installing dependencies before copying the app code, we leverage Docker’s layer cache: if our code changes but requirements don’t, Docker can reuse the cached layer for faster rebuilds .
- Copy app code: After installing packages, COPY . . brings in our FastAPI script, model file, etc. Now the container has everything it needs.
- Non-root user: For security, we create a system user (appuser) and switch to it . Running containers as a non-privileged user is a best practice in production.
- Run command: Finally, we expose port 8000 (optional but documentation-friendly) and use uvicorn to serve the app on 0.0.0.0:8000. This matches how FastAPI apps are typically run in Docker.
Additionally, it’s wise to add a .dockerignore file to exclude unnecessary local files from the image (e.g. caches, virtual environments, .git folder). For example:
__pycache__
*.pyc
.env
.git
This prevents Docker from copying ephemeral or sensitive files into the container, keeping the build context clean.
Building and Tagging the Docker Image
With the Dockerfile ready, build the image using the docker build command. Specify a meaningful tag (-t) so you can easily reference the image later . For example:
docker build -t fastapi-ml-app:latest .
This tells Docker to use the current directory as the build context (.), read the Dockerfile, and name the resulting image fastapi-ml-app with tag latest. The -t or — tag flag assigns this name in one step . Docker will output layer-by-layer build logs as it pulls the base image and runs each instruction. If successful, you’ll have a new image listed in docker images.
Running and Testing the Container
You can now run the container locally, binding port 8000 of the host to port 8000 of the container. Use the docker run command with -p to publish the port and -d to run in detached mode. For example :
docker run -d -p 8000:8000 fastapi-ml-app:latest
This starts the container in the background. You can check with docker ps to see it running. Docker maps your machine’s localhost:8000 to the container’s port 8000 (as defined by our EXPOSE and Uvicorn command), so you can now send requests to the FastAPI service via http://localhost:8000.
To test the /predict endpoint, use curl (or any HTTP client) to send a POST request with JSON. For example, if your model expects features feature1 and feature2, run:
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"feature1": 5.1, "feature2": 3.5}'
You should receive a JSON response with the prediction, such as:
{"prediction": 0}
(This assumes the sample model returns 0 for the given inputs.) This confirms that the containerized service is working: Uvicorn is running the FastAPI app inside Docker, loading your scikit-learn model, and returning predictions.
💡 Need the full Dockerfile and app code?
Check out the companion GitHub repo that matches this guide step-by-step:
👉 GitHub: fastapi-ml-deployment-template: https://github.com/grigorkh/fastapi-ml-deployment-template
It includes the FastAPI app, trained model, optimized Dockerfile, and everything needed to run and test locally.
Next Steps: Kubernetes Deployment
You have now successfully packaged your FastAPI ML service into a Docker container. In Part 3, we will take this container image and deploy it to a Kubernetes cluster for scalable production serving. We’ll write Kubernetes manifests (Deployment, Service, etc.), push the image to a registry, and show how to scale and manage the model service using Kubernetes. Stay tuned for a hands-on guide to turning this container into a distributed, resilient ML endpoint!