Serving ML Models with FastAPI: A Production-Ready API in Minutes
FastAPI makes it easy to wrap a trained ML model in a REST API. In our simple setup, a client sends JSON input to a FastAPI endpoint (e.g. /predict), the app loads the trained model and returns the prediction as JSON. FastAPI uses Python type hints and Pydantic to automatically validate incoming data . In practice this embeds the model in an API service — a common production pattern . For example, one tutorial builds a FastAPI service that accepts feature data and returns model predictions .
We’ll train a small classifier on the classic Iris dataset (four numeric flower features, three classes) , save it with pickle, and write a FastAPI app that loads the model and serves /predict. FastAPI automatically generates data-validation code from a Pydantic schema, so the endpoint will reject invalid inputs. In this first part, we do everything locally. In later articles we’ll see how to containerize and deploy this app.
Training and Saving a Model
First we train a simple model and save it to disk. Here we use scikit-learn’s Iris data (a “classic and very easy multi-class classification dataset” ) and a Random Forest:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pickle
# Load Iris data and split into train/test sets
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42)
# Train a Random Forest classifier
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)
# Save the trained model to a file
with open("iris_model.pkl", "wb") as f:
pickle.dump(clf, f)
This code loads the Iris features (data.data) and labels (data.target), splits them 80/20, trains the classifier, and writes it to iris_model.pkl. We’ll use this pickle file in our FastAPI app. (In practice, you’d retrain with all data and fix the random seed for reproducibility.)
Creating the FastAPI App
Next, create a new Python file main.py for our FastAPI service. We first import FastAPI and Pydantic’s BaseModel, and load the model from the pickle file:
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
# Load the model once at startup
model = pickle.load(open("iris_model.pkl", "rb"))
app = FastAPI()
FastAPI will serve requests with this app. Now we define a request schema with Pydantic. We create a class inheriting from BaseModel that declares each expected field (using Python types):
class IrisFeatures(BaseModel):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
Each attribute here corresponds to one numeric feature. FastAPI automatically enforces these types: if a request is missing a field or has the wrong type, FastAPI/Pydantic will reject it . In FastAPI’s own words: “To declare a request body, you use Pydantic models” , which provides automatic validation. This means our endpoint will only get valid, typed data.
Implementing the /predict Endpoint
With the model and schema in place, we add a prediction endpoint. We use a POST request that takes an IrisFeatures object. Inside the function we extract the feature values and call the model:
@app.post("/predict")
def predict_iris(data: IrisFeatures):
# FastAPI gives us an IrisFeatures instance with validated fields
features = [[
data.sepal_length,
data.sepal_width,
data.petal_length,
data.petal_width
]]
pred = model.predict(features)[0]
# Return the result as JSON
return {"prediction": int(pred)}
This defines a /predict path that accepts JSON like {“sepal_length”:5.1, “sepal_width”:3.5, …}. FastAPI parses it into an IrisFeatures object (erroring out if invalid), then we call model.predict() on the feature vector. We return a JSON dict with the integer prediction (0, 1, or 2). (In a real app you might map these to class names.)
Under the hood, FastAPI also generates OpenAPI docs (/docs), but all we need is this code. Thanks to FastAPI and Pydantic, we didn’t have to write any extra validation code — it’s done automatically . This keeps our code concise yet robust.
Running and Testing Locally
Save main.py and run the app with Uvicorn (an ASGI server):
uvicorn main:app --reload
This starts the API on http://127.0.0.1:8000 by default. Uvicorn’s — reload flag restarts the server on code changes (useful in development). You should see output confirming the app is running.
Now we can test the /predict endpoint. For example, using curl:
curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}'
The API will respond with something like:
{"prediction":0}
Here 0 corresponds to Iris setosa in the training data. You could try different inputs, e.g.:
curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d '{"sepal_length": 6.7, "sepal_width": 3.1, "petal_length": 4.7, "petal_width": 1.5}'
{"prediction":1}
You should see valid JSON responses. Any malformed request (e.g. missing a field) will be rejected with a 422 error by FastAPI.
🚀 Want to see the full working example?
You can find the complete source code (including model training, FastAPI app, Dockerfile, and Kubernetes manifests) here:
👉 GitHub: fastapi-ml-deployment-template: https://github.com/grigorkh/fastapi-ml-deployment-template
This repo aligns with this article series and is perfect if you’re learning how to deploy real ML models using FastAPI.
Conclusion
We’ve built a minimal FastAPI service that serves a scikit-learn model. The key steps were: training and pickling a model, writing a FastAPI app that loads the model, and declaring a Pydantic schema for the input. FastAPI then handles the HTTP interface and data validation for us . This results in a clean, production-ready endpoint /predict that can be called by any client.
In the next article, “Dockerizing Your FastAPI ML App”, we’ll take this same app and package it in a Docker container. That will let us deploy the model service consistently across any environment, and prepare it for scaling. Stay tuned for part 2!