Serve Machine Learning Models via REST APIs in Under 10 Minutes

Image by Author | Canva

If you like building machine learning models and experimenting with new stuff, that’s really cool — but to be honest, it only becomes useful to others once you make it available to them. For that, you need to serve it — expose it through a web API so that other programs (or humans) can send data and get predictions back. That’s where REST APIs come in.

In this article, you will learn how we’ll go from a simple machine learning model to a production-ready API using FastAPI, one of Python’s fastest and most developer-friendly web frameworks, in just under 10 minutes. And we won’t just stop at a “make it run” demo, but we will add things like:

Validating incoming data
Logging every request
Adding background tasks to avoid slowdowns
Gracefully handling errors

So, let me just quickly show you how our project structure is going to look before we move to the code part:

ml-api/
│
├── model/
│   └── train_model.py        # Script to train and save the model
│   └── iris_model.pkl        # Trained model file
│
├── app/
│   └── main.py               # FastAPI app
│   └── schema.py             # Input data schema using Pydantic
│
├── requirements.txt          # All dependencies
└── README.md                 # Optional documentation

Step 1: Install What You Need

We’ll need a few Python packages for this project: FastAPI for the API, Scikit-learn for the model, and a few helpers like joblib and pydantic. You can install them using pip:

pip install fastapi uvicorn scikit-learn joblib pydantic

And save your environment:

pip freeze > requirements.txt

Step 2: Train and Save a Simple Model

Let’s keep the machine learning part simple so we can focus on serving the model. We’ll use the famous Iris dataset and train a random forest classifier to predict the type of iris flower based on its petal and sepal measurements.

Here’s the training script. Create a file called train_model.py in a model/ directory:

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import joblib, os

X, y = load_iris(return_X_y=True)
clf = RandomForestClassifier()
clf.fit(*train_test_split(X, y, test_size=0.2, random_state=42)[:2])

os.makedirs("model", exist_ok=True)
joblib.dump(clf, "model/iris_model.pkl")
print("✅ Model saved to model/iris_model.pkl")

This script loads the data, splits it, trains the model, and saves it using joblib. Run it once to generate the model file:

python model/train_model.py

Step 3: Define What Input Your API Should Expect

Now we need to define how users will interact with your API. What should they send, and in what format?

We’ll use Pydantic, a built-in part of FastAPI, to create a schema that describes and validates incoming data. Specifically, we’ll ensure that users provide four positive float values — for sepal length/width and petal length/width.

In a new file app/schema.py, add:

from pydantic import BaseModel, Field

class IrisInput(BaseModel):
    sepal_length: float = Field(..., gt=0, lt=10)
    sepal_width: float = Field(..., gt=0, lt=10)
    petal_length: float = Field(..., gt=0, lt=10)
    petal_width: float = Field(..., gt=0, lt=10)

Here, we’ve added value constraints (greater than 0 and less than 10) to keep our inputs clean and realistic.

Step 4: Create the API

Now it’s time to build the actual API. We’ll use FastAPI to:

Load the model
Accept JSON input
Predict the class and probabilities
Log the request in the background
Return a clean JSON response

Let’s write the main API code inside app/main.py:

from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import JSONResponse
from app.schema import IrisInput
import numpy as np, joblib, logging

# Load the model
model = joblib.load("model/iris_model.pkl")

# Set up logging
logging.basicConfig(filename="api.log", level=logging.INFO,
                    format="%(asctime)s - %(message)s")

# Create the FastAPI app
app = FastAPI()

@app.post("/predict")
def predict(input_data: IrisInput, background_tasks: BackgroundTasks):
    try:
        # Format the input as a NumPy array
        data = np.array([[input_data.sepal_length,
                          input_data.sepal_width,
                          input_data.petal_length,
                          input_data.petal_width]])
        
        # Run prediction
        pred = model.predict(data)[0]
        proba = model.predict_proba(data)[0]
        species = ["setosa", "versicolor", "virginica"][pred]

        # Log in the background so it doesn’t block response
        background_tasks.add_task(log_request, input_data, species)

        # Return prediction and probabilities
        return {
            "prediction": species,
            "class_index": int(pred),
            "probabilities": {
                "setosa": float(proba[0]),
                "versicolor": float(proba[1]),
                "virginica": float(proba[2])
            }
        }

    except Exception as e:
        logging.exception("Prediction failed")
        raise HTTPException(status_code=500, detail="Internal error")

# Background logging task
def log_request(data: IrisInput, prediction: str):
    logging.info(f"Input: {data.dict()} | Prediction: {prediction}")

Let’s pause and understand what’s happening here.

We load the model once when the app starts. When a user hits the /predict endpoint with valid JSON input, we convert that into a NumPy array, pass it through the model, and return the predicted class and probabilities. If something goes wrong, we log it and return a friendly error.

Notice the BackgroundTasks part — this is a neat FastAPI feature that lets us do work after the response is sent (like saving logs). That keeps the API responsive and avoids delays.

Step 5: Run Your API

To launch the server, use uvicorn like this:

uvicorn app.main:app --reload

Visit: http://127.0.0.1:8000/docs
You’ll see an interactive Swagger UI where you can test the API.
Try this sample input:

{
  "sepal_length": 6.1,
  "sepal_width": 2.8,
  "petal_length": 4.7,
  "petal_width": 1.2
}

or you can use CURL to make the request like this:

curl -X POST "http://127.0.0.1:8000/predict" -H  "Content-Type: application/json" -d 
'{
  "sepal_length": 6.1,
  "sepal_width": 2.8,
  "petal_length": 4.7,
  "petal_width": 1.2
}'

Both of the them generates the same response which is this:

{"prediction":"versicolor",
 "class_index":1,
 "probabilities": {
	 "setosa":0.0,
	 "versicolor":1.0,
	 "virginica":0.0 }
 }

Optional Step: Deploy Your API

You can deploy the FastAPI app on:

Render.com (zero config deployment)
Railway.app (for continuous integration)
Heroku (via Docker)

You can also extend this into a production-ready service by adding authentication (such as API keys or OAuth) to protect your endpoints, monitoring requests with Prometheus and Grafana, and using Redis or Celery for background job queues. You can also refer to my article : Step-by-Step Guide to Deploying Machine Learning Models with Docker.

Wrapping Up

That’s it — and it’s already better than most demos. What we’ve built is more than just a toy example. However, it:

Validates input data automatically
Returns meaningful responses with prediction confidence
Logs every request to a file (api.log)
Uses background tasks so the API stays fast and responsive
Handles failures gracefully

And all of it in under 100 lines of code.

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.

Serve Machine Learning Models via REST APIs in Under 10 Minutes

Step 1: Install What You Need

Step 2: Train and Save a Simple Model

Step 3: Define What Input Your API Should Expect

Step 4: Create the API

Step 5: Run Your API

Optional Step: Deploy Your API

Wrapping Up

Related Posts

How To Create Subtitles Automatically Using Video Transcription

Big Data In Business: How It Has Transformed The Landscape

Leave a Reply Cancel reply