Home » Serve Machine Learning Models via REST APIs in Under 10 Minutes

Serve Machine Learning Models via REST APIs in Under 10 Minutes


Image by Author | Canva

 

If you like building machine learning models and experimenting with new stuff, that’s really cool — but to be honest, it only becomes useful to others once you make it available to them. For that, you need to serve it — expose it through a web API so that other programs (or humans) can send data and get predictions back. That’s where REST APIs come in.

In this article, you will learn how we’ll go from a simple machine learning model to a production-ready API using FastAPI, one of Python’s fastest and most developer-friendly web frameworks, in just under 10 minutes. And we won’t just stop at a “make it run” demo, but we will add things like:

  • Validating incoming data
  • Logging every request
  • Adding background tasks to avoid slowdowns
  • Gracefully handling errors

So, let me just quickly show you how our project structure is going to look before we move to the code part:

ml-api/
│
├── model/
│   └── train_model.py        # Script to train and save the model
│   └── iris_model.pkl        # Trained model file
│
├── app/
│   └── main.py               # FastAPI app
│   └── schema.py             # Input data schema using Pydantic
│
├── requirements.txt          # All dependencies
└── README.md                 # Optional documentation

 

Step 1: Install What You Need

 
We’ll need a few Python packages for this project: FastAPI for the API, Scikit-learn for the model, and a few helpers like joblib and pydantic. You can install them using pip:

pip install fastapi uvicorn scikit-learn joblib pydantic

 

And save your environment:

pip freeze > requirements.txt

 

Step 2: Train and Save a Simple Model

 
Let’s keep the machine learning part simple so we can focus on serving the model. We’ll use the famous Iris dataset and train a random forest classifier to predict the type of iris flower based on its petal and sepal measurements.

Here’s the training script. Create a file called train_model.py in a model/ directory:

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import joblib, os

X, y = load_iris(return_X_y=True)
clf = RandomForestClassifier()
clf.fit(*train_test_split(X, y, test_size=0.2, random_state=42)[:2])

os.makedirs("model", exist_ok=True)
joblib.dump(clf, "model/iris_model.pkl")
print("✅ Model saved to model/iris_model.pkl")

 

This script loads the data, splits it, trains the model, and saves it using joblib. Run it once to generate the model file:

python model/train_model.py

 

Step 3: Define What Input Your API Should Expect

 
Now we need to define how users will interact with your API. What should they send, and in what format?

We’ll use Pydantic, a built-in part of FastAPI, to create a schema that describes and validates incoming data. Specifically, we’ll ensure that users provide four positive float values — for sepal length/width and petal length/width.

In a new file app/schema.py, add:

from pydantic import BaseModel, Field

class IrisInput(BaseModel):
    sepal_length: float = Field(..., gt=0, lt=10)
    sepal_width: float = Field(..., gt=0, lt=10)
    petal_length: float = Field(..., gt=0, lt=10)
    petal_width: float = Field(..., gt=0, lt=10)

 

Here, we’ve added value constraints (greater than 0 and less than 10) to keep our inputs clean and realistic.

 

Step 4: Create the API

 
Now it’s time to build the actual API. We’ll use FastAPI to:

  • Load the model
  • Accept JSON input
  • Predict the class and probabilities
  • Log the request in the background
  • Return a clean JSON response

Let’s write the main API code inside app/main.py:

from fastapi import FastAPI, HTTPException, BackgroundTasks
from fastapi.responses import JSONResponse
from app.schema import IrisInput
import numpy as np, joblib, logging

# Load the model
model = joblib.load("model/iris_model.pkl")

# Set up logging
logging.basicConfig(filename="api.log", level=logging.INFO,
                    format="%(asctime)s - %(message)s")

# Create the FastAPI app
app = FastAPI()

@app.post("/predict")
def predict(input_data: IrisInput, background_tasks: BackgroundTasks):
    try:
        # Format the input as a NumPy array
        data = np.array([[input_data.sepal_length,
                          input_data.sepal_width,
                          input_data.petal_length,
                          input_data.petal_width]])
        
        # Run prediction
        pred = model.predict(data)[0]
        proba = model.predict_proba(data)[0]
        species = ["setosa", "versicolor", "virginica"][pred]

        # Log in the background so it doesn’t block response
        background_tasks.add_task(log_request, input_data, species)

        # Return prediction and probabilities
        return {
            "prediction": species,
            "class_index": int(pred),
            "probabilities": {
                "setosa": float(proba[0]),
                "versicolor": float(proba[1]),
                "virginica": float(proba[2])
            }
        }

    except Exception as e:
        logging.exception("Prediction failed")
        raise HTTPException(status_code=500, detail="Internal error")

# Background logging task
def log_request(data: IrisInput, prediction: str):
    logging.info(f"Input: {data.dict()} | Prediction: {prediction}")

 

Let’s pause and understand what’s happening here.

We load the model once when the app starts. When a user hits the /predict endpoint with valid JSON input, we convert that into a NumPy array, pass it through the model, and return the predicted class and probabilities. If something goes wrong, we log it and return a friendly error.

Notice the BackgroundTasks part — this is a neat FastAPI feature that lets us do work after the response is sent (like saving logs). That keeps the API responsive and avoids delays.

 

Step 5: Run Your API

 
To launch the server, use uvicorn like this:

uvicorn app.main:app --reload

 

Visit: http://127.0.0.1:8000/docs
You’ll see an interactive Swagger UI where you can test the API.
Try this sample input:

{
  "sepal_length": 6.1,
  "sepal_width": 2.8,
  "petal_length": 4.7,
  "petal_width": 1.2
}

 

or you can use CURL to make the request like this:

curl -X POST "http://127.0.0.1:8000/predict" -H  "Content-Type: application/json" -d 
'{
  "sepal_length": 6.1,
  "sepal_width": 2.8,
  "petal_length": 4.7,
  "petal_width": 1.2
}'

 

Both of the them generates the same response which is this:

{"prediction":"versicolor",
 "class_index":1,
 "probabilities": {
	 "setosa":0.0,
	 "versicolor":1.0,
	 "virginica":0.0 }
 }

 

Optional Step: Deploy Your API

 
You can deploy the FastAPI app on:

  • Render.com (zero config deployment)
  • Railway.app (for continuous integration)
  • Heroku (via Docker)

You can also extend this into a production-ready service by adding authentication (such as API keys or OAuth) to protect your endpoints, monitoring requests with Prometheus and Grafana, and using Redis or Celery for background job queues. You can also refer to my article : Step-by-Step Guide to Deploying Machine Learning Models with Docker.

 

Wrapping Up

 
That’s it — and it’s already better than most demos. What we’ve built is more than just a toy example. However, it:

  • Validates input data automatically
  • Returns meaningful responses with prediction confidence
  • Logs every request to a file (api.log)
  • Uses background tasks so the API stays fast and responsive
  • Handles failures gracefully

And all of it in under 100 lines of code.
 
 

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook “Maximizing Productivity with ChatGPT”. As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She’s also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *