Avatar

Anatolii Kostin

Developer,

Support and

Systems Engineer

Read Resume

Step-by-Step Guide to Model Training and Deployment with MLFlow, FastAPI, and Docker

Anatolii Kostin

Published on Wednesday, Sep 24, 2025

Shipping machine learning models into production requires more than just a good offline score. You need reproducible training, experiment tracking, a predictable serving layer, and a reliable deployment process. This guide demonstrates a pragmatic workflow that ties these pieces together using:

  • MLFlow for experiment tracking and artifact management
  • FastAPI for lightweight, high-performance model serving
  • Docker (and docker-compose) for packaging and deploying the service

All code is available in this Github repository. The repository contains a full example: model training that saves model artifacts and MLFlow run data, a FastAPI app exposing a prediction API with API-key authentication, and a Docker image to run the app. Use this as a template you can adapt to your model and infra.

Outline

  • Data and training: load data, train an XGBoost regressor, and log parameters/metrics + artifacts to MLFlow.
  • Serving: FastAPI app that loads model artifacts, enforces API key auth, and exposes a /price/predict endpoint and a /health endpoint.
  • Packaging & deployment: Dockerfile, docker-compose, healthcheck and how to run locally or in containers.
  • Verification & production tips: healthchecks, CI/CD, monitoring, security, and scaling.

1 — Data and training (reproducible + tracked)

Goal: Train a model, log parameters & metrics, and save the trained model and any pre-processing artifacts (e.g., encoder) in a reproducible way.

Relevant files in the repository:

  • app/data/train.csv — training dataset
  • app/train_model.py — training scripts
  • app/model/artifacts/ — where model.pkl and encoder.pkl are saved
  • app/steps/ - where main functions for data loading, transformation, model training and saving are stored
  • mlruns/ — MLFlow local tracking store (created by MLFlow runs)

Key ideas:

  • Use MLFlow to log params, metrics, and artifacts so runs are reproducible and inspectable.
  • Save model and encoder into model/artifacts/ (and optionally register in MLFlow model registry).

Model training example

# app/train_model.py
from pathlib import Path

from steps.load import load
from steps.save import save_model
from steps.train import train_xgb
from steps.transform import transform

# This ensures the script finds the data file even when run from a different working directory.
INPUT_FILE = Path(__file__).resolve().parent / "data" / "train.csv"


def main():
    """Execute the ML training pipeline."""
    try:
        print("Loading data...")
        df = load(INPUT_FILE)

        print("Transforming data...")
        df = transform(df)

        print("Training model...")
        model, encoder, predictions = train_xgb(df)

        print("Saving model...")
        save_model(model, encoder)

        print("Pipeline completed successfully!")

    except Exception as e:
        print(f"Pipeline failed with error: {e}")
        raise


if __name__ == "__main__":
    main()

Notes:

  • This snippet shows main model training pipeline steps: load, transform, train_xgb, save_model
  • Every script execution saves MLFlow runs into app/mlruns/ folder for experiment tracking purposes
  • Latest model.pkl and encoder.pkl files are also getting saved into app/model/artifacts/ folder to be used by FastAPI app

Model training script

This script takes transformed data, runs model training and saves model parameters, metrics and artifacts in MLFlow run execution.

# app/steps/train.py
from pathlib import Path

import category_encoders as ce
import mlflow
from mlflow.models import infer_signature
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor as xgbr

ARTIFACT_DIR = Path("model/artifacts")
ARTIFACT_DIR.mkdir(parents=True, exist_ok=True)


def train_xgb(df):
    features = [
        "LotArea",
        "Neighborhood",
        "BldgType",
        "HouseStyle",
        "OverallQual",
        "OverallCond",
        "YearBuilt",
        "CentralAir",
        "GrLivArea",
        "FullBath",
        "HalfBath",
        "BedroomAbvGr",
        "TotRmsAbvGrd",
        "GarageType",
        "GarageCars",
        "MoSold",
        "YrSold",
    ]
    target = "SalePrice"

    # Split features and target value (house SalePrice)
    X_data = df[features]
    Y_data = df[target]

    # Split data on the train and test datasets
    print("Splitting data into train and test datasets...")
    x_train, x_test, y_train, y_test = train_test_split(X_data, Y_data, test_size=0.2, shuffle=True, random_state=42)

    print(f"No. of training examples: {x_train.shape[0]}")
    print(f"No. of testing examples: {x_test.shape[0]}")

    # Count Encoding
    print("Applying Count Encoding...")
    encoder = ce.CountEncoder(return_df=True)
    x_train_loo = encoder.fit_transform(x_train, y_train, normalize=True)
    x_test_loo = encoder.transform(x_test)

    # Create a new MLflow Experiment
    mlflow.set_experiment("House Price Prediction")

    with mlflow.start_run():
        params = {
            "objective": "reg:squarederror",
            "n_jobs": 8,
            "colsample_bytree": 0.7,
            "tree_method": "exact",
            "learning_rate": 0.05,
            "max_depth": 9,
            "n_estimators": 1000,
            "random_state": 42,
        }

        mlflow.log_params(params)

        # XGboost Regressor
        print("Training XGBoost Regressor...")
        model = xgbr(**params)
        model.fit(x_train_loo, y_train)

        print("Model training completed.")
        print("Evaluating model on test data...")
        predictions = model.predict(x_test_loo)

        r2 = r2_score(y_test, predictions)
        mse = mean_squared_error(y_test, predictions)
        mae = mean_absolute_error(y_test, predictions)

        mlflow.log_metric("r2_score", r2)
        mlflow.log_metric("mean_squared_error", mse)
        mlflow.log_metric("mean_absolute_error", mae)

        print("Model R^2 Score on test data", (r2 * 100), "%")
        print("Model Mean Square Error on test data", mse)
        print("Model Mean Absolute Error on test data", mae)

        # Infer the model signature
        signature = infer_signature(x_train_loo, predictions)

        # Log the model, which inherits the parameters and metric
        model_info = mlflow.xgboost.log_model(
            xgb_model=model,
            name="house_price_model",
            signature=signature,
            input_example=x_train_loo[:1],
            registered_model_name="tracking-house-price-model",
        )

        # Set a tag that we can use to remind ourselves what this model was for
        mlflow.set_logged_model_tags(model_info.model_id, {"Training Info": "Basic XGBR model with Count Encoding"})

        mlflow.log_artifact(ARTIFACT_DIR / "model.pkl", artifact_path="artifacts")
        mlflow.log_artifact(ARTIFACT_DIR / "encoder.pkl", artifact_path="artifacts")

    return model, encoder, predictions

Notes:

  • This snippet shows core MLFlow actions: start_run(), log_params(), log_metric(), and log_artifact().
  • In this repo, training logic lives under app/steps/train.py.
  • Model training params are getting saved in MLFlow database alongside model metrics for experiment tracking and reproducibility.

MLFlow experiments

To access MLFlow experiments UI, run the following command in app/ folder:

mlflow ui --port 5000

Then open http://127.0.0.1:5000 in the browser.

Navigate to Experiments -> House Price Prediction to get a list of your experiments.

MLFlow experiments


2 — Serving with FastAPI (load artifacts + protect endpoints)

Goal: Provide an API to accept JSON feature vectors, apply preprocessing, and return predictions. Protect endpoints with a simple API key header for quick auth.

Relevant files in the repository:

  • app/main.py — FastAPI application entrypoint
  • app/steps/predict.py — prediction helper that applies encoder and model
  • Dockerfile — how the image is built
  • docker-compose.yaml — how containers are wired

Essential serving pattern:

  • On startup: load encoder.pkl and model.pkl from model/artifacts.
  • Provide a /health endpoint used by container healthchecks.
  • Provide a /price/predict endpoint that requires header X-API-Key.
  • Validate input, transform using the saved encoder, and output prediction in JSON.

FastAPI app:

# app/main.py
import os
from secrets import compare_digest

from fastapi import Depends, FastAPI, HTTPException, Security, status
from fastapi.security import APIKeyHeader
from fastapi.security.api_key import APIKey
from pydantic import BaseModel
from steps.predict import predict

# read API_KEY env variable
API_KEY = os.getenv("API_KEY")
if not API_KEY:
    raise ValueError("API_KEY environment variable is not set")

# Get API key from header
api_key_header = APIKeyHeader(name="X-API-Key")


# API key authentication method
def api_key_auth(api_key_header: str = Security(api_key_header)):
    if not compare_digest(api_key_header, API_KEY):
        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Forbidden")


app = FastAPI()


class HouseDetails(BaseModel):
    LotArea: int
    Neighborhood: str
    BldgType: str
    HouseStyle: str
    OverallQual: int
    OverallCond: int
    YearBuilt: int
    CentralAir: int
    GrLivArea: int
    FullBath: int
    HalfBath: int
    BedroomAbvGr: int
    TotRmsAbvGrd: int
    GarageType: str
    GarageCars: int
    MoSold: int
    YrSold: int


class PredictedPrice(BaseModel):
    price: float


@app.get("/health", status_code=status.HTTP_200_OK)
async def health_check():
    return {"status": "ok"}


@app.post("/price/predict", response_model=PredictedPrice, status_code=status.HTTP_200_OK)
def get_prediction(payload: HouseDetails, api_key: APIKey = Depends(api_key_auth)):
    try:
        prediction = predict(dict(payload))
    except Exception as e:
        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"Prediction failed: {str(e)}")
    else:
        if not prediction:
            raise HTTPException(
                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="Prediction failed: empty result."
            )

        return {"price": prediction}

Notes:

  • Reuse the preprocessing code in app/steps/predict.py to avoid mismatch between training and serving.
  • Pydantic models give request validation and nicely formatted docs at /docs.

API auth:

  • The repo uses a basic API key via header X-API-Key which gets compared with API_KEY environment variable.

Example curl request:

curl -X 'POST' \
  'http://localhost:3001/price/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'X-API-Key: <apikey>' \
  -d '{
        "LotArea": 8450,
        "Neighborhood": "CollgCr",
        "BldgType": "1Fam",
        "HouseStyle": "2Story",
        "OverallQual": 7,
        "OverallCond": 5,
        "YearBuilt": 2003,
        "CentralAir": 1,
        "GrLivArea": 1710,
        "FullBath": 2,
        "HalfBath": 1,
        "BedroomAbvGr": 3,
        "TotRmsAbvGrd": 8,
        "GarageType": "Attchd",
        "GarageCars": 2,
        "MoSold": 2,
        "YrSold": 2008
    }'

Expected response:

{
  "price": 207887.015625
}

3 — Packaging with Docker and docker-compose

Goal: Build a reproducible image that contains the app and the runtime dependencies and exposes the FastAPI service.

Important files:

  • Dockerfile
  • docker-compose.yaml
  • requirements.txt
  • .env (can be created from .env.sample)

Key Dockerfile excerpt (present in the project):

FROM python:3.13.7-slim

# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser

WORKDIR /app

COPY requirements.txt .
RUN pip install --upgrade pip && pip install --no-cache-dir -r requirements.txt

COPY ./app .

RUN chown -R appuser:appuser /app
USER appuser

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

EXPOSE 3001

CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "3001"]

docker-compose excerpt:

services:
  ml-api:
    platform: linux/amd64
    image: ml-home-price-predict:latest
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - 3001:3001
    env_file: ".env"
    healthcheck:
      test: [ "CMD", "python", "-c", "import urllib.request, sys; sys.exit(0 if urllib.request.urlopen('http://localhost:3001/health').status == 200 else 1)" ]
      interval: 60s
      timeout: 30s
      retries: 3
      start_period: 60s

How to build and run locally:

  • Build image:
docker compose build
  • Run:
docker compose up

Notes:

  • The Dockerfile copies the app folder into the image. Ensure model/artifacts/* exists in the app folder before building (or mount a volume at runtime) so the container can load the model.
  • Use an .env file referenced by docker-compose.yaml to provide API_KEY and other secrets. Do not commit real secrets to the repo.
  • For production Docker deployments add API_KEY environment variable using methods available by a hosting provider.

4 — Verification & production tips

Smoke tests

  • After container starts, verify:
    • GET http://localhost:3001/health returns 200
    • POST http://localhost:3001/price/predict with valid API key returns a numeric price
  • Use /docs to manually test in the browser.

CI/CD suggestions

  • Build image in CI, run unit tests and linters, then push the image to a registry.
  • Automate training + artifact promotion: when new data arrives or scheduled retraining runs, store artifacts in a central artifact store and trigger a deployment pipeline that pulls the new model.

Monitoring & observability

  • Log prediction latency, input sizes, request rates and error rates.
  • Store prediction request summaries to detect data drift and trigger retraining.

Security

  • Use secrets managers for API keys; rotate keys regularly.
  • Add rate limiting and authentication (JWT/OAuth2) for public APIs.

Scalability

  • Scale horizontally with multiple containers behind a load balancer.
  • Warm up containers or use a dedicated model server for large models.

5 — Quick checklist & commands

Build image:

docker compose build

Run containers:

docker compose up

Test health:

curl http://localhost:3001/health

Test predict (replace <apikey>):

curl -X POST 'http://localhost:3001/price/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'X-API-Key: <apikey>' \
  -d '{"LotArea":8450, "Neighborhood":"CollgCr", "BldgType":"1Fam", "HouseStyle":"2Story", "OverallQual":7, "OverallCond":5, "YearBuilt":2003, "CentralAir":1, "GrLivArea":1710, "FullBath":2, "HalfBath":1, "BedroomAbvGr":3, "TotRmsAbvGrd":8, "GarageType":"Attchd", "GarageCars":2, "MoSold":2, "YrSold":2008}'

6 — Next steps and improvements

  • Replace ad-hoc preprocessing with a single scikit-learn Pipeline that is saved and loaded alongside the model so training and serving use the exact same transforms.
  • Add unit tests for prediction logic and wire them into CI.
  • Use a managed MLFlow tracking server or shared backend store for team collaboration.
  • Add model versioning and an automated promotion process from staging to production.
2025 — Anatolii Kostin