Step-by-Step Guide to Model Training and Deployment with MLFlow, FastAPI, and Docker

Shipping machine learning models into production requires more than just a good offline score. You need reproducible training, experiment tracking, a predictable serving layer, and a reliable deployment process. This guide demonstrates a pragmatic workflow that ties these pieces together using:

MLFlow for experiment tracking and artifact management
FastAPI for lightweight, high-performance model serving
Docker (and docker-compose) for packaging and deploying the service

All code is available in this Github repository. The repository contains a full example: model training that saves model artifacts and MLFlow run data, a FastAPI app exposing a prediction API with API-key authentication, and a Docker image to run the app. Use this as a template you can adapt to your model and infra.

Outline

Data and training: load data, train an XGBoost regressor, and log parameters/metrics + artifacts to MLFlow.
Serving: FastAPI app that loads model artifacts, enforces API key auth, and exposes a /price/predict endpoint and a /health endpoint.
Packaging & deployment: Dockerfile, docker-compose, healthcheck and how to run locally or in containers.
Verification & production tips: healthchecks, CI/CD, monitoring, security, and scaling.

1 — Data and training (reproducible + tracked)

Goal: Train a model, log parameters & metrics, and save the trained model and any pre-processing artifacts (e.g., encoder) in a reproducible way.

Relevant files in the repository:

app/data/train.csv — training dataset
app/train_model.py — training scripts
app/model/artifacts/ — where model.pkl and encoder.pkl are saved
app/steps/ - where main functions for data loading, transformation, model training and saving are stored
mlruns/ — MLFlow local tracking store (created by MLFlow runs)

Key ideas:

Use MLFlow to log params, metrics, and artifacts so runs are reproducible and inspectable.
Save model and encoder into model/artifacts/ (and optionally register in MLFlow model registry).

Model training example

# app/train_model.py
from pathlib import Path

from steps.load import load
from steps.save import save_model
from steps.train import train_xgb
from steps.transform import transform

# This ensures the script finds the data file even when run from a different working directory.
INPUT_FILE = Path(__file__).resolve().parent / "data" / "train.csv"


def main():
    """Execute the ML training pipeline."""
    try:
        print("Loading data...")
        df = load(INPUT_FILE)

        print("Transforming data...")
        df = transform(df)

        print("Training model...")
        model, encoder, predictions = train_xgb(df)

        print("Saving model...")
        save_model(model, encoder)

        print("Pipeline completed successfully!")

    except Exception as e:
        print(f"Pipeline failed with error: {e}")
        raise


if __name__ == "__main__":
    main()

Notes:

This snippet shows main model training pipeline steps: load, transform, train_xgb, save_model
Every script execution saves MLFlow runs into app/mlruns/ folder for experiment tracking purposes
Latest model.pkl and encoder.pkl files are also getting saved into app/model/artifacts/ folder to be used by FastAPI app

Model training script

This script takes transformed data, runs model training and saves model parameters, metrics and artifacts in MLFlow run execution.

# app/steps/train.py
from pathlib import Path

import category_encoders as ce
import mlflow
from mlflow.models import infer_signature
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor as xgbr

ARTIFACT_DIR = Path("model/artifacts")
ARTIFACT_DIR.mkdir(parents=True, exist_ok=True)


def train_xgb(df):
    features = [
        "LotArea",
        "Neighborhood",
        "BldgType",
        "HouseStyle",
        "OverallQual",
        "OverallCond",
        "YearBuilt",
        "CentralAir",
        "GrLivArea",
        "FullBath",
        "HalfBath",
        "BedroomAbvGr",
        "TotRmsAbvGrd",
        "GarageType",
        "GarageCars",
        "MoSold",
        "YrSold",
    ]
    target = "SalePrice"

    # Split features and target value (house SalePrice)
    X_data = df[features]
    Y_data = df[target]

    # Split data on the train and test datasets
    print("Splitting data into train and test datasets...")
    x_train, x_test, y_train, y_test = train_test_split(X_data, Y_data, test_size=0.2, shuffle=True, random_state=42)

    print(f"No. of training examples: {x_train.shape[0]}")
    print(f"No. of testing examples: {x_test.shape[0]}")

    # Count Encoding
    print("Applying Count Encoding...")
    encoder = ce.CountEncoder(return_df=True)
    x_train_loo = encoder.fit_transform(x_train, y_train, normalize=True)
    x_test_loo = encoder.transform(x_test)

    # Create a new MLflow Experiment
    mlflow.set_experiment("House Price Prediction")

    with mlflow.start_run():
        params = {
            "objective": "reg:squarederror",
            "n_jobs": 8,
            "colsample_bytree": 0.7,
            "tree_method": "exact",
            "learning_rate": 0.05,
            "max_depth": 9,
            "n_estimators": 1000,
            "random_state": 42,
        }

        mlflow.log_params(params)

        # XGboost Regressor
        print("Training XGBoost Regressor...")
        model = xgbr(**params)
        model.fit(x_train_loo, y_train)

        print("Model training completed.")
        print("Evaluating model on test data...")
        predictions = model.predict(x_test_loo)

        r2 = r2_score(y_test, predictions)
        mse = mean_squared_error(y_test, predictions)
        mae = mean_absolute_error(y_test, predictions)

        mlflow.log_metric("r2_score", r2)
        mlflow.log_metric("mean_squared_error", mse)
        mlflow.log_metric("mean_absolute_error", mae)

        print("Model R^2 Score on test data", (r2 * 100), "%")
        print("Model Mean Square Error on test data", mse)
        print("Model Mean Absolute Error on test data", mae)

        # Infer the model signature
        signature = infer_signature(x_train_loo, predictions)

        # Log the model, which inherits the parameters and metric
        model_info = mlflow.xgboost.log_model(
            xgb_model=model,
            name="house_price_model",
            signature=signature,
            input_example=x_train_loo[:1],
            registered_model_name="tracking-house-price-model",
        )

        # Set a tag that we can use to remind ourselves what this model was for
        mlflow.set_logged_model_tags(model_info.model_id, {"Training Info": "Basic XGBR model with Count Encoding"})

        mlflow.log_artifact(ARTIFACT_DIR / "model.pkl", artifact_path="artifacts")
        mlflow.log_artifact(ARTIFACT_DIR / "encoder.pkl", artifact_path="artifacts")

    return model, encoder, predictions

Notes:

This snippet shows core MLFlow actions: start_run(), log_params(), log_metric(), and log_artifact().
In this repo, training logic lives under app/steps/train.py.
Model training params are getting saved in MLFlow database alongside model metrics for experiment tracking and reproducibility.

MLFlow experiments

To access MLFlow experiments UI, run the following command in app/ folder:

mlflow ui --port 5000

Then open http://127.0.0.1:5000 in the browser.

Navigate to Experiments -> House Price Prediction to get a list of your experiments.

MLFlow experiments

2 — Serving with FastAPI (load artifacts + protect endpoints)

Goal: Provide an API to accept JSON feature vectors, apply preprocessing, and return predictions. Protect endpoints with a simple API key header for quick auth.

Relevant files in the repository:

app/main.py — FastAPI application entrypoint
app/steps/predict.py — prediction helper that applies encoder and model
Dockerfile — how the image is built
docker-compose.yaml — how containers are wired

Essential serving pattern:

On startup: load encoder.pkl and model.pkl from model/artifacts.
Provide a /health endpoint used by container healthchecks.
Provide a /price/predict endpoint that requires header X-API-Key.
Validate input, transform using the saved encoder, and output prediction in JSON.

FastAPI app:

# app/main.py
import os
from secrets import compare_digest

from fastapi import Depends, FastAPI, HTTPException, Security, status
from fastapi.security import APIKeyHeader
from fastapi.security.api_key import APIKey
from pydantic import BaseModel
from steps.predict import predict

# read API_KEY env variable
API_KEY = os.getenv("API_KEY")
if not API_KEY:
    raise ValueError("API_KEY environment variable is not set")

# Get API key from header
api_key_header = APIKeyHeader(name="X-API-Key")


# API key authentication method
def api_key_auth(api_key_header: str = Security(api_key_header)):
    if not compare_digest(api_key_header, API_KEY):
        raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Forbidden")


app = FastAPI()


class HouseDetails(BaseModel):
    LotArea: int
    Neighborhood: str
    BldgType: str
    HouseStyle: str
    OverallQual: int
    OverallCond: int
    YearBuilt: int
    CentralAir: int
    GrLivArea: int
    FullBath: int
    HalfBath: int
    BedroomAbvGr: int
    TotRmsAbvGrd: int
    GarageType: str
    GarageCars: int
    MoSold: int
    YrSold: int


class PredictedPrice(BaseModel):
    price: float


@app.get("/health", status_code=status.HTTP_200_OK)
async def health_check():
    return {"status": "ok"}


@app.post("/price/predict", response_model=PredictedPrice, status_code=status.HTTP_200_OK)
def get_prediction(payload: HouseDetails, api_key: APIKey = Depends(api_key_auth)):
    try:
        prediction = predict(dict(payload))
    except Exception as e:
        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f"Prediction failed: {str(e)}")
    else:
        if not prediction:
            raise HTTPException(
                status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="Prediction failed: empty result."
            )

        return {"price": prediction}

Notes:

Reuse the preprocessing code in app/steps/predict.py to avoid mismatch between training and serving.
Pydantic models give request validation and nicely formatted docs at /docs.

API auth:

The repo uses a basic API key via header X-API-Key which gets compared with API_KEY environment variable.

Example curl request:

curl -X 'POST' \
  'http://localhost:3001/price/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'X-API-Key: <apikey>' \
  -d '{
        "LotArea": 8450,
        "Neighborhood": "CollgCr",
        "BldgType": "1Fam",
        "HouseStyle": "2Story",
        "OverallQual": 7,
        "OverallCond": 5,
        "YearBuilt": 2003,
        "CentralAir": 1,
        "GrLivArea": 1710,
        "FullBath": 2,
        "HalfBath": 1,
        "BedroomAbvGr": 3,
        "TotRmsAbvGrd": 8,
        "GarageType": "Attchd",
        "GarageCars": 2,
        "MoSold": 2,
        "YrSold": 2008
    }'

Expected response:

{
  "price": 207887.015625
}

3 — Packaging with Docker and docker-compose

Goal: Build a reproducible image that contains the app and the runtime dependencies and exposes the FastAPI service.

Important files:

Dockerfile
docker-compose.yaml
requirements.txt
.env (can be created from .env.sample)

Key Dockerfile excerpt (present in the project):

FROM python:3.13.7-slim

# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser

WORKDIR /app

COPY requirements.txt .
RUN pip install --upgrade pip && pip install --no-cache-dir -r requirements.txt

COPY ./app .

RUN chown -R appuser:appuser /app
USER appuser

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

EXPOSE 3001

CMD ["python", "-m", "uvicorn", "main:app", "--host", "0.0.0.0", "--port", "3001"]

docker-compose excerpt:

services:
  ml-api:
    platform: linux/amd64
    image: ml-home-price-predict:latest
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - 3001:3001
    env_file: ".env"
    healthcheck:
      test: [ "CMD", "python", "-c", "import urllib.request, sys; sys.exit(0 if urllib.request.urlopen('http://localhost:3001/health').status == 200 else 1)" ]
      interval: 60s
      timeout: 30s
      retries: 3
      start_period: 60s

How to build and run locally:

Build image:

docker compose build

Run:

docker compose up

Notes:

The Dockerfile copies the app folder into the image. Ensure model/artifacts/* exists in the app folder before building (or mount a volume at runtime) so the container can load the model.
Use an .env file referenced by docker-compose.yaml to provide API_KEY and other secrets. Do not commit real secrets to the repo.
For production Docker deployments add API_KEY environment variable using methods available by a hosting provider.

4 — Verification & production tips

Smoke tests

After container starts, verify:
- GET http://localhost:3001/health returns 200
- POST http://localhost:3001/price/predict with valid API key returns a numeric price
Use /docs to manually test in the browser.

CI/CD suggestions

Build image in CI, run unit tests and linters, then push the image to a registry.
Automate training + artifact promotion: when new data arrives or scheduled retraining runs, store artifacts in a central artifact store and trigger a deployment pipeline that pulls the new model.

Monitoring & observability

Log prediction latency, input sizes, request rates and error rates.
Store prediction request summaries to detect data drift and trigger retraining.

Security

Use secrets managers for API keys; rotate keys regularly.
Add rate limiting and authentication (JWT/OAuth2) for public APIs.

Scalability

Scale horizontally with multiple containers behind a load balancer.
Warm up containers or use a dedicated model server for large models.

5 — Quick checklist & commands

Build image:

docker compose build

Run containers:

docker compose up

Test health:

curl http://localhost:3001/health

Test predict (replace <apikey>):

curl -X POST 'http://localhost:3001/price/predict' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'X-API-Key: <apikey>' \
  -d '{"LotArea":8450, "Neighborhood":"CollgCr", "BldgType":"1Fam", "HouseStyle":"2Story", "OverallQual":7, "OverallCond":5, "YearBuilt":2003, "CentralAir":1, "GrLivArea":1710, "FullBath":2, "HalfBath":1, "BedroomAbvGr":3, "TotRmsAbvGrd":8, "GarageType":"Attchd", "GarageCars":2, "MoSold":2, "YrSold":2008}'

6 — Next steps and improvements

Replace ad-hoc preprocessing with a single scikit-learn Pipeline that is saved and loaded alongside the model so training and serving use the exact same transforms.
Add unit tests for prediction logic and wire them into CI.
Use a managed MLFlow tracking server or shared backend store for team collaboration.
Add model versioning and an automated promotion process from staging to production.