Advanced Concepts in Docker Compose – Dataquest

By now, you’ve probably got a working multi-container pipeline running through Docker Compose. You can start your services with a single command, connect a Python ETL script to a Postgres database, and even persist your data across runs. For local development, that might feel like more than enough.

But when it’s time to hand your setup off to a DevOps team or prepare it for staging, new requirements start to appear. Your containers need to be more reliable, your configuration more portable, and your build process more maintainable. These are the kinds of improvements that don’t necessarily change what your pipeline does, but they make a big difference in how safely and consistently it runs—especially in environments you don’t control.

In this tutorial, you’ll take your existing Compose-based pipeline and learn how to harden it for production use. That includes adding health checks to prevent race conditions, using multi-stage Docker builds to reduce image size and complexity, running as a non-root user to improve security, and externalizing secrets with environment files.

Each improvement will address a common pitfall in container-based workflows. By the end, your project will be something your team can safely share, deploy, and scale.

Getting Started

Before we begin, let’s clarify one thing: if you’ve completed the earlier tutorials, you should already have a working Docker Compose setup with a Python ETL script and a Postgres database. That’s what we’ll build on in this tutorial.

But if you’re jumping in fresh (or your setup doesn’t work anymore) you can still follow along. You’ll just need a few essentials in place:

A simple app.py Python script that connects to Postgres (we won’t be changing the logic much).
A Dockerfile that installs Python and runs the script.
A docker-compose.yaml with two services: one for the app, one for Postgres.

You can write these from scratch, but to save time, we’ve provided a starter repo with minimal boilerplate.

Once you’ve got that working, you’re ready to start hardening your containerized pipeline.

Add a Health Check to the Database

At this point, your project includes two main services defined in docker-compose.yaml: a Postgres database and a Python container that runs your ETL script. The services start together, and your script connects to the database over the shared Compose network.

That setup works, but it has a hidden risk. When you run docker compose up, Docker starts each container, but it doesn’t check whether those services are actually ready. If Postgres takes a few seconds to initialize, your app might try to connect too early and either fail or hang without a clear explanation.

To fix that, you can define a health check that monitors the readiness of the Postgres container. This gives Docker an explicit test to run, rather than relying on the assumption that “started” means “ready.”

Postgres includes a built-in command called pg_isready that makes this easy to implement. You can use it inside your Compose file like this:

services:
  db:
    image: postgres:15
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "postgres"]
      interval: 5s
      timeout: 2s
      retries: 5

This setup checks whether Postgres is accepting connections. Docker will retry up to five times, once every five seconds, before giving up. If the service responds successfully, Docker will mark the container as “healthy.”

To coordinate your services more reliably, you can also add a depends_on condition to your app service. This ensures your ETL script won’t even try to start until the database is ready:

  app:
    build: .
    depends_on:
      db:
        condition: service_healthy

Once you’ve added both of these settings, try restarting your stack with docker compose up. You can check the health status with docker compose ps, and you should see the Postgres container marked as healthy before the app container starts running.

This one change can prevent a whole category of race conditions that show up only intermittently—exactly the kind of problem that makes pipelines brittle in production environments. Health checks help make your containers functional and dependable.

Optimize Your Dockerfile with Multi-Stage Builds

As your project evolves, your Docker image can quietly grow bloated with unnecessary files like build tools, test dependencies, and leftover cache. It’s not always obvious, especially when the image still “works.” But over time, that bulk slows things down and adds maintenance risk.

That’s why many teams use multi-stage builds: they offer a cleaner, more controlled way to produce smaller, production-ready containers. This technique lets you separate the build environment (where you install and compile everything) from the runtime environment (the lean final image that actually runs your app). Instead of trying to remove unnecessary files or shrink things manually, you define two separate stages and let Docker handle the rest.

Let’s take a quick look at what that means in practice. Here’s a simplified example of what your current Dockerfile might resemble:

FROM python:3.10-slim

WORKDIR /app
COPY app.py .
RUN pip install psycopg2-binary

CMD ["python", "app.py"]

Now here’s a version using multi-stage builds:

# Build stage
FROM python:3.10-slim AS builder

WORKDIR /app
COPY app.py .
RUN pip install --target=/tmp/deps psycopg2-binary

# Final stage
FROM python:3.10-slim

WORKDIR /app
COPY --from=builder /app/app.py .
COPY --from=builder /tmp/deps /usr/local/lib/python3.10/site-packages/

CMD ["python", "app.py"]

The first stage installs your dependencies into a temporary location. The second stage then starts from a fresh image and copies in only what’s needed to run the app. This ensures the final image is small, clean, and free of anything related to development or testing.

To try this out, rebuild your image using a version tag so it doesn’t overwrite your original:

docker build -t etl-app:v2 .

If you want Docker Compose to use this tagged image, update your Compose file to use image: instead of build::

app:
  image: etl-app:v2

This tells Compose to use the existing etl-app:v2 image instead of building a new one.

On the other hand, if you’re still actively developing and want Compose to rebuild the image each time, keep using:

app:
  build: .

In that case, you don’t need to tag anything, just run:

docker compose up --build

That will rebuild the image from your local Dockerfile automatically.

Both approaches work. During development, using build: is often more convenient because you can tweak your Dockerfile and rebuild on the fly. When you’re preparing something reproducible for handoff, though, switching to image: makes sense because it locks in a specific version of the container.

This tradeoff is one reason many teams use multiple Compose files:

A base docker-compose.yml for production (using image:)
A docker-compose.dev.yml for local development (with build:)
And sometimes even a docker-compose.test.yml to replicate CI testing environments

This setup keeps your core configuration consistent while letting each environment handle containers in the way that fits best.

You can check the difference in size using:

docker images

Even if your current app is tiny, getting used to multi-stage builds now sets you up for smoother production work later. It separates concerns more clearly, reduces the chance of leaking dev tools into production, and gives you tighter control over what goes into your images.

Some teams even use this structure to compile code in one language and run it in another base image entirely. Others use it to enforce security guidelines by ensuring only tested, minimal files end up in deployable containers.

Whether or not the image size changes much in this case, the structure itself is the win. It gives you portability, predictability, and a cleaner build process without needing to micromanage what’s included.

A single-stage Dockerfile can be tidy on paper, but everything you install or download, even temporarily, ends up in the final image unless you carefully clean it up. Multi-stage builds give you a cleaner separation of concerns by design, which means fewer surprises, fewer manual steps, and less risk of shipping something you didn’t mean to.

Run Your App as a Non-Root User

By default, most containers, including the ones you’ve built so far, run as the root user inside the container. That’s convenient for development, but it’s risky in production. Even if an attacker can’t break out of the container, root access still gives them elevated privileges inside it. That can be enough to install software, run background processes, or exploit your infrastructure for malicious purposes, like launching DDoS attacks or mining cryptocurrency. In shared environments like Kubernetes, this kind of access is especially dangerous.

The good news is that you can fix this with just a few lines in your Dockerfile. Instead of running as root, you’ll create a dedicated user and switch to it before the container runs. In fact, some platforms require non-root users to work properly. Making the switch early can prevent frustrating errors later on, while also improving your security posture.

In the final stage of your Dockerfile, you can add:

RUN useradd -m etluser
USER etluser

This creates a minimal user (-m) and tells Docker to use that account when the container runs. If you’ve already refactored your Dockerfile using multi-stage builds, this change goes in the final stage, after dependencies are copied in and right before the CMD.

To confirm the change, you can run a one-off container that prints the current user:

docker compose run app whoami

You should see:

etluser

This confirms that your container is no longer running as root. Since this command runs in a new container and exits right after, it works even if your main app script finishes quickly.

One thing to keep in mind is file permissions. If your app writes to mounted volumes or tries to access system paths, switching away from root can lead to permission errors. You likely won’t run into that in this project, but it’s worth knowing where to look if something suddenly breaks after this change.

This small step has a big impact. Many modern platforms—including Kubernetes and container registries like Docker Hub—warn you if your images run as root. Some environments even block them entirely. Running as a non-root user improves your pipeline’s security posture and helps future-proof it for deployment.

Externalize Configuration with `.env` Files

In earlier steps, you may have hardcoded your Postgres credentials and database name directly into your docker-compose.yaml. That works for quick local tests, but in a real project, it’s a security risk.

Storing secrets like usernames and passwords directly in version-controlled files is never safe. Even in private repos, those credentials can easily leak or be accidentally reused. That’s why one of the first steps toward securing your pipeline is externalizing sensitive values into environment variables.

Docker Compose makes this easy by automatically reading from a .env file in your project directory. This is where you store sensitive environment variables like database passwords, without exposing them in your versioned YAML.

Here’s what a simple .env file might look like:

POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_DB=products
DB_HOST=db

Then, in your docker-compose.yaml, you reference those variables just like before:

environment:
  POSTGRES_USER: ${POSTGRES_USER}
  POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
  POSTGRES_DB: ${POSTGRES_DB}
  DB_HOST: ${DB_HOST}

This change doesn’t require any new flags or commands. As long as your .env file lives in the same directory where you run docker compose up, Compose will pick it up automatically.

But your .env file should never be committed to version control. Instead, add it to your .gitignore file to keep it private. To make your project safe and shareable, create a .env.example file with the same variable names but placeholder values:

POSTGRES_USER=your_username
POSTGRES_PASSWORD=your_password
POSTGRES_DB=your_database

Anyone cloning your project can copy that file, rename it to .env, and customize it for their own use, without risking real secrets or overwriting someone else’s setup.

Externalizing secrets this way is one of the simplest and most important steps toward writing secure, production-ready Docker projects. It also lays the foundation for more advanced workflows down the line, like secret injection from CI/CD pipelines or cloud platforms. The more cleanly you separate config and secrets from your code, the easier your project will be to scale, deploy, and share safely.

Optional Concepts: Going Even Further

The features you’ve added so far, health checks, multi-stage builds, non-root users, and .env files, go a long way toward making your pipeline production-ready. But there are a few more Docker and Docker Compose capabilities that are worth knowing, even if you don’t need to implement them right now.

Resource Constraints

One of those is resource constraints. In shared environments, or when testing pipelines in CI, you might want to restrict how much memory or CPU a container can use. Docker Compose supports this through optional fields like mem_limit and cpu_shares, which you can add to any service:

app:
  build: .
  mem_limit: 512m
  cpu_shares: 256

These aren’t enforced strictly in all environments (and don’t work on Docker Desktop without extra configuration), but they become important as you scale up or move into Kubernetes.

Logging

Another area to consider is logging. By default, Docker Compose captures all stdout and stderr output from each container. For most pipelines, that’s enough: you can view logs using docker compose logs or see them live in your terminal. In production, though, logs are often forwarded to a centralized service, written to a mounted volume, or parsed automatically for errors. Keeping your logs structured and focused (especially if you use Python’s logging module) makes that transition easier later on.

Kubernetes

Many of the improvements you’ve made in this tutorial map directly to concepts in Kubernetes:

Health checks become readiness and liveness probes
Non-root users align with container securityContext settings
Environment variables and .env files lay the groundwork for using Secrets and ConfigMaps

Even if you’re not deploying to Kubernetes yet, you’re already building the right habits. These are the same tools and patterns that production-grade pipelines depend on.

You don’t need to learn everything at once, but when you’re ready to make that leap, you’ll already understand the foundations.

Wrap-Up

You started this tutorial with a Docker Compose stack that worked fine for local development. By now, you’ve made it significantly more robust without changing what your pipeline actually does. Instead, you focused on how it runs, how it’s configured, and how ready it is for the environments where it might eventually live.

To review, we:

Added a health check to make sure services only start when they’re truly ready.
Rewrote your Dockerfile using a multi-stage build, slimming down your image and separating build concerns from runtime needs.
Hardened your container by running it as a non-root user and moved configuration into a .env file to make it safer and more shareable.

These are the kinds of improvements developers make every day when preparing pipelines for staging, production, or CI. Whether you’re working in Docker, Kubernetes, or a cloud platform, these patterns are part of the job.

If you’ve made it this far, you’ve done more than just containerize a data workflow: you’ve taken your first steps toward running it with confidence, consistency, and professionalism. In the next project, you’ll put all of this into practice by building a fully productionized ETL stack from scratch.

Advanced Concepts in Docker Compose – Dataquest

Getting Started

Add a Health Check to the Database

Optimize Your Dockerfile with Multi-Stage Builds

Run Your App as a Non-Root User

Externalize Configuration with .env Files

Optional Concepts: Going Even Further

Resource Constraints

Logging

Kubernetes

Wrap-Up

Related Posts

Game Engines And Engagement: Using Light, Motion, And Sound To Deepen Gameplay

What Is Electrical Impedance? – Dataconomy

Leave a Reply Cancel reply

Externalize Configuration with `.env` Files