Home » Intro to Docker Compose – Dataquest

Intro to Docker Compose – Dataquest

As your data projects grow, they often involve more than one piece, like a database and a script. Running everything by hand can get tedious and error-prone. One service needs to start before another. A missed environment variable can break the whole flow.

Docker Compose makes this easier. It lets you define your full setup in one file and run everything with a single command.

In this tutorial, you’ll build a simple ETL (Extract, Transform, Load) workflow using Compose. It includes two services:

  1. a PostgreSQL container that stores product data,
  2. and a Python container that loads and processes that data.

You’ll learn how to define multi-container apps, connect services, and test your full stack locally, all with a single Compose command.

If you completed the previous Docker tutorial, you’ll recognize some parts of this setup, but you don’t need that tutorial to succeed here.

What is Docker Compose?

By default, Docker runs one container at a time using docker run commands, which can get long and repetitive. That works for quick tests, but as soon as you need multiple services, or just want to avoid copy/paste errors, it becomes fragile.

Docker Compose simplifies this by letting you define your setup in a single file: docker-compose.yaml. That file describes each service in your app, how they connect, and how to configure them. Once that’s in place, Compose handles the rest: it builds images, starts containers in the correct order, and connects everything over a shared network, all in one step.

Compose is just as useful for small setups, like a script and a database, with fewer chances for error.

To see how that works in practice, we’ll start by launching a Postgres database with Compose. From there, we’ll add a second container that runs a Python script and connects to the database.

Run Postgres with Docker Compose (Single Service)

Say your team is working with product data from a new vendor. You want to spin up a local PostgreSQL database so you can start writing and testing your ETL logic before deploying it elsewhere. In this early phase, it’s common to start with minimal data, sometimes even a single test row, just to confirm your pipeline works end to end before wiring up real data sources.

In this section, we’ll spin up a Postgres database using Compose. This sets up a local environment we can reuse as we build out the rest of the pipeline.

Before adding the Python ETL script, we’ll start with just the database service. This “single service” setup gives us a clean, isolated container that persists data using a Docker volume and can be connected to using either the terminal or a GUI.

Step 1: Create a project folder

In your terminal, make a new folder for this project and move into it:

mkdir compose-demo
cd compose-demo

You’ll keep all your Compose files and scripts here.

Step 2: Write the Compose file

Inside the folder, create a new file called docker-compose.yaml and add the following content:

services:
  db:
    image: postgres:15
    container_name: local_pg
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: products
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

This defines a service named db that runs the official postgres:15 image, sets some environment variables, exposes port 5432, and uses a named volume for persistent storage.

Tip: If you already have PostgreSQL running locally, port 5432 might be in use. You can avoid conflicts by changing the host port. For example:

ports:
  - "5433:5432"

This maps port 5433 on your machine to port 5432 inside the container.
You’ll then need to connect to localhost:5433 instead of localhost:5432.

If you did the “Intro to Docker” tutorial, this configuration should look familiar. Here’s how the two approaches compare:

docker run command docker-compose.yaml equivalent
--name local_pg container_name: local_pg
-e POSTGRES_USER=postgres environment: section
-p 5432:5432 ports: section
-v pgdata:/var/lib/postgresql/data volumes: section
postgres:15 image: postgres:15

With this Compose file in place, we’ve turned a long command into something easier to maintain, and we’re one step away from launching our database.

Step 3: Start the container

From the same folder, run:

docker compose up

Docker will read the file, pull the Postgres image if needed, create the volume, and start the container. You should see logs in your terminal showing the database initializing. If you see a port conflict error, scroll back to Step 2 for how to change the host port.

You can now connect to the database just like before, either by using:

  • docker compose exec db bash to get inside the container, or
  • connecting to localhost:5432 using a GUI like DBeaver or pgAdmin.

From there, you can run psql -U postgres -d products to interact with the database.

Step 4: Shut it down

When you’re done, press Ctrl+C to stop the container. This sends a signal to gracefully shut it down while keeping everything else in place, including the container and volume.

If you want to clean things up completely, run:

docker compose down

This stops and removes the container and network, but leaves the volume intact. The next time you run docker compose up, your data will still be there.

We’ve now launched a production-grade database using a single command! Next, we’ll write a Python script to connect to this database and run a simple data operation.

Write a Python ETL Script

In the earlier Docker tutorial, we loaded a CSV file into Postgres using the command line. That works well when the file is clean and the schema is known, but sometimes we need to inspect, validate, or transform the data before loading it.

This is where Python becomes useful.

In this step, we’ll write a small ETL script that connects to the Postgres container and inserts a new row. It simulates the kind of insert logic you’d run on a schedule, and keeps the focus on how Compose helps coordinate it.

We’ll start by writing and testing the script locally, then containerize it and add it to our Compose setup.

Step 1: Install Python dependencies

To connect to a PostgreSQL database from Python, we’ll use a library called psycopg2. It’s a reliable, widely-used driver that lets our script execute SQL queries, manage transactions, and handle database errors.

We’ll be using the psycopg2-binary version, which includes all necessary build dependencies and is easier to install.

From your terminal, run:

pip install psycopg2-binary

This installs the package locally so you can run and test your script before containerizing it. Later, you’ll include the same package inside your Docker image.

Step 2: Start building the script

Create a new file in the same folder called app.py. You’ll build your script step by step.

Start by importing the required libraries and setting up your connection settings:

import psycopg2
import os

Note: We’re importing psycopg2 even though we installed psycopg2-binary. What’s going on here?
The psycopg2-binary package installs the same core psycopg2 library, just bundled with precompiled dependencies so it’s easier to install. You still import it as psycopg2 in your code because that’s the actual library name. The -binary part just refers to how it’s packaged, not how you use it.

Next, in the same app.py file, define the database connection settings. These will be read from environment variables that Docker Compose supplies when the script runs in a container.

If you’re testing locally, you can override them by setting the variables inline when running the script (we’ll see an example shortly).

Add the following lines:

db_host = os.getenv("DB_HOST", "db")
db_port = os.getenv("DB_PORT", "5432")
db_name = os.getenv("POSTGRES_DB", "products")
db_user = os.getenv("POSTGRES_USER", "postgres")
db_pass = os.getenv("POSTGRES_PASSWORD", "postgres")

Tip: If you changed the host port in your Compose file (for example, to 5433:5432), be sure to set DB_PORT=5433 when testing locally, or the connection may fail.

To override the host when testing locally:

DB_HOST=localhost python app.py

To override both the host and port:

DB_HOST=localhost DB_PORT=5433 python app.py

We use "db" as the default hostname because that’s the name of the Postgres service in your Compose file. When the pipeline runs inside Docker, Compose connects both containers to the same private network, and the db hostname will automatically resolve to the correct container.

Step 3: Insert a new row

Rather than loading a dataset from CSV or SQL, you’ll write a simple ETL operation that inserts a single new row into the vegetables table. This simulates a small “load” job like you might run on a schedule to append new data to a growing table.

Add the following code to app.py:

new_vegetable = ("Parsnips", "Fresh", 2.42, 2.19)

This tuple matches the schema of the table you’ll create in the next step.

Step 4: Connect to Postgres and insert the row

Now add the logic to connect to the database and run the insert:

try:
    conn = psycopg2.connect(
        host=db_host,
        port=int(db_port), # Cast to int since env vars are strings
        dbname=db_name,
        user=db_user,
        password=db_pass
    )
    cur = conn.cursor()

    cur.execute("""
        CREATE TABLE IF NOT EXISTS vegetables (
            id SERIAL PRIMARY KEY,
            name TEXT,
            form TEXT,
            retail_price NUMERIC,
            cup_equivalent_price NUMERIC
        );
    """)

    cur.execute(
        """
        INSERT INTO vegetables (name, form, retail_price, cup_equivalent_price)
        VALUES (%s, %s, %s, %s);
        """,
        new_vegetable
    )

    conn.commit()
    cur.close()
    conn.close()
    print(f"ETL complete. 1 row inserted.")

except Exception as e:
    print("Error during ETL:", e)

This code connects to the database using your earlier environment variable settings.
It then creates the vegetables table (if it doesn’t exist) and inserts the sample row you defined earlier.

If the table already exists, Postgres will leave it alone thanks to CREATE TABLE IF NOT EXISTS. This makes the script safe to run more than once without breaking.

Note: This script will insert a new row every time it runs, even if the row is identical. That’s expected in this example, since we’re focusing on how Compose coordinates services, not on deduplication logic. In a real ETL pipeline, you’d typically add logic to avoid duplicates using techniques like:

  • checking for existing data before insert,
  • using ON CONFLICT clauses,
  • or cleaning the table first with TRUNCATE.

We’ll cover those patterns in a future tutorial.

Step 5: Run the script

If you shut down your Postgres container in the previous step, you’ll need to start it again before running the script. From your project folder, run:

docker compose up -d

The -d flag stands for “detached.” It tells Docker to start the container and return control to your terminal so you can run other commands, like testing your Python script.

Once the database is running, test your script by running:

python app.py

If everything is working, you should see output like:

ETL complete. 1 row inserted.

If you get an error like:

could not translate host name "db" to address: No such host is known

That means the script can’t find the database. Scroll back to Step 2 for how to override the hostname when testing locally.

You can verify the results by connecting to the database service and running a quick SQL query. If your Compose setup is still running in the background, run:

docker compose exec db psql -U postgres -d products

This opens a psql session inside the running container. Then try:

SELECT * FROM vegetables ORDER BY id DESC LIMIT 5;

You should see the most recent row, Parsnips , in the results. To exit the session, type q.

In the next step, you’ll containerize this Python script, add it to your Compose setup, and run the whole ETL pipeline with a single command.

Build a Custom Docker Image for the ETL App

So far, you’ve written a Python script that runs locally and connects to a containerized Postgres database. Now you’ll containerize the script itself, so it can run anywhere, even as part of a larger pipeline.

Before we build it, let’s quickly refresh the difference between a Docker image and a Docker container. A Docker image is a blueprint for a container. It defines everything the container needs: the base operating system, installed packages, environment variables, files, and the command to run. When you run an image, Docker creates a live, isolated environment called a container.

You’ve already used prebuilt images like postgres:15. Now you’ll build your own.

Step 1: Create a Dockerfile

Inside your compose-demo folder, create a new file called Dockerfile (no file extension). Then add the following:

FROM python:3.10-slim

WORKDIR /app

COPY app.py .

RUN pip install psycopg2-binary

CMD ["python", "app.py"]

Let’s walk through what this file does:

  • FROM python:3.10-slim starts with a minimal Debian-based image that includes Python.
  • WORKDIR /app creates a working directory where your code will live.
  • COPY app.py . copies your script into that directory inside the container.
  • RUN pip install psycopg2-binary installs the same Postgres driver you used locally.
  • CMD […] sets the default command that will run when the container starts.

Step 2: Build the image

To build the image, run this from the same folder as your Dockerfile:

docker build -t etl-app .

This command:

  • Uses the current folder (.) as the build context
  • Looks for a file called Dockerfile
  • Tags the resulting image with the name etl-app

Once the build completes, check that it worked:

docker images

You should see etl-app listed in the output.

Step 3: Try running the container

Now try running your new container:

docker run etl-app

This will start the container and run the script, but unless your Postgres container is still running, it will likely fail with a connection error.

That’s expected.

Right now, the Python container doesn’t know how to find the database because there’s no shared network, no environment variables, and no Compose setup. You’ll fix that in the next step by adding both services to a single Compose file.

Update the docker-compose.yaml

Earlier in the tutorial, we used Docker Compose to define and run a single service: a Postgres database. Now that our ETL app is containerized, we’ll update our existing docker-compose.yaml file to run both services — the database and the app — in a single, connected setup.

Docker Compose will handle building the app, starting both containers, connecting them over a shared network, and passing the right environment variables, all in one command. This setup makes it easy to swap out the app or run different versions just by updating the docker-compose.yaml file.

Step 1: Add the app service to your Compose file

Open docker-compose.yaml and add the following under the existing services: section:

  app:
    build: .
    depends_on:
      - db
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: products
      DB_HOST: db

This tells Docker to:

  • Build the app using the Dockerfile in your current folder
  • Wait for the database to start before running
  • Pass in environment variables so the app can connect to the Postgres container

You don’t need to modify the db service or the volumes: section — leave those as they are.

Step 2: Run and verify the full stack

With both services defined, we can now start the full pipeline with a single command:

docker compose up --build -d

This will rebuild our app image (if needed), launch both containers in the background, and connect them over a shared network.

Once the containers are up, check the logs from your app container to verify that it ran successfully:

docker compose logs app

Look for this line:

ETL complete. 1 row inserted.

That means the app container was able to connect to the database and run its logic successfully.

If you get a database connection error, try running the command again. Compose’s depends_on ensures the database starts first, but doesn’t wait for it to be ready. In production, you’d use retry logic or a wait-for-it script to handle this more gracefully.

To confirm the row was actually inserted into the database, open a psql session inside the running container:

docker compose exec db psql -U postgres -d products

Then run a quick SQL query:

SELECT * FROM vegetables ORDER BY id DESC LIMIT 5;

You should see your most recent row (Parsnips) in the output. Type q to exit.

Step 3: Shut it down

When you’re done testing, stop and remove the containers with:

docker compose down

This tears down both containers but leaves your named volume (pgdata) intact so your data will still be there next time you start things up.

Clean Up and Reuse

To run your pipeline again, just restart the services:

docker compose up

Because your Compose setup uses a named volume (pgdata), your database will retain its data between runs, even after shutting everything down.

Each time you restart the pipeline, the app container will re-run the script and insert the same row unless you update the script logic. In a real pipeline, you’d typically prevent that with checks, truncation, or ON CONFLICT clauses.

You can now test, tweak, and reuse this setup as many times as needed.

Push Your App Image to Docker Hub (optional)

So far, our ETL app runs locally. But what if we want to run it on another machine, share it with a teammate, or deploy it to the cloud?

Docker makes that easy through container registries, which are places where we can store and share Docker images. The most common registry is Docker Hub, which offers free accounts and public repositories. Note that this step is optional and mostly useful if you want to experiment with sharing your image or using it on another computer.

Step 1: Create a Docker Hub account

If you don’t have one yet, go to hub.docker.com and sign up for a free account. Once you’re in, you can create a new repository (for example, etl-app).

Step 2: Tag your image

Docker images need to be tagged with your username and repository name before you can push them. For example, if your username is myname, run:

docker tag etl-app myname/etl-app:latest

This gives your local image a new name that points to your Docker Hub account.

Step 3: Push the image

Log in from your terminal:

docker login

Then push the image:

docker push myname/etl-app:latest

Once it’s uploaded, you (or anyone else) can pull and run the image from anywhere:

docker pull myname/etl-app:latest

This is especially useful if you want to:

  • Share your ETL container with collaborators
  • Use it in cloud deployments or CI pipelines
  • Back up your work in a versioned registry

If you’re not ready to create an account, you can skip this step and your image will still work locally as part of your Compose setup.

Wrap-Up and Next Steps

You’ve built and containerized a complete data pipeline using Docker Compose.

Along the way, you learned how to:

  • Build and run custom Docker images
  • Define multi-service environments with a Compose file
  • Pass environment variables and connect services
  • Use volumes for persistent storage
  • Run, inspect, and reuse your full stack with one command

This setup mirrors how real-world data pipelines are often prototyped and tested because Compose gives you a reliable, repeatable way to build and share these workflows.

Where to go next

Here are a few ideas for expanding your project:

  • Schedule your pipeline

    Use something like Airflow to run the job on a schedule.

  • Add logging or alerts

    Log ETL status to a file or send notifications if something fails.

  • Transform data or add validations

    Add more steps to your script to clean, enrich, or validate incoming data.

  • Write tests

    Validate that your script does what you expect, especially as it grows.

  • Connect to real-world data sources

    Pull from APIs or cloud storage buckets and load the results into Postgres.

Once you’re comfortable with Compose, you’ll be able to spin up production-like environments in seconds — a huge win for testing, onboarding, and deployment.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *