Home » End-to-End AWS RDS Setup with Bastion Host Using Terraform

End-to-End AWS RDS Setup with Bastion Host Using Terraform

Introduction

any data pipeline, data sources — especially databases — are the backbone. To simulate a realistic pipeline, I needed a secure, reliable database environment to serve as the source of truth for downstream ETL jobs.

Rather than provisioning this manually, I chose to automate everything using Terraform, aligning with modern data engineering and DevOps best practices. This not only saved time but also ensured the environment could be easily recreated, scaled, or destroyed with a single command — just like in production. And if you’re working on the AWS Free Tier, this is even more important — automation ensures you can clean up everything without forgetting a resource that might generate costs.

Prerequisites

To follow along with this project, you’ll need the following tools and setup:

  • Terraform installed https://developer.hashicorp.com/terraform/install
  • AWS CLI & IAM Setup
    • Install the AWS CLI
    • Create an IAM user with programmatic access that has permission to:
      • Attach the policy AdministratorAccess (or create a custom policy with limited permissions to create all the resources included)
      • Download the Access Key ID and Secret Access Key
    • Configure the AWS CLI
  • An AWS Key Pair
    Required to SSH into the bastion host. You can create one in the AWS Console under EC2 > Key Pairs.
  • A Unix-based environment (Linux/macOS, or WSL for Windows)
    This ensures compatibility with the shell script and Terraform commands.

Getting Started: What We’re Building

Let’s walk through how to build a secure and automated AWS database setup using Terraform.

Infrastructure Overview

This project provisions a complete, production-style AWS environment using Terraform. The following resources will be created:

Networking

  • A custom VPC with a CIDR block (10.0.0.0/16)
  • Two private subnets in different Availability Zones (for the RDS instance)
  • One public subnet (for the bastion host)
  • Internet Gateway and Route Tables for public subnet routing
  • A DB Subnet Group for multi-AZ RDS deployment

Compute

  • A bastion EC2 instance in the public subnet
    • Used to SSH into the private subnets and access the database securely
    • Provisioned with a custom security group allowing only port 22 (SSH) access

Database

  • A MySQL RDS instance
    • Deployed in private subnets (not accessible from the public internet)
    • Configured with a dedicated security group that allows access only from the bastion host

Security

  • Security groups:
    • Bastion SG: allows inbound SSH (port 22) from your IP
    • RDS SG: allows inbound MySQL (port 3306) from the bastion’s SG

Automation

  • A setup script (setup.sh) that:
    • Exports Terraform variables

Modular Design With Terraform

I broke the infrastructure into modules like network, bastion and rds. This allows me to reuse, scale and test different components independently.

The following diagram illustrates how Terraform understands and structures the dependencies between different components of the infrastructure, where each node represents a resource or module.

This visualization helps verify that:

  • Resources are properly connected (e.g., the RDS instance depends on private subnets),
  • Modules are isolated yet interoperable (e.g., network, bastion, and rds),
  • There are no circular dependencies.
Terraform Dependency Graph (Image by Author)

To maintain the above-mentioned modular configuration, I structured the project accordingly and provided explanations for each component to clarify their roles within the setup.

.
├── data
│   └── mysqlsampledatabase.sql       # Sample dataset to be imported into the RDS database
├── scripts
│   └── setup.sh                      # Bash script to export environment variables (TF_VAR_*), fetch dynamic values, and upload Glue scripts (optional)
└── terraform
    ├── modules                       # Reusable infrastructure modules
    │   ├── bastion
    │   │   ├── compute.tf            # Defines EC2 instance configuration for the Bastion host
    │   │   ├── network.tf            # Uses data sources to reference existing public subnet and VPC (does not create them)
    │   │   ├── outputs.tf            # Outputs Bastion host public IP address
    │   │   └── variables.tf          # Input variables required by the Bastion module (AMI ID, key pair name, etc.)
    │   ├── network
    │   │   ├── network.tf            # Provisions VPC, public/private subnets, Internet gateway, and route tables
    │   │   ├── outputs.tf            # Exposes VPC ID, subnet IDs, and route table IDs for downstream modules
    │   │   └── variables.tf          # Input variables like CIDR blocks and availability zones
    │   └── rds
    │       ├── network.tf            # Defines DB subnet group using private subnet IDs
    │       ├── outputs.tf            # Outputs RDS endpoint and security group for other modules to consume
    │       ├── rds.tf                # Provisions a MySQL RDS instance inside private subnets
    │       └── variables.tf          # Input variables such as DB name, username, password, and instance size
    └── rds-bastion                   # Root Terraform configuration
        ├── backend.tf                # Configures the Terraform backend (e.g., local or remote state file location)
        ├── main.tf                   # Top-level orchestrator file that connects and wires up all modules
        ├── outputs.tf                # Consolidates and re-exports outputs from the modules (e.g., Bastion IP, DB endpoint)
        ├── provider.tf               # Defines the AWS provider and required version
        └── variables.tf              # Project-wide variables passed to modules and referenced across files

With the modular structure in place, the main.tf file is located in the rds-bastion directory acts as the orchestrator. It ties together the core components: the network, the RDS database, and the bastion host. Each module is invoked with required inputs, most of which are defined in variables.tf or passed via environment variables (TF_VAR_*).

module "network" {
  source                = "../modules/network"
  region                = var.region
  project_name          = var.project_name
  availability_zone_1   = var.availability_zone_1
  availability_zone_2   = var.availability_zone_2
  vpc_cidr              = var.vpc_cidr
  public_subnet_cidr    = var.public_subnet_cidr
  private_subnet_cidr_1 = var.private_subnet_cidr_1
  private_subnet_cidr_2 = var.private_subnet_cidr_2
}


module "bastion" {
  source = "../modules/bastion"
  region              = var.region
  vpc_id              = module.network.vpc_id
  public_subnet_1     = module.network.public_subnet_id
  availability_zone_1 = var.availability_zone_1
  project_name        = var.project_name

  instance_type = var.instance_type
  key_name      = var.key_name
  ami_id        = var.ami_id

}


module "rds" {
  source              = "../modules/rds"
  region              = var.region
  project_name        = var.project_name
  vpc_id              = module.network.vpc_id
  private_subnet_1    = module.network.private_subnet_id_1
  private_subnet_2    = module.network.private_subnet_id_2
  availability_zone_1 = var.availability_zone_1
  availability_zone_2 = var.availability_zone_2

  db_name       = var.db_name
  db_username   = var.db_username
  db_password   = var.db_password
  bastion_sg_id = module.bastion.bastion_sg_id
}

In this modular setup, each infrastructure component is loosely coupled but connected through well-defined inputs and outputs.

For example, after provisioning the VPC and subnets in the network module, I retrieve their IDs using its outputs, and pass them as input variables to other modules like rds and bastion. This avoids hardcoding and enables Terraform to dynamically resolve dependencies and build the dependency graph internally.

In some cases (such as within the bastion module), I also use data sources to reference existing resources created by previous modules, instead of recreating or duplicating them.

The dependency between modules relies on the correct definition and exposure of outputs from previously created modules. These outputs are then passed as input variables to dependent modules, enabling Terraform to build an internal dependency graph and orchestrate the correct creation order.

For example, the network module exposes the VPC ID and subnet IDs using outputs.tf. These values are then consumed by downstream modules like rds and bastion through the main.tf file of the root configuration.

Below is how this works in practice:

Inside modules/network/outputs.tf:

output "vpc_id" {
  description = "ID of the VPC"
  value       = aws_vpc.main.id
}

Inside modules/bastion/variables.tf:

variable "vpc_id" {
  description = "ID of the VPC"
  type        = string
}

Inside modules/bastion/network.tf:

data "aws_vpc" "main" {
  id = var.vpc_id
}

To provision the RDS instance, I created two private subnets in different Availability Zones, as AWS requires at least two subnets in separate AZs to set up a DB subnet group.

Although I met this requirement for correct configuration, I disabled Multi-AZ deployment during RDS creation to stay within the AWS Free Tier limits and avoid additional costs. This setup still simulates a production-grade network layout while remaining cost-effective for development and testing.

Deployment Workflow

With all the modules properly wired through inputs and outputs, and the infrastructure logic encapsulated in reusable blocks, the next step is to automate the provisioning process. Instead of manually passing variables each time, a helper script setup.sh is used to export necessary environment variables (TF_VAR_*).

Once the setup script is sourced, deploying the infrastructure becomes as simple as running a few Terraform commands.

source scripts/setup.sh
cd terraform/rds-bastion
terraform init
terraform plan
terraform apply

To streamline the Terraform deployment process, I created a helper script (setup.sh) that automatically exports required environment variables using the TF_VAR_ naming convention. Terraform automatically picks up variables prefixed with TF_VAR_, so this approach avoids hardcoding values in .tf files or requiring manual input every time.

#!/bin/bash
set -e
export de_project=""
export AWS_DEFAULT_REGION=""

# Define the variables to manage
declare -A TF_VARS=(
  ["TF_VAR_project_name"]="$de_project"
  ["TF_VAR_region"]="$AWS_DEFAULT_REGION"
  ["TF_VAR_availability_zone_1"]="us-east-1a"
  ["TF_VAR_availability_zone_2"]="us-east-1b"

  ["TF_VAR_ami_id"]=""
  ["TF_VAR_key_name"]=""
  ["TF_VAR_db_username"]=""
  ["TF_VAR_db_password"]=""
  ["TF_VAR_db_name"]=""
)

for var in "${!TF_VARS[@]}"; do
    value="${TF_VARS[$var]}"
    if grep -q "^export $var=" "$HOME/.bashrc"; then
        sed -i "s|^export $var=.*|export $var=$value|" "$HOME/.bashrc"
    else
        echo "export $var=$value" >> "$HOME/.bashrc"
    fi
done

# Source updated .bashrc to make changes available immediately in this shell
source "$HOME/.bashrc"

After running terraform apply, Terraform will provision all the defined resources—VPC, subnets, route tables, RDS instance, and Bastion host. Once the process completes successfully, you’ll see output values similar to the following:

Apply complete! Resources: 12 added, 0 changed, 0 destroyed.

Outputs:

bastion_public_ip      = ""
bastion_sg_id          = ""
db_endpoint            = ":3306"
instance_public_dns    = ""
rds_db_name            = ""
vpc_id                 = ""
vpc_name               = ""

These outputs are defined in the outputs.tf files of your modules and re-exported in the root module (rds-bastion/outputs.tf). They are crucial for:

  • SSH-ing into the Bastion Host
  • Connecting securely to the private RDS instance
  • Validating resource creation

Connecting to the RDS via Bastion Host and Seeding the Database

Now that the infrastructure is provisioned, the next step is to seed the MySQL database hosted on the RDS instance. Since the database is inside a private subnet, we cannot access it directly from our local machine. Instead, we’ll use the Bastion EC2 instance as a jump host to:

  • Transfer the sample dataset (mysqlsampledatabase.sql) to the Bastion.
  • Connect from the Bastion to the RDS instance.
  • Import the SQL data to initialize the database.

You may move two directories up from the Terraform main directory and send the SQL content to the remote EC2 (Bastion) after reading the local SQL file inside data directory.

cd ../.. 
cat data/mysqlsampledatabase.sql | ssh -i your-key.pem ec2-user@ 'cat > ~/mysqlsampledatabase.sql'

Once the dataset is copied to the Bastion EC2 instance, the next step is to SSH into the remote machine and :

ssh -i ~/.ssh/new-key.pem ec2-user@

After connecting, you can use the MySQL client (already installed if you used mariadb105 in your EC2 setup) to import the SQL file into your RDS database:

mysql -h  -P 3306 -u  -p < mysqlsampledatabase.sql 

Enter the password when prompted.

Once the import is complete, you can connect to the RDS MySQL database again to verify that the database and its tables have been successfully created.

Run the following command from within the Bastion host:

mysql -h  -P 3306 -u  -p 

After entering your password, you can list the available databases and tables:

List of databases (Image by Author)
List of tables in the databases (Image by Author)

To ensure the dataset was properly imported into the RDS instance, I ran a simple query:

Query results from the customers table (Image by Author)

This returned a row from the customers table, confirming that:

  • The database and tables were created successfully
  • The sample dataset was seeded into the RDS instance
  • The Bastion host and private RDS setup are working as intended

This completes the infrastructure setup and data import process.

Destroying the Infrastructure

Once you’re done testing or demonstrating your setup, it’s important to destroy the AWS resources to avoid unnecessary charges.

Since everything was provisioned using Terraform, tearing down the entire infrastructure is just as simple as running one command after navigating to your root configuration directory:

cd terraform/rds-bastion
terraform destroy

Conclusion

In this project, I demonstrated how to provision a secure and production-like database infrastructure using Terraform on AWS. Rather than exposing the database to the public internet, I implemented best practices by placing the RDS instance in private subnets, accessible only via a bastion host in a public subnet.

By structuring the project with modular Terraform configurations, I ensured each component—network, database, and bastion host—was loosely coupled, reusable, and easy to manage. I also showcased how Terraform’s internal dependency graph handles the orchestration and sequencing of resource creation seamlessly.

Thanks to infrastructure as code (IaC), the entire environment can be brought up or torn down with a single command, making it ideal for ETL prototyping, data engineering practice, or proof-of-concept pipelines. Most importantly, this automation helps avoid unexpected costs by letting you destroy all resources cleanly once you’re done.

You can find the complete source code, Terraform configuration, and setup scripts in my GitHub repository:

https://github.com/YagmurGULEC/rds-ec2-terraform.git

Feel free to explore the code, clone the repo, and adapt it to your own AWS projects. Contributions, feedback, and stars are always welcome!

What’s Next?

You can extend this setup by:

  • Connecting an AWS Glue job to the RDS instance for ETL processing.
  • Adding monitoring for your RDS database and EC2 instance

References

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *