DevOps Architecture on AWS

This page documents the standard DevOps Architecture that we typically follow at FP Complete for projects hosted on Amazon Web Services (AWS). While the specifics are tailored for AWS, the core principles can be applied to other cloud providers.

The goal of this document is to provide a recommended foundation for deploying containerized applications.

Health Checks and Monitoring

Before deploying an application, ensure that the entrypoint of your Docker container is the health-check executable. This tool acts as a wrapper around your application, monitoring its health and reporting any crashes.

It is crucial to configure health-check with Slack notifications. This ensures that any application crashes are immediately reported to a designated Slack channel, allowing for rapid response. Typically, we use different Slack channels for different environments (e.g., testnet, mainnet).

If your team uses a different communication platform, you will need to add support for it in the health-check executable.

Amazon Application Load Balancer (ALB)

We use Amazon's Application Load Balancer (ALB) to receive traffic from the internet and route it to the appropriate backend applications running on ECS.

To optimize costs and simplify management, it is best practice to use a single ALB for your entire project. The ALB can use host-based or path-based routing rules to direct traffic to multiple distinct applications or services.

Amazon Elastic Container Service (ECS)

Our standard compute platform is Amazon Elastic Container Service (ECS) with AWS Fargate. Using Fargate provides a serverless experience for running containers, removing the need to provision and manage the underlying EC2 instances.

  • Logging: We rely on Amazon CloudWatch Logs, which natively integrates with ECS for log collection and monitoring.
  • Secrets Management: Application secrets should be passed securely as environment variables to the containers. These secrets should be managed and propagated from Terraform using the amber tool.

Amber Secret Management

With this setup, you will need the amber tool to run terraform plan. Typically, you would do it like this:

amber exec -- terraform plan

Note that for this to work, you must export the AMBER_SECRET environment variable in your shell. The AMBER_SECRET variable should be shared using one of our recommended tools. We typically use Bitwarden, but some customers may prefer a different tool, such as 1Password.

Amazon RDS

For relational databases, we use Amazon Relational Database Service (RDS).

  • Instance Sizing: The database instance size should be chosen based on a balance of cost and performance requirements. You can consult a resource like https://instances.vantage.sh/rds to compare options.
  • Bastion Access: For administrative access to the RDS cluster, use an EC2 Instance Connect Endpoint. This is more secure than a traditional bastion host as it does not require managing SSH keys or leaving ports open in a security group.

Choosing Between Standard PostgreSQL vs. Aurora PostgreSQL

Note that Aurora PostgreSQL is AWS's closed-source fork of PostgreSQL, although it maintains wire compatibility with the open-source PostgreSQL database.

Our recommendation is as follows:

  • Start with Standard PostgreSQL. It is generally more cost-effective and offers smaller instance sizes, making it ideal for initial deployments, development, and testing environments. For non-production environments, choose a burstable CPU type (e.g., T-series). For production environments, you would typically want a non-burstable CPU type for consistent performance under significant traffic.
  • Consider Aurora PostgreSQL when your application has a write-heavy workload and you begin to hit the performance limitations of Standard PostgreSQL.

Cloudflare (Optional, but Recommended)

Using Cloudflare as a layer in front of the ALB is highly recommended, especially for applications expecting significant traffic or requiring enhanced security. It provides critical features like DDoS protection, a Web Application Firewall (WAF), CDN caching, and rate limiting.

When using Cloudflare, ensure the following configuration:

  • Infrastructure as Code: Manage Cloudflare resources using the official Terraform Cloudflare provider.
  • End-to-End Encryption: Import Cloudflare's origin certificate into AWS Certificate Manager (ACM) and attach it to the ALB listener. The SSL/TLS encryption mode in Cloudflare should be set to Full (Strict) to ensure a secure, end-to-end encrypted connection.

Additional useful Cloudflare features include:

  • Cloudflare Zero Trust: For securing access to internal applications and environments.
  • Cloudflare Health Checks: For monitoring application availability from the edge. This can be configured to raise alerts to your Slack (or any webhook) channels.
  • PagerDuty Integration: For advanced incident response (Note: This is available only on Business plans).

CloudWatch Alarms

Make sure to set up CloudWatch alarms for key metrics, such as:

  • Log Group Size: To prevent excessive logging costs, create an alarm for when a log group's size exceeds a daily threshold.
  • ALB 5xx Errors: Monitor for an increase in server-side errors (HTTP 5xx status codes).
  • High CPU Utilization: For RDS instances and ECS tasks.
  • High Memory Utilization: For ECS tasks.
  • RDS Storage: For non-Aurora instances, monitor for low free storage space to prevent outages.

Example Stack

The architecture described above is a proven stack we use for most of our clients. An example implementation of this architecture can be found in the devops directory of the kolme-rare-evo-demo repository.