ADR 018: Database Patterns
Status: Proposed | Date: 2025-07-28
Context
Applications need managed persistent storage for databases, datalakes, and objects with automatic scaling and jurisdiction-compliant backup strategies. Workloads that need shared file-system access are covered by ADR 019: Shared File Access.
- AWS Aurora Serverless v2 Documentation
- Percona Everest Documentation and Pigsty Documentation for development/non-AWS environments
- DuckLake for lightweight lakehouse storage over object storage
- Amazon S3 Tables for managed Apache Iceberg tables
- s3proxy and rclone serve s3 for development/non-AWS object storage
Decision
Use Aurora Serverless v2 outside EKS clusters with automated scaling, multi-AZ deployment, and dual backup strategy.
Datalakes: Separate the storage format from the access layer:
- Storage layer: store analytical data in object storage with open table formats
- Lightweight access layer: use DuckLake with a DuckDB client for local development, scheduled jobs, and simpler analytical workloads
- Serverless Iceberg access layer: use Amazon S3 Tables for managed Apache Iceberg tables when workloads need AWS-managed table maintenance or multi-engine access
- Distributed query access layer: use Trino or equivalent Iceberg-compatible engines when workloads need concurrent or larger-scale querying
DuckLake and S3 Tables are not an either/or decision. Choose the access layer per workload while keeping data in object storage and open table formats where practical. See Reference Architecture: Data Pipelines for full datalake patterns.
Implementation
- Database: Aurora Serverless v2 (PostgreSQL/MySQL) with built-in connection pooling and automatic scaling
- Datalake Storage: S3-compatible object storage with open table formats for analytics data
- Datalake Access: DuckDB clients for DuckLake workloads; S3 Tables, Trino, or equivalent Iceberg-compatible engines for serverless or distributed access
- Object Storage: Amazon S3 for files and objects. Use ADR 019: Shared File Access when workloads need file-system access to object-backed files
- Deployment: Outside EKS cluster (handles complexity automatically)
- Credentials: Follow ADR 005: Secrets Management for endpoint and credential management
- Backup: Follow ADR 014: Object Storage Backups plus AWS automated snapshots
- Security: Follow ADR 007: Centralised Security Logging and ADR 012: Privileged Remote Access
Consequences
Benefits:
- Serverless scaling reducing operational costs during low usage periods
- Automated high availability with managed backup strategies per ADR 014: Object Backup
- Compliance with jurisdiction requirements through dual backup approach
Risks if not implemented:
- High operational overhead managing database infrastructure
- Inconsistent backup strategies across database systems
- Cost inefficiency from overprovisioned database resources