Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

WA Government Architecture Decision Records

Reusable architecture patterns for WA Government digital services, maintained by the Office of Digital Government (DGOV) Digital Transformation and Technology Unit (DTT).

For WA Public Sector Agencies

These patterns help you build secure, compliant digital services faster. Instead of starting from scratch, use proven approaches that align with WA Government security and compliance requirements.

Getting Started

  1. Review the Architecture Principles - Six guiding principles for all technology decisions
  2. Choose a Reference Architecture - Project kickoff templates combining multiple decisions:
  3. Check the Compliance Mapping - Find which ADRs apply to your security and compliance requirements

Compliance Alignment

These ADRs align with:

Supporting training: DGOV Technical - DevSecOps Induction

Browse online | Printable long view


Contributing

New ADRs document the context (problem), decision (solution), and consequences (trade-offs). See the Contributing Guide for workflow and templates.

For AI-assisted contributions, see the guidance in CONTRIBUTING.md.

Repository Structure

This project uses mdBook to generate documentation:

  • development/, operations/, security/ - ADRs by domain
  • reference-architectures/ - Project kickoff templates
  • SUMMARY.md - Navigation structure
just setup    # One-time tool installation
just serve    # Preview locally (port 8080)
just build    # Build website and print view

Architecture Principles

Status: Accepted | Date: 2025-03-07

These six principles guide all architecture decisions in this repository. Each ADR should align with one or more of these principles.

1. Establish secure foundations

Integrate security practices from the outset, and throughout the design, development and deployment of products and services, per the ACSC Foundations for modern defensible architecture.

2. Understand and govern data

Use authoritative data sources to ensure data consistency, integrity, and quality. Embed data management and governance practices, including information classification, records management, and Privacy and Responsible Information Sharing, throughout information lifecycles.

3. Prioritise user experience

Apply user-centred design principles to simplify tasks and establish intuitive mappings between user intentions and system responses. Involve users throughout design and development to iteratively evaluate and refine product goals and requirements.

4. Preference tried and tested approaches

Adopt sustainable open source software, and mature managed services where capabilities closely match business needs. When necessary, bespoke service development should be led by internal technical capabilities to ensure appropriate risk ownership. Bespoke software should preference open standards and code to avoid vendor lock-in.

5. Embrace change, release early, release often

Design services as loosely coupled modules with clear boundaries and responsibilities. Release often with tight feedback loops to test assumptions, learn, and iterate. Enable frequent and predictable high-impact changes (your service does not deliver or add value until it is in the hands of users) per the CNCF Cloud Native Definition.

6. Default to open

Encourage transparency, inclusivity, adaptability, collaboration, and community by defaulting to permissive licensing of code and artifacts developed with public funding.

ADR 001: Application Isolation

Status: Accepted | Date: 2025-02-17

Context

Not isolating applications and environments can lead to significant security risks. The risk of lateral movement means threats of vulnerability exposure of a single application can compromise other applications or the entire environment. This lack of isolation can enable the spread of malware, unauthorised access, and data breaches.

Decision

To mitigate the risks associated with shared environments, all applications and environments should isolate by default.

flowchart LR
    account[Cloud Account]
    cluster[K8s Cluster]
    namespace[Namespace]

    account -->|nested isolation| cluster -->|nested isolation| namespace

This isolation can be achieved through the following approaches (strongest to weakest):

  1. Dedicated Accounts: Use separate cloud accounts / resource groups for different environments (for example, development, testing, production) to ensure complete isolation of resources and data. Strongest isolation - use for production and sensitive data.
  2. Kubernetes Clusters: Deploy separate Kubernetes clusters for different applications or environments to isolate workloads and manage resources independently. Strong isolation - use for distinct products or security domains.
  3. Kubernetes Namespaces: Within a Kubernetes cluster, use namespaces to logically separate different applications or environments, providing a level of isolation for network traffic, resource quotas, and access controls. Moderate isolation - use for related services within a product.

The preferred approach for isolation should be driven by data sensitivity and product boundaries.

Consequences

Benefits:

  • Network microsegmentation preventing lateral movement
  • Simplified incident containment and forensic analysis
  • Compliance with regulatory isolation requirements

Risks if not implemented:

  • Single vulnerability compromising multiple applications
  • Difficult incident response across shared environments
  • Data breaches through unauthorised cross-system access

ADR 005: Secrets Management

Status: Accepted | Date: 2025-02-25

Context

Per the Open Web Application Security Project (OWASP) Secrets Management Cheat Sheet:

Organisations face a growing need to centralise the storage, provisioning, auditing, rotation and management of secrets to control access to secrets and prevent them from leaking and compromising the organisation.

Secrets should be accessed at runtime by workloads and never be hard-coded or stored in plain text. In Kubernetes environments, the principle of “local over global” minimises attack surface. Secrets adjacent to their workload reduce blast radius. Account-wide secret stores increase exposure.

Decision

Use AWS Secrets Manager for administrative and CI/CD operations. For Kubernetes workloads, prefer namespace-scoped secrets and minimise reliance on account-wide stores.

Security Hierarchy: Local > Global

ApproachBlast RadiusUse Case
Ephemeral (ad-hoc fetch)NoneOne-time ops
Kubernetes Secret (namespace-scoped)Single namespaceDefault for app runtime
External Secrets OperatorNamespace (syncs from account-wide)Bridge existing AWS-managed secrets
AWS Secrets Manager (account-wide)Entire AWS accountAdmin/CI only

Principle: Secrets adjacent to their workload minimise attack surface. Account-wide stores increase exposure because any compromised credential in that account may gain access to them.

Secret Rotation

Every system design should define secret rotation periods and the automation or manual process used to meet them.

  • Database credentials: 30-90 days (automate via Secrets Manager)
  • API keys: 90 days or on suspected compromise
  • Certificates: Before expiry (automate via ACM where possible)

Kubernetes Secrets

Workloads should use namespace-local Kubernetes Secrets, protected at rest with Elastic Kubernetes Service (EKS) envelope encryption using AWS Key Management Service (KMS). This is the default for EKS 1.28+.

Ad-hoc Operations

For one-time tasks (DB migrations, user creation), fetch secrets ephemerally rather than storing them. Use stdin/stdout piping and HereDoc patterns so secrets travel via standard input instead of command-line arguments, keeping them out of process lists (ps). See detailed examples and patterns.

Encrypted Repository Fixtures

For low-sensitivity secrets that must live with source code, such as test fixtures, store only encrypted values in the repository using the age CLI. Do not use this for runtime application secrets. Never commit the age private key, identity file, or passphrase to the repository.

External Secrets Operator (Fallback)

When secrets must live in AWS for organisational reasons, ESO bridges AWS Secrets Manager to Kubernetes. It’s useful but not the default.

Why ESO over Sealed Secrets:

  • Secrets stay in Secrets Manager (single source of truth, audit logging)
  • Auto-sync to Kubernetes with configurable refresh
  • Uses Pod Identity (no AWS keys in cluster)
  • No encrypted blobs in git (unlike Sealed Secrets)

ESO flow: AWS Secrets Manager -> ExternalSecret -> Kubernetes Secret

IAM and Access Control

Use IAM policy statements to enforce least-privilege access. Pods should not call AWS Secrets Manager directly at runtime. Use namespace-scoped Kubernetes Secrets or ESO instead.

Consequences

Benefits:

  • Reduced attack surface through localised storage
  • Defined rotation periods and automation where practical reduce human error
  • Meets compliance and auditing requirements

Risks:

  • Security exposure from manual handling
  • Non-compliance without proper implementation

Trade-offs:

  • AWS vendor dependency may complicate future migrations
  • ESO adds complexity compared to native K8s Secrets

ADR 008: Email Authentication Protocols

Status: Accepted | Date: 2025-08-15

Context

Government email domains are prime targets for cybercriminals who exploit them for phishing attacks, business email compromise, and brand impersonation. Citizens and businesses expect government emails to be trustworthy, making email authentication critical for maintaining public confidence and preventing fraud.

Without proper email authentication, attackers can easily spoof government domains to conduct social engineering attacks, distribute malware, or harvest credentials from unsuspecting recipients.

References:

Decision

Implement email authentication standards for all government domains:

Required Standards:

  • SPF: Publish records defining authorized mail servers. Use “-all” (hard fail) for domains with well-defined mail infrastructure; use “~all” (soft fail) only during initial rollout or when third-party senders are being onboarded.
  • DKIM: Sign all outbound email with minimum 2048-bit RSA keys, rotate annually.
  • DMARC: Implement with a progression timeline:
    1. Start with “p=none” to collect reports (2-4 weeks)
    2. Move to “p=quarantine” once legitimate sources are aligned (4-8 weeks)
    3. Progress to “p=reject” when reports show minimal false positives
    • Include “rua=” for aggregate reports and “ruf=” for forensic reports
    • Apply same policy to subdomains with “sp=reject”
  • MTA-STS: Publish MTA-STS policy to enforce TLS for inbound mail transport.

Recommended:

  • BIMI: Implement verified brand logos with Verified Mark Certificates (VMCs) for high-profile citizen-facing domains.

Implementation:

  • Monitor DNS records for tampering
  • Regular authentication testing and effectiveness reviews
  • Incident response procedures for authentication failures
  • Integration with email security gateways

Consequences

Benefits:

  • Automated email authentication blocking domain spoofing
  • Enhanced brand protection and citizen trust
  • Comprehensive threat visibility through DMARC reporting

Risks if not implemented:

  • Phishing attacks exploiting government domain reputation
  • Reduced email deliverability affecting citizen communications
  • Non-compliance with government security requirements

ADR 011: AI Tool and Agent Governance

Status: Accepted | Date: 2025-08-15

Context

Generative and agentic AI tools used for development and operations can process sensitive data, call tools, and produce outputs that affect security, privacy, and compliance. Without governance, they can expose data, make biased or incorrect recommendations, misuse privileges, and create compliance failures.

Agentic AI adds autonomy: models can use tools, external data, memory, planning workflows, and execution privileges. This increases the attack surface through prompt injection, unsafe tool use, privilege creep, identity spoofing, third-party component compromise, cascading failures, and opaque audit trails.

High-risk scenarios include:

  • Automated Decision-Making: policy, approval, or resource allocation decisions without human review
  • Government Data Processing: sensitive organisational data processed by offshore or unapproved AI services
  • Uncontrolled Outputs: generated content, code, or analysis used without qualified validation
  • Privacy Violations: personal information processed without consent or required controls
  • Agentic Tool Use: shell, network, API, email, data, or infrastructure actions beyond a tightly approved scope

References:

Decision

Implement mandatory human oversight for all AI tool usage, with pre-approval for any AI tool that processes organisational data, uses agents or external tools, or generates outputs used in an official capacity.

AI security, including agentic AI security, must be managed inside normal cyber security governance: secure-by-design, defence in depth, identity and access management, monitoring, incident response, and supply chain risk management.

Human Oversight Requirements:

Adopt a values-based approach to AI governance (per Oxide RFD 576):

  • Responsibility: Humans are accountable for AI-generated artifacts
  • Rigor: AI should support rigorous thinking, not replace it
  • Validation: Qualified humans must review AI-generated content before use
  • Accountability: AI-assisted decisions must have a clear human owner

Human approval gates must be set by system designers and operators, not by the AI system. Prior human approval is required for high-impact or hard-to-reverse actions, including production changes, network egress, data or log deletion, releases, deployments, procurement, payments, approvals, and customer-facing actions.

Approval Matrix:

Use caseDefault stanceApproval
Local or read-only coding help with no sensitive dataAllowedNormal human review
External AI with organisational dataRestrictedPre-approval and contract
Agent with shell, network, API, or memory toolsRestrictedRisk assessment and approval gates
Production or customer-facing actionProhibited by defaultExplicit human approval
Delete logs or audit recordsProhibited by defaultSeparate human approval

Covered AI Tools:

This ADR applies to all AI tools including:

  • Development and coding assistants
  • Content generation and writing assistants
  • Data analysis and business intelligence platforms
  • Automated testing and code review tools
  • Agentic AI systems, autonomous agents, multi-agent workflows, and tools with API, shell, network, memory, or execution privileges

Requirements:

AI tools must not:

  • Automatically act in customer-facing or production environments
  • Process sensitive data with third parties without a formal contractual arrangement
  • Merge, release, deploy, or alter production state without human review
  • Receive broad or unrestricted access to sensitive data, critical systems, logs, credentials, networks, or production environments
  • Decide when human approval, escalation, rollback, or audit deletion is required

AI tools must:

  • Be limited to low-risk and non-sensitive tasks unless explicitly approved through risk assessment and accountable ownership
  • Run in isolated or local environments (refer to ADR 001: Application Isolation) with minimal permissions and bounded blast radius
  • Use explicit workspace, shell/process, network, model-provider, local-state, and approval boundaries
  • Default to read-only or approval-gated modes for untrusted repositories and first-look analysis
  • Apply least privilege to every agent, tool, credential, API, and sub-task, scoped to the required resource, operation, and timeframe
  • Prefer ephemeral or just-in-time credentials for privileged actions
  • Validate inputs, prompt context, tool responses, third-party components, and generated outputs before consequential use
  • Log tool calls, approvals, denied actions, policy decisions, model-provider disclosures, and official AI-generated outputs
  • Fail safe: stop and escalate when uncertain, rate-limited, degraded, or denied by policy

Agentic AI Adoption Controls:

Agentic AI adoption must follow ACSC-aligned controls:

  • Start with low-risk, non-sensitive tasks and expand access or autonomy only after monitoring, testing, and risk review
  • Threat model prompt injection, confused-deputy abuse, identity spoofing, third-party tools, data exfiltration, cascading failures, and credential compromise
  • Test agents in sandboxes before production use, including adversarial and failure-mode testing
  • Maintain trusted inventories for model providers, tools, prompts, datasets, and agent components
  • Monitor runtime behaviour, including anomalous resource use, guardrail triggers, denied actions, and attempts to bypass approval or logging
  • Separate high-risk agents into distinct security domains and avoid implicit trust between agents

Required Evidence:

Approved AI tool use must retain enough evidence for review:

  • Approved tool or register entry
  • Data disclosure and model-provider assessment where applicable
  • Risk assessment for agentic workflows
  • Human approval records for high-impact actions
  • Logs of tool calls, denied actions, generated outputs, and approvals

Exceptions:

Exceptions require documented risk acceptance by the accountable owner, time-bound approval, and compensating controls. Exceptions must not remove human accountability for consequential decisions or high-impact actions.

Implementation Examples:

  • Preferred: oy-cli for governed agent-backed development and repository audits
  • Rejected: Automated tools that merge, release, deploy, alter production state, or delete audit records without human approval

Strategic Research

Future adoption should favour simple agent workflows with inspectable boundaries: explicit workspace scope, approval-gated mutation, least privilege, deterministic audits, continuous monitoring, fail-safe defaults, and reversible deployment. Tools with a similar posture to oy are preferred over broad agent platforms that add opaque orchestration, implicit trust, or unnecessary operational complexity.

bedrock-mantle is the preferred execution environment for this research where suitable because it enables access to open models through Amazon Bedrock and aligns with Mantle’s zero operator access design, which AWS describes as eliminating technical means for AWS operators to access customer data.

Model and backend selection must consider:

  • Evidence of better quality or security on representative development, audit, and operations tasks
  • Data disclosure to the configured model provider, including snippets, command output, tool results, and audit chunks
  • Avoiding lock-in to heavily proprietary platforms such as Bedrock AgentCore or GitHub Copilot unless there is a clear risk or capability justification
  • Compatibility with oy approval modes, workspace boundaries, deterministic no-tools audits, least privilege, audit logging, safe rollback, and human approval gates

Consequences

Benefits:

  • Ensures human accountability for all AI-assisted decisions
  • Maintains compliance with Privacy Act and data sovereignty requirements
  • Prevents automated production actions without approval
  • Establishes an audit trail for responsible AI usage
  • Aligns agentic AI adoption with ACSC guidance: low-risk initial use, least privilege, monitoring, progressive deployment, and human approval for high-impact actions

Risks if not implemented:

  • Unauthorized data exposure to offshore AI services
  • AI making critical decisions without human oversight
  • Compliance violations and regulatory breaches
  • Operational errors from unchecked AI outputs
  • Agent compromise, confused-deputy abuse, identity spoofing, tool misuse, cascading failures, or audit gaps from over-privileged autonomous agents

ADR 012: Privileged Remote Access

Status: Accepted | Date: 2025-08-15

Context

Traditional privileged access methods using jump boxes, bastion hosts, and shared credentials create security risks through persistent network connections and broad administrative access. Modern cloud-native alternatives provide better security controls and audit capabilities for administrative tasks.

Decision

Replace traditional bastion hosts and jump boxes with cloud-native privileged access solutions:

flowchart LR
    admin[Administrator]
    ssm[Session Manager]
    systems[Target Systems]

    admin -->|MFA + identity| ssm
    ssm -->|temporary session| systems

Session Manager provides MFA enforcement, session recording, and audit trails without persistent network access.

Prohibited Methods:

  • Bastion hosts and jump boxes with persistent SSH access
  • Direct SSH/RDP access to production systems
  • Shared administrative credentials and keys
  • VPN-based administrative access

Required Access Methods:

  • Server Access: AWS Systems Manager Session Manager (replaces SSH to EC2)
  • Infrastructure Management: AWS CLI with temporary credentials (replaces persistent VPN)
  • Kubernetes Access: kubectl with IAM authentication (replaces cluster SSH)
  • Infrastructure Deployment: Infrastructure as Code with audit trails per ADR 010: Infrastructure as Code (replaces manual deployment)

Access Controls:

  • Multi-factor authentication for all access
  • Time-limited sessions
  • Identity-based access through cloud IAM
  • Approval workflows for privileged access
  • Session recording and audit logging per ADR 007: Centralised Security Logging

Implementation:

  • All sessions initiated through APIs only
  • Short-lived credentials
  • Real-time monitoring and alerting
  • Integration with SIEM systems

Consequences

Benefits:

  • Zero-trust network access with session recording
  • Enhanced audit capabilities through centralised logging
  • Short-lived credential security reducing persistent threats

Risks if not implemented:

  • Unauthorised lateral movement across network systems
  • Prolonged security breaches from persistent access
  • Non-compliance with government zero-trust requirements

ADR 013: Identity Federation Standards

Status: Accepted | Date: 2025-08-15

Context

Applications need to integrate with multiple identity providers including jurisdiction citizen identity services, enterprise directories, and cloud identity platforms. Current approaches use inconsistent protocols (SAML, OIDC, proprietary) creating integration complexity and security inconsistencies.

Modern identity federation requires support for emerging standards like verifiable credentials while maintaining compatibility with legacy enterprise systems.

Decision

Standardise on OpenID Connect (OIDC) as the primary federation protocol for all new identity integrations, with SAML 2.0 support only for legacy systems that cannot support OIDC.

Protocol Standards:

  • Primary: OpenID Connect for modern identity providers and new integrations
  • Legacy Support: SAML 2.0 only when upstream providers require it and OIDC is unavailable
  • Security: Implement PKCE for OIDC public clients and proper token validation
  • Compliance: Support Digital ID Act 2024 requirements for jurisdiction identity services

Architecture Requirements:

  • Applications should integrate through managed identity platforms (AWS Cognito, Microsoft Entra ID), not directly with identity providers
  • Separate privileged and standard user domains for administrative access isolation (see Reference Architecture: OpenAPI Backend)
  • Support multiple upstream identity providers per application
  • Maintain audit trails per ADR 007: Centralised Security Logging

Identity Federation Flow:

flowchart TB
    subgraph standard[Standard User Domain]
        users[Users]
        idp[Identity Providers]
    end

    subgraph privileged[Privileged User Domain]
        admins[Administrators]
        pim[Privileged Identity Management]
    end

    platform[Managed Platform]
    apps[Applications]

    users -->|authenticate| idp
    idp -->|OIDC/SAML tokens| platform
    admins -->|authenticate| pim
    pim -->|elevated claims| platform
    platform -->|validated claims| apps

The managed platform handles protocol translation between OIDC and SAML providers, token validation, and audit logging.

Emerging Standards:

Implementation Requirements:

  • Implement fallback authentication mechanisms for critical systems
  • Choose identity platforms with high availability and data export capabilities

Consequences

Benefits:

  • Consistent modern federation standard across all applications
  • Better security through OIDC’s improved token handling and PKCE support
  • Simplified integration with jurisdiction citizen identity services
  • Clear separation of administrative and standard user access

Risks if not implemented:

  • Fragmented authentication systems across applications
  • Legacy SAML limitations hindering citizen service integration
  • Inconsistent security posture across identity touchpoints

ADR 016: Web Application Edge Protection

Status: Accepted | Date: 2025-08-15

Context

Government web applications face heightened security threats including state-sponsored attacks, DDoS campaigns by activist groups, and sophisticated application-layer exploits targeting public services. These attacks can disrupt critical citizen services and damage public trust.

Traditional perimeter security is insufficient for protecting modern web applications that serve millions of citizens. Edge protection through CDNs and WAFs provides the first line of defense, filtering malicious traffic before it reaches application infrastructure.

References:

Decision

All public web applications and APIs must use CDN with integrated WAF protection:

flowchart LR
    users[Internet Users]
    cdn[CDN + WAF]
    apps[Applications]

    users -->|requests| cdn
    cdn -->|filtered traffic| apps

The CDN edge handles SSL termination, caching, WAF filtering, and DDoS mitigation before traffic reaches application infrastructure.

CDN Requirements:

  • Geographic distribution with SSL/TLS termination at edge
  • Cache optimization and origin shielding
  • Object-backed origins for static and media assets, using ADR 019: Shared File Access when authoring or processing workloads need file-system access
  • IPv6 dual-stack support on edge (internal use of IPv4 allowed)

WAF Protection:

  • OWASP Top 10 protection rules enabled
  • Layer 7 DDoS protection and rate limiting
  • Geo-blocking and bot management
  • Custom rules for application-specific threats

DDoS Protection:

  • AWS Shield Advanced or equivalent
  • Real-time attack monitoring and alerting
  • DDoS Response Team access

Implementation:

  • WAF logs integrated with SIEM per ADR 007: Centralised Security Logging
  • Fail-secure configuration (no fail-open)
  • Regular penetration testing and rule tuning
  • CI/CD integration for automated deployments

Consequences

Benefits:

  • Automated threat detection and mitigation at network edge
  • Global content delivery and caching capabilities
  • Comprehensive attack surface reduction through filtering
  • Real-time traffic analysis and bot management

Risks if not implemented:

  • Critical citizen services disrupted by attacks
  • Direct server exposure to malicious traffic
  • Slow response times affecting user adoption
  • No early warning of emerging attack patterns

ADR 002: AWS EKS for Cloud Workloads

Status: Accepted | Date: 2025-02-17

Context

Organisations want to efficiently manage and scale bespoke workloads in a secure and scalable manner. Traditional server management can be cumbersome and inefficient for dynamic workloads. Provider-specific control planes can result in lock-in and artificial constraints limiting technology options.

Decision

To address these challenges, use a CNCF Certified Kubernetes platform with automatically managed infrastructure resources. Due to hyperscaler availability and size AWS EKS (Elastic Kubernetes Service) in auto mode is the preferred option.

flowchart LR
    users[Users]
    lb[Load Balancer]
    eks[EKS Cluster]
    db[DBaaS]

    users --> lb --> eks --> db

This leverages Kubernetes for orchestration, AWS EKS for managed Kubernetes services, AWS Elastic Block Store (EBS) for storage and AWS load balancers for traffic management.

  • AWS EKS Auto Mode: Provide a managed Kubernetes service, that automatically scales the infrastructure based on workload demands.
  • Managed Storage and NodePools: Ensure that the underlying infrastructure is maintained and updated by AWS.
  • Load Balancers: Standardise ingress and traffic management.
  • Persistent Storage: Keep durable state in managed services outside the cluster. Use ADR 018: Database Patterns for databases and datalakes, and ADR 019: Shared File Access when workloads need shared file-system access.

Consequences

Benefits:

  • Efficient resource utilisation through managed scaling
  • Clear boundaries for shared responsibilities with a small operational overhead
  • Enhanced security through automatic updates and patches
  • Improved availability with managed storage and node pools

Risks if not implemented:

  • Resource inefficiency from manual scaling
  • High operational overhead managing custom infrastructure
  • Security vulnerabilities from delayed updates
  • Service downtime during traffic spikes

Strategic Research

CNCF Kubernetes AI Conformance

The CNCF Kubernetes AI Conformance Program establishes standards for AI/ML workload portability across Kubernetes platforms. Only platforms meeting these standards should be supported, ensuring workloads can interoperate as flexible nodes within a broader state/federal ecosystem.

Current Platform Conformance:

HPC Requirements:

Physical infrastructure for HPC (High-Performance Computing) projects must meet CNCF Kubernetes AI Conformance capabilities. This ensures models developed on local compute can scale to centralised HPC facilities without environment mismatches. Target sovereign Australian platforms meeting security and privacy requirements (ASD IRAP assessed, PRIS compliant).

Digital Sovereignty

Analysis like Cloud services and government digital sovereignty in Australia and beyond. / Mitchell, Andrew D.; Samlidis, Theodore. in the International Journal of Law and Information Technology, Vol. 29, No. 4, 2021, p. 364-394 highlights the ongoing issues with depending on hyperscalers in a single foreign jurisdiction. Based on this changing landscape, exploring simplified options for secure sovereign owned hosting options such as Australian Dedicated Servers and local colo in Tier 3+ datacentres (designed for 99.98% uptime) is warranted and touched on below.

Bare metal management

Use a platform like Proxmox VE to run standalone clusters at multiple facilities with multiple 2U servers per location. Example hardware (starts approx $15k AUD per server) - Dell PowerEdge R7725, HPE ProLiant DL385 Gen11, Lenovo ThinkSystem SR665 V3

Year 1 estimated costs:

  • Hardware: ~$200k for 6x ~$33k servers
  • Colo (2 sites, Tier 3+): ~$50k for 2x 5kw racks with 1 Gbit IP Transit
  • Total: $250k for ~2-3TB ram, ~500 cores, 100TB disk across 2 sites (reduce by a factor of 2-3 for redundancy)

ADR 006: Automated Policy Enforcement

Status: Proposed | Date: 2025-07-29

Context

Cloud infrastructure requires automated policy enforcement to prevent misconfigurations, ensure compliance, and provide secure network access patterns. Manual checking cannot scale effectively across multiple accounts and services.

Decision

Implement comprehensive automated policy enforcement using AWS native services for governance, network security, and access control.

flowchart LR
    governance[Governance]
    network[Network Security]
    workloads[Workloads]

    governance -->|policies| network
    network -->|access control| workloads

Governance (Control Tower, Config) enforces policies on network security (Transit Gateway, Security Groups), which controls access to workloads.

Governance Foundation

  • AWS Control Tower: Account factory, guardrails, and compliance monitoring across organisation
  • Service Control Policies: Preventive controls blocking non-compliant resource creation
  • AWS Config Rules: Detective controls for compliance monitoring and drift detection

Network Security & Access

  • Transit Gateway: Central hub for intra-account resource exposure via security groups
  • Security Group References: Use security group IDs instead of hardcoded IP addresses for dynamic, maintainable access policies
  • Shield Advanced: DDoS protection for public-facing resources per ADR 016: Web Application Edge Protection
  • VPC Flow Logs: Complete egress traffic monitoring and analysis per WA SOC Cyber Network Management Guideline

Note: This approach creates dependency on AWS for traffic and network protection. Open-source equivalents include Security Onion for network security monitoring, OPNsense and pfSense for firewall and intrusion detection capabilities.

Core Policy Areas

  • Encryption: Mandatory encryption for all data stores and communications
  • Access Control: IAM least-privilege access and security group-based resource access
  • Resource Tagging: Governance and cost allocation requirements
  • Data Sovereignty: Geographic restrictions for jurisdiction compliance
  • Network Segmentation: Security group-based micro-segmentation over IP-based rules

Implementation Requirements:

  • Implement policy validation in CI/CD pipelines following ADR 010: Infrastructure as Code
  • Use security group references over hardcoded IP addresses for maintainable policies
  • Monitor VPC Flow Logs for egress traffic analysis and anomaly detection

Consequences

Benefits:

  • Proactive security misconfiguration prevention through automated guardrails
  • Comprehensive egress traffic visibility via ADR 007: Centralised Security Logging
  • Centralised network access management reducing operational complexity

Risks if not implemented:

  • Security misconfigurations deploying to production environments
  • Unmonitored egress traffic enabling data exfiltration
  • Fragmented access policies creating security gaps

ADR 007: Centralised Security Logging

Status: Accepted | Date: 2025-02-25

Context

Security logs should be centrally collected to support monitoring, detection, and response capabilities across workloads. Sensitive information logging must be minimised to follow data protection regulations and reduce the risk of data breaches. Audit and authentication logs are critical for security monitoring and should be collected by default.

Decision

Use centralised logging using Microsoft Sentinel and Amazon CloudWatch.

Configuration:

Operations:

  • Review and update logging configurations regularly to ensure coverage and privacy requirements are met.
  • Extract and archive log information used during investigations to an appropriate location (in alignment with record keeping requirements).

Consequences

Benefits:

  • Faster incident detection and response
  • Simplified compliance with data protection regulations
  • Centralised security log management reduces operational overhead

Risks if not implemented:

  • Delayed security incident detection from decentralised logs
  • Sensitive information exposure leading to data breaches
  • Incomplete audit trails hindering forensic investigations

ADR 010: Infrastructure as Code

Status: Accepted | Date: 2025-03-10

Context

All environments must be reproducible from source to minimise drift and security risk. Manual changes and missing version control create deployment failures and vulnerabilities.

Compliance Requirements:

Decision

Golden Path

  1. Git Repository Structure: Single repo per application with environments/{dev,int,uat,prod} folders matching AWS account names, for example app-a-dev, app-a-int, app-a-uat, and app-a-prod
  2. State Management: Terraform remote state with locking, separate state per environment
  3. CI Pipeline:
    • Validate: Trivy scan + terraform plan/kubectl diff drift check
    • Plan: Show proposed changes on PR
    • Apply: DEV and INT may deploy approved branch refs; UAT and PROD deploy tagged releases only per ADR 009
  4. Versioning: Git tags = semantic versions (x.y.z) created on main for UAT and PROD
  5. Disaster Recovery: Checkout tag + run just deploy --env=prod with static artifacts from ADR 004

Required Tools & Practices

ToolPurposeStageMandatory
TrivyVulnerability scanningValidateYes
Terraform or kubectl/kustomizeConfiguration managementDeployYes
JustfilesTask automationAllRecommended
devcontainer-baseDev environmentLocalRecommended
k3dLocal testingDevOptional

Infrastructure as Code Workflow:

flowchart LR
    artifacts[Static Artifacts]
    repo[Infrastructure Repo]
    envs[AWS Accounts]

    artifacts -->|versioned| repo
    repo -->|deploy| envs

Git tags are immutable release versions for UAT and PROD. DEV and INT may deploy approved branch refs per ADR 009. Environment folders (environments/{dev,int,uat,prod}) map to separate AWS accounts with isolated state storage.

Consequences

Benefits:

  • Reproducible infrastructure deployments with version control
  • Automated drift detection and prevention mechanisms
  • Reliable disaster recovery through infrastructure as code

Risks if not implemented:

  • Configuration drift creating security vulnerabilities
  • Failed rollbacks during critical incident recovery
  • Inconsistent environments affecting application reliability

References

ADR 014: Object Storage Backups

Status: Proposed | Date: 2025-07-22

Context

Current backup approaches lack cross-region redundancy and automated lifecycle management, creating single points of failure and compliance risks for government data retention requirements. Traditional storage systems do not provide the durability and geographic distribution needed for critical government systems.

Key challenges:

  • Single region backup storage creating vulnerability to regional outages
  • Manual backup processes prone to human error
  • Lack of automated recovery testing
  • Insufficient geographic separation for disaster recovery

References:

Decision

Implement standardised object storage backup solution with automated cross-region replication and lifecycle management for all critical systems and data.

flowchart TB
    workloads[Workloads]

    primary_s3[Primary S3 Buckets versioned]
    dbaas[DBaaS]
    backup_s3[Backup S3 Bucket]

    workloads --> primary_s3
    workloads --> dbaas
    dbaas -->|automated exports| backup_s3

    replica_s3[Replica S3 Bucket Cross-Region]
    primary_s3 -->|S3 replication| replica_s3
    backup_s3 -->|S3 replication| replica_s3

All storage (primary, backup, and replica) uses S3 buckets with versioning and immutable retention policies. Primary S3 buckets use native versioning for point-in-time recovery. DBaaS exports to backup buckets. Both primary and backup buckets replicate cross-region for geographic redundancy.

Storage Requirements:

Critical Systems Definition:

  • Production databases containing citizen or business data
  • Shared content, media, and file assets required to restore services
  • Application source code and deployment configurations
  • Security logs and audit trails
  • Infrastructure as Code templates and state files

Geographic Distribution:

Lifecycle Management:

  • Automated storage tiering based on age and access patterns
  • Compliance-based retention policies
  • Recovery testing and validation procedures

Recovery Objectives:

  • Recovery Time Objective (RTO): 4 hours for critical systems, 24 hours for standard systems
  • Recovery Point Objective (RPO): 1 hour for databases, 24 hours for static content
  • Implementation Example: AWS S3 Cross-Region Replication to Australian regions

Consequences

Benefits:

  • Automated disaster recovery meeting defined RTO/RPO objectives
  • Geographic redundancy protecting against regional outages
  • Compliance with government data retention requirements

Risks if not implemented:

  • Permanent data loss from infrastructure failures
  • Extended service recovery times affecting citizen services
  • Regulatory violations from inadequate data protection

ADR 015: Data Governance Standards

Status: Proposed | Date: 2025-07-28

Context

Data pipelines require governance to ensure quality and compliance. Modern approaches use code-based validation and version control rather than separate governance tools.

Decision

Use code-based data governance with git workflows. Data transformations written in Ibis are version-controlled, testable, and provide implicit lineage through code dependencies. See Reference Architecture: Data Pipelines for full implementation patterns.

Priority Focus Areas

  • Schema Contracts: Define expected schemas in code, validate in CI/CD pipeline
  • Data Lineage: Track through transformation code history in git
  • Quality Validation: Use Ibis expressions for data validation checks, run as automated tests
  • Audit Integration: Follow ADR 007: Centralised Security Logging for transformation logs

Implementation

# Example: Schema validation with Ibis
import ibis

def validate_customers(table: ibis.Table) -> ibis.Table:
    """Validate customer data before processing."""
    return table.filter(
        table.email.notnull() &
        table.created_at.notnull() &
        (table.status.isin(['active', 'inactive', 'pending']))
    )

Consequences

Benefits:

  • Data quality validation as code, testable in CI/CD
  • Lineage tracked through git history and code dependencies
  • No separate governance infrastructure to maintain

Risks if not implemented:

  • Data quality issues reaching downstream systems
  • Unable to trace data issues back to source transformations
  • Compliance gaps from undocumented data handling

ADR 017: Analytics Tooling Standards

Status: Proposed | Date: 2025-07-28

Context

Organisations need simple, secure reporting with reproducible outputs. Reports should be version-controlled alongside the data transformations that produce them.

Decision

Use Quarto for analytics and reporting.

Why Quarto

  • Multi-format: Same source produces HTML, PDF, Word, presentations
  • Version-controlled: Reports live alongside data transformation code in git
  • Open source: Markdown-based, portable, no vendor lock-in
  • Accessible: Built-in support for WCAG compliance

Capabilities

NeedQuarto Feature
Static reportsMarkdown + code blocks
PDF documentsPDF output with professional formatting
Interactive chartsObservable JS for client-side interactivity
DashboardsQuarto Dashboards for layout and filtering
Parameterised reportsParameters for automated report generation

Integration

Consequences

Benefits:

  • Version-controlled, reproducible analytics outputs
  • Static hosting with minimal operational overhead
  • Consistent tooling across reports, dashboards, and documents

Risks if not implemented:

  • Inconsistent reporting approaches across teams
  • Reports not tracked in version control
  • Difficulty reproducing historical analytics outputs

ADR 018: Database Patterns

Status: Proposed | Date: 2025-07-28

Context

Applications need managed persistent storage for databases, datalakes, and objects with automatic scaling and jurisdiction-compliant backup strategies. Workloads that need shared file-system access are covered by ADR 019: Shared File Access.

Decision

Use Aurora Serverless v2 outside EKS clusters with automated scaling, multi-AZ deployment, and dual backup strategy.

Datalakes: Separate the storage format from the access layer:

  • Storage layer: store analytical data in object storage with open table formats
  • Lightweight access layer: use DuckLake with a DuckDB client for local development, scheduled jobs, and simpler analytical workloads
  • Serverless Iceberg access layer: use Amazon S3 Tables for managed Apache Iceberg tables when workloads need AWS-managed table maintenance or multi-engine access
  • Distributed query access layer: use Trino or equivalent Iceberg-compatible engines when workloads need concurrent or larger-scale querying

DuckLake and S3 Tables are not an either/or decision. Choose the access layer per workload while keeping data in object storage and open table formats where practical. See Reference Architecture: Data Pipelines for full datalake patterns.

Implementation

  • Database: Aurora Serverless v2 (PostgreSQL/MySQL) with built-in connection pooling and automatic scaling
  • Datalake Storage: S3-compatible object storage with open table formats for analytics data
  • Datalake Access: DuckDB clients for DuckLake workloads; S3 Tables, Trino, or equivalent Iceberg-compatible engines for serverless or distributed access
  • Object Storage: Amazon S3 for files and objects. Use ADR 019: Shared File Access when workloads need file-system access to object-backed files
  • Deployment: Outside EKS cluster (handles complexity automatically)
  • Credentials: Follow ADR 005: Secrets Management for endpoint and credential management
  • Backup: Follow ADR 014: Object Storage Backups plus AWS automated snapshots
  • Security: Follow ADR 007: Centralised Security Logging and ADR 012: Privileged Remote Access

Consequences

Benefits:

  • Serverless scaling reducing operational costs during low usage periods
  • Automated high availability with managed backup strategies per ADR 014: Object Backup
  • Compliance with jurisdiction requirements through dual backup approach

Risks if not implemented:

  • High operational overhead managing database infrastructure
  • Inconsistent backup strategies across database systems
  • Cost inefficiency from overprovisioned database resources

ADR 019: Shared File Access

Status: Accepted | Date: 2026-04-27

Context

Some workloads need both object APIs and file-system access to the same files. Examples include content management systems, media processing, shared workspaces, AI/ML tooling, and EKS workloads that expect paths, folders, and ordinary file operations.

Without a standard pattern, teams may copy the same files between object storage, network file systems, and CDN origins. This creates synchronisation risk, extra cost, and unclear backup and retention ownership.

References:

Decision

Use object storage as the source of truth for shared files that also need backup, lifecycle management, or CDN delivery. On AWS, use S3 buckets for the canonical storage layer.

Use Amazon S3 Files when applications, users, or agents need shared file-system access to S3 data from AWS compute, including EKS workloads. This is the preferred pattern for CMS media libraries, static assets, shared workspaces, AI/ML data, and other file assets that benefit from both object and file access.

Avoid copying canonical files into separate file systems unless a workload has a hard requirement that object-backed file access cannot meet.

flowchart LR
    authors[Authors / EKS Workloads]
    file_access[Shared File Access]
    bucket[Object Storage]
    cdn[CDN + WAF]
    users[Users]

    authors -->|file operations| file_access
    file_access -->|object-backed storage| bucket
    bucket -->|origin| cdn
    cdn -->|deliver| users

Requirements

  • Store canonical files in object storage with versioning, encryption, lifecycle policies, and backups per ADR 014
  • Use managed shared file access for workloads that need paths, folders, and ordinary file operations
  • Scope working sets and least-privilege access with bucket prefixes, policy, access points, or equivalent storage boundaries
  • Publish public assets through a CDN and WAF per ADR 016
  • Use workload identity and scoped IAM permissions for EKS access per ADR 002 and ADR 005: Secrets Management
  • Define ownership, lifecycle, retention, and recovery expectations for each shared file store

Consequences

Benefits:

  • One source of truth for object, file, backup, and CDN access
  • Less data duplication and fewer synchronisation pipelines
  • Familiar file operations for EKS workloads and other AWS compute
  • Clearer ownership for retention, recovery, and public asset delivery

Risks if not implemented:

  • Teams may duplicate files between object storage and file systems
  • CDN assets may drift from files used by authoring or processing tools
  • Backup, retention, and ownership boundaries may become unclear

ADR 003: API Documentation Standards

Status: Accepted | Date: 2025-03-26

Context

Secure, maintainable APIs require mature frameworks with low complexity and industry standard compliance. Where existing standards exist, prefer them over bespoke REST APIs.

Compliance Requirements:

Decision

API Requirements

RequirementStandardMandatory
DocumentationOpenAPI SpecificationYes
TestingRestish CLI scriptsYes
FrameworkHuma (Go), Litestar (Python), or equivalentRecommended
NamingConsistent conventionYes
SecurityOWASP API security coverageYes
ExposureNo admin APIs on InternetYes

Development Guidelines

  • Self-Documenting: Use frameworks that auto-generate OpenAPI specs
  • Data Types: Prefer standard types over custom formats
  • Segregation: Separate APIs by purpose (see Reference Architecture: OpenAPI Backend)
  • Testing: Include security vulnerability checks in test scripts

API Development Flow:

flowchart LR
    framework[Framework]
    openapi[OpenAPI Spec]
    testing[Automated Tests]

    framework -->|generates| openapi
    openapi -->|validates| testing

Use self-documenting frameworks that generate OpenAPI specifications, then validate with automated security and behaviour tests.

Consequences

Benefits:

  • Standardised API documentation automatically generated from code
  • Enhanced security through consistent validation patterns
  • Reduced maintenance overhead via automated testing integration

Risks if not implemented:

  • Documentation drift creating integration difficulties
  • Security vulnerabilities from inconsistent API patterns
  • Increased development time debugging undocumented APIs

ADR 004: CI/CD Quality Assurance

Status: Accepted | Date: 2025-03-10

Context

Ensure security and integrity of software artifacts that are consumed by infrastructure repositories per ADR 010. Threat actors exploit vulnerabilities in code, dependencies, container images, and exposed secrets.

Compliance Requirements:

Decision

CI/CD Pipeline Requirements

Pipeline Flow: Code Commit → Build & Test → Quality Assurance → Release

StageToolsPurposeMandatory
BuildDocker BakeMulti-platform builds with SBOM/provenanceYes
Scanscc and TrivyComplexity and Vulnerability scanningYes
AnalysisGitHub CodeQLStatic code analysisYes
TestPlaywrightEnd-to-end testingRecommended
PerformanceGrafana K6Load testingOptional
APIRestishAPI validation per ADR 003Optional

Execution Environment

  • Use devcontainer-base for standardised tooling
  • Use Docker Bake to standardise builds
  • Use Justfiles for task automation
  • Use GitHub Actions for repository-hosted CI work that does not need AWS access, including lengthy builds, tests, and scans
  • Run only AWS-privileged release or deployment automation from an operations-controlled environment, such as controlled Woodpecker CI runners

AWS-Privileged Automation

Use operations-controlled automation only where release or deployment steps need AWS credentials or direct access to AWS-hosted systems.

Required controls:

  • Assume AWS roles at runtime; do not store long-lived cloud credentials in pipeline systems
  • Run automation on dedicated, operations-managed hosts or workloads
  • Limit network access to the AWS services and internal systems required for the job
  • Apply strong access control, audit logging, and minimal administrative access
  • Keep build, release, and deployment logs for audit and incident review

CI/CD Pipeline:

flowchart LR
    code[Code Commit]
    build[Build]
    scan[Scan + Analyse]
    release[Release]

    code --> build --> scan --> release

Build produces container images with SBOM/provenance. Scan runs vulnerability and static analysis. Release produces static artifacts consumed by ADR 010: Infrastructure as Code. Keep unprivileged build, test, and scan work on repository-hosted CI, including long-running jobs. Move only AWS-privileged release or deployment steps to an operations-controlled environment.

Consequences

Benefits:

  • Automated security scanning and vulnerability remediation
  • Standardised artifact integrity and compliance alignment
  • Consistent deployment pipelines with audit trails
  • Clear separation between general CI checks and AWS-privileged automation

Risks if not implemented:

  • Vulnerable containers deployed to production
  • Exposed secrets or excessive cloud privilege in automation systems
  • Manual security processes prone to human error
  • Compliance violations and audit failures

References

ADR 009: Release Standards

Status: Accepted | Date: 2025-03-04

Context

Release notes should be standardised so security and infrastructure operations teams can quickly understand what changed, why it changed, and what action is required.

Release standards also need a clear promotion model. Without one, integrated code can reach main before testing is complete, and release evidence can become detached from the deployed version.

Compliance Requirements:

Decision

Use Markdown release notes and a Gitflow-based release model.

Release notes must include:

  • Summary of features, fixes, security updates, and infrastructure changes
  • Security and operational impacts, including deployment, logging, monitoring, and Infrastructure as Code (IaC) changes
  • Links to changelogs, test results, security scans, and approvals

Release workflow:

  • develop is the integration branch
  • main is the tested release history
  • Create release/* branches from develop for release candidates
  • Create hotfix/* branches from main for urgent production fixes
  • Merge to main only after required testing and approval
  • Tag releases on main using annotated semantic version tags such as v1.0.0

Environment promotion:

  • DEV and INT may deploy approved branch refs
  • UAT and PROD must deploy immutable tagged releases only
  • Promote the same tested tag from UAT to PROD without rebuilding, moving, or recreating the tag
  • Record evidence on pull requests or GitHub Releases; update evidence without changing existing tags

A template is provided below that can be tailored per project. A completed release notes Markdown document should be provided with all proposed changes.

## Release Notes

### Overview

- **Name:** Name
- **Version:** [Version Number](#)
- **Previous Version:** [Version Number](#)

### Changes and Testing

High level summary

**New Features & Improvements**:

- [Feature/Improvement 1]: Brief description including testing.
- [Feature/Improvement 2]: Brief description including testing.

**Bug Fixes & Security Updates**:

- [Bug Fix/Security Update 1]: Brief description with severity level and response timeline.
- [Bug Fix/Security Update 2]: Brief description with severity level and response timeline.
- **Response Timelines**: Critical (24h), High (7d), Medium (30d), Low (90d)

### Changelogs

*Only include list items changed by this release*

- **Code**: Brief description. [View Changes](#)
- **Infrastructure**: Brief description. [View Changes](#)
- **Configuration & Secrets**: Brief description.

### Known Issues

- [Known Issue 1]: Brief description.
- [Known Issue 2]: Brief description.

### Action Required

- [Action 1]: Brief description of any action required by users or stakeholders.
- [Action 2]: Brief description of any action required by users or stakeholders.

### Contact

For any questions or issues, please contact [Contact Information].

Consequences

Benefits:

  • Release communication is consistent across teams
  • main reflects tested release history, not day-to-day integration
  • UAT and PROD promotion uses immutable release tags
  • Change tracking supports ADR 007: Centralised Security Logging

Risks if not implemented:

  • Critical release information may be lost between teams
  • Integration changes may be promoted before release validation is complete
  • Security or operational issues may be introduced through undocumented system changes

Reference Architecture: Content Management

Status: Proposed | Date: 2025-07-28

When to Use This Pattern

Use when building:

  • Public websites and intranets
  • Content portals with editorial workflows
  • Headless CMS backends for mobile apps or multi-channel publishing

Do not use this pattern for simple static sites that can be generated at build time without editorial workflows.

Overview

Build content platforms with a small CMS runtime, managed database, object-backed media storage, and CDN/WAF delivery. Keep authoring, storage, and delivery concerns separate so content editors can work safely while public users only reach cached, protected endpoints.

Core Components

flowchart LR
    editors[Content Editors]
    cms[CMS Application]
    db[Content Database]
    media[Object Storage]
    cdn[CDN + WAF]
    users[End Users]

    editors -->|author + approve| cms
    cms -->|content metadata| db
    cms -->|media assets| media
    media -->|origin| cdn
    cms -->|publish / invalidate| cdn
    cdn -->|deliver| users

Project Kickoff Steps

Foundation Setup

  1. Apply Isolation - Follow ADR 001: Application Isolation for CMS service network, runtime, and environment separation
  2. Deploy CMS Runtime - Follow ADR 002: AWS EKS for Cloud Workloads for the CMS application and background workers
  3. Configure Infrastructure - Follow ADR 010: Infrastructure as Code for reproducible database, storage, CDN, and runtime deployments
  4. Setup Storage - Follow ADR 018: Database Patterns for the content database and ADR 019: Shared File Access when editorial or processing workloads need shared file access to media assets

Security & Operations

  1. Configure Secrets Management - Follow ADR 005: Secrets Management for database, CMS, identity, and API credentials
  2. Setup Logging - Follow ADR 007: Centralised Security Logging for audit trails, publishing events, and administrative actions
  3. Setup Backup Strategy - Follow ADR 014: Object Storage Backups for content database, media asset, and configuration recovery
  4. Configure Edge Protection - Follow ADR 016: Web Application Edge Protection for CDN, WAF, origin protection, and cache rules
  5. Identity Integration - Follow ADR 013: Identity Federation Standards and ADR 012: Privileged Remote Access for editorial and administrative access

Implementation Details

Content Model & Editorial:

  • Define content types, ownership, approval stages, and publishing rules before selecting CMS plugins or custom fields
  • Use role-based editorial workflows for draft, review, approval, and publishing steps
  • Keep administrative CMS endpoints separate from public delivery paths
  • Implement headless CMS APIs following ADR 003: API Documentation Standards where content is consumed by other applications

Media & Delivery:

Compliance & Quality:

  • Test WCAG 2.1 AA accessibility before publishing templates or major content changes
  • Apply content retention, disposal, and ownership rules per ADR 015: Data Governance Standards
  • Configure privacy notices, cookie consent, and multilingual content where required
  • Monitor content performance, broken links, publishing failures, and CDN cache effectiveness

Reference Architecture: Data Pipelines

Status: Proposed | Date: 2025-01-28

When to Use This Pattern

Use when building:

  • Analytics and business intelligence reporting
  • Data integration between organisational systems
  • Batch data processing and transformation workflows
  • Small to medium data products that should start simple and scale later

Do not use this pattern as a default for low-latency transactional APIs or streaming systems with sub-second processing requirements.

Overview

Build data pipelines as version-controlled transformation code over an object-storage datalake. Keep storage and table formats separate from the access layer so teams can start with DuckDB/DuckLake and add S3 Tables, Trino, or other Iceberg-compatible engines when concurrency or scale requires it.

Core Components

flowchart LR
    sources[Data Sources]
    transform[Ibis Transformations]
    storage[Object Storage + Open Tables]
    access[DuckDB / S3 Tables / Trino]
    output[Reports & APIs]

    sources -->|extract + validate| transform
    transform -->|load curated data| storage
    storage -->|query| access
    access -->|serve| output

Key Technologies:

ComponentToolPurpose
TransformationIbisPortable Python dataframe API for transformations across local and cloud engines
Local AccessDuckDB + DuckLakeLightweight client and lakehouse access for development, scheduled jobs, and smaller workloads
Serverless TablesAmazon S3 TablesManaged Apache Iceberg table storage and maintenance for AWS workloads
Distributed QueryTrino or equivalentConcurrent and larger-scale SQL access to Iceberg tables
ReportingQuartoStatic reports and dashboards from version-controlled notebooks

Project Kickoff Steps

Foundation Setup

  1. Apply Isolation - Follow ADR 001: Application Isolation for data processing network and account boundaries
  2. Deploy Workloads - Follow ADR 002: AWS EKS for Cloud Workloads for scheduled pipeline jobs when local or CI execution is not sufficient
  3. Configure Infrastructure - Follow ADR 010: Infrastructure as Code for buckets, table resources, permissions, and deployment environments
  4. Setup Storage and Access - Follow ADR 018: Database Patterns for object storage, DuckLake, S3 Tables, and Iceberg-compatible access layers

Security & Operations

  1. Configure Secrets - Follow ADR 005: Secrets Management for source system credentials and scoped storage access
  2. Setup Logging - Follow ADR 007: Centralised Security Logging for pipeline runs, data access, and failures
  3. Setup Backups - Follow ADR 014: Object Storage Backups for datalake backup, replication, and recovery objectives
  4. Apply Data Governance - Follow ADR 015: Data Governance Standards for ownership, quality, classification, and retention

Development Process

  1. Configure CI/CD - Follow ADR 004: CI/CD Quality Assurance for automated testing and deployment
  2. Setup Releases - Follow ADR 009: Release Standards for versioned pipeline changes, release notes, promotion, and data-impact notes
  3. Publish Analytics - Follow ADR 017: Analytics Tooling Standards for Quarto reports and dashboards

Implementation Details

Access Layer Selection:

  • Use DuckDB + DuckLake for local development, scheduled jobs, notebooks, and simpler analytical workloads
  • Use S3 Tables for managed Iceberg tables when AWS-managed table maintenance, catalog integration, or multi-engine access is required
  • Use Trino or another Iceberg-compatible query engine when many users or services need concurrent SQL access
  • Keep transformation logic in Ibis where practical so the same code can move between access layers

Data Quality:

  • Validate schemas and business rules during ingestion and transformation
  • Run schema and sample-data checks in CI/CD
  • Track lineage through transformation code, table names, and release notes

Operations:

  • Partition tables by common query and retention boundaries
  • Use lifecycle policies and replication for backup and cost control
  • Start with the simplest access layer and add distributed query only when measured concurrency or scale requires it

Reference Architecture: Identity Management

Status: Proposed | Date: 2025-07-29

When to Use This Pattern

Use when building:

  • Applications requiring user login via government or enterprise identity providers
  • Single sign-on across multiple services
  • Integration with Australian Government Digital ID or verifiable credentials
  • Services that need separate standard-user and privileged-user access paths

Do not use this pattern to create a new identity provider when a managed identity platform can meet the need.

Overview

Use a managed identity platform as the relying-party integration point for applications. Add a broker layer only when the service must normalise multiple upstream identity providers, verifiable credentials, or policy requirements that applications should not implement directly.

Identity Federation Pattern

The pattern separates standard user authentication from privileged administration. Both paths issue standard OIDC/SAML claims to downstream applications, with audit logging and policy enforcement at the managed platform or broker layer.

Key Benefits:

  • Single integration point for multiple upstream providers
  • Standard OIDC/SAML interface for downstream applications
  • Separate privileged and standard user domains
  • Centralised policy enforcement and audit logging

Core Components

flowchart TB
    subgraph standard[Standard User Domain]
        users[Users]
        providers[Identity Providers]
    end

    subgraph privileged[Privileged User Domain]
        admins[Administrators]
        pim[Privileged Identity Management]
    end

    platform[Managed Identity Platform / Broker]
    apps[Applications]

    users -->|authenticate| providers
    providers -->|OIDC/SAML claims| platform
    admins -->|elevate| pim
    pim -->|privileged claims| platform
    platform -->|validated tokens| apps

Project Kickoff Steps

  1. Define Trust Boundaries - Follow ADR 001: Application Isolation to separate identity runtime, application runtime, and administrative access paths
  2. Deploy Runtime - Follow ADR 002: AWS EKS for Cloud Workloads only for broker components that cannot be provided by a managed platform
  3. Configure Identity Federation - Follow ADR 013: Identity Federation Standards for OIDC-first integration, SAML fallback, claim mapping, and downstream consumer configuration
  4. Configure Persistence - Follow ADR 018: Database Patterns for broker state, session metadata, and configuration storage where required
  5. Secure Secrets and Logs - Follow ADR 005: Secrets Management for OIDC client secrets and ADR 007: Centralised Security Logging for authentication audit trails
  6. Privileged Administration - Follow ADR 012: Privileged Remote Access for break-glass and administrator access

Implementation Considerations

Provider and Claim Design:

  • Prefer OIDC for new integrations; use SAML only where the upstream provider cannot support OIDC
  • Define required claims, optional claims, and claim transformation rules before application integration
  • Avoid persistent cross-service identifiers unless there is a lawful and documented need
  • Keep application authorization decisions close to the application while centralising authentication and identity proofing

Privacy & PII Protection:

  • Minimise collected identity attributes to what each service needs
  • Prevent tracking across services using persistent identifiers unless explicitly justified
  • Prohibit disclosure of identity information for marketing purposes
  • Ensure voluntary Digital ID participation where legislation requires it
  • Define breach notification and fraud incident response processes

Assurance and Administration:

  • Match identity proofing and authentication levels to transaction risk and data sensitivity
  • Separate standard user login from privileged administration and support step-up authentication for high-risk actions
  • Maintain audit trails for login, consent, claim release, privilege elevation, and administrative configuration changes
  • Implement fallback authentication for critical services

Standards Compliance:

Reference Architecture: OpenAPI Backend

Status: Proposed | Date: 2025-07-28

When to Use This Pattern

Use when building:

  • Backend services consumed by web, mobile, or system clients
  • Services requiring clear separation between public and administrative operations
  • APIs that need generated documentation, contract testing, and stable versioning

Do not use this pattern for static content delivery, event-only systems, or simple data extracts that do not need a request/response API.

Overview

Build APIs from an OpenAPI contract generated or validated in CI. Expose standard user operations through a protected public API and keep administrative operations on a separate endpoint, authentication realm, and network path.

Core Components

flowchart LR
    clients[API Clients]
    edge[CDN + WAF]
    api[Standard API]
    admin[Admin API]
    db[Managed Database]
    logs[Security Logs]
    admins[Administrators]

    clients --> edge -->|user operations| api
    admins -->|privileged operations| admin
    api -->|application data| db
    admin -->|configuration| db
    api -->|audit events| logs
    admin -->|admin audit events| logs

Standard APIs (api.example.com/v1/*): Business operations for authenticated users or system clients.

Admin APIs (admin.example.com/v1/*): System management for privileged users. Do not expose admin APIs directly to the Internet.

Project Kickoff Steps

  1. Infrastructure Foundation - Follow ADR 001: Application Isolation and ADR 002: AWS EKS for Cloud Workloads for runtime and environment separation
  2. API Standards - Follow ADR 003: API Documentation Standards for OpenAPI generation, validation, and testing
  3. Identity Federation - Follow ADR 013: Identity Federation Standards for separate standard and privileged authentication realms
  4. Edge Protection - Follow ADR 016: Web Application Edge Protection for WAF, rate limiting, TLS, and public API protection
  5. Database & Secrets - Follow ADR 018: Database Patterns for managed persistence and ADR 005: Secrets Management for runtime secrets
  6. Logging & Monitoring - Follow ADR 007: Centralised Security Logging for user, system, and admin audit trails

Implementation Details

API Contract:

  • Generate or validate OpenAPI specifications in CI for every API change
  • Version public routes with stable prefixes such as /v1
  • Use standard schema types and consistent error responses
  • Publish documentation from the same specification used for tests

Security Boundaries:

  • Keep admin APIs on separate hostnames, routes, authentication realms, and network controls
  • Apply least-privilege database access separately for standard and admin operations
  • Validate all request bodies, path parameters, query parameters, and response schemas
  • Log security-relevant user, system, and administrative events

Operations:

  • Use health, readiness, and dependency checks for deployment automation
  • Apply rate limits by client, route, and risk level
  • Test contract compatibility before deployment and document breaking changes in release notes

ADR ###: Specific Decision Title

Status: Proposed | Date: YYYY-MM-DD

Context

What problem are we solving? Include background and constraints.

Decision

What we decided and how to implement it:

  • Requirement 1: Specific implementation detail
  • Requirement 2: Configuration specifics
  • Requirement 3: Monitoring approach

Consequences

Positive:

  • Benefit 1 with explanation
  • Benefit 2 with explanation

Negative:

  • Risk 1 with mitigation
  • Risk 2 with mitigation

Reference Architecture: Pattern Name

Status: Proposed | Date: YYYY-MM-DD

When to Use This Pattern

Clear use case description for when to apply this architecture.

Overview

Brief template description focusing on practical implementation.

Core Components

flowchart LR
    source[Data Sources]
    process[Processing Layer]
    output[Output Systems]

    source -->|ingest data| process
    process -->|deliver results| output

    style source fill:#e3f2fd,stroke:#1976d2
    style process fill:#e8f5e8,stroke:#388e3c
    style output fill:#f3e5f5,stroke:#7b1fa2

Project Kickoff Steps

  1. Step Name - Follow relevant ADRs for implementation
  2. Next Step - ADR needed for missing standards
  3. Final Step - Reference to existing practices

Contributing Guide

When to Create ADRs

Create ADRs for foundational decisions only:

  • High cost to change mid/late project
  • Architectural patterns and technology standards
  • Security frameworks and compliance requirements
  • Infrastructure patterns that affect multiple teams

Do not create ADRs for:

  • Implementation details (use documentation)
  • Project-specific configurations
  • Operational procedures that change frequently
  • Tool-specific guidance that belongs in user manuals

Quick Workflow

  1. Open in Codespaces - Automatic tool setup
  2. Get number - just next-number
  3. Create file - ###-short-name.md in correct directory (see content types)
  4. Write content - Follow template below
  5. Lint - just lint to fix formatting, check SUMMARY.md, and validate links
  6. Add to SUMMARY.md - Include new ADR in navigation (required for mdBook)
  7. Submit PR - Ready for review

Useful Commands

just --list      # Show all available commands
just next-number # Get next ADR number
just check-summary # Verify SUMMARY.md includes all markdown files
just lint        # Run checks and fixes
just serve       # Preview locally on port 8080
just build       # Build website and print view

AI-Assisted Contributions

AI tools may help draft or review ADRs, but a human contributor remains responsible for the final content.

flowchart LR
    setup[Environment Setup]
    create[Content Creation]
    validate[Validation]
    publish[Publication]

    setup --> create --> validate --> publish
    validate -->|fix issues| create

    style setup fill:#e3f2fd
    style create fill:#e8f5e8
    style validate fill:#f3e5f5
    style publish fill:#fff3e0

Project Notes

  • Documentation is built with mdBook
  • Navigation is defined in SUMMARY.md; new ADRs must be added there
  • just build creates the website and a single-page print view
  • Use Mermaid diagrams where a simple visual explanation is clearer than text alone

Directory Structure

DirectoryContent
development/API standards, CI/CD, releases
operations/Infrastructure, logging, config
security/Isolation, secrets, AI governance
reference-architectures/Project kickoff templates

Content Types: When to Use What

ADRs (Architecture Decision Records)

Purpose: Document foundational technology decisions that are expensive to change
Format: ###-decision-name.md in development/, operations/, or security/
Examples: “AWS EKS for workloads”, “Secrets management approach”, “API standards”

Reference Architectures

Purpose: Project kickoff templates that combine multiple existing ADRs
Format: descriptive-name.md in reference-architectures/
Examples: “Content Management”, “Data Pipelines”, “Identity Management”

Rule: Reference architectures should only link to existing ADRs, not create new ones.

ADR Template

See templates/adr-template.md for the complete template.

Note: ADR numbers are globally unique across all directories (gaps from removed drafts are normal)

Reference Architecture Template

See templates/reference-architecture-template.md for the complete template.

Quality Standards

Before submitting:

  • Title is concise (under 50 characters) and actionable
  • All acronyms defined on first use
  • Active voice (not passive)
  • Passes just lint without errors

Title Examples:

  • GOOD: “ADR 002: AWS EKS for Cloud Workloads” (concise, ~30 chars)
  • GOOD: “ADR 008: Email Authentication Protocols” (specific, clear)
  • BAD: “ADR 004: Enforce release quality with CI/CD prechecks and build attestation” (too long)
  • BAD: “Container stuff” or “Security improvements” (too vague)

Status Guide

StatusMeaning
ProposedUnder review
AcceptedActive decision
SupersededReplaced by newer ADR

ADR References

Reference format:

  • [ADR 005: Secrets Management](../security/005-secrets-management.md)
  • Quick reference: per ADR 005
  • Multiple refs: aligned with ADR 001 and ADR 005

Examples:

Writing Tips

  • Be specific: “Use AWS EKS auto mode” not “Use containers”
  • Include implementation: How, not just what
  • Define scope: What’s included and excluded
  • Reference standards: Link to external docs
  • Australian English: Use “organisation” not “organization”, “jurisdiction” not “government”
  • Character usage: Use plain-text safe Unicode - avoid emoji, smart quotes, em-dashes for print page compatibility
  • Mermaid diagrams: Use Mermaid for diagrams with clean syntax and universal compatibility
    • Use when text alone isn’t sufficient (system relationships, data flows, workflows)
    • Keep simple: 5-7 components max, clear labels, logical flow
    • Use flowchart TB for compact layouts, flowchart LR for flows
    • Use style directives for color styling, keep labels short

Compliance Mapping

This table maps ADRs to specific controls and requirements in Western Australian and Australian compliance frameworks.

ACSC Information Security Manual (ISM)

ADRTopicISM Guidelines & Control IDsKey Controls
001 IsolationApplication isolationGuidelines for Networking (ISM-1182, ISM-0535, ISM-1277, ISM-1517)Network segmentation, micro-segmentation, preventing bypass of controls
002 WorkloadsCloud workloadsCloud Computing Security (ISM-1588, ISM-1589, ISM-1452, ISM-0499)Cloud security assessment, multi-tenant isolation, virtualisation hardening
004 CI/CDBuild and releaseGuidelines for Software Development (ISM-1256, ISM-0400, ISM-1419, ISM-2032)Secure development lifecycle, environment segregation, automated testing
005 SecretsSecrets managementGuidelines for Cryptography (ISM-0507, ISM-0488, ISM-0518, ISM-1090)Key management, secure storage of secrets, key rotation
007 LoggingSecurity loggingGuidelines for System Monitoring (ISM-0580, ISM-1405, ISM-1985, ISM-0988)Event logging policy, centralised logging, log protection, time synchronisation
008 Email AuthEmail authenticationGuidelines for Email (ISM-0574, ISM-1151, ISM-1540, ISM-0259)SPF, DKIM, DMARC, email encryption
010 IaCInfrastructure as codeGuidelines for System Hardening (ISM-1211, ISM-1409, ISM-1383)Configuration management, automated deployment, drift detection
011 AI Tool and Agent GovernanceAI tool and agent governanceGuidelines for Software Development (ISM-2074, ISM-1755, ISM-0226)AI usage policy, supply chain risk management, software assessment
012 Privileged AccessPrivileged accessGuidelines for System Management (ISM-1175, ISM-1507, ISM-1483, ISM-1173)Restricting privileged access, JIT access, jump servers, MFA for admins
013 IdentityIdentity federationGuidelines for Personnel Security (ISM-0418, ISM-1173, ISM-1420, ISM-1505)Authentication, MFA, federated identity trust, credential management
016 Edge ProtectionWAF and CDNGuidelines for Gateways (ISM-1192, ISM-1262, ISM-1460)Web application firewalls, traffic inspection, DDoS protection

ACSC Agentic AI Guidance

The ACSC Careful adoption of agentic AI services guidance recommends aligning agentic AI risks with existing security models, avoiding broad access to sensitive data or critical systems, and starting with low-risk, non-sensitive tasks.

ADRGuidance Alignment
011 AI Tool and Agent GovernanceLow-risk adoption, least privilege, human approval gates, sandbox testing, monitoring and audit logs, trusted component inventories, isolation of high-risk agents

WA Government Cyber Security Policy (WA CSP)

The 2024 WA Government Cyber Security Policy defines baseline cyber security requirements for WA Government entities.

ADRWA CSP RequirementSection
001 IsolationCyber security context & risk management2.1, 2.2
002 WorkloadsSupply chain risk, data offshoring2.3, 1.5
005 SecretsInformation security (Cryptography)3.1
006 Policy EnforcementCyber security governance1.4
007 LoggingContinuous monitoring4.2
011 AI Tool and Agent GovernanceSupply chain risk management2.3
012 Privileged AccessIdentity and access management3.6
013 IdentityIdentity and access management3.6

Implementation Guidance:

WA Government AI Policy

The WA Government AI Policy and Assurance Framework requires AI Accountable Officers and self-assessments for AI projects.

ADRWA AI Policy Requirement
011 AI Tool and Agent GovernanceAI Accountable Officer, AI Assurance Framework self-assessment
015 Data GovernanceData quality validation for AI systems

Key Requirements:

Privacy and Responsible Information Sharing (PRIS)

The Privacy and Responsible Information Sharing (PRIS) framework governs personal information handling and upcoming statutory requirements.

ADRPRIS Alignment
007 LoggingMinimise PII in logs (Data Minimisation)
013 IdentityData minimisation, consent protocols
015 Data GovernanceInformation classification, retention schedules

Digital ID Act 2024 (Commonwealth)

The Digital ID Act 2024 establishes privacy safeguards for the Australian Government Digital ID System (AGDIS).

ADRDigital ID Act Requirement
013 IdentityData minimisation (s15), no single identifiers (s16), voluntary participation (s18), biometric safeguards (Part 4)

Key Privacy Safeguards:

  • Prohibit collection beyond identity verification requirements
  • Prevent tracking across services using persistent identifiers
  • Users cannot be required to create a Digital ID for service access (voluntary)
  • Strict restrictions on collection, use, and disclosure of biometric information

Additional Resources

Glossary

Acronyms and Definitions

ACSC - Australian Cyber Security Centre
ADR - Architecture Decision Record
API - Application Programming Interface
ATT&CK - Adversarial Tactics, Techniques & Common Knowledge (MITRE)
AWS - Amazon Web Services
BIMI - Brand Indicators for Message Identification
CDN - Content Delivery Network
CI/CD - Continuous Integration/Continuous Deployment
CNCF - Cloud Native Computing Foundation
DBaaS - Database as a Service
DGOV - Office of Digital Government (Western Australia)
DKIM - DomainKeys Identified Mail
DMARC - Domain-based Message Authentication, Reporting and Conformance
DNS - Domain Name System
DTT - Digital Transformation and Technology Unit
EKS - Elastic Kubernetes Service (AWS)
ETL - Extract, Transform, Load
GCP - Google Cloud Platform
IAM - Identity and Access Management
IAP - Identity-Aware Proxy
ISM - Information Security Manual (ACSC)
JIT - Just-In-Time
OIDC - OpenID Connect
OWASP - Open Web Application Security Project
PII - Personally Identifiable Information
PITR - Point-in-Time Recovery
PKCE - Proof Key for Code Exchange
RDP - Remote Desktop Protocol
RPO - Recovery Point Objective
RTO - Recovery Time Objective
SAML - Security Assertion Markup Language
SBOM - Software Bill of Materials
SIEM - Security Information and Event Management
SPF - Sender Policy Framework
SSO - Single Sign-On
TLS - Transport Layer Security
VMC - Verified Mark Certificate
VPN - Virtual Private Network
WAF - Web Application Firewall
WCAG - Web Content Accessibility Guidelines