Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ADR 015: Data Governance Standards

Status: Proposed | Date: 2025-07-28

Context

Data pipelines require governance to ensure quality and compliance. Modern approaches use code-based validation and version control rather than separate governance tools.

Decision

Use code-based data governance with git workflows. Data transformations written in Ibis are version-controlled, testable, and provide implicit lineage through code dependencies. See Reference Architecture: Data Pipelines for full implementation patterns.

Priority Focus Areas

  • Schema Contracts: Define expected schemas in code, validate in CI/CD pipeline
  • Data Lineage: Track through transformation code history in git
  • Quality Validation: Use Ibis expressions for data validation checks, run as automated tests
  • Audit Integration: Follow ADR 007: Centralised Security Logging for transformation logs

Implementation

# Example: Schema validation with Ibis
import ibis

def validate_customers(table: ibis.Table) -> ibis.Table:
    """Validate customer data before processing."""
    return table.filter(
        table.email.notnull() &
        table.created_at.notnull() &
        (table.status.isin(['active', 'inactive', 'pending']))
    )

Consequences

Benefits:

  • Data quality validation as code, testable in CI/CD
  • Lineage tracked through git history and code dependencies
  • No separate governance infrastructure to maintain

Risks if not implemented:

  • Data quality issues reaching downstream systems
  • Unable to trace data issues back to source transformations
  • Compliance gaps from undocumented data handling