Evolutionary Database Design and Database automigrations

Reading material

Evolutionary Database Design

https://martinfowler.com/articles/evodb.html

When is Automigration ever good?

https://www.reddit.com/r/golang/comments/x75glh/when_is_automigration_ever_good/

Working on Bob and taking a look at other ORM tools, some of them include migration systems that make changes to your database based on your defined model structs.

I struggle to think of a scenario where this is a good thing, and I’d like someone to help me out.

To be clear, while not my preferred style I can understand the schema-as-code tools that generate a migration file based on the difference between your schema and the current state of the DB.

I am specifically referring to tools that do away with migration files entirely with no way to replay/recover the state of the DB at a certain point in time.

Why it is a bad idea

It is never a good idea to do migration based on your Go structs. Database is source of the truth for database stuff, and sql databases have clearly defined language(DDL) how db schema is modified. Adding DDL to go structs tags is reinventing the wheel, without any clear benefits, and with a lot of potential issues in the future.

It is a very bad idea to not having migration files that clearly lay out the steps.

We need to know exactly what is changing in our database and how it is changing when we merge in a PR. Sure, we can interpret it from the code with automigrations, but that’s a level of effort that we don’t really want to have to put into something as essential as database structure.

Migration files in general is not such a bad thing. If they’re explicit code, you can always handle for scenarios where things like table locks might become an issue. But implicit modifications to the db are not good at all.

Migrations running automatically on a production application is a very bad idea. There’s just too many things that can go wrong and hidden gotchas. The gotchas involved in automatic migrations feel too risky to allow it run in production.

Valid use cases for it

It’s good for prototyping and in rare cases when your models are rather simple and won’t change much, or if you can afford to drop the database with new release. I’ve used automigration in production for programs that extract data from legacy systems and make it available via REST/gRPC APIs.

Use tools that generate the migration files for you from your defined schema

That approach works similar to writing the migration by hand.

e.g.

Java
1. Flyway
2. Liquibase
Go - GoLang - Working with databases
1. Bob

Liquibase

Let’s break down using Liquibase for the database in your new backend application. It’s an excellent choice for managing database schema changes in a structured and version-controlled way.

What is Liquibase?

Liquibase is an open-source, database-independent library for tracking, managing, and applying database schema changes. Think of it like version control (e.g., Git) but specifically for your database structure (tables, columns, indexes, constraints, etc.) and reference data.

Why Use Liquibase in a New Project?

Version Control for Your Schema:
1. Just like your code, your database schema evolves. Liquibase lets you track every change, know who made it, and why.
Consistency Across Environments:
1. Ensures that your development, testing, staging, and production databases all have the exact same schema version applied reliably. No more “it works on my machine” database issues.
Repeatable Deployments:
1. Automate database updates as part of your deployment process. Run liquibase update, and it intelligently applies only the necessary changes.
Collaboration:
1. Makes it easy for multiple developers to work on database changes without conflicts. Changesets are merged like code.
Database Independence (Mostly):
1. Write your changes in an abstract format (XML, YAML, JSON, or SQL), and Liquibase translates them into the appropriate SQL for your target database (PostgreSQL, MySQL, Oracle, SQL Server, etc.). Caveat: Complex or database-specific features might still require specific SQL.
Rollback Capabilities:
1. Define rollback steps for your changesets, allowing you to revert database changes if needed (though careful planning is often better than relying heavily on rollbacks).
Audit Trail:
1. The DATABASECHANGELOG table created by Liquibase provides a clear history of exactly which changes have been applied to a specific database instance.
CI/CD Integration:
1. Easily integrates into continuous integration and continuous deployment pipelines.

Core Concepts

Changelog File:
1. The central file (usually XML, YAML, JSON, or even formatted SQL) that lists the sequence of changes to be applied. This is the file you check into version control. You typically have a master changelog that includes other specific changelog files.
Changeset:
1. An atomic unit of change within a changelog file. Each changeset is uniquely identified by an id and an author attribute. Liquibase tracks which changesets have already been run.
Idempotency:
1. Liquibase ensures that each changeset is run only once against a given database by recording its execution in a tracking table.
Change Type:
1. Specific database operations defined within a changeset (e.g., createTable, addColumn, addForeignKeyConstraint, insert, sql, sqlFile). Liquibase provides many abstract change types.
DATABASECHANGELOG Table:
1. A table automatically created and managed by Liquibase in your target database. It records which changesets (id, author, filename) have been successfully executed and when.
DATABASECHANGELOGLOCK Table:
1. A table used by Liquibase to prevent multiple Liquibase instances from running concurrently against the same database, ensuring changes are applied sequentially.

Setting Up Liquibase in Your New Backend App (Common Steps):

Let’s assume a typical Java backend using Maven or Gradle.

Step 1: Add Dependencies

You’ll need the Liquibase core library and the JDBC driver for your specific database.

Maven (pom.xml):

<dependencies>
    <!-- Liquibase Core -->
    <dependency>
        <groupId>org.liquibase</groupId>
        <artifactId>liquibase-core</artifactId>
        <version>LATEST_LIQUIBASE_VERSION</version> <!-- e.g., 4.27.0 -->
    </dependency>

    <!-- Your Database JDBC Driver -->
    <!-- Example for PostgreSQL -->
    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
        <version>LATEST_POSTGRES_VERSION</version> <!-- e.g., 42.7.3 -->
        <scope>runtime</scope>
    </dependency>
    <!-- Example for MySQL -->
    <!--
    <dependency>
        <groupId>com.mysql</groupId>
        <artifactId>mysql-connector-j</artifactId>
        <version>LATEST_MYSQL_VERSION</version> e.g., 8.3.0
        <scope>runtime</scope>
    </dependency>
    -->
</dependencies>

(Replace LATEST_*_VERSION with the actual desired versions)

Step 2: Create the Master Changelog File

Create a file, typically in your project’s resource directory (e.g., src/main/resources/db/changelog).

src/main/resources/db/changelog/db.changelog-master.xml

(Example using XML):

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd">

    <!-- Include other changelog files here for organization -->
    <!-- Example: <include file="db/changelog/001-initial-schema.xml"/> -->
    <!-- Example: <include file="db/changelog/002-add-user-roles.yaml"/> -->
    <!-- Or include all files in a directory: -->
    <!-- <includeAll path="db/changelog/changes/"/> -->

    <!-- You can also put changesets directly here for simple projects -->
    <changeSet id="1" author="yourname">
         <tagDatabase tag="baseline_v1.0"/>
         <comment>Initial baseline tag</comment>
    </changeSet>

</databaseChangeLog>

Tip: It’s good practice to start with a <tagDatabase> changeset to mark the initial state or version.

Organization: For larger projects, use <include> or <includeAll> to reference separate files for different features or versions (e.g., 001-create-users-table.xml, 002-add-products-table.xml).

Step 3: Create Your First Changeset File

Let’s create the first actual schema change.

src/main/resources/db/changelog/changes/001-create-initial-tables.xml

(Example):

<?xml version="1.0" encoding="UTF-8"?>
<databaseChangeLog
    xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
        http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-latest.xsd">

    <changeSet id="001-create-users-table" author="yourname">
        <preConditions onFail="MARK_RAN">
             <not>
                  <tableExists tableName="users"/>
             </not>
        </preConditions>
        <comment>Create the main users table</comment>
        <createTable tableName="users">
            <column name="id" type="BIGINT" autoIncrement="true">
                <constraints primaryKey="true" nullable="false"/>
            </column>
            <column name="username" type="VARCHAR(50)">
                <constraints nullable="false" unique="true"/>
            </column>
            <column name="email" type="VARCHAR(100)">
                <constraints nullable="false" unique="true"/>
            </column>
             <column name="created_at" type="TIMESTAMP WITH TIME ZONE" defaultValueComputed="CURRENT_TIMESTAMP">
                 <constraints nullable="false"/>
             </column>
        </createTable>
        <rollback>
             <dropTable tableName="users"/>
        </rollback>
    </changeSet>

    <changeSet id="001-create-products-table" author="yourname">
         <preConditions onFail="MARK_RAN">
             <not>
                  <tableExists tableName="products"/>
             </not>
        </preConditions>
         <comment>Create the products table</comment>
        <createTable tableName="products">
            <column name="id" type="BIGINT" autoIncrement="true">
                <constraints primaryKey="true" nullable="false"/>
            </column>
            <column name="name" type="VARCHAR(100)">
                <constraints nullable="false"/>
            </column>
            <column name="price" type="DECIMAL(10, 2)">
                 <constraints nullable="false"/>
             </column>
             <column name="created_at" type="TIMESTAMP WITH TIME ZONE" defaultValueComputed="CURRENT_TIMESTAMP">
                 <constraints nullable="false"/>
             </column>
        </createTable>
        <rollback>
             <dropTable tableName="products"/>
        </rollback>
    </changeSet>

</databaseChangeLog>

id and author: Crucial for Liquibase to track execution. Make id unique within the entire changelog history. A good pattern is TICKET-description or SEQ-description.
preConditions: Optional but highly recommended. They check the database state before running the changeset (e.g., ensure a table doesn’t already exist). onFail=“MARK_RAN” tells Liquibase to skip the changeset but mark it as executed if the precondition fails (useful if the change was applied manually).
comment: Describe the purpose of the changeset.
rollback: Defines how to undo this specific changeset. Essential if you plan to use rollback commands.

Step 4: Configure Liquibase Execution

How you run Liquibase depends on your workflow and framework.
- Option A: Using Build Tool Plugins (Maven/Gradle) - Recommended for CI/CD
  
  Add the Liquibase plugin to your build file.
  
  Configure database connection details (URL, username, password, driver) and the path to your master changelog file. This can be done in the plugin configuration, a separate liquibase.properties file, environment variables, or system properties.
  
  liquibase.properties (Example - place in src/main/resources or project root):
```
# Path to your master changelog
changeLogFile: db/changelog/db.changelog-master.xml

# Database Connection Details
url: jdbc:postgresql://localhost:5432/mydatabase
username: myuser
password: mypassword
driver: org.postgresql.Driver
# For MySQL:
# url: jdbc:mysql://localhost:3306/mydatabase
# username: myuser
# password: mypassword
# driver: com.mysql.cj.jdbc.Driver
```
  (Remember to secure sensitive data like passwords - don’t commit them directly!) Use environment variables or a secrets manager in real applications.
  
  Run via command line:
  
  Maven:
```
mvn liquibase:update
```
  Gradle: (requires the Liquibase Gradle plugin setup)
```
gradle update
```
- Option B: Spring Boot Integration (Very Common)
  1. If you’re using Spring Boot, it has built-in auto-configuration for Liquibase.
  2. Just add the liquibase-core dependency.
  3. Configure your database connection in application.properties or application.yml as you normally would for Spring Data JPA/JDBC.
  4. By default, Spring Boot will automatically find db/changelog/db.changelog-master.xml (or .yaml) on the classpath and run liquibase update on application startup.
  5. You can customize this behavior using properties like:
```
spring.liquibase.change-log=classpath:/db/changelog/custom-master.xml
spring.liquibase.enabled=false (to disable auto-run)
spring.liquibase.user, spring.liquibase.password, spring.liquibase.url (if different from the main datasource)
```
- Option C: Standalone CLI
  1. Download the Liquibase CLI tool.
  2. Configure connection details via liquibase.properties, command-line arguments, or environment variables.
  3. Run commands like liquibase update, liquibase status, liquibase rollback, etc. Useful for manual operations or DBA tasks.
- Option D: Programmatic API
  1. Use the Liquibase Java API directly within your application code. Less common for standard migrations but offers fine-grained control.

Step 5: Run Liquibase

Execute the appropriate command or let your framework (like Spring Boot) run it. Liquibase will:

Connect to the database.
Acquire a lock using the DATABASECHANGELOGLOCK table.
Check the DATABASECHANGELOG table to see which changesets have already run.
Execute any pending changesets from your db.changelog-master.xml (and included files) in order.
Record the newly executed changesets in the DATABASECHANGELOG table.
Release the lock.

Best Practices for New Projects:

Start Early:
1. Integrate Liquibase from the very beginning.
Atomic Changesets:
1. Keep changesets small and focused on a single logical change (e.g., create one table, add one column, create one index).
Meaningful IDs/Authors:
1. Use descriptive ids and consistent author names.
Use <preConditions>:
1. Prevent errors and make changesets runnable even if partially applied manually.
Organize Changelogs:
1. Use <include> or <includeAll> with a clear naming convention (e.g., sequence numbers, feature names).
Version Control Everything:
1. Check in your changelog files and liquibase.properties (excluding sensitive data).
Test Migrations:
1. Run Liquibase against a test database as part of your build or test suite.
Contexts and Labels:
1. Use contexts (<changeSet context=“test”>) or labels (<changeSet labels=“performance”>) to selectively run changesets (e.g., only run test data insertions in the “test” context).
Avoid Destructive Changes:
1. Be cautious with changes like dropTable or dropColumn. Consider soft deletes or renaming first. If dropping, ensure data migration or backups are handled.
Understand Rollbacks:
1. While useful, complex rollbacks can be tricky. Test them thoroughly if you rely on them. Often, writing a new changeset to revert a change is safer (“roll forward”).

By following these steps and practices, we can establish a robust and reliable process for managing our database schema throughout the lifecycle of our new backend application.

Flyway

Flyway is a robust and straightforward tool that brings essential version control discipline to your backend application’s database schema, significantly improving reliability, consistency, and developer workflow.

Let’s break down using Flyway for managing databases in our backend application. It’s an excellent choice and widely adopted for this purpose.

What is Flyway?

Flyway is an open-source database migration tool. It allows you to:

Version Control Your Database Schema:
1. Treat your database schema changes (like creating tables, adding columns, altering data) like application code. You write changes in migration files, and Flyway manages applying them in the correct order.
Automate Database Updates:
1. Integrate Flyway into your application’s startup process or build/deployment pipeline to automatically bring the database schema to the required version.
Ensure Consistency:
1. Guarantee that every environment (development, testing, staging, production) has the exact same database schema version corresponding to the application code version.
Simplify Collaboration:
1. When multiple developers work on features requiring database changes, Flyway provides a clear, ordered way to manage these changes without conflicts.

Why Use Flyway for Backend Applications?

Solves Manual Scripting Pain:
1. Avoids error-prone manual execution of SQL scripts, forgetting scripts, or running them in the wrong order.
Reliability & Repeatability:
1. Migrations are applied transactionally (where supported by the DB). If a migration fails, the changes are rolled back, leaving the database in a known state. The same set of migrations always results in the same schema structure.
Visibility:
1. You always know the exact state (version) of your database schema by looking at the flyway_schema_history table Flyway creates.
Simplicity:
1. Flyway focuses on a straightforward migration-forward approach. While complex rollbacks aren’t its primary focus (often requiring “undo” migrations), its core functionality is easy to grasp and implement.
Environment Consistency:
1. Eliminates “it works on my machine” problems related to database schema differences.
CI/CD Integration:
1. Easily plugs into continuous integration and deployment pipelines for automated database updates during deployments.

How Flyway Works: Key Concepts

Migrations:

These are files containing your database changes.
Most commonly written in plain SQL (.sql files).
Can also be written in Java (for complex data transformations, procedural logic, or dynamic schema changes).
Naming Convention is Crucial: V<VERSION>__<DESCRIPTION>.sql (e.g., V1__Create_user_table.sql, V1.1__Add_email_to_user.sql, V202305181030__Add_auditing_columns.sql).
1. V: Indicates a Versioned migration (applied once).
2. <VERSION>: Unique version identifier (e.g., 1, 1.1, 2, 2023.05.18.1). Migrations are applied in version order.
3. __: Double underscore separator.
4. <DESCRIPTION>: Human-readable description of the change.
5. .sql or .java: File extension.
Repeatable Migrations: R__<DESCRIPTION>.sql (e.g., R__Create_or_update_views.sql). These run every time their checksum changes. Useful for managing views, stored procedures, functions, etc.
Immutability: Once a versioned migration has been applied to a database (e.g., in production), it should never be changed. If you need to modify something introduced in V1, you create a V2 migration to alter it. Flyway uses checksums to detect changes to applied migrations.

Schema History Table (flyway_schema_history):

Flyway automatically creates this table in your target database schema upon first run (if it doesn’t exist).
It tracks which migrations have been applied, when they were applied, whether they were successful, and their checksums.
This is how Flyway knows the current state of the database and which new migrations need to be run.

Flyway Commands:

migrate: Scans for available migrations, compares them to the schema history table, and applies any pending migrations in version order. This is the most common command.
info: Provides information about the status of migrations (applied, pending, failed).
validate: Checks if applied migrations have changed (compares checksums in the history table against migration files). Fails if inconsistencies are found.
clean: DANGEROUS! Drops all objects (tables, views, etc.) in the configured schema(s). Useful for development/testing resets, but never run this carelessly on production.
baseline: For existing databases. Tells Flyway to “ignore” all migrations up to and including a specified version, marking them as applied in the history table without actually running them. This lets you start using Flyway on a database that wasn’t previously managed by it.
repair: Repairs the schema history table, usually by removing failed migration entries or realigning checksums if needed (use with caution).

Getting Started (Typical Backend Flow)

Add Flyway Dependency: Include the Flyway library in your project (e.g., via Maven, Gradle).

Maven:

<dependency>
    <groupId>org.flywaydb</groupId>
    <artifactId>flyway-core</artifactId>
    <!-- Use the latest compatible version -->
    <version>...</version>
</dependency>
<!-- Add dependency for your specific DB driver (e.g., flyway-mysql, flyway-postgresql) -->
<dependency>
    <groupId>org.flywaydb</groupId>
    <artifactId>flyway-mysql</artifactId>
    <version>...</version> <!-- Make sure this matches flyway-core version -->
</dependency>

Configure Flyway: Provide database connection details and the location of your migration scripts.

Spring Boot: Often auto-configured if spring-boot-starter-flyway is included. Configuration goes in application.properties or application.yml:

# Database connection (usually already configured for Spring Data JPA/JDBC)
spring.datasource.url=jdbc:mysql://localhost:3306/mydatabase
spring.datasource.username=user
spring.datasource.password=secret

# Flyway configuration
spring.flyway.url=${spring.datasource.url} # Can often reuse DS properties
spring.flyway.user=${spring.datasource.username}
spring.flyway.password=${spring.datasource.password}
spring.flyway.locations=classpath:db/migration # Default location
# spring.flyway.baseline-on-migrate=true # Useful for first run on existing DB
# spring.flyway.enabled=true # Default is true if dependency present

Java API: Configure programmatically:

import org.flywaydb.core.Flyway;

// ... inside your application startup logic ...
Flyway flyway = Flyway.configure()
    .dataSource("jdbc:mysql://localhost:3306/mydatabase", "user", "secret")
    .locations("classpath:db/migration") // Location of your .sql files
    .load();

// Start the migration
flyway.migrate();

Command Line / Build Tool Plugins: Configure via command line arguments, environment variables, config files, or plugin configurations (e.g., pom.xml for Maven, build.gradle for Gradle).

Create Migration Files: Place your .sql files in the configured location (e.g., src/main/resources/db/migration).

src/main/resources/db/migration/V1__Create_initial_tables.sql:

CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT,
    username VARCHAR(100) NOT NULL UNIQUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(255) NOT NULL,
    price DECIMAL(10, 2) NOT NULL
);

src/main/resources/db/migration/V1.1__Add_email_to_users.sql:

ALTER TABLE users ADD COLUMN email VARCHAR(255);

Run Migrations:
Spring Boot:
1. By default, migrations run automatically on application startup before Hibernate/JPA initializes (ensuring the schema matches the entity definitions).
Java API:
1. Call flyway.migrate() at an appropriate point in your application’s lifecycle.
Build Tools:
1. Run mvn flyway:migrate or gradle flywayMigrate. Often done as part of a build or deployment script.
CLI:
1. flyway migrate

Best Practices

Keep Migrations Small & Focused:
1. Each migration should represent a single, logical change.
Never Edit Applied Migrations:
1. Create new migrations to make further changes or corrections. Use flyway repair carefully if checksums legitimately need updating (e.g., after fixing line endings across OSes).
Use Descriptive Names:
1. Make it easy to understand what each migration does.
Test Migrations:
1. Run migrations against a test database as part of your automated tests.
Integrate into CI/CD:
1. Automate migration execution during deployment.
Backup Production:
1. Always back up your production database before applying migrations.
Understand Transactions:
1. Most DDL statements in databases like MySQL or PostgreSQL cause implicit commits, meaning a single migration script might not run entirely within one transaction. Structure scripts accordingly. Flyway wraps each versioned migration execution in a transaction if the DB supports it.
baselineOnMigrate:
1. Useful when first introducing Flyway to an existing database.
Be Careful with clean:
1. Restrict its use to development/testing environments.

Flyway Editions

Community: Free and open-source, covers most common use cases (SQL/Java migrations, core commands).
Teams / Enterprise: Paid versions offering advanced features like Undo migrations (generate SQL to reverse a versioned migration), dry runs, batch processing, enhanced Oracle support, etc.

Links to this note

Databases