somewhat serious

(Nugget) Migrations Seem To Be an Invented Problem

My career thus far has been primarily working with (against?) SQL databases. One concept brought to light very early on is migrations, and to be precise, "migrations" typically refer to schema migrations. For those who are not familiar with a schema in a SQL database, it is a set of relations (usually called tables) which have concrete types. And a schema migration is moving the schema from a base state to a new state.

Typically, a "schema migration" is a means to impose an order on how a database should be brought forward to an expected state from a well known current state. The current state can be a certain order of dropping (i.e., deleting) relations or columns in relations, renaming columns in existing relations, even dropping entire databases. However, the common path which is good and expected, is adding columns or relations as model of the domain evolves.

If you've noticed the invented problem already, hats off to you! If not, I'll speak plainly.

Deletion of a previously held and valid state is an invented problem.

It was invented out of how scarce storage and memory were when databases were being developed, and it made sense then. It does not make sense now in the face of how cheap compute, memory, and storage is.

Databases nowadays enable idempotent operations for relations and accreting domain knowledge. Which means one can craft a schema which carries its various valid states for all time into the future.

If you're crying out, "but at a certain point, you're wasting storage!", you're correct. But from a wholistic standpoint, its cheaper to store invalidated information -- remember invalidated means that at one point said information was valid -- and be able to roll back your entire system to a certain point in time, without needing to coordinate snapshots, infrastructure, teams, etc... there are other nice properties which I won't get into here. Suffice to say, making your entire schema idempotent means that any application of it to the database will only accrete, it will not delete or rename. It will always be stable from a schema perspective. The schema, in essence, is quasi-immutable. Pretty neat, huh?

To give a concrete, below is a database schema, expressed as SQL, for a small chat application (using the Postgres dialect):

create table message (
  id          bigserial,
  sent        timestamptz default current_timestamp,
  sender_id   bigint,
  receiver_id bigint,
  content     text,

  primary key(id)
);

create table account (
  id     bigserial,
  handle text,

  primary key(id),
  unique(handle)
);

Folks who chat on the application want read receipts. Great!

Except, the above schema isn't idempotent... Meaning I can't apply it without the database screaming at me, "THIS TABLE ALREADY EXISTS!".

To make it idempotent, a couple things are needed: 1. if not exists needs to be added to tables. 2. Indexes need to be broken out and have if not exists added as well.

Comments are added for visual separation:

--;;
--;; Message
--;;

create table if not exists message (
  id          bigserial,
  sent        timestamptz default current_timestamp,
  sender_id   bigint,
  receiver_id bigint,
  content     text,

  primary key(id)
);

-- need to have indexes on sender/receiver
-- given they'll be queried _a lot_.

create index if not exists idx_message_sender_id on message (sender_id);
create index if not exists idx_message_receiver_id on message (receiver_id);

--;;
--;; Account
--;;

create table if not exists account (
  id     bigserial,
  handle text,

  primary key(id)
);

create index if not exists idx_account_hanle on account (handle);

Now, we can add acknowledgements (ack) to the message relation:

--;;
--;; Message
--;;

create table if not exists message (
  id          bigserial,
  sent        timestamptz default current_timestamp,
  sender_id   bigint,
  receiver_id bigint,
  content     text,

  primary key(id)
);

-- need to have indexes on sender/receiver
-- given they'll be queried _a lot_.

create index if not exists idx_message_sender_id on message (sender_id);
create index if not exists idx_message_receiver_id on message (receiver_id);

-- Adding acknowledgements (ack) for a message, only applicable to the receiver

alter table if exists message add column if not exists receiver_ack timestamptz default null;


--;;
--;; Account
--;;

create table if not exists account (
  id     bigserial,
  handle text,

  primary key(id)
);

create index if not exists idx_account_hanle on account (handle);

We can run the above against a database all day, and it won't (i.e., shouldn't) break any of the database or applications of past, present, or future.

But those that cry out: "HOW WILL I KNOW WHAT MY TABLE LOOKS LIKE?!?! IT'S HARD TO READ THE SCHEMA. OH THE HUMANITY...!?!?!".

That's what comments exist for. I'd rather maintain comments than broken infrastructure any day.

--;;
--;; Message
--;;
--;; The message relation exists to hold the set of messages
--;; between two people and only two people.
--;;
--;; message (
--;;   id           bigserial,
--;;   sent         timestamptz default current_timestamp,
--;;   sender_id    bigint,
--;;   receiver_id  bigint,
--;;   content      text,
--;;   receiver_ack timestamptz default null,
--;;
--;;   primary key(id)
--;; );
--;;

If the above is ever "out of sync" or one worries about such: get your stuff together on your team.

Go forth, and prosper. May your data always be correct and have a valid representation.

🖖


For those who thing my caricatures (yes they are such) are a bit off putting: I've encountered such discourse and lines of questioning before. It's silly the lengths some folks will go in order to not have to have some craft. Arguably, the database is where the most care/craft goes a long way.