A Remedy To Poor Relations

Published on August 31, 2024

I posit one-to-many relationships are a cancer in SQL databases.

The primary reasons for the above is that one-to-many relationships as they are supported in SQL databses, mix values[1] and references. Once a reference is introduced into a table, it ceases to be simple. Reasonableness decreases as the number of references in a table, and across tables, increases.

Evolvability also becomes a problem. How do you know your addition or retraction of a column which as a reference won't break any other references? Evolving your application's worldview becomes a difficult proposition, requiring either insane amounts of trust or (more likely) insane amounts of infrastructure just to validate your SQL doesn't break your previous, current, and next versions of your application.

Any role in engineering should have discipline around validating systems as they adapt to changes in a business. I prefer discipline to be focused on solving real problems, not managing a (well intentioned) big ball of mud.

Therefore, the tradeoff to make is to:

disallow mixing references and values, only encoding relations via associative tables (née entities).
only accrete domain knowledge

The approach I favor has the properties:

Simplicity: tables are sets of named values, nothing more. They also tend to be smaller (i.e. less than 10 columns).
Composition: applications know how to compose tables into something meaningful.
Immutability: columns in non-associative tables only accrete and are not renamed.
Safety: given the first three bullets, each table's schema can be evolved without breaking or invalidating past applications' and their worldviews.

Frankly, queries aren't that much worse. There is an extra join for the association, but if everything is related via associations, a convention can be developed pretty quickly that set's a strong precedent for how data is navigated.

Some SQL

-- what leads to issues
create table products (
  id int primary key,
  name text,
);

create table inventory (
  id int primary key,
  product_id int references products(id) on delete cascade,
  count int
);

create table prices (
  id int primary key,
  product_id int references products(id) on delete cascade,
  price int,
  denomination text default 'usd/cents'
);


-- what leads to less issues
create table products (
  id int primary key,
  name text
);

create table inventory (
  id int primary key,
  count int
);

create table prices (
  id int primary key,
  price int,
  denomination text default 'usd/cents'
);

create table assoc_inventory_products (
  inventory_id int references inventory(id) on delete do nothing,
  product_id int references products(id) on delete do nothing
);

create table assoc_prices_products (
  price_id int references prices(id) on delete do nothing,
  product_id int references products(id) on delete do nothing
);

The latter is more verbose, yes. The tradeoff here is data in non-associative tables is safer, more adaptable, and simpler.

Scattered Thoughts

As a side effect, arguments about "how to model" become naming arguments, not "should this be a one-to-many? will we need a many-to-many?". The way to relate is already decided, and the problem can be addressed, not the means about how to address the problem.

Good John Blow quote, poorly paraphrased "in the 80's people solved problems because there was so much limitation from the hardware and by implication the software." Essentially if all you have is X, then you'll figure out how to solve the problem with X. However, so much time has been devoted to "how should we solve problems" the industry has lost its way and is optimizing for "operational excellence" not bringing value to humanity. The incentives of the market are aligned with who looks like they're the most valuable, even if they are a net detriment to humanity. We need more emphasis on the what and why, and less on the how.