unique constraints with `NULL`able columns
some versions of the sql standard dictate that NULL
values do not compare
equal. this can make creating unique indexes with NULL
able column members
difficult. but it can be done with this semi-clean workaround!
i wrote the original version of this document around 6 months ago as a sort of
"note to self" at work. at the time, i was working with postgresql 14. funny
enough, postgresql 15 (to be released imminently) obviates the need for the
approach described in this post; pg15 contains a new feature to allow UNIQUE
and NULL
to work better together via UNIQUE NULLS NOT DISTINCT
.
sometimes, when dealing with a sql database, we want to add a unique constraint to a table to ensure that certain kinds of rows can only appear once in that table. we might do this to offload some work from the application to the database, and so that we get a loud error from the database when something would happen to violate the unique constraint (which tips us off that our application business logic likely has a bug).
however, while unique constraints are great tool, their usefulness is sometimes
hindered by a critical piece of the sql standard pertaining to NULL
: two
NULL
values do not compare equal. since a unique constraint uses a check of
equality between to rows to see if there has been a uniqueness violation,
rows which contain NULL
able values need to be treated specially. if you
don't, two "identical" rows won't be caught as non-unique if they both have a
NULL
for a value in the same slot. (as a simple example, suppose we have a
table with 3 columns of nullable ints. trying to insert unique pairs
(5,5,5), (7,7,7)
is fine. trying to insert non-unique pairs
(5,5,5), (5,5,5)
is not fine, and the database will correctly reject such an
operation with an error. however, inserting non-unique pairs when a NULL
is
involved changes things: inserting (5,5,NULL), (5,5,NULL)
will be allowed by
the database. why? the final NULL
s do not compare as equal despite both
having the same non-value of NULL
, resulting in the rows themselves not
comparing as equal. depending on our schema and application, there are some
cases where we'd prefer the NULL
s be considered equal, and as such we'd
rather the database also reject this insert operation!
okay-- so sometimes we want a columns' NULL
values to compare as equal. well,
(in postgres) you can't; this is specified by the sql standard and postgres 14
adheres to it. an ugly way to progress would be to give up on nullability, add
NOT NULL
to every column that you want to be a part of the uniqueness
constraint, and try to salvage things (adjust your queries, propagate the
change to systems that work with the database, etc.). personally, i think that
sounds quite messy and isn't a satisfying "solution".
to actually progress, we need to replace NULL
s with a substitute value for
which equality actually works the way we want, but only in the eye of our
uniqueness constraint. we want the actual rows stored to genuinely store NULL
if the inserted value is NULL
, but the index powering the uniqueness
constraint to see a different value. can we do this? spoiler: the answer is
yes.
postgres offers functional (unique) indices to provide the "treat values used
by the index differently from the actual row value" behavior described above.
we just need a function that can convert NULL
s to a substitute value, but
leaves non-NULL
values alone. postgres has a built-in function for this:
coalesce(value1,[value2, [...]])
,
which returns the first argument passed in (valueN
) which is non-NULL
. so,
we can easily use something like coalesce(column_name, substitute)
and
know we'll either get the row's real value or our constant substitute
as a
result (always some value, never NULL
, if substitute
is hard-coded and
non-NULL
).
we need to pick a "sentinel" value substitute
for each (nullable) column we
transform in our functional unique index this way. this is actually worth
putting some thought into. ideally we'd want our sentinel value to have two
properties: 1) it is a value of the type of the column (this is actually
required for the schema to even be valid), and 2) it is a value that rows will
never take for this column, preventing any confusion / collision scenarios.
to explain, let's imagine a table that stores two columns which represent
network source and destination ports (both nullable) and a textual note;
additionally, we want port pairings to be unique across the table (even if one
or both of the values are NULL
). this means we want our unique index to
reject inserting (80, 443, 'a'), (80, 443, 'b')
(non-unique port pairing,
textual note doesn't affect uniqueness) as well as (80, NULL, 'a'), (80, NULL, 'b')
(non-unique port pairing, with NULL
s that are rejected as being the
"same").
so far we have a rough idea of what we want:
CREATE TABLE ports (
src_port smallint CHECK (src_port IS NULL OR src_port BETWEEN 0 AND 65535),
dest_port smallint CHECK (dest_port IS NULL OR dest_port BETWEEN 0 AND 65535),
note text NOT NULL
);
CREATE UNIQUE INDEX uix_port_pairs
ON ports (
coalesce(src_port, sentinel_goes_here),
coalesce(dest_port, sentinel_goes_here)
/* no `note` in unique index! doesn't factor in */
);
we need to put a smallint
in each of the sentinel_goes_here
slots (property
#1 and required for a valid schema). we don't want whatever we put there to
show up in actual day-to-day use of the table; this is important so our
functional index can tell the difference between a real row value v.s. a value
that is standing in for the row's NULL
. a trick that works excellently here
is to use a sentinel value that would be rejected by the column's check
constraint. it's actually fine to index such a value for a column, plus you
are assured that no row will ever be able to exist with a bad value because
such operations will cause CHECK
errors. since the valid domain for ports is
BETWEEN 0 AND 65535
(inclusive), how about a sentinel value of... -1
?
CREATE TABLE ports (
src_port smallint CHECK (src_port IS NULL OR src_port BETWEEN 0 AND 65535),
dest_port smallint CHECK (dest_port IS NULL OR dest_port BETWEEN 0 AND 65535),
note text NOT NULL
);
CREATE UNIQUE INDEX uix_port_pairs
ON ports (
coalesce(src_port, -1),
coalesce(dest_port, -1)
);
let's give it a spin:
/* this insert works ok! */
INSERT INTO ports (src_port, dest_port, note) VALUES
(80, 443, 'http to https'), -- indexed as (80, 443)
(NULL, 80, 'unsure to http'), -- indexed as (-1, 80)
(NULL, NULL, 'kinda vacuous!'); -- indexed as (-1, -1)
let's try to sneak a row into the table that would be a collision, meaning a
(src_port, dest_port) value pair of (80, 443)
, (NULL, 80)
, or
(NULL, NULL)
:
INSERT INTO ports (src_port, dest_port, note) VALUES
(80, 443, 'different note, still rejected'); -- index sees (80, 443)
/* => error! */
INSERT INTO ports (src_port, dest_port, note) VALUES
(NULL, 80, ''); -- index sees (-1, 80)
/* => error! */
INSERT INTO ports (src_port, dest_port, note) VALUES
(NULL, NULL, 'these NULLs compare equal in unique index'); -- index sees (-1, -1)
/* => error! */
now, let's try intentionally adding rows to our table that would interfere with our unique index by using the sentinel value in our row:
INSERT INTO ports (src_port, dest_port, note) VALUES
(-1, 80, 'trying to collide with "unsure to http"');
/* => error! check constraint violation */
INSERT INTO ports (src_port, dest_port, note) VALUES
(-1, -1, 'trying to collide with "vacuous"');
/* => error! check constraint violation */
to summarize:
sometimes we want to ensure rows are unique across a table, considering NULL
s
as the same, but the sql standard can make that difficult if many of the
involved columns of the uniqueness constraint are nullable because NULL
s do
not compare equal (including for the purposes of comparisons in a unique
constraint).
we can fix this problem by using a functional unique index and postgres's
coalesce()
function to modify the row's values in the eyes of the unique
index so that none of them contain NULL
. this requires a valid "sentinel"
value of the column's core type, which hopefully is one that a row will never
take on in that column anyway. if the relevant column has a CHECK
constraint,
a perfect choice for the sentinel value is one that is outside the valid domain
of the CHECK
.