A practical Postgres unique constraint playbook for live tables: when to use it, how to clean up duplicates, build a unique index concurrently, validate.
The table already exists. It has rows. Some of those rows are duplicates. And the first thing most engineers reach for — `ALTER TABLE ... ADD UNIQUE` — is exactly the wrong move. A postgres unique constraint on a live table with dirty data will fail immediately, or worse, will block concurrent writes long enough to cause an incident before it fails.
This is not a corner case. It's the standard situation for any backend engineer working on a schema that predates the uniqueness rule they're now being asked to enforce. The good news is that PostgreSQL gives you the tools to do this safely. The bad news is that the safe path requires more steps than the obvious one, and skipping any of them tends to surface as an outage rather than a warning.
What follows is the full migration playbook: find the bad data, clean it with a deterministic rule, build the unique index without blocking writes, attach it as a constraint, and validate what already exists. Each step is load-bearing.
Use a UNIQUE Constraint When the Database Should Stop Bad Data, Not Just Detect It
What UNIQUE Actually Promises, and What PRIMARY KEY Already Covers
A UNIQUE constraint in PostgreSQL makes one promise: no two rows in the table will share the same value in the constrained column or column combination. That's it. It says nothing about row identity, nothing about how foreign keys reference this table, and nothing about whether the value is meaningful as a business identifier.
PRIMARY KEY is stricter on two counts: it implies NOT NULL, and it declares the canonical identifier for the row. The confusion happens when engineers conflate these two roles. A user's email address is not their identity — it's a property that happens to need to be unique. If you make `email` the primary key, you've made it impossible to change without cascading updates across every foreign key that references it. Keep `id` as the primary key. Add a separate UNIQUE constraint on `email`. These are different jobs.
What This Looks Like in Practice
The single-column rule on `email` is enough when every user, regardless of tenant, must have a globally unique email. But in a multi-tenant SaaS product, that's rarely the actual business rule.
Composite Rules Are Where Business Logic Gets Real
If two tenants can legitimately have users with the same email address — a common model in white-label or enterprise SaaS — the uniqueness rule is `(email, tenant_id)`, not `email` alone. That's a table-level constraint, not a column property, and the syntax reflects it:
The same logic applies to order line items. A customer can order the same product multiple times across separate orders, but within a single order, `(order_id, product_id)` should be unique. Expressing that as a composite constraint makes the business rule explicit at the schema level, not buried in application code that future engineers may not know to look for.
The PostgreSQL documentation draws the distinction between column-level and table-level constraint declarations clearly: column-level syntax is shorthand for single-column rules; table-level syntax is required the moment you need more than one column involved.
Why `ALTER TABLE ... ADD UNIQUE` Breaks on Live Tables
The Problem Is Not the Syntax — It's the Existing Data
`ALTER TABLE ... ADD UNIQUE` is the syntactically correct command. It is also the operationally wrong first move on a production table that hasn't been audited for duplicates. The moment PostgreSQL tries to enforce the constraint, it scans every existing row. If it finds two rows that violate the rule, the statement fails immediately with a duplicate key error. You've learned nothing you couldn't have learned with a `GROUP BY` query, and you've done it in a way that may have held locks while figuring it out.
What This Looks Like in Practice
Say you have a `users` table with 2 million rows and a handful of duplicate emails that crept in before the application enforced uniqueness. Running this:
produces an error along these lines:
The statement fails, but it didn't fail instantly on a large table. It scanned, it built, it found the violation, and it rolled back. During that time, it held an `ACCESS EXCLUSIVE` lock on the table, blocking every read and write.
Why Downtime Usually Appears Before the Error Does
On a busy production table, the lock is the problem more than the failure. An `ACCESS EXCLUSIVE` lock from a DDL operation queues behind any existing long-running transactions and blocks everything that comes after it. A slow migration attempt can create a lock queue that cascades into connection exhaustion and application-level timeouts before the constraint even has a chance to fail on duplicate data. The PostgreSQL documentation on explicit locking describes this behavior in detail. The operational blast radius is almost always larger than engineers expect the first time they try this on a live table.
Find the Duplicates Before Postgres Does
Don't Guess Which Row Survives — Rank Them First
Blind deletion is how migrations go wrong. If you delete duplicates without a deterministic rule for which row to keep, you risk losing the row that foreign keys reference, the row with the most complete data, or the row that the application considers canonical. Before you touch a single row, you need a ranking rule. The most common production-safe options are: keep the row with the lowest `id` (assuming auto-increment), or keep the row with the most recent `created_at`. The choice depends on your data model, but the rule must be explicit before you write the DELETE.
What This Looks Like in Practice
First, find the duplicates:
Then rank them to identify which rows to remove:
This pattern — partition by the duplicate key, order by the tiebreaker, delete anything ranked above 1 — is the standard window-function approach for duplicate cleanup. Before running the DELETE, run it as a SELECT to confirm the row count matches your audit. A table with 50 duplicate email groups and 2 rows each should produce exactly 50 deletions. If the count is different, the ranking rule is wrong.
The Real Danger Is the Race Between Cleanup and New Inserts
Cleaning old duplicates is only half the problem. If the application is still running during the migration and has no constraint preventing new duplicate inserts, you can clean the table and have it re-polluted before the constraint is in place. The safest approach is to wrap the cleanup in a transaction and add the constraint in the same transaction — but only if the table is small enough that the lock is acceptable. For large tables, the concurrent index path in the next section is the right answer. Either way, treat the window between cleanup and enforcement as the highest-risk phase of the migration.
The PostgreSQL documentation on window functions covers the `ROW_NUMBER()` pattern in detail and is worth reviewing before writing the cleanup query on a production dataset.
Build the Unique Index Concurrently, Then Attach the Constraint
The Trick Is to Let Postgres Do the Hard Work Without Blocking Writes
The straightforward argument for `ALTER TABLE ... ADD UNIQUE` is that it's one command and it's done. That argument holds on small tables with clean data. It fails on large production tables because the index build that backs the constraint requires an `ACCESS EXCLUSIVE` lock for its entire duration. On a table with tens of millions of rows, that duration is measured in minutes, not milliseconds.
The production-safe alternative is to build a unique index in Postgres using `CREATE UNIQUE INDEX CONCURRENTLY`. A concurrent index build takes longer — it makes multiple passes over the table — but it does not hold an `ACCESS EXCLUSIVE` lock during the build. Reads and writes continue normally. The lock it acquires is brief and only happens at the very end of the build when the index is ready to be marked valid.
What This Looks Like in Practice
The full migration sequence for a live table:
The `USING INDEX` clause is the key step. It converts an existing index into a named constraint without rebuilding anything. The lock required for this final `ALTER TABLE` is brief because the index is already built and valid — PostgreSQL is only updating the catalog, not scanning the table.
Why This Is Safer Than Pretending the Table Is Small
A concurrent build on a 50-million-row table might take 10–20 minutes. That's uncomfortable, but it's not an outage. The non-concurrent alternative on the same table could lock out application traffic for the same duration. The tradeoff is time versus availability, and on a production system, availability wins.
One caveat: `CREATE UNIQUE INDEX CONCURRENTLY` cannot run inside a transaction block. If your migration framework wraps statements in transactions automatically, you'll need to handle this step outside that wrapper. Also, if the concurrent build fails partway through — due to a duplicate row appearing during the build — it leaves behind an invalid index. Check `pg_indexes.indisvalid` before proceeding to the attach step. An invalid index must be dropped and rebuilt after the duplicate is resolved.
The PostgreSQL documentation on `CREATE INDEX CONCURRENTLY` covers the multi-pass behavior and the failure modes in full.
Validate the Rule Without Gambling on Existing Data
Validation Is the Step People Skip and Regret Later
Attaching the constraint is not the same as proving that every existing row obeys it. In PostgreSQL, a constraint can be in a `NOT VALID` state, which means it will enforce the rule for new writes but has not yet scanned historical rows to confirm they comply. This state exists specifically for the zero-downtime use case: you can protect new data immediately, then validate old data in a separate, lower-risk operation.
For the concurrent index path described above, the index build itself serves as the validation — if the index builds successfully, every existing row passed the uniqueness check. But for foreign key constraints and check constraints added with `NOT VALID`, the `VALIDATE CONSTRAINT` step is where you close the loop.
What This Looks Like in Practice
The `VALIDATE CONSTRAINT` command acquires a `SHARE UPDATE EXCLUSIVE` lock, which does not block reads or writes. It scans the table and confirms every row passes the constraint. If any row fails, the validation errors out and the constraint remains in `NOT VALID` state — which is exactly what you want, because it means new writes are still protected while you fix the historical data.
The Useful Detail: Validated Is Not the Same as Enforced Going Forward
A `NOT VALID` constraint still enforces the rule on INSERT and UPDATE. The only gap is historical rows. This is a deliberate design choice: it lets you add constraints to live tables without requiring a clean-history guarantee upfront. The practical implication is that you can ship the constraint in a NOT VALID state, monitor for any application errors that surface pre-existing problems, clean those rows, and then validate — all without downtime.
Handle NULLs, Partial Rules, and the Weird Business Cases Up Front
NULL Is Not a Duplicate in the Way Most People Think
By default, PostgreSQL treats NULL as unknown, and unknown values cannot be equal to each other. This means a UNIQUE constraint on a nullable column allows multiple rows with NULL in that column — each NULL is considered distinct. Most engineers expect this to mean "at most one NULL allowed." It doesn't. It means unlimited NULLs are permitted.
This behavior surprises teams who add a unique rule to an optional email column and then discover that thousands of rows with `email = NULL` all coexist without any violation.
What This Looks Like in Practice
For a nullable column where you want to allow at most one NULL, a partial unique index is the right tool:
This enforces uniqueness only among rows where `email` is not null, and ignores null rows entirely — which is usually the correct business rule for optional fields.
For a composite unique constraint on a multi-tenant model, the approach is the same but scoped to the business key:
When Postgres 15 Changes the Conversation
PostgreSQL 15 introduced `NULLS NOT DISTINCT` for unique constraints and unique indexes. With this option, NULL values are treated as equal to each other, meaning only one NULL is permitted in the column:
This changes the behavior that every prior version of PostgreSQL established. If you're running Postgres 15 or later and your business rule is "at most one user without an email," `NULLS NOT DISTINCT` is the clean solution. If you're on an earlier version, the partial index approach is the only option. The PostgreSQL 15 release notes document this change, and it's worth checking your version before assuming which behavior your constraint will exhibit.
Know What Your App Will See When Uniqueness Is Enforced
INSERT and UPDATE Fail Differently, and Your Code Should Know the Difference
A unique constraint violation is not just an INSERT problem. An UPDATE that changes a business key — say, updating a user's email to one that another user already holds — will produce the same class of error. The violation fires whenever the new value of the constrained column or columns would create a duplicate, regardless of whether it's an insert or an update operation.
What This Looks Like in Practice
Consider an API endpoint that lets a user update their email address. The handler checks whether the new email is already in use with an application-level SELECT, then issues the UPDATE. Under low traffic, this works. Under real traffic, two concurrent requests can both pass the SELECT check and both attempt the UPDATE — and one of them will hit the database constraint.
The error PostgreSQL returns looks like this:
The SQLSTATE code is `23505` — unique violation. Every mature PostgreSQL driver exposes this code, and your error handling should catch it specifically rather than treating it as a generic database error. In Python's `psycopg2`, it surfaces as `psycopg2.errors.UniqueViolation`. In Go's `pgx`, it's an `*pgconn.PgError` with `Code == "23505"`. The PostgreSQL error codes reference documents the full list.
The Race Condition Most Teams Forget to Test
Application-side prechecks — SELECT before INSERT — are not a substitute for a database constraint. They're a UX optimization that reduces the frequency of constraint errors under normal load. Under concurrent load, they're unreliable. The database constraint is the final authority, and the application should be written to handle a `23505` error gracefully rather than assuming the precheck made it impossible.
FAQ
Q: When should I use a UNIQUE constraint instead of a primary key in PostgreSQL?
Use a UNIQUE constraint when the column carries a uniqueness requirement but is not the row's canonical identifier. Email is the classic example: it should be unique, but using it as a primary key creates cascading update problems if it ever changes. Keep `id` as the primary key and add a separate UNIQUE constraint on `email`. The two roles — identity and uniqueness — are distinct, and conflating them creates schema debt.
Q: How do I add a UNIQUE constraint to a table that already has duplicate rows?
You don't start with `ALTER TABLE`. You start with a `GROUP BY ... HAVING COUNT(*) > 1` audit to find duplicates, then use a window function with a deterministic tiebreaker to identify which rows to delete, then clean the data, then build the unique index concurrently, then attach it as a constraint. Running `ALTER TABLE` before cleanup produces an immediate error and may hold locks long enough to cause an incident.
Q: What happens when a UNIQUE column contains NULL values, and how does Postgres 15 change that?
Before Postgres 15, NULL values are treated as distinct from each other, so multiple NULLs in a unique column are permitted — which surprises most engineers. Postgres 15 introduced `NULLS NOT DISTINCT`, which treats NULLs as equal, allowing at most one. On earlier versions, use a partial unique index with `WHERE column IS NOT NULL` to enforce at-most-one-null semantics.
Q: How do composite UNIQUE constraints work for business rules like email plus tenant_id or customer_id plus product_id?
A composite unique constraint enforces uniqueness across the combination of columns, not each column independently. `UNIQUE (email, tenant_id)` allows the same email to appear with different tenant IDs, but not twice with the same tenant ID. This is the right model for multi-tenant SaaS where email uniqueness is scoped to the tenant. The syntax requires a table-level constraint declaration, not a column-level one.
Q: What is the safest way to enforce uniqueness on a large live table with minimal locking?
Build the index with `CREATE UNIQUE INDEX CONCURRENTLY` first. This takes longer than a standard index build but holds no `ACCESS EXCLUSIVE` lock during the scan — reads and writes continue normally. Once the index is valid, attach it with `ALTER TABLE ... ADD CONSTRAINT ... UNIQUE USING INDEX`. The final attach step requires only a brief lock. This two-step sequence is the production-safe path for any table where downtime is not acceptable.
Q: How do UNIQUE constraints behave on UPDATE versus INSERT, and what error should my application expect?
Both INSERT and UPDATE can trigger a unique constraint violation. Any operation that would result in a duplicate value in the constrained column fires the same error: SQLSTATE `23505`, with a message identifying the constraint name and the duplicate value. Application code should catch this specific error code and handle it as a conflict, not as a generic database failure. Application-side prechecks reduce the frequency of conflicts but cannot eliminate them under concurrent load.
Q: When should I use a unique index instead of a UNIQUE constraint?
A unique index in Postgres and a UNIQUE constraint backed by an index are functionally identical for enforcement purposes — PostgreSQL implements constraints via indexes internally. The practical difference is visibility and intent: a named constraint appears in `information_schema.table_constraints` and communicates business intent clearly to other engineers. A standalone unique index is more flexible — it can be partial, it can be built concurrently before being attached, and it can be dropped without a separate constraint drop step. Use a named constraint for permanent business rules. Use a standalone unique index during migration workflows or when you need partial uniqueness.
Conclusion
The live-table problem is the real problem. Any tutorial that starts with `CREATE TABLE` and adds a UNIQUE constraint in the same statement is describing a scenario that doesn't exist in production. The actual scenario is a table with millions of rows, some of them dirty, running under live traffic, with a business requirement that just got formalized into a schema rule.
The playbook is not complicated, but it has to be done in order: audit the duplicates, rank and delete with a deterministic rule, build the unique index concurrently, attach it as a constraint, and validate. Skip the cleanup step and the index build fails. Skip the concurrent build and the lock becomes an outage. Skip the validation and you have a constraint that protects new writes but makes no guarantee about historical data.
Treat the database as the last line of defense against bad data — not the first place you hope application logic will hold. Application-side checks are optimizations. The constraint is the guarantee. Getting that guarantee in place safely, without downtime or data loss, is exactly what this workflow is designed to do.
How Verve AI Can Help You Ace Your Backend Coding Interview
The structural problem in a backend coding interview is the same one this article addresses: knowing the right answer is not enough if you can't explain the reasoning under live pressure. An interviewer asking about schema migrations doesn't want to hear "add a UNIQUE constraint" — they want to hear you walk through duplicate detection, lock behavior, concurrent index builds, and validation without being prompted. That's a performance skill, not just a knowledge skill.
Verve AI Interview Copilot is built for exactly that gap. It listens in real-time to what the interviewer is actually asking and surfaces relevant context — not a canned script, but a response to the live conversation. If the interviewer follows up on why you'd use `USING INDEX` instead of a direct constraint add, Verve AI Interview Copilot has already tracked what you said and can help you extend the answer rather than restart it. The tool stays invisible during the session, so it operates as a live thinking aid without changing how the interview looks to the other side. For backend engineers preparing to discuss PostgreSQL schema design, migration safety, or lock behavior under production load, Verve AI Interview Copilot turns preparation from memorization into rehearsed reasoning — which is the version that actually holds up when the follow-up question arrives.
James Miller
Career Coach

