Interview questions

SQLQuery Interview Questions: 25 Answers for Talking Through the Work

August 6, 2025Updated May 15, 202623 min read
Can Sqlquery Be The Secret Weapon For Acing Your Next Interview

Master SQL interview questions with 25 answers that explain JOINs, GROUP BY, NULLs, subqueries, window functions, and performance trade-offs.

Getting the query right is not the same as passing the interview. SQL query interview questions trip up mid-level candidates not because the SQL is hard, but because the explanation is — and most prep resources stop at the syntax.

The gap shows up in a specific moment: you write a clean JOIN, the result set looks right, and then the interviewer asks "why did you choose a LEFT JOIN here?" and you pause for three seconds too long. Not because you don't know. Because you've been practicing queries, not conversations. That's the actual problem this article addresses.

What follows is a narration playbook for SQL interview questions — how to talk through the reasoning, name the trade-offs, and handle the follow-up before it becomes a follow-up. The query is the entry ticket. The explanation is what gets you the offer.

What Interviewers Are Really Testing When They Ask SQL Questions

What are they actually listening for beyond the final query?

SQL interview questions are rarely a test of whether you can recall syntax. The interviewer can read the query. What they can't see — and what they're specifically listening for — is whether you understand the shape of the data before you write a single line.

The real signal is three things: Can you name the grain of the result? Can you avoid hidden double-counting? And can you explain the logic without sounding like you're reciting a cheat sheet you memorized the night before?

Naming the grain means saying out loud what one row in the result represents. "One row per customer per month" is a grain statement. It tells the interviewer you thought about the output before you thought about the join. Candidates who skip this step often produce technically correct queries that answer a slightly different question than the one asked — and they don't notice until the interviewer points it out.

Why a correct answer can still feel weak

Consider a simple orders-and-customers dataset. The candidate is asked to return all customers and their most recent order. They write an INNER JOIN on customer_id, add an ORDER BY on order_date, and return the result. The query runs. The numbers look reasonable.

Then the interviewer asks: "What happens to customers who have never placed an order?" Silence. The INNER JOIN silently excluded them, and the candidate never named that choice. A strong answer would have said, before writing anything: "I'll use a LEFT JOIN here because I want to preserve customers even if they have no matching orders — otherwise I'd be filtering them out without meaning to."

That sentence — said before the code — is the difference between a correct answer and a strong one.

Which follow-up questions usually show up next?

Take the prompt "count active users by month." A candidate writes a GROUP BY on month, filters on an is_active flag, and returns a count. Correct. The interviewer's next three questions are almost always: What does "active" mean here — is it based on login date, a flag, or transaction history? What happens if a user was active in multiple months — are they counted once or multiple times? And how does this scale if the events table has 500 million rows?

Those aren't gotcha questions. They're the second half of the interview. The candidates who handle them well are the ones who already mentioned one of those concerns before being asked. "I'm assuming 'active' means at least one login in the calendar month — let me know if the definition is different" buys you credibility before the interviewer even opens their mouth.

How to Narrate Your Approach Before You Write a Single Line of SQL

What should you say before you touch the keyboard?

Good SQL interview prep teaches you to slow down before the first keystroke. The verbal equivalent of pseudocode is not optional — it's the thing that separates candidates who sound like engineers from candidates who sound like they're taking a quiz.

The opening move is to restate the goal in your own words. "So the ask is to find the top five products by revenue for each sales region, for Q3 only." That sentence does two things: it confirms you understood the prompt, and it gives the interviewer a chance to correct you before you spend five minutes solving the wrong problem.

Then name the unit of analysis. Using a sales funnel dataset as an example: "One row in my result will represent one product within one region — so I need to make sure I'm not mixing region-level aggregates with product-level aggregates in the same GROUP BY." That's the kind of thing a senior engineer says naturally. It signals you've thought about the output before the query.

How do you sound structured instead of rehearsed?

The shape of a clean verbal flow is: goal → grain → joins → filters → aggregation → edge cases. That's not a script. It's a sequence of decisions, and naming them in order is what sounds structured rather than memorized.

The canned version sounds like: "I'll use a subquery to get the max order date per customer and then join back." That might be the right answer. But it sounds like a template applied without thought. The structured version sounds like: "I want one row per customer with their most recent order. I could do that with a subquery on max order date, or I could use ROW_NUMBER and filter to rank 1 — the window function approach is easier to extend if we later want the top three orders instead of just one. I'll go with ROW_NUMBER." Now the interviewer knows you considered alternatives and chose deliberately.

What if you need to think out loud without rambling?

The "latest order per customer" problem is a good one to practice this on, because there are at least three valid approaches and the choice between them is worth narrating.

The useful kind of pause sounds like: "I know I want exactly one row per customer. I'm deciding between a self-join on max date, a correlated subquery, and a window function. The self-join gets messy if there are ties. The correlated subquery is clean but runs once per customer row, which gets expensive on large tables. I'll use ROW_NUMBER with PARTITION BY customer_id ORDER BY order_date DESC — that's readable and the optimizer handles it well." That's not rambling. That's thinking. Interviewers reward it.

SQL JOIN Questions: Say Why the Rows Line Up the Way They Do

How do you explain INNER, LEFT, RIGHT, and FULL JOIN without sounding memorized?

SQL query interview questions about joins are almost always checking whether you can predict the output, not recite the definition. The plain-language version: an INNER JOIN returns only rows that match on both sides. A LEFT JOIN returns all rows from the left table, with NULLs on the right where there's no match. RIGHT JOIN is the mirror. FULL JOIN returns everything from both sides, matched where possible, NULLs where not.

The follow-up is always about unmatched rows. In an employees-and-departments dataset: "What happens to employees who haven't been assigned to a department yet?" If you used INNER JOIN, they disappear. If that's intentional — say so. If it's not — LEFT JOIN is the fix, and you should have named that before writing the query.

When does a join answer turn into a duplicate problem?

The structural mistake is joining two many-side tables without naming the grain first. On an orders-and-order_items dataset: if you join orders to order_items on order_id, and each order has five line items, every order-level column now appears five times in the result. Sum the revenue at that point and you've multiplied it by five without realizing it.

The explanation that impresses interviewers is: "Before I join these two tables, I need to decide whether I'm working at the order level or the line-item level. If I need both, I'll aggregate order_items first — get total value per order — and then join that summary to orders. That way I'm joining one row to one row and there's no explosion." That's a grain statement. It's the most important thing you can say about a join.

What do you say when the interviewer asks why not a subquery?

On a one-to-many customer activity example: if the goal is to return all customers alongside their total activity count, a JOIN with a GROUP BY is often cleaner than a correlated subquery because it's easier to read, easier to extend, and the query planner handles it predictably. The subquery runs once per outer row; the join aggregates once and is done.

Say that. "I'm using a JOIN here rather than a correlated subquery because the subquery would execute once per customer row — that's fine for a small table, but if this table has a million customers, I'd rather let the optimizer handle one aggregation pass." That answer is defensible. "I just prefer joins" is not.

GROUP BY and HAVING: Explain the Filter You Want, Not the One You Wish You Had

Why does WHERE fail when HAVING is the right answer?

The timing mismatch is the whole story. WHERE filters individual rows before the database groups them. HAVING filters groups after aggregation. If you need "customers with more than three orders," WHERE can't see the count yet — it runs before the count exists. HAVING can, because it runs after GROUP BY collapses the rows into groups.

SQL questions and answers on this topic almost always include a trap query where WHERE is used on an aggregate alias and the database throws an error. The clean explanation: "I'm filtering on a COUNT, which means the filter has to happen after grouping — so this is a HAVING clause, not a WHERE clause. WHERE doesn't know what COUNT(order_id) is yet at the point it runs."

How do you talk through GROUP BY without losing the room?

Monthly revenue by product category is a good prompt for this. The verbal version: "I'm grouping by month and category, so one row in my result represents one category in one month. The thing I'm measuring is total revenue — so I'll SUM the order value. I need to make sure every non-aggregated column in my SELECT is also in my GROUP BY, otherwise the database doesn't know which value to pick." That last sentence catches a common mistake and names it before the interviewer has to.

What should you say when the interviewer pushes on COUNT, SUM, and DISTINCT?

The DISTINCT trap: a candidate uses COUNT(DISTINCT email) to fix a result that's returning inflated numbers. That might work. But the cleaner explanation is that DISTINCT is patching a grain problem — the underlying query is producing duplicates because the join or the grouping is wrong, and DISTINCT is hiding that rather than fixing it.

A duplicate sign-up records example: if the users table has multiple rows per email because of a data quality issue, the right fix is to deduplicate at the source — either in a CTE before the main query, or by understanding why the duplicates exist. COUNT(DISTINCT email) gives you the right number but leaves the structural problem invisible.

Subqueries, Correlated Subqueries, and UNION ALL: Choose the Shape You Can Defend

When is a subquery the cleaner move?

A subquery earns its place when you need to isolate a condition before the main query can use it. The "above-average salary" prompt is the classic: you can't filter on AVG(salary) in a WHERE clause in the same query that computes the average, because the average doesn't exist yet. A subquery computes it first, and the outer query filters against that result. That's not a workaround — that's the right tool for the job.

The "latest purchase per user" version works the same way: compute the max purchase date per user in a subquery, then join back to the purchases table to pull the full row. Say that out loud. "I need the max date before I can filter to it, so I'll compute it in a subquery and join on both user_id and purchase_date."

When does a correlated subquery sound elegant — and when does it smell expensive?

A correlated subquery references the outer query in its WHERE clause, which means it re-executes for every row in the outer query. For a per-customer last-order lookup on a table with 10,000 customers, that's 10,000 subquery executions. It might be fine. It might be catastrophic. The honest answer is: "A correlated subquery here is readable and correct, but I'd check the execution plan before shipping it to production — if the table is large, a window function or a join to a pre-aggregated subquery is going to be faster."

That answer shows you understand the trade-off. Interviewers are not looking for "correlated subqueries are always bad." They're looking for "I know when to worry about them."

Why would you reach for UNION ALL instead of UNION?

UNION deduplicates. UNION ALL does not. Deduplication costs a sort pass. If you know the two result sets don't overlap — two event-stream tables from different source systems, for example — UNION ALL is the honest choice: it's faster and it doesn't silently discard rows that happen to look identical.

The explanation: "I'm using UNION ALL here because these two tables come from different systems and I know there's no overlap. UNION would run a deduplication step that's both expensive and unnecessary. If I'm wrong about the overlap, I'd rather find out from a row count check than have UNION hide the problem." That's a defensible answer. PostgreSQL documentation confirms that UNION ALL is the explicit non-deduplicating form and is generally faster for this reason.

NULLs, NOT IN, Duplicates, and Keys: The Traps Interviewers Love

Why does NOT IN break the moment NULL shows up?

SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. When a NULL appears in the list passed to NOT IN, every comparison involving that NULL evaluates to UNKNOWN, not FALSE. And a WHERE clause only passes rows where the condition is TRUE. So NOT IN with a NULL in the subquery silently returns zero rows — not because nothing matches, but because nothing can be proven to not match.

The blocked-users exclusion example: `WHERE user_id NOT IN (SELECT user_id FROM blocked_users)` returns nothing if any row in blocked_users has a NULL user_id. The fix is NOT EXISTS, which handles NULLs correctly because it checks for the existence of a matching row rather than comparing values. Say that in the interview. "I'm using NOT EXISTS here instead of NOT IN because if the subquery ever returns a NULL, NOT IN breaks in a way that's very hard to debug — it returns no rows at all without an error."

How do you talk about duplicate handling without sounding defensive?

Duplicates are a grain problem, not a formatting problem. If your query returns duplicate rows, something upstream is wrong — either the join is producing a many-to-many match you didn't intend, or the source table itself has duplicate records.

A customer table with repeated emails: the right question is "why are there duplicates?" before reaching for DISTINCT. Is it a data quality issue? A missing deduplication step in the pipeline? A legitimate one-to-many relationship you didn't account for? Name that out loud. "I'm seeing duplicate emails here — before I add DISTINCT, I want to understand whether this is a data quality issue or whether the join is creating the duplicates. Let me check the row count before and after the join." That's how a senior engineer talks about duplicates.

Where do primary and foreign keys fit into the answer?

Keys are about join safety. If orders.customer_id is a foreign key to customers.customer_id, you know the join is safe — every order maps to exactly one customer, and you won't get a row explosion on that side. If the key constraint doesn't exist, you're trusting the data, and you should say so.

"I'm joining on customer_id assuming it's a one-to-many relationship — one customer, many orders. If there's a foreign key constraint enforcing that, I can trust the join. If not, I'd want to run a quick check that customer_id in orders only references valid customers before relying on this query in production."

How Window Functions Show Up When the Interviewer Wants a Level Up

How do you explain ROW_NUMBER, RANK, and DENSE_RANK in one breath?

All three assign a number to each row within a partition. ROW_NUMBER is strict: every row gets a unique number, ties broken arbitrarily. RANK skips numbers after ties — two rows tied at rank 1 mean the next row is rank 3. DENSE_RANK doesn't skip — two rows tied at rank 1, the next row is rank 2.

For a leaderboard or top-selling-products dataset, the practical question is: "Do I want exactly one row per group, or do I want to include all tied rows?" If the business rule is "show the top product per category, and if there's a tie, pick either one," ROW_NUMBER is the right tool. If the rule is "show all products tied for the top spot," RANK or DENSE_RANK is the honest answer. Say which business rule you're assuming before you write the function.

When does LAG or LEAD make the answer better than a self-join?

Month-over-month revenue change is the canonical example. A self-join on the same table with offset months works, but it's verbose and harder to read. LAG(revenue, 1) OVER (ORDER BY month) pulls the previous month's value directly, and the query reads like the business question it's answering.

The narration: "I'm using LAG here to pull the prior month's revenue into the same row as the current month, so I can compute the difference in one pass. A self-join would work too, but I'd need to join on month minus one, which gets messy with date arithmetic. LAG is cleaner and the intent is obvious." PostgreSQL window function documentation is a reliable reference for partition and ordering behavior across common dialects.

How do running totals sound natural instead of fancy?

A cumulative spend or inventory balance prompt is where SUM() OVER (PARTITION BY ... ORDER BY ...) earns its place. The explanation doesn't need to sound technical: "I want each row to show the running total up to that point — so I'm using a window SUM ordered by date. The PARTITION BY resets the running total for each customer, so I'm not accidentally accumulating across customers."

That sentence — "resets the running total for each customer" — is the thing the interviewer wants to hear. It proves you understand what PARTITION BY does, not just that you know the syntax.

What to Say When the Interviewer Asks About Performance or Indexing

What performance answer sounds credible in an interview?

The credible answer is practical, not theoretical. On a large events table: "The first thing I'd look at is whether the filter columns are indexed — if I'm filtering on user_id and event_date, I want a composite index on those two columns in that order. Then I'd check whether I'm filtering early enough in the query to reduce the rows before any joins or aggregations."

Cardinality matters too. A filter on a column with two distinct values (like a boolean is_deleted flag) won't benefit much from an index. A filter on user_id in a table with 50 million users will. Saying that in an interview — "I'd want to know the cardinality of the filter column before assuming an index helps" — is the kind of thing that signals real experience.

How do you narrate an EXPLAIN plan without pretending to be a database engine?

You don't need to memorize every node type. You need to know three things: a sequential scan on a large table is usually a problem; an index scan is usually what you want; and a bad row estimate (where the planner expects 100 rows and gets 100,000) is a sign that table statistics are stale or the query structure is confusing the planner.

The verbal version: "I'd run EXPLAIN ANALYZE and look for sequential scans on large tables, then check whether the row estimates match reality. If the planner is way off on row counts, that usually means the statistics need refreshing or the query is structured in a way that's hard to optimize." PostgreSQL EXPLAIN documentation covers the output format in detail for candidates who want to go deeper.

What should you say when the interviewer asks how you would index it?

The vague answer is "I'd add an index on the filter column." The defensible answer names the column, the order, and the access pattern. For an orders lookup query filtered by customer_id and ordered by order_date: "I'd add a composite index on (customer_id, order_date DESC). The customer_id filter narrows the scan, and the order_date ordering means the database can read the most recent rows first without a sort step."

That answer shows you understand how the query will actually use the index — not just that indexes exist.

The 30-Second SQL Answer Framework You Can Reuse in Any Round

What does a strong 30-second answer sound like?

The structure is five moves: restate the goal, name the grain, choose the shape, call out edge cases, finish with the result. On a "top customers by month" prompt, it sounds like this:

"The ask is the top five customers by revenue for each calendar month. One row in my result represents one customer in one month — so I need to make sure I'm grouping at that level. I'll aggregate orders by customer and month, then use RANK() partitioned by month to assign positions, and filter to rank ≤ 5. Edge case: if two customers are tied for fifth place, RANK will include both — I'll mention that and confirm whether the business wants exactly five rows or all tied rows."

That's thirty seconds. It covers the goal, the grain, the tool choice, and an edge case. The interviewer has almost nothing to push back on because you've already named the interesting decisions.

What are the 8 follow-up questions you should expect next?

These are the natural second half of the conversation, not surprises:

  • What happens if a customer has no orders in a given month? (Tests whether you considered NULLs and LEFT JOIN behavior.)
  • How does this scale if the orders table has a billion rows? (Tests whether you thought about indexing and filter order.)
  • Why did you use RANK instead of ROW_NUMBER? (Tests whether the choice was deliberate or default.)
  • What if the definition of "revenue" changes to net of refunds? (Tests whether the query is easy to modify.)
  • How would you handle ties at the cutoff? (Tests grain awareness and business logic clarity.)
  • Could you write this with a subquery instead of a window function? (Tests whether you understand alternatives.)
  • What does the execution plan look like? (Tests performance intuition.)
  • How would you test this query before shipping it? (Tests engineering judgment beyond syntax.)

Framing these as the natural second half of the conversation — rather than a gauntlet — is the right mental model. A candidate who has already mentioned two of these in their initial answer is demonstrating exactly the kind of thinking that separates strong from average.

How do you recover if you started off messy?

Not because the candidate didn't know SQL, but because they answered the query and forgot to narrate the reasoning — and now the interviewer is pushing back and it feels like the whole thing is unraveling.

The reset is simple. Stop. Say: "Let me step back for a second — I want to make sure I'm solving the right problem." Then restate the goal, name the grain, and continue. That's not a sign of weakness. It's a sign of someone who can debug under pressure and recalibrate without spiraling.

The compact checklist from mock interview feedback, distilled: Did you name the grain before joining? Did you say why you chose that join type? Did you mention at least one edge case before being asked? Did you explain the filter timing when using HAVING? Did you name an alternative and say why you didn't use it? Five questions. If you can answer yes to three of them, you're in the top half of candidates. If you can answer yes to all five, you're the one they call back.

How Verve AI Can Help You Prepare for Your Interview With SQL

The structural problem this article has been building toward is this: you can read every SQL reference in existence and still freeze when the follow-up comes in live. The gap isn't knowledge — it's the gap between knowing an answer and being able to deliver it coherently under real-time pressure, when someone is watching and the stakes feel high.

That's a performance skill, and performance skills require live repetition — not more reading. Verve AI Interview Copilot is built specifically for this: it listens in real-time to the actual conversation happening in your interview, reads what's on your screen, and surfaces the kind of structured guidance that helps you narrate your reasoning rather than just recall a fact. It's not a flashcard tool. It's a live thinking partner that responds to what you actually said, not a canned prompt.

For SQL prep specifically, Verve AI Interview Copilot lets you run through prompts like "top customers by month" or "explain your join choice" and get immediate feedback on whether your narration covered the grain, the edge cases, and the trade-offs — or whether you answered the query and forgot the explanation. The copilot stays invisible during the session, so the practice environment feels like the real thing. That's the repetition that closes the gap between knowing SQL and sounding like someone who lives in it.

Conclusion

The query was never the whole interview. It was the entry point — the thing that gets you to the second question, which is the one that actually matters.

The real win is sounding like someone who can think through a data problem out loud, name the decisions they're making in real time, and handle the follow-up without losing the thread. That's not a SQL skill. It's a communication skill built on top of a SQL skill, and it only comes from practicing the narration, not just the syntax.

Before your next interview, take one prompt — "top customers by month," "latest order per customer," "users with more than three purchases" — and run through it out loud using the 30-second framework. Restate the goal. Name the grain. Choose the shape. Call out one edge case. Then do it again until it sounds like thinking, not reciting.

JM

James Miller

Career Coach

Ace your live interviews with AI support!

Get Started For Free

Available on Mac, Windows and iPhone