SQL Fundamentals Prep Guide for Technical Interviews and AIEH AI-Augmented SQL

Note on framing: This is the first prep-guide article in the AIEH content set. Prep guides differ from topic clusters (practitioner-thought-pieces) and question explainers (single-item explainers): they are technical-foundation guides that help candidates prepare for technical assessments and interviews, with explicit cross-links to AIEH sample tests and role bundles. The framing is documented here so future prep guides (algorithms-and-data-structures, system-design) can inherit the structural pattern.

SQL fluency is one of the most-tested skills in technical interviews and one of the most under-taught at the practical production level. Tutorials cover SELECT-FROM-WHERE syntax; production work involves window functions, query plan analysis, schema-aware indexing, and the discipline of debugging slow queries against unfamiliar schemas under time pressure.

This guide covers SQL fundamentals at the depth expected for AIEH’s AI-Augmented SQL assessment and for technical interviews in Backend, Data Engineer, and Data Analyst contexts. It’s organized to support both first-time learners building toward production fluency and experienced practitioners refreshing specific dimensions before a high-stakes assessment.

Data Notice: SQL syntax and semantics described here reflect the SQL standard plus the most common implementations (PostgreSQL, MySQL/MariaDB, SQL Server, SQLite, BigQuery, Snowflake). Vendor-specific deviations exist; consult the documentation for your target database before deploying anything in production.

Who this guide is for

Three reader profiles benefit from this guide:

Candidates preparing for technical interviews in Backend, Data Engineer, Data Analyst, or Full-Stack roles. Most technical-screen processes include SQL questions, often unfamiliar schemas where the candidate has 15-30 minutes to produce non-trivial queries.
Candidates preparing for the AIEH AI-Augmented SQL assessment. The full 40-scenario assessment probes SQL fluency augmented by AI assistance — knowing when to author queries directly, when AI assistance helps, and recognizing when AI-generated SQL is subtly wrong. This guide grounds the underlying SQL skills the assessment probes. The free 5-question AI-Augmented SQL sample is takeable today.
Working engineers refreshing SQL fluency. Many engineers use ORMs daily and lose direct SQL fluency over time; refreshing before a job change or a new project where SQL is load-bearing is a common need.

Core query mechanics

SQL queries follow a logical evaluation order that differs from the written order. The SELECT-FROM-WHERE-GROUP BY-HAVING-ORDER BY-LIMIT structure reads top-to-bottom but evaluates roughly in this sequence:

FROM and JOINs identify the row source.
WHERE filters individual rows before grouping.
GROUP BY aggregates rows into groups.
HAVING filters the groups (after aggregation).
SELECT projects the final columns.
ORDER BY sorts the result.
LIMIT/OFFSET slices the sorted result.

Knowing the evaluation order matters because column aliases defined in SELECT aren’t available in WHERE (alias evaluates later than filter), and HAVING operates on aggregated groups rather than individual rows. Practitioners who internalize the evaluation order debug SQL faster than those who reason from written order.

Joins at production depth

Six join types cover essentially all join requirements:

INNER JOIN keeps only rows where the join condition matches in both tables. The default join type if you write JOIN without a qualifier in most dialects.
LEFT JOIN (LEFT OUTER JOIN) keeps all rows from the left table, with NULLs for unmatched right-table columns. The most-common join type in production work because reports often need “all customers, with their orders if any.”
RIGHT JOIN (RIGHT OUTER JOIN) is the mirror of LEFT JOIN; rarely used because flipping the table order produces a more readable LEFT JOIN.
FULL OUTER JOIN keeps unmatched rows from both sides with NULLs filling missing values. Useful for set-comparison queries; less common in routine reporting.
CROSS JOIN produces the Cartesian product (every row in the left table paired with every row in the right table). Almost always a mistake unless you specifically want the Cartesian product (rare); writing CROSS JOIN explicitly signals intent rather than typo.
SELF JOIN is a join of a table to itself, typically using aliases. Useful for hierarchical queries (employee-manager relationships) and pairwise comparisons.

A common interview-question pattern probes whether the candidate recognizes when a LEFT JOIN with WHERE right_table.column IS NULL is the idiomatic anti-join (rows in the left table without matches in the right). Practitioners with strong SQL fluency recognize this pattern reflexively; weaker candidates often try to express it with NOT EXISTS or NOT IN, both of which work but have different NULL-handling semantics.

Subqueries and CTEs

Modern SQL favors Common Table Expressions (CTEs) over subqueries for readability:

WITH recent_orders AS (
  SELECT customer_id, MAX(order_date) AS last_order
  FROM orders
  WHERE order_date > CURRENT_DATE - INTERVAL '90 days'
  GROUP BY customer_id
)
SELECT c.name, r.last_order
FROM customers c
LEFT JOIN recent_orders r ON c.id = r.customer_id;

The same logic with a subquery is often less readable. CTEs are particularly valuable for queries with multiple derived intermediate results, where naming each step clarifies intent.

Recursive CTEs handle hierarchical queries (organizational charts, file-system trees, graph traversal) using a CTE that references itself. Recursive CTEs are powerful but can have performance issues at scale; understand the cardinality of the recursion before deploying.

Window functions

Window functions are the most-underused feature of modern SQL and the most-frequently-tested in mid-to-senior technical interviews. Window functions compute values across a “window” of related rows without collapsing the result set the way GROUP BY does.

The general syntax:

SELECT
  customer_id,
  order_date,
  amount,
  SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date)
    AS running_total,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date)
    AS order_sequence
FROM orders;

Common window-function patterns:

Ranking functions (ROW_NUMBER(), RANK(), DENSE_RANK()) number rows within partitions. Useful for top-N-per-group queries.
Aggregate windows (SUM, AVG, COUNT with OVER) compute running totals, moving averages, and partition-level aggregates without collapsing rows.
Lead and lag (LEAD, LAG) access values from subsequent or prior rows within the partition. Useful for period-over-period comparisons and gap detection.
First and last value (FIRST_VALUE, LAST_VALUE) access boundary values within the window frame.

The “top N per group” interview pattern uses ranking functions in a CTE:

WITH ranked AS (
  SELECT
    department_id,
    employee_id,
    salary,
    ROW_NUMBER() OVER (
      PARTITION BY department_id ORDER BY salary DESC
    ) AS rank
  FROM employees
)
SELECT department_id, employee_id, salary
FROM ranked
WHERE rank <= 3;

Practitioners who write this query reflexively distinguish themselves from candidates who try to express it with GROUP BY (which doesn’t work) or correlated subqueries (which work but are harder to read).

Indexing fundamentals

Database performance is mostly about indexing. SQL practitioners need working knowledge of:

B-tree indexes, the default in most databases. Support equality and range queries on indexed columns and prefix matches on string columns.
Composite indexes on multiple columns. Order matters: an index on (a, b) supports queries filtering on a alone or (a, b) together, but NOT queries filtering on b alone.
Index-only scans when all queried columns are in the index, allowing the query to skip the table read.
Index selectivity — indexes are useful only if they filter to a small fraction of rows. An index on a low- cardinality column (like a boolean flag) often hurts more than it helps.
Covering indexes that include columns not part of the index key but stored in the index for index-only-scan optimization.

Reading query plans (EXPLAIN ANALYZE in PostgreSQL, EXPLAIN in MySQL) is the discipline that distinguishes production SQL practitioners from tutorial-level ones. Production work routinely involves diagnosing why a query is slow, and the answer is almost always in the query plan.

Common interview problem patterns

Six recurring interview problems worth recognizing reflexively:

Top N per group. Window functions with ranking; covered above. Variants ask for top-2 or top-3 instead of top-1.
Running totals. Window functions with cumulative sum (SUM(...) OVER (ORDER BY ...)).
Gap detection. Find missing dates in a sequence, or customers without recent orders. LEFT JOIN + IS NULL pattern, or EXCEPT/MINUS set operations.
Hierarchical queries. Recursive CTE for tree traversal; self-join for direct parent-child relationships.
Pivoting. Convert long-format data to wide-format (rows-to-columns) using conditional aggregates (SUM(CASE WHEN ...)) or vendor-specific PIVOT syntax.
Anti-join (rows in A not in B). LEFT JOIN with IS NULL, or NOT EXISTS (preferred over NOT IN because NOT IN has surprising NULL semantics).

SQL anti-patterns

Five anti-patterns produce slow or wrong queries:

SELECT * in production code. Returns every column, including unused ones; brittle to schema changes; missing index-only-scan opportunities.
Functions on indexed columns in WHERE. WHERE LOWER(email) = '[email protected]' defeats the index on email; use a functional index or restructure the query.
Implicit type conversion. WHERE customer_id = '123' (string) when customer_id is integer can defeat indexes and produce silent type-coercion in some dialects.
OR across columns without composite index support. WHERE a = 1 OR b = 2 typically can’t use a single index; consider UNION or composite indexes.
N+1 query patterns through ORMs. ORMs often produce queries that look fine in isolation but generate one query per related record at scale. Use JOIN or eager-loading features to fetch related data in a single query.

When to use AI assistance well in SQL work

The AIEH AI-Augmented SQL assessment specifically probes the discipline of using AI well rather than as an unconditional substitute for SQL fluency. Three patterns where AI assistance is most valuable:

Boilerplate generation. AI is excellent at producing the initial structure of standard queries (joins across known tables, aggregates with appropriate grouping, basic filtering). The practitioner reviews and refines.
Translating across dialects. Converting a PostgreSQL query to BigQuery or vice versa is a known AI-assistance-strong task; the practitioner verifies semantics-preserved.
Debugging unfamiliar errors. AI can quickly explain cryptic database errors and propose fixes; the practitioner verifies the fix is appropriate to the specific schema.

Three patterns where AI assistance is least valuable (and most prone to producing subtly-wrong output):

Schema-specific queries against unfamiliar databases. AI doesn’t know your schema; queries assuming standard-shaped tables can be wildly wrong against non-standard schemas. The practitioner authors the query with full schema knowledge.
Performance optimization. AI can suggest plausible optimizations but can’t verify them against your actual query plan and data distribution. The practitioner reads the query plan and decides.
Edge-case-heavy aggregations. Window-function logic with multiple partitions, NULL-handling-sensitive aggregates, and date-arithmetic edge cases are areas where AI-generated SQL is frequently subtly wrong. The practitioner authors carefully and tests against edge cases.

The pattern that distinguishes strong AI-augmented SQL practitioners: treat AI as a fast first-draft tool, not a verified-output tool. Read every AI-generated query carefully against the schema before running it.

How this maps to AIEH assessments and roles

This guide grounds the underlying SQL skills probed by AIEH’s AI-Augmented SQL assessment family (see the AI-Augmented SQL sample and the scoring methodology for how scores map onto the 300–850 Skills Passport scale).

For role-specific applications:

Backend Engineer — SQL is the highest-leverage signal after Python; production-depth SQL fluency distinguishes senior backend engineers.
Data Engineer — SQL is essentially the load-bearing skill; this guide covers the concept floor expected for the role.
Data Analyst — SQL fluency is the primary technical skill; analytical query patterns (window functions, gap detection, period-over-period) are particularly central.
DevOps / Platform Engineer — SQL appears in observability-data analysis and capacity-planning work; the analytical patterns here are most relevant.

Resources for deeper study

Three resources that reward sustained study:

PostgreSQL documentation. PostgreSQL’s manual is the most-comprehensive freely-available SQL reference; reading the relevant chapters end-to-end is among the highest-leverage learning paths for production SQL fluency. https://www.postgresql.org/docs/
Use The Index, Luke! by Markus Winand. A free online book on database indexing and query optimization that covers the practical-production-relevant material that most database courses skip. https://use-the-index-luke.com/
SQL Performance Explained, also by Markus Winand. The print companion to Use The Index, with extended treatment of query optimization and explain-plan reading.

For interview-specific practice, LeetCode’s SQL section and HackerRank’s SQL practice cover most common interview problem patterns; both are valuable for pattern recognition but should be supplemented with production-system work for full fluency.

Common pitfalls candidates fall into

Three patterns that recurring candidates fall into during SQL technical interviews:

Solving in the ORM language they know rather than SQL. Translating mental queries from Django ORM, ActiveRecord, or SQLAlchemy patterns into SQL produces awkward queries that signal weak SQL fluency. Practice writing SQL directly without the ORM as a translation layer.
Skipping the query-plan reading step. When a query is slow, the answer is in the query plan, not in guessed optimizations. Practice reading EXPLAIN ANALYZE output before encountering a real performance problem under time pressure.
Not testing against edge cases. Aggregations with NULL handling, GROUP BY interactions with HAVING, and window function frame specifications all have edge cases that surface in production. Practice authoring queries that handle the edge cases explicitly.

Takeaway

SQL fluency at production depth involves more than SELECT-FROM- WHERE syntax: window functions, indexing strategy, query plan reading, and the discipline of recognizing common interview problem patterns reflexively. AI assistance helps with boilerplate and translation but doesn’t substitute for direct SQL fluency on schema-specific work, performance optimization, or edge-case-heavy aggregation.

The AIEH AI-Augmented SQL assessment probes both the underlying SQL fluency and the AI-collaboration discipline that distinguishes practitioners who use AI well from ones who substitute it for fluency. This guide grounds the SQL side; the assessment surfaces the AI-augmented practice.

For broader treatment of AIEH’s assessment approach, see the AI-Augmented SQL sample, the scoring methodology, and the skills-based hiring evidence overview.

Sources

Codd, E. F. (1970). A relational model of data for large shared data banks. Communications of the ACM, 13(6), 377–387.
Date, C. J. (2003). An Introduction to Database Systems (8th ed.). Addison-Wesley.
Date, C. J. (2005). Database in Depth: Relational Theory for Practitioners. O’Reilly Media.
Garcia-Molina, H., Ullman, J. D., & Widom, J. (2008). Database Systems: The Complete Book (2nd ed.). Pearson.
ISO/IEC. (2016). ISO/IEC 9075:2016 — Information technology — Database languages — SQL. International Organization for Standardization.
PostgreSQL Global Development Group. (2024). PostgreSQL documentation. https://www.postgresql.org/docs/
Winand, M. (2024). Use The Index, Luke! A guide to database performance for developers. https://use-the-index-luke.com/