From the ai augmented sql sample test
How do you detect an implicit Cartesian join in AI-generated SQL?
The item presents a multi-table SQL query produced by an AI assistant and asks the candidate to identify whether it contains an implicit Cartesian join — a join with no predicate, or with a predicate that fails to constrain the relationship between the joined tables. The competence under test is the ability to read AI-generated SQL critically, recognize the row-explosion pattern before it ships to production, and articulate why the AI got it wrong. Implicit Cartesian joins are among the most expensive bugs in AI-suggested SQL because they often pass syntactic review, sometimes pass row-count smoke tests on small fixtures, and only manifest as a problem when a previously empty table acquires data.
What this question tests
The concept being tested is the distinction between a join
that is syntactically well-formed and a join that semantically
constrains the relationship between two tables. A query that
lists multiple tables in the FROM clause without a connecting
ON predicate, or whose ON predicate is always true (e.g.,
ON 1=1, or ON a.x = a.x), produces the Cartesian product —
every row from one table paired with every row from the other.
The result is correct under the SQL standard but almost never
what the developer intended.
AI assistants generate implicit Cartesian joins in three
recurring patterns: omitting the ON clause when joining a
small dimension table that happens to have a single row in the
training-data fixture; writing ON a.id = b.id when the
foreign key actually lives on b.a_id; and using CROSS JOIN
explicitly but forgetting to add a WHERE predicate that
constrains the relationship downstream. The probe tests
whether the candidate recognizes all three.
Why this is the right answer
Consider an AI-generated query that looks reasonable on first read:
-- AI-generated: missing the ON predicate entirely
SELECT
c.customer_id,
c.name,
o.order_id,
o.amount
FROM customers c, orders o
WHERE c.region = 'NA';
The comma-separated FROM clause is the legacy SQL-89 join
syntax, which is still valid but produces the Cartesian product
when no join predicate appears in the WHERE clause. With 10,000
customers in NA and 1,000,000 orders, this query returns
10 billion rows. The fix is to add the missing predicate:
-- Corrected: predicate connects the tables
SELECT
c.customer_id,
c.name,
o.order_id,
o.amount
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE c.region = 'NA';
The right answer recognizes the legacy comma-join syntax as a
red flag and verifies that some predicate in the query
constrains the join. A subtler variant uses explicit JOIN but
with an always-true predicate:
-- AI-generated: ON clause is syntactically valid but
-- semantically vacuous
SELECT c.name, o.amount
FROM customers c
JOIN orders o ON c.region = 'NA';
Here the ON clause filters customers by region but does not
join the two tables. Each NA customer is paired with every
order, regardless of whether the order belongs to that customer.
The query runs, produces output, and may even pass a smoke test
that only checks for non-zero rows — but the result is
nonsense. The right answer recognizes that the ON predicate
must reference columns from both joined tables to constrain
the relationship.
A third diagnostic pattern: the query references a join column
that exists on neither table, but the AI has confidently
hallucinated it. Reading the schema (or running EXPLAIN)
catches this; trusting the AI’s output does not.
What the wrong answers reveal
The three incorrect options each map to a common but mistaken mental model:
- “The query is fine because it returns rows when run on a test fixture.” Empty tables, single-row fixtures, and permissive smoke tests all hide Cartesian joins. The Cartesian product of two empty tables is empty; the Cartesian product of a 1-row table and a 100-row table looks identical to a correct join in row count. Respondents picking this option have a smoke-test mental model and haven’t internalized that row-count checks don’t catch this class of bug.
- “Cartesian joins are obvious — they always show up as
CROSS JOINin the AI output.” This is true for explicit Cartesian joins. The hard cases are implicit ones where the AI used comma-join syntax or wrote a vacuousONclause. Respondents picking this option recognize the explicit form but miss the implicit forms that AI tools more commonly produce. - “Modern query optimizers reject Cartesian joins; if the query runs, it must be correctly joined.” Optimizers do warn about Cartesian products in some plan-display tools, but they do not reject the query — the result of a Cartesian product is well-defined and sometimes intentional (e.g., generating a date series cross-joined to a category list). Respondents picking this option conflate “the optimizer warns about it” with “the optimizer prevents it.”
The wrong-answer patterns cluster into three distinct misreadings of how Cartesian joins manifest in production code.
How the sample test scores you
In the AIEH 5-question AI-Augmented SQL sample, this item contributes one of five datapoints aggregated into a single ai_augmented_sql score via the W3.2 normalize-by-count threshold. Binary scoring per item: 5 for the correct option, 1 for any of the three wrong options. With 5 binary items, the average ranges 1–5 and the level threshold maps avg ≤ 2 to low, ≤ 4 to mid, > 4 to high.
Data Notice: Sample-test results are directional indicators only. A 5-question sample can’t reliably distinguish a candidate who reads AI SQL critically from one who got lucky on these specific items; for a verified Skills Passport credential, take the full 50-question assessment.
The full assessment probes Cartesian-join detection across
multiple syntax variants (comma-join, vacuous ON, missing
multi-column key, hallucinated column), tests the candidate’s
ability to read EXPLAIN output, and includes adversarial
items where the AI suggestion is correct and the candidate
must resist over-rejecting. See the
scoring methodology for how AI-Augmented SQL scores
map onto the AIEH 300–850 Skills Passport scale.
Related concepts
- Composite-key joins. When the foreign-key relationship
spans multiple columns (e.g.,
(tenant_id, user_id)), AI tools often join on only one of the columns, producing a partial Cartesian explosion within each tenant. The fix is to verify that theONclause covers every column of the composite key, not just the most-named one. USINGclause as a Cartesian-prevention idiom. WritingJOIN orders USING (customer_id)requires both tables to have a column namedcustomer_id; if the AI has hallucinated the column name on one side, the query fails at parse time rather than producing nonsense at runtime. Some teams standardize onUSINGspecifically to make this class of bug fail loudly.- Outer-join asymmetry. A
LEFT JOINwith a missing or vacuous predicate doesn’t produce a Cartesian product in the same way — it produces every row from the left table paired with every matching row from the right (orNULLwhen no match). The bug manifests as duplicated left-table rows rather than as an N×M explosion, which makes it harder to catch by row-count alone. EXPLAINplan inspection as the canonical fix. Every major engine surfaces “Nested Loop without join condition” or equivalent text in the plan output for a Cartesian join. Reading the plan is the cheapest way to verify that an AI- suggested query joins the way the candidate expects.
For the broader AI-Augmented SQL lineup, see the tests catalog. For how this skill fits into a hiring loop, see hiring loop design and interview question design.
Sources
- ISO/IEC 9075-2:2023. Information technology — Database languages — SQL — Part 2: Foundation (SQL/Foundation). International Organization for Standardization. — Cross product semantics in clause 7.7 (joined table).
- PostgreSQL Global Development Group. (2024). PostgreSQL Documentation: SELECT — FROM Clause. https://www.postgresql.org/docs/current/sql-select.html
- Oracle Corporation. (2023). Oracle Database SQL Language Reference: Joins. https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Joins.html
- Sackett, P. R., & Lievens, F. (2008). Personnel selection. Annual Review of Psychology, 59, 419–450. — Reviews evidence that work-sample tests (of which AI-augmented code review is a contemporary form) show among the highest validity coefficients for predicting job performance in cognitively demanding roles.