Correlated Subqueries
So far, we have learned subqueries that work independently. In those cases, the subquery executes first, returns a result, and then the outer query uses that result.
However, real world database problems are often more complex.
Sometimes:
- Each row needs a different condition
- Data must be checked row by row
- The subquery needs values from the outer query
In such situations, a normal subquery is not enough. This is where correlated subqueries are used.
What Is a Correlated Subquery?
A correlated subquery is a subquery that:
- Refers to columns from the outer query
- Cannot be executed on its own
- Runs once for each row processed by the outer query
In simple terms:
The outer query selects one row
The subquery uses values from that row
SQL decides whether the row should be included
Why Correlated Subqueries Are Needed
Correlated subqueries are useful when:
- Conditions depend on the current row
- Comparisons must be done per record
- You need to check whether related data exists or not
Common real-world scenarios include:
- Employees earning more than their department average
- Customers who have placed orders
- Products that were never sold
EXISTS Operator
The EXISTS operator checks whether a subquery returns at least one row.
Important points:
- It does not care about actual column values
- It only checks the existence of rows
- As soon as a match is found, SQL stops searching
Example: Employees Who Have Placed at Least One Order
SELECT e.emp_nameFROM employees eWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.emp_id = e.emp_id);Explanation
- The outer query reads one employee at a time
- The subquery checks if an order exists for that employee
- If even one order is found, the employee is selected
This makes EXISTS very efficient for large tables.
Why SELECT 1 Is Used Inside EXISTS
Inside an EXISTS subquery, SQL does not use the selected columns.
You can write:
SELECT *SELECT emp_idSELECT 1All behave the same.
However, SELECT 1 is preferred because:
- It is clean and readable
- It clearly indicates existence checking
- It is a professional best practice
NOT EXISTS Operators
NOT EXISTS works in the opposite way.
It returns TRUE when:
- The subquery returns no rows
This is useful for finding missing or unmatched data.
Example: Customers Who Have Never Placed an Order
SELECT c.customer_nameFROM customers cWHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);Explanation
- The outer query checks one customer
- The subquery searches for related orders
- If no order is found, the customer is selected
This pattern is commonly used to identify:
- Inactive users
- Unsold products
- Orphan records
Difference Between EXISTS and IN
Many beginners use IN instead of EXISTS, but there are key differences.
| Feature | IN | EXISTS |
|---|---|---|
| Execution | Subquery runs once | Runs per row |
| Performance | Slower on large data | Faster |
| NULL handling | Can cause issues | Safe |
| Correlation | Not correlated | Correlated |
For large databases, EXISTS is usually the better choice.
Correlated Subquery with Aggregation
Correlated subqueries are often combined with aggregate functions.
Example: Employees Earning More Than Their Department Average
SELECT emp_name, salaryFROM employees eWHERE salary > ( SELECT AVG(salary) FROM employees WHERE department_id = e.department_id);How This Works
- The outer query selects an employee
- The subquery calculates the average salary of that employee’s department
- Comparison is done for each row
- This is a classic real-world and interview example.
Performance Considerations
Because correlated subqueries run once per row:
- They can be expensive on very large datasets
However:
- EXISTS stops execution as soon as a match is found
- Modern databases optimize these queries efficiently
That is why EXISTS is widely used in production systems.
When Should You Use Correlated Subqueries?
Use them when:
- Row-by-row comparison is required
- You need to check existence or absence of related data
- Logical correctness matters more than simple joins
Two Minute Drill
- Correlated subqueries depend on the outer query
- They execute once per row
- EXISTS checks whether rows exist
- NOT EXISTS checks whether rows do not exist
- SELECT 1 is best practice inside EXISTS
- EXISTS is safer and faster than IN
- Very common in real world SQL and interviews