Data is very valuable to organizations. Actionable insights that give an organization competitive advantage and help it run more efficiently can be extracted from the organization’s data. Therefore, data must be collected and stored.
Databases are an organized way to store and query data. There are two main types of databases: relational and non-relational.
Relational databases are very popular because they’re structured and data is organized in tables. The data in the various tables may have one-to-one, one-to-many, or many-to-many relationships. The way the data is organized in a relational database is defined in its schema.
Non-relational databases store data using simple key-value pairs in non-tabular form. Although relational databases support Binary Large Objects (BLOB) for storing semi-structured and unstructed data, storing and retrieving these types of data is easier with non-relational databases.
The Structured Query Language (SQL) is the language for accessing and interacting with relational databases. If you want to add, delete, edit, or query information on a relational database, the easiest way to do it is through SQL.
Relational databases dominate the database market and they are projected to grow by more than 30 percent between 2021 and 2026. SQL is the most in-demand skill for data related jobs. As more and more organizations embrace the use of relational databases, the demand for data professionals with SQL skills will continue to grow.
More often than not, you may need to combine two or more tables in a relational database to get the data needed for performing your analysis. You can combine tables in SQL with JOIN clauses. There are several SQL JOIN clauses; understanding how each of them works can be challenging.
SQL JOIN questions are, therefore, interviewers’ favorites. They tend to show up in most SQL interviews. In this tutorial, we’ll take you step-by-step through the answers to the top 20 SQL JOIN interview questions — and equip you with the knowledge to ace your upcoming SQL interviews. We’ll be writing a lot of SQL queries. This SQL Cheat Sheet will help you get up to speed if your SQL skills are a little rusty.
SQL JOINs Interview Questions
In a SQL interview, you’ll most likely be asked questions that require you to combine tables. SQL JOIN clauses are used to combine data from two or more tables in a relational database.
SQL JOIN clauses are often used when the tables share some sort of relationship. The most common conditional expression to join tables is the equality conditional (equi-join). However, you can join tables that don’t share any relationships and use other conditional expressions besides equality.
Here are some commoly asked SQL JOIN interview questions and their answers. We recommend that you draw your own diagrams and repeatedly practice these questions to master how SQL JOINs work.
1. What are SQL JOINs?
Answer
SQL JOIN clauses are used to combine rows from two or more tables. Usually, tables are joined based on a related column between them, but you can also join tables that have no relationships. The term “SQL JOIN” is used to refer to a SQL query with the JOIN
keyword. This is known as an explicit join. However, you can also join tables in SQL without using the JOIN
keyword. This is known as an implicit join.
2. What are the general explicit and implicit join notation syntaxes?
Answer
EXPLICIT JOIN:
SELECT *
FROM [left table]
[JOIN CLAUSE] [right table]
ON conditional expression;
IMPLICIT JOIN:
SELECT *
FROM [left table], [right table]
WHERE conditional expression;
The conditional expression is not required for some types of SQL JOINs.
3. What are the different types of JOINs in SQL?
Answer
The main types of SQL JOINs are:
- CROSS JOIN: matches every row of the left table with every row of the right table and returns their Cartesian product. It’s also known as Cartesian JOIN.
- [INNER] JOIN: returns rows that have matching values on both the right and left tables, based on the specified conditional expression.
- NATURAL JOIN: a type of equi-join that combines the left and right tables using common column names.
- LEFT [OUTER] JOIN: returns all the rows from the left table with the matching rows from the right table, based on the specified conditional expression.
- RIGHT [OUTER] JOIN: returns all the rows from the right table with the matching rows from the left table, based on the specified conditional expression.
- FULL [OUTER] JOIN: returns all rows from the left and the right tables, regardless of whether the conditional expression is matched.
- SELF JOIN: allows you to join a table to itself as if the tables were two different tables.
It’s worth noting that the SQL syntax placed inside square brackets is optional and can be excluded from the query. For example, the SQL query below executes an INNER JOIN
:
SELECT *
FROM left_table
JOIN right_table
ON conditional expression;
4. What type of JOIN is executed without conditional expression
in the explict and implict join notations shown below?
Explicit join:
SELECT *
FROM [left table]
[JOIN CLAUSE] [right table]
Implict join:
SELECT *
FROM [left table], [right table]
Answer
Let’s set up a SQLite database with tables. When you try to connect to a database that doesn’t exist, it’s created for you. First:
pip install ipython-sql
Then create the database and tables:
%%capture
%load_ext sql
sql sqlite:///joins.db
%%sql
-- create a students table:
CREATE TABLE students (
student_id INTEGER PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL
);
INSERT INTO students VALUES (1, "Mary", "Wilson");
INSERT INTO students VALUES (2, "Tim", "Ben");
INSERT INTO students VALUES (3, "Alice", "Robinson");
INSERT INTO students VALUES (4, "Reece", "Bells");
-- create a student_contact table:
CREATE TABLE student_contact (
student_id,
email_address,
FOREIGN KEY (student_id) REFERENCES students(student_id)
);
INSERT INTO student_contact VALUES (1, "[email protected]");
INSERT INTO student_contact VALUES (2, "[email protected]");
INSERT INTO student_contact VALUES (3, "[email protected]");
-- create a staff table:
CREATE TABLE staff (
staff_id INTEGER PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL
);
INSERT INTO staff VALUES (1, "Ada", "Lovelace");
INSERT INTO staff VALUES (2, "Adam ", "Smith");
INSERT INTO staff VALUES (3, "Nikolo", "Tesla");
-- create a staff_contact table:
CREATE TABLE staff_contact (
staff_id,
email_address,
FOREIGN KEY (staff_id) REFERENCES staff(staff_id)
);
INSERT INTO staff_contact VALUES (1, "[email protected]");
INSERT INTO staff_contact VALUES (2, "[email protected]");
INSERT INTO staff_contact VALUES (3, "[email protected]");
* sqlite:///joins.db
Done.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Done.
1 rows affected.
1 rows affected.
1 rows affected.
Done.
1 rows affected.
1 rows affected.
1 rows affected.
Done.
1 rows affected.
1 rows affected.
1 rows affected.
We’ll use the student_contact
as the left table and staff_contact
as the right table to answer this question. Let’s see them:
%sql SELECT * FROM student_contact;
* sqlite:///joins.db
Done.
%sql SELECT * FROM staff_contact;
* sqlite:///joins.db
Done.
If we don’t specify the join condition, SQL does it for us. It assumes a CROSS JOIN
. If the left table has n
rows and y
columns, and the right table has m
rows and z
columns, a CROSS JOIN
will return y + z
columns and m x n
rows. Our example will return:
This is true for JOIN clauses that don’t have conditional expressions built into them. The NATURAL JOIN
has an equality
expression built into it and will not result in a CROSS JOIN
because we don’t even need to specify the conditional expression
when using it.
Let’s evaluate whether this is true for the the following explict joins: INNER JOIN
, LEFT JOIN
, RIGHT JOIN
, and FULL JOIN
:
%%sql
-- INNER JOIN without conditional expression
SELECT *
FROM student_contact
INNER JOIN staff_contact;
* sqlite:///joins.db
Done.
%%sql
-- LEFT OUTER JOIN without conditional expression
SELECT *
FROM student_contact
LEFT OUTER JOIN staff_contact;
* sqlite:///joins.db
Done.
%%sql
-- RIGHT OUTER JOIN without conditional expression
SELECT *
FROM student_contact
RIGHT OUTER JOIN staff_contact;
* sqlite:///joins.db
Done.
%%sql
-- FULL OUTER JOIN without conditional expression
SELECT *
FROM student_contact
FULL OUTER JOIN staff_contact;
* sqlite:///joins.db
Done.
Let’s evaluate whether this is also true for the implicit join:
%%sql
-- IMPLICIT JOIN without conditional expression
SELECT *
FROM student_contact, staff_contact;
* sqlite:///joins.db
Done.
Lastly, let’s evaluate whether this is true for a NATURAL JOIN
:
%%sql
-- NATURAL JOIN has in-built equality conditional
SELECT *
FROM student_contact
NATURAL JOIN staff_contact;
* sqlite:///joins.db
Done.
student_id | email_address | staff_id |
---|
NATURAL JOIN
does not return a CROSS JOIN
because of its in-built equality conditional. It searches for the same column name(s) in the left and right tables and applies the equality (=
) conditional. For this example, email_address
is the similar column in both tables. However, there are no matching email addresses in the tables, so an empty table is returned.
5. How is the INNER JOIN executed?
Answer
The INNER JOIN
returns the matching values in both the right and left tables. A CROSS JOIN
is first executed, then the conditional expression specified is used to limit the rows returned. Let’s illustrate with the students
and student_contact
tables.
The cross join of these tables returns:
%%sql
-- CROSS JOIN of the students and student_contact
SELECT *
FROM students
CROSS JOIN student_contact;
* sqlite:///joins.db
Done.
Let’s assume we’re using the equality
conditional. Then rows matching in the student_id
columns from both tables are returned:
There is no matching value for Reece Bells on both tables. So, values from this row are not returned. This SQL query can be evaluated using an INNER JOIN
as:
%%sql
-- INNER JOIN of the students and student_contact
SELECT *
FROM students
INNER JOIN student_contact
ON students.student_id = student_contact.student_id;
* sqlite:///joins.db
Done.
6. How is the LEFT OUTER JOIN executed?
Answer
First LEFT OUTER JOIN
returns all the rows in the left table:
student_id | first_name | last_name | student_id | email_address |
---|---|---|---|---|
1 | Mary | Wilson | – | – |
2 | Tim | Ben | – | – |
3 | Alice | Robinson | – | – |
4 | Reece | Bells | – | – |
Next, matching values from CROSS JOIN with
the right tables are also returned:
Finally, unmatched rows from the left table are added to the result. Their values are padded with either NULL
or None
:
The SQL query for a LEFT OUTER JOIN
with equality
conditional is shown below:
%%sql
-- LEFT OUTER JOIN of the students and student_contact
SELECT *
FROM students
LEFT JOIN student_contact
ON students.student_id = student_contact.student_id;
* sqlite:///joins.db
Done.
7. How is the FULL OUTER JOIN executed?
Answer
The FULL OUTER JOIN
returns all rows from the left and the right tables, regardless of whether the conditional expression is matched. To show how this works, we’ll create two additional tables. The courses
table lists the courses presently being offered in school, and the last_enrolment
table lists the courses students last enrolled in.
%%sql
-- CREATE the courses table
CREATE TABLE courses (
course_id INTEGER PRIMARY KEY,
course_name TEXT NOT NULL
);
INSERT INTO courses VALUES (100, "Advanced Physics");
INSERT INTO courses VALUES (200, "Computer Science");
INSERT INTO courses VALUES (300, "Economics");
INSERT INTO courses VALUES (400, "Quantum Computing");
INSERT INTO courses VALUES (500, "Cryptography");
-- CREATE the last_enrolment table
CREATE TABLE last_enrolment (
student_id,
course_id,
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
INSERT INTO last_enrolment VALUES (2, 500);
INSERT INTO last_enrolment VALUES (1, 500);
INSERT INTO last_enrolment VALUES (3, 400);
INSERT INTO last_enrolment VALUES (2, 400);
INSERT INTO last_enrolment VALUES (4, 111);
* sqlite:///joins.db
Done.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Done.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
Let’s view these tables:
%sql SELECT * FROM courses;
* sqlite:///joins.db
Done.
course_id | course_name |
---|---|
100 | Advanced Physics |
200 | Computer Science |
300 | Economics |
400 | Quantum Computing |
500 | Cryptography |
%sql SELECT * FROM last_enrolment;
* sqlite:///joins.db
Done.
student_id | course_id |
---|---|
2 | 500 |
1 | 500 |
3 | 400 |
2 | 400 |
4 | 111 |
Let’s get back to how the FULL OUTER JOIN
works. First, the CROSS JOIN
is evaluated:
%%sql
SELECT *
FROM last_enrolment
CROSS JOIN courses;
* sqlite:///joins.db
Done.
student_id | course_id | course_id_1 | course_name |
---|---|---|---|
2 | 500 | 100 | Advanced Physics |
2 | 500 | 200 | Computer Science |
2 | 500 | 300 | Economics |
2 | 500 | 400 | Quantum Computing |
2 | 500 | 500 | Cryptography |
1 | 500 | 100 | Advanced Physics |
1 | 500 | 200 | Computer Science |
1 | 500 | 300 | Economics |
1 | 500 | 400 | Quantum Computing |
1 | 500 | 500 | Cryptography |
3 | 400 | 100 | Advanced Physics |
3 | 400 | 200 | Computer Science |
3 | 400 | 300 | Economics |
3 | 400 | 400 | Quantum Computing |
3 | 400 | 500 | Cryptography |
2 | 400 | 100 | Advanced Physics |
2 | 400 | 200 | Computer Science |
2 | 400 | 300 | Economics |
2 | 400 | 400 | Quantum Computing |
2 | 400 | 500 | Cryptography |
4 | 111 | 100 | Advanced Physics |
4 | 111 | 200 | Computer Science |
4 | 111 | 300 | Economics |
4 | 111 | 400 | Quantum Computing |
4 | 111 | 500 | Cryptography |
Next, the entire row of the left table, last_enrolment
, is returned:
student_id | course_id |
---|---|
2 | 500 |
1 | 500 |
3 | 400 |
2 | 400 |
4 | 111 |
Then the matching values in the right table, courses
, are returned from the CROSS JOIN
:
student_id | course_id | course_id | course_name |
---|---|---|---|
2 | 500 | 500 | Cryptography |
1 | 500 | 500 | Crypography |
3 | 400 | 400 | Quantum Computing |
2 | 400 | 400 | Quantum Computing |
4 | 111 | – | – |
Next, the non-matching rows in the right table are returned:
student_id | course_id | course_id | course_name |
---|---|---|---|
2 | 500 | 500 | Cryptography |
1 | 500 | 500 | Crypography |
3 | 400 | 400 | Quantum Computing |
2 | 400 | 400 | Quantum Computing |
4 | 111 | – | – |
– | – | 100 | Advanced Physics |
– | – | 200 | Computer Science |
– | – | 300 | Economics |
Finally, the empty cells are padded with NULL
or None
:
student_id | course_id | course_id | course_name |
---|---|---|---|
2 | 500 | 500 | Cryptography |
1 | 500 | 500 | Crypography |
3 | 400 | 400 | Quantum Computing |
2 | 400 | 400 | Quantum Computing |
4 | 111 | NULL | NULL |
NULL | NULL | 100 | Advanced Physics |
NULL | NULL | 200 | Computer Science |
NULL | NULL | 300 | Economics |
Let’s confirm that our result is correct by running a FULL OUTER JOIN
query:
%%sql
SELECT *
FROM last_enrolment
FULL OUTER JOIN courses
ON last_enrolment.course_id = courses.course_id;
* sqlite:///joins.db
Done.
student_id | course_id | course_id_1 | course_name |
---|---|---|---|
2 | 500 | 500 | Cryptography |
1 | 500 | 500 | Cryptography |
3 | 400 | 400 | Quantum Computing |
2 | 400 | 400 | Quantum Computing |
4 | 111 | None | None |
None | None | 100 | Advanced Physics |
None | None | 200 | Computer Science |
None | None | 300 | Economics |
8. How do you eliminate redundant columns from SQL JOINs?
Answer
The first way to eliminate redundant columns when running JOIN queries is to use the USING
clause as our conditional expression:
SELECT *
FROM left_table
[JOIN CLAUSE] right_table
USING [col_name, ..];
Let’s first run an INNER JOIN
query:
%%sql
-- INNER JOIN of the students and student_contact
SELECT *
FROM students
INNER JOIN student_contact
ON students.student_id = student_contact.student_id;
* sqlite:///joins.db
Done.
There are duplicate student_id
columns. Let’s eliminate this redundancy with the USING
clause:
%%sql
-- INNER JOIN of the students and student_contact with USING
SELECT *
FROM students
INNER JOIN student_contact
USING (student_id);
* sqlite:///joins.db
Done.
You can see that the duplicate column has been removed.
The second way to eliminate the redundant column is to use the NATURAL JOIN
. The NATURAL JOIN
clause is semantically similar to the INNER JOIN .. USING
clause, in which the conditional expression is equality.
Let try NATURAL JOIN
with the previous example:
%%sql
-- NATURAL JOIN instead of INNER JOIN ... USING
SELECT *
FROM students
NATURAL JOIN student_contact;
* sqlite:///joins.db
Done.
A third way to eliminate duplicate columns is to list the column names you want displayed in SELECT
:
%%sql
-- INNER JOIN list column names in SELECT and using ALIASES
SELECT
s.student_id,
s.first_name,
s.last_name,
sc.email_address
FROM students AS s
INNER JOIN student_contact AS sc
ON s.student_id = sc.student_id;
* sqlite:///joins.db
Done.
So far, we’ve seen how the SQL JOIN clauses are used to combine two tables. Some of the SQL JOIN interview questions that we’ll see next will require combining and filtering data from more than two tables. Writing SQL JOIN queries for these types of questions may be challenging. Take the Combining Tables in SQL Course to reinforce your knowledge of how SQL JOINS work and prepare yourself for the next questions.
9. How is a UNION clause different from a JOIN clause?
Answer
The JOIN and UNION clauses are used to combine data from two or more tables. With the JOIN clause, the columns and matching rows from both tables are returned. The number of columns in the combined table is the sum of the number of columns on both tables.
UNION combines data from tables by stacking them vertically. The number of columns in the SELECT
statement and the order of their data types must be the same. The staff
and students
tables can be combined with UNION
as follows:
%%sql
/*
* Both tables have columns with the same data types in the same order
* We can use the * wild card instead of specifying columns in SELECT
*/
SELECT * FROM staff
UNION
SELECT * FROM students;
* sqlite:///joins.db
Done.
staff_id | first_name | last_name |
---|---|---|
1 | Ada | Lovelace |
1 | Mary | Wilson |
2 | Adam | Smith |
2 | Tim | Ben |
3 | Alice | Robinson |
3 | Nikolo | Tesla |
4 | Reece | Bells |
%%sql
/*
* We specify the columns we want returned in SELECT
*/
SELECT
first_name,
last_name
FROM staff
UNION
SELECT
first_name,
last_name
FROM students;
* sqlite:///joins.db
Done.
first_name | last_name |
---|---|
Ada | Lovelace |
Adam | Smith |
Alice | Robinson |
Mary | Wilson |
Nikolo | Tesla |
Reece | Bells |
Tim | Ben |
10. What do you understand by Non-EQUI JOIN?
Answer
Non-EQUI JOINs combine tables using other conditional operators besides equality. The other operators may include: less than (<
), less than or equals (<=
), greater than (>
), greater than or equals (>=
), not equals (<>
), and BETWEEN
. For example:
%%sql
SELECT *
FROM last_enrolment
FULL OUTER JOIN courses
ON last_enrolment.course_id > 300
AND courses.course_id < 300
WHERE courses.course_id IS NOT NULL
AND last_enrolment.course_id IS NOT NULL;
* sqlite:///joins.db
Done.
student_id | course_id | course_id_1 | course_name |
---|---|---|---|
2 | 500 | 100 | Advanced Physics |
2 | 500 | 200 | Computer Science |
1 | 500 | 100 | Advanced Physics |
1 | 500 | 200 | Computer Science |
3 | 400 | 100 | Advanced Physics |
3 | 400 | 200 | Computer Science |
2 | 400 | 100 | Advanced Physics |
2 | 400 | 200 | Computer Science |
11. How will you join a table to itself?
Answer
You can join a table to itself if it contains related columns. Let’s assume that Ada Lovelace is the supervisor to the other staff. We’ll add a supervisor column with Ada Lovelace’s staff_id
. The supervisor
and staff_id
columns are related.
%sql SELECT * FROM staff;
* sqlite:///joins.db
Done.
staff_id | first_name | last_name |
---|---|---|
1 | Ada | Lovelace |
2 | Adam | Smith |
3 | Nikolo | Tesla |
%%sql
-- Add supervisor column
ALTER TABLE staff
ADD supervisor INTEGER;
SELECT * FROM staff;
* sqlite:///joins.db
Done.
Done.
staff_id | first_name | last_name | supervisor |
---|---|---|---|
1 | Ada | Lovelace | None |
2 | Adam | Smith | None |
3 | Nikolo | Tesla | None |
%%sql
-- Update supervisor column with Ada Lovelace staff_id
UPDATE staff
SET supervisor = 1
WHERE staff_id <> 1;
SELECT * FROM staff;
* sqlite:///joins.db
Done.
Done.
staff_id | first_name | last_name | supervisor |
---|---|---|---|
1 | Ada | Lovelace | None |
2 | Adam | Smith | 1 |
3 | Nikolo | Tesla | 1 |
Let’s perform self join on this table to list the staff and their supervisor:
%%sql
SELECT *
FROM staff s1
LEFT JOIN staff s2
ON s1.supervisor = s2.staff_id
* sqlite:///joins.db
Done.
staff_id | first_name | last_name | supervisor | staff_id_1 | first_name_1 | last_name_1 | supervisor_1 |
---|---|---|---|---|---|---|---|
1 | Ada | Lovelace | None | None | None | None | None |
2 | Adam | Smith | 1 | 1 | Ada | Lovelace | None |
3 | Nikolo | Tesla | 1 | 1 | Ada | Lovelace | None |
You can observe that the left table, s1
, contains the information for the staff and the right table; s2
contains the information about their supervisors. Let’s make things clearer by specifying column names in SELECT
:
%%sql
SELECT
s1.staff_id AS staff_id,
s1.first_name AS first_name,
s1.last_name AS last_name,
s2.first_name AS supervisor_first_name,
s2.last_name AS supervisor_last_name
FROM staff s1
LEFT JOIN staff s2
ON s1.supervisor = s2.staff_id
* sqlite:///joins.db
Done.
staff_id | first_name | last_name | supervisor_first_name | supervisor_last_name |
---|---|---|---|---|
1 | Ada | Lovelace | None | None |
2 | Adam | Smith | Ada | Lovelace |
3 | Nikolo | Tesla | Ada | Lovelace |
Let’s drop the supervisor column we added to the staff
table:
%%sql
ALTER TABLE staff
DROP COLUMN supervisor;
SELECT * FROM staff;
* sqlite:///joins.db
Done.
Done.
staff_id | first_name | last_name |
---|---|---|
1 | Ada | Lovelace |
2 | Adam | Smith |
3 | Nikolo | Tesla |
12. What do you understand about Nested JOINs?
Answer
Nested JOIN simply means combining more than two tables. Let’s use nested join to create a table containing students’ full information: id, names, contact, and courses last enrolled.
%%sql
SELECT *
FROM students
LEFT JOIN student_contact
USING (student_id)
LEFT JOIN last_enrolment
USING (student_id)
LEFT JOIN courses
USING (course_id);
* sqlite:///joins.db
Done.
At the start, the students
and student_contact
tables are the left and right tables respectively. When these two tables are joined, the resulting table is the left table for the next join operation with the last_enrolment
table, which is the right table. The resulting table from this join operation is the left table when joining with the courses
table.
13. How will you use a subquery to execute INNER JOINs for more than two tables?
Answer
Both subquery and join can be used to combine data from two or more tables. Subqueries can be used to perform inner join operations if the equality conditional operator is used with the WHERE
clause. This is an example of implict join:
%%sql
SELECT *
FROM (
-- Third join operation
SELECT *
FROM
-- Second join operation
(SELECT *
FROM
(
-- First join operation
SELECT *
FROM students, student_contact
WHERE students.student_id = student_contact.student_id
-- End of first join operation
) AS sub_query_one, last_enrolment
WHERE sub_query_one.student_id = last_enrolment.student_id
-- End of second join operation
) AS sub_query_two, courses
WHERE sub_query_two.course_id = courses.course_id
-- End of third join operation
)
ORDER BY 1;
* sqlite:///joins.db
Done.
There are duplicate columns in the result. These can be removed by listing the names of the columns in the outermost SELECT
statement.
If you’re having a hard time understanding how subqueries work, our SQL Subqueries course will help you get started. If you’re new to SQL and you’re having difficulties understanding the SQL queries, we recommend that you learn the basics from our SQL Fundamentals Skill Path.
14. Write the SQL queries to execute the operations shown in the diagrams
Answer
Diagram 1 set notation is $A – B$, read as A difference B. We want all the elements that belong to A but not B.
Let’s assume A is students
table and B is student_contact
. We want to return only records that are in students
but not in student_contact
.
Let’s view the tables:
%sql SELECT * FROM students;
* sqlite:///joins.db
Done.
student_id | first_name | last_name |
---|---|---|
1 | Mary | Wilson |
2 | Tim | Ben |
3 | Alice | Robinson |
4 | Reece | Bells |
%sql SELECT * FROM student_contact;
* sqlite:///joins.db
Done.
Only Reece Bells
satisfies the condition in diagram 1. The query is given below:
%%sql
SELECT *
FROM students
LEFT JOIN student_contact
USING (student_id)
WHERE student_contact.student_id IS NULL;
* sqlite:///joins.db
Done.
student_id | first_name | last_name | email_address |
---|---|---|---|
4 | Reece | Bells | None |
LEFT JOIN
returns all the rows in students
table, and student_contact.student_id IS NULL
sets the intersecting rows to NULL.
Diagram 2 set notation is $(A \cap B)^c$. This is the elements of both A and B that are not common to them. We’ll use the last_enrolment
and courses
tables.
Let’s view the tables:
%sql SELECT * from last_enrolment;
* sqlite:///joins.db
Done.
student_id | course_id |
---|---|
2 | 500 |
1 | 500 |
3 | 400 |
2 | 400 |
4 | 111 |
%sql SELECT * FROM courses;
* sqlite:///joins.db
Done.
course_id | course_name |
---|---|
100 | Advanced Physics |
200 | Computer Science |
300 | Economics |
400 | Quantum Computing |
500 | Cryptography |
The common keys are 400 and 500. These will be excluded. The returned table contains 111, 100, 200, and 300. The query is shown below:
%%sql
SELECT *
FROM last_enrolment
FULL OUTER JOIN courses
USING(course_id)
WHERE last_enrolment.course_id IS NULL
OR courses.course_id is NULL;
* sqlite:///joins.db
Done.
student_id | course_id | course_name |
---|---|---|
4 | 111 | None |
None | 100 | Advanced Physics |
None | 200 | Computer Science |
None | 300 | Economics |
15. Discuss a common pitfall with using JOIN
Answer
There are different types of JOINs. We may need to implement a particular type of JOIN to answer our question correctly. As you’re already aware, if you don’t specify the conditional expression, your SQL query will not fail. It will return a CROSS JOIN
. If you go on to perform your analysis on the CROSS JOIN
, you’ll get an incorrect result. You’ll also get incorrect results when you don’t join your tables with the appropriate conditional expressions.
You’ll also get incorrect results if you incorrectly filter your JOIN with the WHERE
clause. Incorrectly filtering can result in an unintended type of JOIN. For example, a LEFT JOIN
can be transformed to an INNER JOIN
with an incorrect WHERE
clause.
Let’s see the LEFT JOIN
result:
%%sql
-- LEFT OUTER JOIN of the students and student_contact
SELECT *
FROM students
LEFT JOIN student_contact
ON students.student_id = student_contact.student_id;
* sqlite:///joins.db
Done.
Now let’s incorrectly filter the LEFT JOIN
to transform it into an INNER JOIN
:
%%sql
-- LEFT OUTER JOIN transformed to INNER JOIN by incorrect filtering
SELECT *
FROM students
LEFT JOIN student_contact
ON students.student_id = student_contact.student_id
WHERE student_contact.student_id > 0;
* sqlite:///joins.db
Done.
Logical errors are introduced into our program this way. Logical errors do not return error messages, making them difficult to detect — especially when we’re working with large tables.
Close the connection to the joins.db
database:
%sql -x / --close sqlite:///joins.db
16. How will you structure data for facilitating JOIN in a one-to-many relationship condition?
Answer
In a one-to-many relationship, one record in table A can be associated to more than one record in table B. So the primary key of A is a foreign key in B. The students
and last_enrolment
tables share a one-to-many relationship, as a student can be enrolled in more than one course.
-- create a students table:
CREATE TABLE students (
student_id INTEGER PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL
)
--create last_enrolment table
CREATE TABLE last_enrolment (
student_id,
course_id,
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
In the above code snippet, the student_id
PRIMARY KEY in the students
table is set as FOREIGN KEY in the last_enrolment
table.
17. How will you structure data for facilitating JOIN in a many-to-many relationship condition?
Answer
In a many-to-many relationship, multiple records in table A are associated with multiple records in table B. There is a many-to-many relationship between the students
and courses
tables. A student can take more than one course, and there can be more than one student in a course.
The many-to-many relationship in our example is broken into two one-to-many relationships. The first one-to-many relationship is the relationship between students
and last_enrolment
, and the second one-to-many relationship is the relationship between courses
and last_enrolment
. The last_enrolment
table connects the students
and courses
tables and is referred to as a join table.
-- create a students table:
CREATE TABLE students (
student_id INTEGER PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL
)
-- CREATE the courses table
CREATE TABLE courses (
course_id INTEGER PRIMARY KEY,
course_name TEXT NOT NULL
)
--create last_enrolment table
CREATE TABLE last_enrolment (
student_id,
course_id,
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
18. Write a query to return information on people that love Jazz music
Write a query to return the email
and full_names
of people that have bought Jazz music. You’ll be working with the chinook.db
. You can find the schema and download the file from here.
Answer
First, we connect to the database and list the tables in it:
%sql sqlite:///chinook.db --connect to the database
%%sql
-- List the tables in the database
SELECT name
FROM sqlite_master
WHERE type = 'table';
* sqlite:///chinook.db
Done.
name |
---|
album |
artist |
customer |
employee |
genre |
invoice |
invoice_line |
media_type |
playlist |
playlist_track |
track |
We’ll return the email
, first_name
, and last_name
columns on the customer
table for the subset where the genre
is Jazz. We want to connect the customer
table to the genre
table. The schema will be helpful here. In the schema, we’ll need to join the customer
, invoice
, invoice_line
, track
, and genre
tables to get the information we want.
%%sql
SELECT
DISTINCT customer.email,
customer.first_name || " " || customer.last_name AS full_name
FROM customer
INNER JOIN invoice
ON customer.customer_id = invoice.customer_id
INNER JOIN invoice_line
ON invoice_line.invoice_id = invoice.invoice_id
INNER JOIN track
ON track.track_id = invoice_line.track_id
INNER JOIN genre
ON genre.genre_id = track.genre_id
WHERE genre.name = 'Jazz'
ORDER BY 1;
* sqlite:///chinook.db
Done.
It’s possible for a customer to buy several Jazz music items. So the customer details will be returned as many times as they bought Jazz music. We’ve used the DISTINCT
keyword to return only one detail for this type of customer.
19. Write a query to return the top 5 spenders on Jazz and Rock
Answer
We’ll continue to work with the chinook.db
. We need to join the customer
and genre
tables and filter by the type of genre. We’ll be returning the customer full_name
, and the amount_spent
. To get the amount spent, we’ll look at the invoice_line
table for the unit_price
and quantity
purchased.
A single customer can buy music from a genre more than once. Their information will apply multiple times. You need to GROUP BY
the customer full_name
and SUM
the values from the product of unit_price
and quantity
to get the total amount spent on that genre:
%%sql
-- Top 5 Spenders on Jazz
SELECT
customer.first_name || " " || customer.last_name AS full_name,
ROUND(SUM(invoice_line.unit_price * invoice_line.quantity), 2) AS amount_spent
FROM customer
INNER JOIN invoice
ON customer.customer_id = invoice.customer_id
INNER JOIN invoice_line
ON invoice_line.invoice_id = invoice.invoice_id
INNER JOIN track
ON track.track_id = invoice_line.track_id
INNER JOIN genre
ON genre.genre_id = track.genre_id
WHERE genre.name = 'Jazz'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
* sqlite:///chinook.db
Done.
full_name | amount_spent |
---|---|
Fernanda Ramos | 15.84 |
Enrique Muñoz | 15.84 |
Hannah Schneider | 14.85 |
Astrid Gruber | 14.85 |
Kara Nielsen | 3.96 |
%%sql
-- Top 5 Spenders on Rock
SELECT
customer.first_name || " " || customer.last_name AS full_name,
ROUND(SUM(invoice_line.unit_price * invoice_line.quantity), 2) AS amount_spent
FROM customer
INNER JOIN invoice
ON customer.customer_id = invoice.customer_id
INNER JOIN invoice_line
ON invoice_line.invoice_id = invoice.invoice_id
INNER JOIN track
ON track.track_id = invoice_line.track_id
INNER JOIN genre
ON genre.genre_id = track.genre_id
WHERE genre.name = 'Rock'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
* sqlite:///chinook.db
Done.
full_name | amount_spent |
---|---|
Helena Holý | 75.24 |
François Tremblay | 74.25 |
Luís Gonçalves | 71.28 |
Hugh O’Reilly | 71.28 |
João Fernandes | 67.32 |
You can easily get the amount_spent
values wrong if you use the total
column in the invoice
table. This is because a customer can buy songs from more than one genre on the same invoice, and the total
column will contain the full amount spent, not the amount spent on a particular genre.
This is demonstrated in the query below. Ellie Sullivan bought 10 songs from different genres on 2017-04-16. The total amount spent on all the songs is 9.9, which is different from the amount spent on a particular genre.
%%sql
-- Common mistake made using invoice total to calculate amount spend on a genre
SELECT
customer.first_name || " " || customer.last_name AS full_name,
genre.name,
invoice_line.unit_price,
invoice_line.quantity,
invoice.total
FROM customer
INNER JOIN invoice
ON customer.customer_id = invoice.customer_id
INNER JOIN invoice_line
ON invoice_line.invoice_id = invoice.invoice_id
INNER JOIN track
ON track.track_id = invoice_line.track_id
INNER JOIN genre
ON genre.genre_id = track.genre_id
WHERE customer.first_name = 'Ellie' AND customer.last_name = 'Sullivan' AND invoice.invoice_date = '2017-04-16 00:00:00';
* sqlite:///chinook.db
Done.
full_name | name | unit_price | quantity | total |
---|---|---|---|---|
Ellie Sullivan | Metal | 0.99 | 1 | 9.9 |
Ellie Sullivan | Rock | 0.99 | 1 | 9.9 |
Ellie Sullivan | Rock | 0.99 | 1 | 9.9 |
Ellie Sullivan | Rock | 0.99 | 1 | 9.9 |
Ellie Sullivan | Rock | 0.99 | 1 | 9.9 |
Ellie Sullivan | Rock | 0.99 | 1 | 9.9 |
Ellie Sullivan | Rock | 0.99 | 1 | 9.9 |
Ellie Sullivan | Rock | 0.99 | 1 | 9.9 |
Ellie Sullivan | Metal | 0.99 | 1 | 9.9 |
Ellie Sullivan | Alternative & Punk | 0.99 | 1 | 9.9 |
20. Simplify the query to return the top 5 spenders on Jazz with CTE
CTE stands for Common Table Expression. It enables us to simplify complex queries by deconstructing them into simple blocks. We initiate a CTE with the WITH
clause. The SQL query that answers the previous question is a bit complex. We can make the query more readable by performing most of the join operations with CTE.
%%sql
-- Start of JOIN operations with CTE
WITH cte_table AS (
SELECT
invoice.customer_id,
invoice_line.unit_price,
invoice_line.quantity
FROM invoice
INNER JOIN invoice_line
ON invoice_line.invoice_id = invoice.invoice_id
INNER JOIN track
ON track.track_id = invoice_line.track_id
INNER JOIN genre
ON genre.genre_id = track.genre_id
WHERE genre.name = 'Jazz'
)
-- End of JOIN operations with CTE
-- Simplified query to return the top 5 spenders on Jazz
SELECT
customer.first_name || " " || customer.last_name AS full_name,
ROUND(SUM(cte_table.unit_price * cte_table.quantity), 2) AS amount_spent
FROM customer
INNER JOIN cte_table
ON customer.customer_id = cte_table.customer_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;
* sqlite:///chinook.db
Done.
full_name | amount_spent |
---|---|
Fernanda Ramos | 15.84 |
Enrique Muñoz | 15.84 |
Hannah Schneider | 14.85 |
Astrid Gruber | 14.85 |
Kara Nielsen | 3.96 |
Close connection to chinook.db
:
%sql -x / --close sqlite:///chinook.db
Takeaway
In this tutorial, we discussed the top 20 SQL JOINs interview questions and provided a step-by-step guide to their answers. If you aspire to become a business analyst, data analyst, or data scientist, knowledge of SQL is one of the leading requirements for your role. No two ways about it: you must learn SQL programming.
As you work with relational databases, one of the most common SQL operations you’ll perform is joining tables. This is why SQL JOIN questions are so popular with interviewers. At your SQL interviews, you’ll get several questions on SQL JOINs to test your knowledge of how they work. We’ve provided answers to some of the most common questions you may be asked.
If you haven’t learned the basics of SQL yet, this tutorial may not be helpful. Learn how to explore, query, join, and filter data from relational databases in our SQL Fundamentals Skill Path. You’ll master the SQL syntaxes and basic commands that you’ll need to start working with relational databases. On our interactive learning platform, you get to practice writing your own SQL queries.
We understand that porfolio projects are very important, and they help showcase your SQL programming and data analysis skills to potential recruiters. You get to complete real-life SQL projects on the SQL Fundamentals Skill Path. We’ve also curated 10 exiciting SQL project ideas to help you get more practice and projects in your SQL portfolio.