SQL – Self Join Explained

This tutorial will explain why a self join is used in SQL and how to create a self join query. Knowledge of how to write inner, left, and right joins is needed before continuing with this tutorial. This tutorial uses MySQL as the database management system in the examples. However, many other DBMS’s such as SQL Server and Oracle use the same or similar syntax for self joining a table.

Hierarchical Data

In SQL, a self join is used to join a table to itself to query hierarchical data. Hierarchical data is used when the data refers to itself by having rows with a foreign key that references a key in the same table.

A common example of when hierarchical data is used is a company’s employee organizational structure. It is common to have a table for employees. Some of the employees are managers. An employee’s data will include the manager he or she reports to. The data for this manager will also be in the same table in the database. Most managers will also have a manager.

We can use the following employees table to illustrate a company organizational structure. This table has a foreign key of manager_id that references the primary key id of the same employees table.

From this table we can find an employee’s manager by checking the manager_id column and finding the row with a matching value in the id column. We can see that John Smith is the manager for Jane Smith, Brandon Russell, and Doug Collins. Jane Smith is the manager for Bob Malcom and Helen Jensen. Brandon Russell is the manager for Jordan Parker.

Creating Self Join Queries

A self join query can be used to get all employees and the manager that each employee reports to. The self join query needs to have an alias of the employees table when specifying the table to use in the FROM clause. In this part we can use e as the alias. Then a second alias of the employees table is used when the table is joined to itself by using the manager_id column to reference the id column. The letter m can be used for this alias to show this is the manager. A left join is used because we want to get all employees including any employee that does not have a manager.  This is done with the below query.

SELECT * 
FROM employees  AS e
LEFT JOIN employees AS m ON e.manager_id = m.id;

This query gives the below results. In each row, the first four columns are details about the employee from the employees table. The next four columns are details about the employee’s manager from the same employees table.

If we had tried to join the employees table to itself without using different aliases we would get an error when executing the query. The following incorrect query causes a “Not unique table/alias:” error.

Below is an incorrect query that does not use any aliases.

SELECT * 
FROM employees 
LEFT JOIN employees ON employees.manager_id = employees.id;

Result: ERROR 1066 (42000): Not unique table/alias: ’employees’

If we wanted to get all employees except any employees that do not report to a manager then an INNER JOIN would be used like the below query.

SELECT * 
FROM employees AS e
INNER JOIN employees AS m ON e.manager_id = m.id;

This query gives the following results:


References:
https://en.wikipedia.org/wiki/Hierarchical_database_model
https://www.mysqltutorial.org/mysql-self-join/
https://www.w3schools.com/sql/sql_join_self.asp

Leave a Comment

Your email address will not be published. Required fields are marked *