JOIN, UNION, INTERSECT and EXCEPT in SQL

Joining tables in a SELECT query is done using the JOIN statement.

It is also possible to join without a JOIN statement using a WHERE clause using join columns, but this syntax is considered implicit and deprecated.

There are the following types of joins, each of which has its own form of the JOIN operator:

  • CROSS JOIN – cross or cartesian join
  • [INNER] JOIN – natural or inner join
  • LEFT [OUTER] JOIN – left outer join
  • RIGHT [OUTER] JOIN – right outer join
  • FULL [OUTER] JOIN – full outer join

There is also a theta join, a self join, and a semi join.

Natural join

A natural join is an inner join or an equijoin.

Here the FROM clause defines the tables to be joined and explicitly specifies the join type – INNER JOIN. The ON clause is part of the FROM clause and specifies the columns to join. The expression employee.dept_no = department.dept_no defines a join condition.

Equivalent query using implicit syntax:

Joined columns must have identical semantics, i.e. both columns must have the same boolean value. Joined columns do not have to have the same name (or even the same data type), although they often do.

Only rows that have the same value in the joined columns are joined. Rows that do not have such identical values ​​will not be included in the result set at all.

You can join up to 64 tables in a SELECT statement (MS SQL limitation), while one JOIN statement joins only two tables:

Cartesian product (cross join)

The Cartesian product (cross join) connects each row of the first table with each row of the second. The Cartesian product of the first table with n rows and the second table with m rows will result in a table with n × m rows.

Outer join

An outer join, unlike an inner join, allows you to extract not only rows with the same values ​​of the joined columns, but also rows without matches from one or both tables.

There are three types of outer joins:

  • left outer join – all rows from the table on the left side of the comparison operator (regardless of whether there are matching rows on the right side) get into the result set, and only rows with matching column values ​​from the table on the right side. In this case, if for a row from the left table there are no matches in the right table, the values ​​of the row in the right table will be assigned NULL

  • right outer join – similar to left outer join, but the tables are swapped
  • full outer join – a composition of a left and a right outer join: the result set consists of all rows from both tables. If a row in one of the tables does not have a corresponding row in another table, all cells in the row in the second table are set to NULL.

Theta join

The join column comparison condition does not have to be an equality, but can be any other comparison. A join that uses a common join column comparison condition is called a theta join:

Self join

A self-join is a natural join of a table to itself. In this case, one column of the table is compared with itself. Comparing a column to itself means that the table name appears twice in the FROM clause of the SELECT statement. Therefore, it is necessary to be able to refer to the same table name twice. This can be done using at least one alias. The same applies to column names in a join condition in a SELECT statement. Qualified names must be used to distinguish between columns with the same name.

Semi join

A semi join is similar to a natural join, but returns only the set of all rows from one table for which there is one or more matches in another table.

UNION operator

The UNION operator combines the results of two or more queries into a single result set that includes all rows belonging to all queries in the union:

The select_1, select_2, … options are SELECT statements that create a union. If the ALL option is used, all rows are displayed, including duplicates. By default, duplicates are removed.

Only compatible tables can be joined using the UNION statement. By compatible tables, we mean that both lists of columns in the selection must contain the same number of columns, and that the corresponding columns must have compatible data types. The result of a join can only be ordered by using the ORDER BY clause in the last SELECT statement. GROUP BY and HAVING clauses can be used with individual SELECT statements, but not within the join itself.

INTERSECT and EXCEPT operators

Two other set operators:

  • INTERSECT – intersection – a set of rows that belong to both tables
  • EXCEPT – difference between two tables – all values ​​that belong to the first table and are not present in the second