Grouping in SQL: GROUP BY clause, HAVING clause and aggregate functions

The GROUP BY clause (SELECT statement) allows you to group data (rows) by the value of a column or multiple columns or expressions. The result will be a set of summary rows.

Each column in the select list must be present in the GROUP BY clause, except for constants and columns that are operands of aggregate functions.

A table can be grouped by any combination of its columns.

Aggregate functions are used to get a single total value from a group of rows. All aggregate functions perform calculations on a single argument, which can be either a column or an expression. The result of any aggregate function is a constant value displayed in a separate result column.

Aggregate functions are specified in the column list of the SELECT statement, which can also contain a GROUP BY clause. If there is no GROUP BY clause in the SELECT statement, and the select column list contains at least one aggregate function, then it must not contain simple columns. On the other hand, a column select list may contain column names that are not arguments to an aggregate function if those columns are arguments to the GROUP BY clause.

If the query contains a WHERE clause, then the aggregate functions calculate a value for the results of the selection.

The MIN and MAX aggregate functions calculate the smallest and largest value of a column, respectively. Arguments can be numbers, strings, and dates. All NULL values are removed before calculation (i.e. they are not taken into account).

The SUM aggregate function calculates the total sum of the values in a column. Arguments can only be numbers. Using the DISTINCT option eliminates all duplicate values in the column before applying the SUM function. Similarly, all NULL values are removed before applying this aggregate function.

The AVG aggregate function returns the average of all values in a column. Arguments can also only be numbers, and all NULL values are removed before evaluation.

The COUNT aggregate function has two different forms:

COUNT([DISTINCT] col_name) – counts the number of values in the col_name column, NULL values are ignored
COUNT(*) – counts the number of rows in the table, NULL values are also taken into account

If you use the DISTINCT keyword in a query, any duplicate column values are removed before the COUNT function is applied.

The COUNT_BIG function is similar to the COUNT function. The only difference between them is the type of result they return: the COUNT_BIG function always returns BIGINT values, while the COUNT function returns INTEGER data values.

The HAVING clause defines a condition that applies to a group of rows. It has the same meaning for groups of rows as the WHERE clause for the contents of the corresponding table (WHERE is applied before grouping, HAVING after):

HAVING condition

1	HAVING condition

The condition parameter contains aggregate functions or constants.