SQL GROUP BY and Aggregate Functions: Summarize Your Data

Published: 2026-04-08

GROUP BY collapses multiple rows that share the same value in one or more columns into a single summary row. Combined with aggregate functions like COUNT, SUM, AVG, MIN, and MAX, it is the foundation of almost every reporting and analytics query. GROUP BY is supported in all SQL databases including SQLite, PostgreSQL, MySQL, SQL Server, and Oracle.

Sample Data

We use two tables in all examples:

employees table:

id name department hire_date salary
1 Alice Engineering 2023-01-15 95000
2 Bob Engineering 2023-06-01 85000
3 Carol Engineering 2024-03-10 85000
4 Dave Sales 2023-02-20 72000
5 Eve Sales 2023-09-05 68000
6 Frank Sales 2024-01-15 65000
7 Grace Marketing 2023-04-01 82000
8 Henry Marketing 2024-06-01 70000

orders table:

id customer product category quantity unit_price order_date status
1 Acme Corp Laptop Pro Electronics 3 1299.99 2026-01-05 delivered
2 Acme Corp Wireless Mouse Electronics 10 29.99 2026-01-12 delivered
3 Globex Office Chair Furniture 5 249.99 2026-01-18 delivered
4 Globex Standing Desk Furniture 2 599.99 2026-01-20 shipped
5 Initech Notebook Pack Stationery 20 12.99 2026-02-03 delivered
6 Initech Pen Set Stationery 15 8.99 2026-02-07 delivered
7 Acme Corp Monitor 27" Electronics 4 399.99 2026-02-14 delivered
8 Umbrella Ltd Laptop Pro Electronics 2 1299.99 2026-02-20 cancelled
9 Umbrella Ltd Desk Lamp Furniture 8 39.99 2026-03-01 delivered
10 Globex Wireless Mouse Electronics 12 29.99 2026-03-05 shipped
11 Initech Monitor 27" Electronics 1 399.99 2026-03-10 delivered
12 Acme Corp Standing Desk Furniture 1 599.99 2026-03-15 processing
Explore dataset in DbGate Lite

GROUP BY Syntax

SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1
HAVING aggregate_condition
ORDER BY column1;
  • GROUP BY — one or more columns whose distinct value combinations define each group
  • Aggregate functions — compute a single value from all rows in the group
  • HAVING — filters groups (like WHERE, but for grouped results)
  • Every column in SELECT that is not inside an aggregate must appear in GROUP BY

SQL Execution Order

SQL clauses are written in one order but executed in a different order. Understanding this helps explain why HAVING can reference aggregate functions but WHERE cannot, and why you cannot use a SELECT alias in a HAVING clause.

FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
Step Clause What happens
1 FROM Load the table(s) and evaluate any JOINs
2 WHERE Filter individual rows — aggregate functions are not allowed here
3 GROUP BY Collapse remaining rows into groups
4 HAVING Filter groups — aggregate functions are allowed here
5 SELECT Compute output columns and aliases
6 ORDER BY Sort the final result set

This is why WHERE status = 'delivered' runs before grouping (fast, row-level filter), while HAVING COUNT(*) > 2 runs after grouping (filters whole groups).

COUNT()

COUNT(*) counts all rows in a group. COUNT(column) counts non-NULL values in that column.

Example: Number of Employees per Department

Example: Number of Orders per Customer

Example: Count Delivered vs. Other Statuses

Count only orders matching a condition using COUNT with a CASE expression:

SUM()

SUM(column) adds up all values in the group for a numeric column.

Example: Total Salary per Department

Example: Total Revenue per Category

Compute revenue as quantity * unit_price:

Example: Revenue by Customer and Category

Group by multiple columns to break revenue down further:

AVG()

AVG(column) returns the arithmetic mean of all non-NULL values in the group.

Example: Average Salary per Department

Example: Average Order Value per Customer

MIN() and MAX()

MIN(column) and MAX(column) return the smallest and largest values in the group.

Example: Salary Range per Department

Example: First and Last Order Date per Customer

Combining Multiple Aggregates

You can combine any number of aggregate functions in a single query.

Example: Full Department Summary

Example: Order Summary per Customer

HAVING

HAVING filters groups after aggregation — it is the equivalent of WHERE for grouped results.

Example: Departments with More Than Two Employees

Example: Customers with Total Revenue Above 2000

Example: Categories with Average Order Value Below 500

WHERE vs. HAVING

WHERE filters individual rows before grouping. HAVING filters groups after aggregation.

Example: Revenue from Delivered Orders Only per Customer

Use WHERE to exclude rows before grouping, then HAVING to filter results:

GROUP BY with JOINs

You can group data that spans multiple tables by first joining them.

Example: Revenue per Department via Employee Join

Example: Order Count and Revenue per Category and Month

GROUP BY with Expressions

You can group by computed expressions, not just plain columns.

Example: Employees Hired per Year

Example: Orders by Month

COUNT(DISTINCT)

COUNT(DISTINCT column) counts the number of unique values in the group.

Example: Unique Products Ordered per Customer

Filtering Groups with Multiple Conditions

HAVING supports any boolean expression, including AND/OR combinations.

Example: High-Value Active Customers

Find customers with more than one order and total revenue above 1000:

Practical Example: Sales Report

Combine everything into a full sales summary report.

Example: Category Performance Report

Example: Top Product by Revenue

GROUP BY vs. Window Functions

GROUP BY collapses rows into one row per group. Window functions keep all rows and add a computed column — see the SQL Window Functions tutorial for details.

Example: GROUP BY Returns One Row per Department

Example: Window Function Keeps All Rows

Common Use Cases for GROUP BY

  • Reporting dashboards — Summarize totals, averages, and counts by category or time period
  • Headcount analysis — Count employees, users, or records by group
  • Revenue breakdown — Total and average revenue by product, region, or customer
  • Trend analysis — Group by month or year to spot patterns over time
  • Top-N per group — Combine with subqueries or CTEs to find the best performer in each group
  • Data validation — Count duplicates or NULL values across groups
  • Cohort analysis — Group users by signup date or behavior segment

Performance Tips

  1. Index GROUP BY columns — Indexes on grouped columns speed up the sort and grouping step
  2. Filter early with WHERE — Reducing rows before grouping with WHERE is faster than filtering after with HAVING
  3. Avoid SELECT * — Only select columns you actually need; aggregate operations benefit from narrower data
  4. Use covering indexes — An index that includes both the GROUP BY column and the aggregated column can avoid a full table scan
  5. Consider materialized views — For frequently computed aggregates on large tables, some databases support pre-computed summaries

Summary

  • GROUP BY collapses rows that share the same column values into one summary row per unique combination
  • COUNT(*) counts all rows; COUNT(column) counts non-NULL values; COUNT(DISTINCT column) counts unique values
  • SUM totals numeric values; AVG computes the mean; MIN / MAX find the boundaries
  • Every SELECT column must either appear in GROUP BY or be wrapped in an aggregate function
  • WHERE filters rows before grouping; HAVING filters groups after aggregation
  • You can GROUP BY expressions (e.g., SUBSTR(date, 1, 7) for month-level grouping)
  • Joining tables before grouping lets you combine data from multiple sources into one report
  • Use window functions instead of GROUP BY when you need group-level metrics without losing individual rows

Try combining GROUP BY with JOINs, CASE expressions, and HAVING clauses to build the reports your data analysis needs!