INTERMEDIATE ⏱ 38 minutes

Grouping Data with GROUP BY

Remember aggregate functions from the last lesson? They gave us one summary for the entire dataset: "The average GPA is 3.6." But what if you want to break that down by category? For example, "What's the average GPA per major?" or "What are total sales per region?" That's where GROUP BY comes in—it's the bridge that connects aggregate functions to categorical analysis. This is one of the most powerful features in SQL!

📚 What You'll Learn

What GROUP BY does and why it's essential for analysis
How GROUP BY creates one row per group
The partnership between GROUP BY and aggregate functions
Grouping by a single column (e.g., by major, by region)
Grouping by multiple columns (e.g., by major AND year)
How GROUP BY changes the meaning of SELECT
The golden rule: every non-aggregated column MUST appear in GROUP BY
Using WHERE to filter before grouping
Using HAVING to filter after grouping
The complete execution order: WHERE → GROUP BY → HAVING
Common beginner mistakes and how to avoid them
Real-world business analytics scenarios
Hands-on practice exercises

1. What Does GROUP BY Do?

GROUP BY divides your data into groups based on column values, then calculates aggregates for each group separately.

💡 Real-World Example: Imagine you're a teacher analyzing test scores. Without GROUP BY, you'd say: "The class average is 85." With GROUP BY, you can break it down: "Period 1 averaged 88, Period 2 averaged 82, Period 3 averaged 87." Same data, but now organized by category (period). That's exactly what GROUP BY does!

Visual Transformation: From Individual Rows to Groups

WITHOUT GROUP BY (Aggregate for All)

Input: All Students

name	major	gpa
Alice	CS	3.8
Bob	CS	3.6
Carol	Bio	3.9
Diana	Bio	3.7

→

Output: One Row

SELECT AVG(gpa)
FROM students;

AVG(gpa)
3.75

WITH GROUP BY (Aggregate per Group)

Input: Same Students

name	major	gpa
Alice	CS	3.8
Bob	CS	3.6
Carol	Bio	3.9
Diana	Bio	3.7

→

Output: One Row per Group

SELECT major, AVG(gpa)
FROM students
GROUP BY major;

major	AVG(gpa)
CS	3.7
Bio	3.8

See the magic? GROUP BY split students into CS and Biology groups, then calculated the average for each group separately!

2. Basic GROUP BY Syntax

SELECT column_to_group_by, AGGREGATE_FUNCTION(column)
FROM table_name
GROUP BY column_to_group_by;

Example: Average GPA per Major

SELECT 
    major,
    AVG(gpa) AS average_gpa
FROM students
GROUP BY major;

How It Works (Step by Step):

Step 1: SQL looks at the major column
Step 2: Creates separate groups for each unique major (CS, Biology, Math, etc.)
Step 3: Calculates AVG(gpa) within each group
Step 4: Returns one row per group (one row for CS, one for Biology, etc.)

3. Critical Concept: One Row Per Group

This is the most important thing to understand about GROUP BY:

⚡ The Golden Rule: GROUP BY produces exactly one row for each unique group. If you have 5 majors, you get 5 rows. If you have 50 regions, you get 50 rows.

Example: Counting Students per Major

SELECT 
    major,
    COUNT(*) AS student_count
FROM students
GROUP BY major;

Sample Result:

major	student_count
Computer Science	45
Biology	32
Mathematics	28
Engineering	51

Four unique majors = four rows. Each row shows the major and how many students are in it.

4. The Most Important Rule: SELECT and GROUP BY Must Match

Here's the rule that trips up everyone at first:

🚨 Critical Rule: Every column in SELECT that is NOT inside an aggregate function MUST appear in GROUP BY.

Valid Queries (Follow the Rule):

-- ✅ CORRECT: major is in SELECT and GROUP BY
SELECT major, AVG(gpa)
FROM students
GROUP BY major;

-- ✅ CORRECT: Only aggregate in SELECT (no GROUP BY needed)
SELECT AVG(gpa)
FROM students;

-- ✅ CORRECT: Both columns in SELECT appear in GROUP BY
SELECT major, year, AVG(gpa)
FROM students
GROUP BY major, year;

Invalid Query (Breaks the Rule):

-- ❌ WRONG: name is in SELECT but NOT in GROUP BY!
SELECT name, major, AVG(gpa)
FROM students
GROUP BY major;

Why This Fails (Logical Explanation):

The Problem: We're grouping by major, so we get one row per major (one for CS, one for Biology). But which student's name should appear in that row? CS has 45 students—should it show Alice? Bob? All 45 names?

SQL's Response: "I don't know which name to pick!" → ERROR

The Solution: Either GROUP BY name (one row per student), or don't include name in SELECT.

5. Grouping by Multiple Columns

You can group by multiple columns to create more specific groups:

-- Average GPA per major AND year
SELECT 
    major,
    year,
    AVG(gpa) AS avg_gpa
FROM students
GROUP BY major, year;

How Multi-Column Grouping Works:

SQL creates groups based on unique combinations of the columns:

major	year	avg_gpa
CS	2023	3.65
CS	2024	3.72
Biology	2023	3.81
Biology	2024	3.76

Each combination gets its own row: CS 2023, CS 2024, Biology 2023, Biology 2024.

💡 Pro Tip: The order of columns in GROUP BY doesn't change the result—GROUP BY major, year is the same as GROUP BY year, major. But column order in SELECT does matter for display!

6. Filtering with WHERE (Before Grouping)

WHERE filters rows before they're grouped:

-- Average GPA per major, but only for students with GPA > 3.0
SELECT 
    major,
    AVG(gpa) AS avg_gpa
FROM students
WHERE gpa > 3.0
GROUP BY major;

Execution Order:

FROM: Get all students
WHERE: Filter to students with GPA > 3.0
GROUP BY: Group filtered students by major
SELECT AVG(): Calculate average for each group

Students with GPA ≤ 3.0 are excluded before grouping happens!

7. Filtering with HAVING (After Grouping)

What if you want to filter based on the aggregate result? That's where HAVING comes in:

-- Show only majors with average GPA above 3.5
SELECT 
    major,
    AVG(gpa) AS avg_gpa
FROM students
GROUP BY major
HAVING AVG(gpa) > 3.5;

WHERE vs. HAVING

WHERE

Filters individual rows
Happens before grouping
Can't use aggregates
Example: WHERE gpa > 3.0

HAVING

Filters groups
Happens after grouping
Can use aggregates
Example: HAVING AVG(gpa) > 3.5

Example: Majors with More Than 30 Students

SELECT 
    major,
    COUNT(*) AS student_count
FROM students
GROUP BY major
HAVING COUNT(*) > 30;

This filters out majors that have 30 or fewer students. The filtering happens after counting!

8. The Complete SQL Execution Order

Understanding the order SQL processes your query is crucial:

FROM - Get the table
WHERE - Filter individual rows
GROUP BY - Create groups
HAVING - Filter groups
SELECT - Choose columns and calculate aggregates
ORDER BY - Sort results
LIMIT - Limit number of rows returned

Example Using All Clauses:

SELECT 
    major,
    AVG(gpa) AS avg_gpa,
    COUNT(*) AS student_count
FROM students
WHERE year >= 2023                    -- 1. Filter: only recent students
GROUP BY major                        -- 2. Group by major
HAVING AVG(gpa) > 3.4                 -- 3. Filter: only high-performing majors
ORDER BY avg_gpa DESC                 -- 4. Sort: highest GPA first
LIMIT 5;                              -- 5. Show: top 5 majors only

9. Common Beginner Mistakes

Mistake #1: Missing Column in GROUP BY

-- ❌ WRONG: name is in SELECT but not in GROUP BY
SELECT name, major, AVG(gpa)
FROM students
GROUP BY major;

-- ✅ CORRECT: Remove name, or add it to GROUP BY
SELECT major, AVG(gpa)
FROM students
GROUP BY major;

Mistake #2: Using WHERE Instead of HAVING

-- ❌ WRONG: Can't use aggregates in WHERE
SELECT major, AVG(gpa)
FROM students
WHERE AVG(gpa) > 3.5
GROUP BY major;

-- ✅ CORRECT: Use HAVING for aggregates
SELECT major, AVG(gpa)
FROM students
GROUP BY major
HAVING AVG(gpa) > 3.5;

Mistake #3: Forgetting GROUP BY Entirely

-- ❌ WRONG: major + aggregate without GROUP BY
SELECT major, AVG(gpa)
FROM students;

-- ✅ CORRECT: Add GROUP BY
SELECT major, AVG(gpa)
FROM students
GROUP BY major;

Mistake #4: Wrong Column Order Expectation

-- Both of these work and give the same result:
SELECT major, AVG(gpa) FROM students GROUP BY major;
SELECT AVG(gpa), major FROM students GROUP BY major;

-- But the column display order is different!
-- First query: major column, then avg_gpa column
-- Second query: avg_gpa column, then major column

Mistake #5: Using HAVING Without GROUP BY

-- ❌ WRONG: HAVING without GROUP BY
SELECT AVG(gpa)
FROM students
HAVING AVG(gpa) > 3.5;

-- ✅ CORRECT: Use WHERE for non-grouped queries
SELECT AVG(gpa)
FROM students
WHERE gpa > 3.5;

10. 🎯 Hands-On Practice Exercises

Exercise 1: Basic GROUP BY

Task: Count how many students are in each major.

Show Answer

SELECT 
    major,
    COUNT(*) AS student_count
FROM students
GROUP BY major;

Exercise 2: Average per Group

Task: Find the average GPA for each major.

Show Answer

SELECT 
    major,
    AVG(gpa) AS average_gpa
FROM students
GROUP BY major;

Exercise 3: Multiple Aggregates

Task: For each major, show the count, average GPA, minimum GPA, and maximum GPA.

Show Answer

SELECT 
    major,
    COUNT(*) AS student_count,
    AVG(gpa) AS avg_gpa,
    MIN(gpa) AS lowest_gpa,
    MAX(gpa) AS highest_gpa
FROM students
GROUP BY major;

Exercise 4: GROUP BY with WHERE

Task: Find the average GPA per major, but only include students with GPA ≥ 3.0.

Show Answer

SELECT 
    major,
    AVG(gpa) AS avg_gpa
FROM students
WHERE gpa >= 3.0
GROUP BY major;

Exercise 5: Using HAVING

Task: Show majors that have more than 40 students.

Show Answer

SELECT 
    major,
    COUNT(*) AS student_count
FROM students
GROUP BY major
HAVING COUNT(*) > 40;

Exercise 6: Multi-Column Grouping

Task: Find the average GPA for each combination of major and year.

Show Answer

SELECT 
    major,
    year,
    AVG(gpa) AS avg_gpa
FROM students
GROUP BY major, year;

Challenge Exercise: Complete Query

Task: Find majors with average GPA above 3.5, but only include students from 2023 or later. Show the top 3 majors by average GPA.

Show Answer

SELECT 
    major,
    AVG(gpa) AS avg_gpa,
    COUNT(*) AS student_count
FROM students
WHERE year >= 2023
GROUP BY major
HAVING AVG(gpa) > 3.5
ORDER BY avg_gpa DESC
LIMIT 3;

Execution order: WHERE filters rows → GROUP BY creates groups → HAVING filters groups → ORDER BY sorts → LIMIT restricts output

📝 Key Takeaways

GROUP BY divides data into groups and calculates aggregates per group
Produces one row per unique group
Golden Rule: Non-aggregated columns in SELECT must be in GROUP BY
Can group by single or multiple columns
WHERE filters rows before grouping
HAVING filters groups after grouping
Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
GROUP BY column order doesn't matter (but SELECT order does)
Can combine multiple aggregates in one query
Essential for business analytics, dashboards, and reports

🚀 What's Next?

Congratulations! You've mastered one of SQL's most powerful features. GROUP BY unlocks advanced data analysis—you can now answer complex business questions like "Which products sell best per region?" or "What's our revenue trend by month?" Next, you'll learn about table joins, which let you combine data from multiple tables. This is where SQL really becomes powerful!

Up next: Introduction to JOINs - Combining Data from Multiple Tables

← Back to Lessons

Practice Now →