← Back to Lessons
INTERMEDIATE ⏱ 38 minutes

Grouping Data with GROUP BY

Remember aggregate functions from the last lesson? They gave us one summary for the entire dataset: "The average GPA is 3.6." But what if you want to break that down by category? For example, "What's the average GPA per major?" or "What are total sales per region?" That's where GROUP BY comes inβ€”it's the bridge that connects aggregate functions to categorical analysis. This is one of the most powerful features in SQL!


πŸ“š What You'll Learn

1. What Does GROUP BY Do?

GROUP BY divides your data into groups based on column values, then calculates aggregates for each group separately.

πŸ’‘ Real-World Example: Imagine you're a teacher analyzing test scores. Without GROUP BY, you'd say: "The class average is 85." With GROUP BY, you can break it down: "Period 1 averaged 88, Period 2 averaged 82, Period 3 averaged 87." Same data, but now organized by category (period). That's exactly what GROUP BY does!

Visual Transformation: From Individual Rows to Groups

WITHOUT GROUP BY (Aggregate for All)

Input: All Students

name major gpa
AliceCS3.8
BobCS3.6
CarolBio3.9
DianaBio3.7
β†’

Output: One Row

SELECT AVG(gpa)
FROM students;
AVG(gpa)
3.75

WITH GROUP BY (Aggregate per Group)

Input: Same Students

name major gpa
AliceCS3.8
BobCS3.6
CarolBio3.9
DianaBio3.7
β†’

Output: One Row per Group

SELECT major, AVG(gpa)
FROM students
GROUP BY major;
major AVG(gpa)
CS3.7
Bio3.8

See the magic? GROUP BY split students into CS and Biology groups, then calculated the average for each group separately!

2. Basic GROUP BY Syntax

SELECT column_to_group_by, AGGREGATE_FUNCTION(column)
FROM table_name
GROUP BY column_to_group_by;

Example: Average GPA per Major

SELECT 
    major,
    AVG(gpa) AS average_gpa
FROM students
GROUP BY major;

How It Works (Step by Step):

  1. Step 1: SQL looks at the major column
  2. Step 2: Creates separate groups for each unique major (CS, Biology, Math, etc.)
  3. Step 3: Calculates AVG(gpa) within each group
  4. Step 4: Returns one row per group (one row for CS, one for Biology, etc.)

3. Critical Concept: One Row Per Group

This is the most important thing to understand about GROUP BY:

⚑ The Golden Rule: GROUP BY produces exactly one row for each unique group. If you have 5 majors, you get 5 rows. If you have 50 regions, you get 50 rows.

Example: Counting Students per Major

SELECT 
    major,
    COUNT(*) AS student_count
FROM students
GROUP BY major;

Sample Result:

major student_count
Computer Science 45
Biology 32
Mathematics 28
Engineering 51

Four unique majors = four rows. Each row shows the major and how many students are in it.

4. The Most Important Rule: SELECT and GROUP BY Must Match

Here's the rule that trips up everyone at first:

🚨 Critical Rule: Every column in SELECT that is NOT inside an aggregate function MUST appear in GROUP BY.

Valid Queries (Follow the Rule):

-- βœ… CORRECT: major is in SELECT and GROUP BY
SELECT major, AVG(gpa)
FROM students
GROUP BY major;

-- βœ… CORRECT: Only aggregate in SELECT (no GROUP BY needed)
SELECT AVG(gpa)
FROM students;

-- βœ… CORRECT: Both columns in SELECT appear in GROUP BY
SELECT major, year, AVG(gpa)
FROM students
GROUP BY major, year;

Invalid Query (Breaks the Rule):

-- ❌ WRONG: name is in SELECT but NOT in GROUP BY!
SELECT name, major, AVG(gpa)
FROM students
GROUP BY major;

Why This Fails (Logical Explanation):

The Problem: We're grouping by major, so we get one row per major (one for CS, one for Biology). But which student's name should appear in that row? CS has 45 studentsβ€”should it show Alice? Bob? All 45 names?

SQL's Response: "I don't know which name to pick!" β†’ ERROR

The Solution: Either GROUP BY name (one row per student), or don't include name in SELECT.

5. Grouping by Multiple Columns

You can group by multiple columns to create more specific groups:

-- Average GPA per major AND year
SELECT 
    major,
    year,
    AVG(gpa) AS avg_gpa
FROM students
GROUP BY major, year;

How Multi-Column Grouping Works:

SQL creates groups based on unique combinations of the columns:

major year avg_gpa
CS 2023 3.65
CS 2024 3.72
Biology 2023 3.81
Biology 2024 3.76

Each combination gets its own row: CS 2023, CS 2024, Biology 2023, Biology 2024.

πŸ’‘ Pro Tip: The order of columns in GROUP BY doesn't change the resultβ€”GROUP BY major, year is the same as GROUP BY year, major. But column order in SELECT does matter for display!

6. Filtering with WHERE (Before Grouping)

WHERE filters rows before they're grouped:

-- Average GPA per major, but only for students with GPA > 3.0
SELECT 
    major,
    AVG(gpa) AS avg_gpa
FROM students
WHERE gpa > 3.0
GROUP BY major;

Execution Order:

  1. FROM: Get all students
  2. WHERE: Filter to students with GPA > 3.0
  3. GROUP BY: Group filtered students by major
  4. SELECT AVG(): Calculate average for each group

Students with GPA ≀ 3.0 are excluded before grouping happens!

7. Filtering with HAVING (After Grouping)

What if you want to filter based on the aggregate result? That's where HAVING comes in:

-- Show only majors with average GPA above 3.5
SELECT 
    major,
    AVG(gpa) AS avg_gpa
FROM students
GROUP BY major
HAVING AVG(gpa) > 3.5;

WHERE vs. HAVING

WHERE

  • Filters individual rows
  • Happens before grouping
  • Can't use aggregates
  • Example: WHERE gpa > 3.0

HAVING

  • Filters groups
  • Happens after grouping
  • Can use aggregates
  • Example: HAVING AVG(gpa) > 3.5

Example: Majors with More Than 30 Students

SELECT 
    major,
    COUNT(*) AS student_count
FROM students
GROUP BY major
HAVING COUNT(*) > 30;

This filters out majors that have 30 or fewer students. The filtering happens after counting!

8. The Complete SQL Execution Order

Understanding the order SQL processes your query is crucial:

  1. FROM - Get the table
  2. WHERE - Filter individual rows
  3. GROUP BY - Create groups
  4. HAVING - Filter groups
  5. SELECT - Choose columns and calculate aggregates
  6. ORDER BY - Sort results
  7. LIMIT - Limit number of rows returned

Example Using All Clauses:

SELECT 
    major,
    AVG(gpa) AS avg_gpa,
    COUNT(*) AS student_count
FROM students
WHERE year >= 2023                    -- 1. Filter: only recent students
GROUP BY major                        -- 2. Group by major
HAVING AVG(gpa) > 3.4                 -- 3. Filter: only high-performing majors
ORDER BY avg_gpa DESC                 -- 4. Sort: highest GPA first
LIMIT 5;                              -- 5. Show: top 5 majors only

9. Common Beginner Mistakes

Mistake #1: Missing Column in GROUP BY

-- ❌ WRONG: name is in SELECT but not in GROUP BY
SELECT name, major, AVG(gpa)
FROM students
GROUP BY major;

-- βœ… CORRECT: Remove name, or add it to GROUP BY
SELECT major, AVG(gpa)
FROM students
GROUP BY major;

Mistake #2: Using WHERE Instead of HAVING

-- ❌ WRONG: Can't use aggregates in WHERE
SELECT major, AVG(gpa)
FROM students
WHERE AVG(gpa) > 3.5
GROUP BY major;

-- βœ… CORRECT: Use HAVING for aggregates
SELECT major, AVG(gpa)
FROM students
GROUP BY major
HAVING AVG(gpa) > 3.5;

Mistake #3: Forgetting GROUP BY Entirely

-- ❌ WRONG: major + aggregate without GROUP BY
SELECT major, AVG(gpa)
FROM students;

-- βœ… CORRECT: Add GROUP BY
SELECT major, AVG(gpa)
FROM students
GROUP BY major;

Mistake #4: Wrong Column Order Expectation

-- Both of these work and give the same result:
SELECT major, AVG(gpa) FROM students GROUP BY major;
SELECT AVG(gpa), major FROM students GROUP BY major;

-- But the column display order is different!
-- First query: major column, then avg_gpa column
-- Second query: avg_gpa column, then major column

Mistake #5: Using HAVING Without GROUP BY

-- ❌ WRONG: HAVING without GROUP BY
SELECT AVG(gpa)
FROM students
HAVING AVG(gpa) > 3.5;

-- βœ… CORRECT: Use WHERE for non-grouped queries
SELECT AVG(gpa)
FROM students
WHERE gpa > 3.5;

10. 🎯 Hands-On Practice Exercises

Exercise 1: Basic GROUP BY

Task: Count how many students are in each major.

Show Answer
SELECT 
    major,
    COUNT(*) AS student_count
FROM students
GROUP BY major;

Exercise 2: Average per Group

Task: Find the average GPA for each major.

Show Answer
SELECT 
    major,
    AVG(gpa) AS average_gpa
FROM students
GROUP BY major;

Exercise 3: Multiple Aggregates

Task: For each major, show the count, average GPA, minimum GPA, and maximum GPA.

Show Answer
SELECT 
    major,
    COUNT(*) AS student_count,
    AVG(gpa) AS avg_gpa,
    MIN(gpa) AS lowest_gpa,
    MAX(gpa) AS highest_gpa
FROM students
GROUP BY major;

Exercise 4: GROUP BY with WHERE

Task: Find the average GPA per major, but only include students with GPA β‰₯ 3.0.

Show Answer
SELECT 
    major,
    AVG(gpa) AS avg_gpa
FROM students
WHERE gpa >= 3.0
GROUP BY major;

Exercise 5: Using HAVING

Task: Show majors that have more than 40 students.

Show Answer
SELECT 
    major,
    COUNT(*) AS student_count
FROM students
GROUP BY major
HAVING COUNT(*) > 40;

Exercise 6: Multi-Column Grouping

Task: Find the average GPA for each combination of major and year.

Show Answer
SELECT 
    major,
    year,
    AVG(gpa) AS avg_gpa
FROM students
GROUP BY major, year;

Challenge Exercise: Complete Query

Task: Find majors with average GPA above 3.5, but only include students from 2023 or later. Show the top 3 majors by average GPA.

Show Answer
SELECT 
    major,
    AVG(gpa) AS avg_gpa,
    COUNT(*) AS student_count
FROM students
WHERE year >= 2023
GROUP BY major
HAVING AVG(gpa) > 3.5
ORDER BY avg_gpa DESC
LIMIT 3;

Execution order: WHERE filters rows β†’ GROUP BY creates groups β†’ HAVING filters groups β†’ ORDER BY sorts β†’ LIMIT restricts output

πŸ“ Key Takeaways

πŸš€ What's Next?

Congratulations! You've mastered one of SQL's most powerful features. GROUP BY unlocks advanced data analysisβ€”you can now answer complex business questions like "Which products sell best per region?" or "What's our revenue trend by month?" Next, you'll learn about table joins, which let you combine data from multiple tables. This is where SQL really becomes powerful!

Up next: Introduction to JOINs - Combining Data from Multiple Tables

← Back to Lessons
Practice Now β†’