Grouping and Aggregation

Often you need to group data by a category and then compute statistics for each group – for example, average score per class, total sales per region. Pandas groupby makes this easy.

Basic GroupBy

import pandas as pd

df = pd.DataFrame({
    'Department': ['HR', 'IT', 'IT', 'HR', 'Finance'],
    'Salary': [50000, 70000, 80000, 55000, 90000]
})

# Group by Department and compute mean salary
grouped = df.groupby('Department')['Salary'].mean()
print(grouped)

Output:

Department
Finance    90000
HR         52500
IT         75000

Multiple Aggregations

result = df.groupby('Department')['Salary'].agg(['mean', 'median', 'count'])

Grouping by Multiple Columns

df = pd.DataFrame({
    'City': ['NY', 'NY', 'LA', 'LA'],
    'Year': [2020, 2021, 2020, 2021],
    'Sales': [100, 150, 200, 250]
})
grouped = df.groupby(['City', 'Year'])['Sales'].sum()

Custom Aggregation Functions

def range_func(x):
    return x.max() - x.min()

df.groupby('Department')['Salary'].agg(range_func)

Why GroupBy Matters for AI

You might need to:

Compute class‑wise statistics in a dataset (e.g., average pixel per digit in MNIST).
Aggregate user behavior for recommendation systems.
Prepare summary tables for visualization.

Two Minute Drill

df.groupby('column')['value'].mean() – group and aggregate.
Use .agg() for multiple functions.
Group by multiple columns with a list.
Custom functions can be passed to .agg().

Practice Exercises Interview Questions Take Quiz

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Grouping and Aggregation

Need more clarification?