Grouping and Aggregation
Often you need to group data by a category and then compute statistics for each group – for example, average score per class, total sales per region. Pandas
groupby makes this easy.Basic GroupBy
import pandas as pd
df = pd.DataFrame({
'Department': ['HR', 'IT', 'IT', 'HR', 'Finance'],
'Salary': [50000, 70000, 80000, 55000, 90000]
})
# Group by Department and compute mean salary
grouped = df.groupby('Department')['Salary'].mean()
print(grouped)Output:Department
Finance 90000
HR 52500
IT 75000Multiple Aggregations
result = df.groupby('Department')['Salary'].agg(['mean', 'median', 'count'])Grouping by Multiple Columns
df = pd.DataFrame({
'City': ['NY', 'NY', 'LA', 'LA'],
'Year': [2020, 2021, 2020, 2021],
'Sales': [100, 150, 200, 250]
})
grouped = df.groupby(['City', 'Year'])['Sales'].sum()Custom Aggregation Functions
def range_func(x):
return x.max() - x.min()
df.groupby('Department')['Salary'].agg(range_func)Why GroupBy Matters for AI
You might need to:
- Compute class‑wise statistics in a dataset (e.g., average pixel per digit in MNIST).
- Aggregate user behavior for recommendation systems.
- Prepare summary tables for visualization.
Two Minute Drill
df.groupby('column')['value'].mean()– group and aggregate.- Use
.agg()for multiple functions. - Group by multiple columns with a list.
- Custom functions can be passed to
.agg().
Need more clarification?
Drop us an email at career@quipoinfotech.com
