Grouping and Aggregation Interview Questions

Q1. You have sales data with columns ''region'', ''product'', ''sales''. Group by region and compute total sales per region.

grouped = df.groupby(''region'')[''sales''].sum()
# .mean(), .count(), .agg() also work

For multiple aggregations: df.groupby(''region'')[''sales''].agg([''sum'',''mean'',''count'']).

Q2. Group by both ''region'' and ''product'', and calculate average sales. Then unstack the result into a table format.

avg_sales = df.groupby([''region'',''product''])[''sales''].mean()
table = avg_sales.unstack()   # pivot-like table

Also can use pivot_table: pd.pivot_table(df, values=''sales'', index=''region'', columns=''product'', aggfunc=''mean'').

Q3. Using groupby, get the maximum sales for each region and the product that achieved it. Use .idxmax() to find the product index.

max_idx = df.groupby(''region'')[''sales''].idxmax()
top_products = df.loc[max_idx, [''region'',''product'',''sales'']]

This returns the row(s) with max per group.

Q4. Apply multiple aggregation functions to different columns: sum of sales, mean of quantity, and count of orders per region.

df.groupby(''region'').agg({
    ''sales'': ''sum'',
    ''quantity'': ''mean'',
    ''order_id'': ''count''
})

Or use named aggregations: .agg(sales_sum=(''sales'',''sum'')).

Q5. Compute the percentage of total sales contributed by each region. Use groupby transform to create a column with total sales per region.

df[''region_total''] = df.groupby(''region'')[''sales''].transform(''sum'')
df[''region_pct''] = df[''sales''] / df[''region_total''] * 100

transform returns same shape as original, useful for normalization.

Quipoin Menu