Merging and Joining
In real AI projects, data often comes from multiple sources. You need to combine them based on a common key – similar to SQL joins. Pandas provides
merge(), join(), and concat().Merging Two DataFrames (SQL‑style)
df1 = pd.DataFrame({'ID': [1,2,3], 'Name': ['Alice','Bob','Charlie']})
df2 = pd.DataFrame({'ID': [2,3,4], 'Score': [90,85,88]})
# Inner join (only keys present in both)
inner = pd.merge(df1, df2, on='ID', how='inner')
# Left join (keep all rows from left)
left = pd.merge(df1, df2, on='ID', how='left')
# Outer join (keep all keys from both)
outer = pd.merge(df1, df2, on='ID', how='outer')Concatenating (Stacking) DataFrames
When two datasets have the same columns, you can stack them vertically.
df1 = pd.DataFrame({'A': [1,2], 'B': [3,4]})
df2 = pd.DataFrame({'A': [5,6], 'B': [7,8]})
combined = pd.concat([df1, df2], ignore_index=True)Joining on Index
df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)
joined = df1.join(df2, how='inner')Why Merging Matters for AI
- Combine features from different tables (e.g., user demographics and purchase history).
- Merge train labels with feature data.
- Stack test data with training data for preprocessing consistency.
Choosing the Right Join
- Inner join: only rows with keys in both tables.
- Left join: keep all rows from left, match from right.
- Outer join: keep all rows from both, fill missing with NaN.
Two Minute Drill
pd.merge(df1, df2, on='key')– SQL‑style join.how='inner'/'left'/'outer'/'right'controls join type.pd.concat()stacks rows vertically.- Essential for combining real‑world AI datasets.
Need more clarification?
Drop us an email at career@quipoinfotech.com
