Loading

Quipoin Menu

Learn • Practice • Grow

python-for-ai / Merging and Joining
tutorial

Merging and Joining

In real AI projects, data often comes from multiple sources. You need to combine them based on a common key – similar to SQL joins. Pandas provides merge(), join(), and concat().

Merging Two DataFrames (SQL‑style)

df1 = pd.DataFrame({'ID': [1,2,3], 'Name': ['Alice','Bob','Charlie']})
df2 = pd.DataFrame({'ID': [2,3,4], 'Score': [90,85,88]})

# Inner join (only keys present in both)
inner = pd.merge(df1, df2, on='ID', how='inner')

# Left join (keep all rows from left)
left = pd.merge(df1, df2, on='ID', how='left')

# Outer join (keep all keys from both)
outer = pd.merge(df1, df2, on='ID', how='outer')

Concatenating (Stacking) DataFrames

When two datasets have the same columns, you can stack them vertically.
df1 = pd.DataFrame({'A': [1,2], 'B': [3,4]})
df2 = pd.DataFrame({'A': [5,6], 'B': [7,8]})
combined = pd.concat([df1, df2], ignore_index=True)

Joining on Index

df1.set_index('ID', inplace=True)
df2.set_index('ID', inplace=True)
joined = df1.join(df2, how='inner')

Why Merging Matters for AI

  • Combine features from different tables (e.g., user demographics and purchase history).
  • Merge train labels with feature data.
  • Stack test data with training data for preprocessing consistency.

Choosing the Right Join

  • Inner join: only rows with keys in both tables.
  • Left join: keep all rows from left, match from right.
  • Outer join: keep all rows from both, fill missing with NaN.


Two Minute Drill
  • pd.merge(df1, df2, on='key') – SQL‑style join.
  • how='inner'/'left'/'outer'/'right' controls join type.
  • pd.concat() stacks rows vertically.
  • Essential for combining real‑world AI datasets.

Need more clarification?

Drop us an email at career@quipoinfotech.com