Feature Engineering Basics
Feature engineering is the process of creating new features from existing data to improve model performance. A good feature can make a simple model work well; poor features can ruin even a complex model.
Feature engineering is the art of turning raw data into informative inputs for ML models.
Simple Feature Engineering Examples
- Combining features: BMI = weight / height² (instead of separate weight and height).
- Extracting parts: From a date column, extract day of week, month, is_weekend.
- Aggregations: For customer data, compute total purchase amount per customer.
- Polynomial features: Add x², x³ to capture non‑linear relationships.
Why It Matters
Raw data often lacks the right representation. Domain knowledge can create powerful features. For house prices: age of house, distance to station, number of rooms per floor. These are not directly in the raw data but derived.
Example with Code
import pandas as pd
df['date'] = pd.to_datetime(df['date'])
df['day_of_week'] = df['date'].dt.dayofweek
df['is_weekend'] = df['day_of_week'] >= 5
df['month'] = df['date'].dt.month
# Polynomial features with scikit-learn
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly.fit_transform(X[['age', 'income']])Caution: Don’t Over‑Engineer
Too many features can cause overfitting. Use domain knowledge and test which features improve validation performance. Feature selection (later module) helps.
Two Minute Drill
- Feature engineering creates new features from raw data.
- Examples: ratios, date parts, aggregations, polynomials.
- Good features improve model performance significantly.
- Avoid over‑engineering – validate with cross‑validation.
Need more clarification?
Drop us an email at career@quipoinfotech.com
