Ensemble Methods
Ensemble methods combine multiple models to produce a stronger predictor. Two popular techniques are Random Forest (bagging) and Gradient Boosting (boosting).
Ensembles reduce overfitting and improve accuracy by averaging or boosting many weak learners.
Random Forest (Bagging)
Builds many decision trees on random subsets of data and features, then averages their predictions (or majority vote). This reduces variance and overfitting.
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)
rf.fit(X_train, y_train)Gradient Boosting (Boosting)
Sequentially adds trees that correct the errors of previous trees. Each new tree focuses on the mistakes of the ensemble so far. Often yields higher accuracy.
from sklearn.ensemble import GradientBoostingClassifier
gb = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3)
gb.fit(X_train, y_train)XGBoost – Popular Variant
XGBoost (eXtreme Gradient Boosting) is an optimized, fast implementation of gradient boosting, widely used in competitions.
import xgboost as xgb
model = xgb.XGBClassifier(n_estimators=100, learning_rate=0.1)When to Use Which?
Random Forest: robust, less prone to overfitting, works well out‑of‑box. Gradient Boosting: often higher accuracy, but requires careful tuning.
Two Minute Drill
- Ensembles combine multiple models for better performance.
- Random Forest: bagging, reduces variance.
- Gradient Boosting: boosting, corrects errors sequentially.
- XGBoost is a fast, popular boosting library.
Need more clarification?
Drop us an email at career@quipoinfotech.com
