Decision Trees
Decision trees model decisions by splitting data based on feature values, forming a tree structure. They are easy to interpret and can handle both numerical and categorical data.
A decision tree asks a series of yes/no questions to reach a prediction.
How a Decision Tree Works
Start at the root (entire dataset). Choose the best feature to split (based on impurity measures like Gini or entropy). Repeat recursively until stopping criteria (max depth, min samples per leaf).
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier(max_depth=5, random_state=42)
dt.fit(X_train, y_train)Splitting Criteria
- Gini impurity: measures how often a randomly chosen element would be misclassified.
- Entropy: measures disorder/information gain.
- Both work well; Gini is faster, entropy more balanced.
Controlling Overfitting
Decision trees easily overfit. Use parameters:
max_depth: limit tree depth.min_samples_split: minimum samples to split a node.min_samples_leaf: minimum samples at a leaf.max_features: limit features considered for splits.
Visualizing a Tree
You can export tree structure using `sklearn.tree.plot_tree` – helpful for explanation.
Two Minute Drill
- Decision trees split data based on feature values.
- Use Gini or entropy to choose best split.
- Prone to overfitting – limit depth or leaf size.
- Easy to visualize and interpret.
Need more clarification?
Drop us an email at career@quipoinfotech.com
