Q1. Scenario: After training a classifier on imbalanced dataset, why is accuracy not a good metric? What metric would you use?
Accuracy can be high by always predicting majority class. Better metrics: precision, recall, F1-score, or ROC-AUC. Use classification_report: from sklearn.metrics import classification_report; print(classification_report(y_test, y_pred)).
Q2. Scenario: For a regression problem, compute mean absolute error (MAE) and mean squared error (MSE) between true and predicted values.
from sklearn.metrics import mean_absolute_error, mean_squared_error; mae = mean_absolute_error(y_test, y_pred); mse = mean_squared_error(y_test, y_pred). MSE penalizes large errors more.
Q3. Scenario: Plot the confusion matrix for a binary classifier and compute sensitivity and specificity.
from sklearn.metrics import confusion_matrix; cm = confusion_matrix(y_test, y_pred); tn, fp, fn, tp = cm.ravel(); sensitivity = tp/(tp+fn); specificity = tn/(tn+fp). Plot using seaborn heatmap.
Q4. Scenario: Compute the ROC curve and AUC for a probabilistic classifier (e.g., LogisticRegression). Use predict_proba.
from sklearn.metrics import roc_curve, auc; y_prob = model.predict_proba(X_test)[:,1]; fpr, tpr, thresholds = roc_curve(y_test, y_prob); roc_auc = auc(fpr, tpr). Plot fpr vs tpr.
Q5. Scenario: Compare multiple models using cross-validation scores. Use cross_val_score with scoring=''neg_mean_squared_error'' for regression.
from sklearn.model_selection import cross_val_score; scores = cross_val_score(model, X, y, cv=5, scoring=''neg_mean_squared_error''); print(scores.mean(), scores.std()). Higher (less negative) is better.
