Training a Simple Model Interview Questions

Q1. Scenario: Train a logistic regression classifier on the iris dataset to predict species. Use only two features (sepal length, petal length) for simplicity.

from sklearn.linear_model import LogisticRegression; X = iris.data[:, [0,2]]; y = iris.target; model = LogisticRegression(); model.fit(X, y); predictions = model.predict(X_test).

Q2. Scenario: Train a K-Nearest Neighbors regressor on the diabetes dataset. Set n_neighbors=5. Then predict on test set.

from sklearn.neighbors import KNeighborsRegressor; knn = KNeighborsRegressor(n_neighbors=5); knn.fit(X_train, y_train); y_pred = knn.predict(X_test).

Q3. Scenario: Train a Decision Tree Classifier on the breast cancer dataset. Evaluate training and testing accuracy. Check for overfitting.

from sklearn.tree import DecisionTreeClassifier; dt = DecisionTreeClassifier(random_state=42); dt.fit(X_train, y_train); train_acc = dt.score(X_train, y_train); test_acc = dt.score(X_test, y_test). If train_acc much higher than test_acc, overfitting. Limit depth with max_depth.

Q4. Scenario: Train a Random Forest classifier on the digits dataset and print feature importance.

from sklearn.ensemble import RandomForestClassifier; rf = RandomForestClassifier(n_estimators=100); rf.fit(X_train, y_train); importances = rf.feature_importances_.`;` `Sorted indices show most important pixels.

Q5. Scenario: Use a pipeline to scale data then fit a Support Vector Classifier. Avoid data leakage.

from sklearn.pipeline import Pipeline; from sklearn.preprocessing import StandardScaler; from sklearn.svm import SVC; pipeline = Pipeline([(''scaler'', StandardScaler()), (''svc'', SVC())]); pipeline.fit(X_train, y_train); y_pred = pipeline.predict(X_test).

Welcome to Quipoin

Quipoin Menu