Loading

Quipoin Menu

Learn • Practice • Grow

python-for-ai / Training a Simple Model
interview

Q1. Train a logistic regression classifier on the iris dataset to predict species. Use only two features (sepal length, petal length) for simplicity.
from sklearn.linear_model import LogisticRegression
X = iris.data[:, [0,2]]
y = iris.target
model = LogisticRegression()
model.fit(X, y)
predictions = model.predict(X_test)

Q2. Train a K-Nearest Neighbors regressor on the diabetes dataset. Set n_neighbors=5. Then predict on test set.
from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

Q3. Train a Decision Tree Classifier on the breast cancer dataset. Evaluate training and testing accuracy. Check for overfitting.
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier(random_state=42)
dt.fit(X_train, y_train)
train_acc = dt.score(X_train, y_train)
test_acc = dt.score(X_test, y_test)
If train_acc much higher than test_acc, overfitting. Limit depth with max_depth.

Q4. Train a Random Forest classifier on the digits dataset and print feature importance.
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, y_train)
importances = rf.feature_importances_
Sorted indices show most important pixels.

Q5. Use a pipeline to scale data then fit a Support Vector Classifier. Avoid data leakage.
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
pipeline = Pipeline([(''scaler'', StandardScaler()), (''svc'', SVC())])
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)