Loading

Quipoin Menu

Learn • Practice • Grow

pyspark / Model Evaluation & Tuning
tutorial

Model Evaluation & Tuning

Evaluating and tuning models is crucial for building reliable ML systems. MLlib provides evaluators and cross‑validation tools for hyperparameter tuning.

Evaluators for Different Tasks

  • Regression: `RegressionEvaluator` (metrics: rmse, mae, r2).
  • Binary classification: `BinaryClassificationEvaluator` (areaUnderROC, areaUnderPR).
  • Multiclass classification: `MulticlassClassificationEvaluator` (accuracy, f1, precision, recall).
  • Clustering: `ClusteringEvaluator` (silhouette score).
from pyspark.ml.evaluation import RegressionEvaluator

evaluator = RegressionEvaluator(labelCol="label", predictionCol="prediction", metricName="rmse")
rmse = evaluator.evaluate(predictions)

Cross‑Validation (Train‑Validation Split)

`TrainValidationSplit` performs a single train‑test split for hyperparameter tuning. For k‑fold, use `CrossValidator` (more expensive).
from pyspark.ml.tuning import ParamGridBuilder, TrainValidationSplit, CrossValidator

paramGrid = (ParamGridBuilder()
.addGrid(lr.regParam, [0.01, 0.1, 1.0])
.addGrid(lr.elasticNetParam, [0.0, 0.5, 1.0])
.build())

tvs = TrainValidationSplit(estimator=lr,
estimatorParamMaps=paramGrid,
evaluator=evaluator,
trainRatio=0.8)
cv_model = tvs.fit(train)

Best Model and Parameters

best_model = cv_model.bestModel
best_params = cv_model.getEstimatorParamMaps()[cv_model.getBestIndex()]


Two Minute Drill
  • Evaluators measure performance for regression, classification, clustering.
  • `ParamGridBuilder` defines hyperparameter search space.
  • `TrainValidationSplit` and `CrossValidator` automate tuning.
  • Always tune on training data; evaluate final model on test set.

Need more clarification?

Drop us an email at career@quipoinfotech.com