Loading

Quipoin Menu

Learn • Practice • Grow

machine-learning / Classification Project
tutorial

Classification Project

In this end‑to‑end project, you will build a spam detection system using logistic regression. You will handle text data, vectorize messages, train a classifier, and evaluate performance.

Project: Classify SMS messages as spam or ham (not spam) using logistic regression.

Step 1: Load Data

We will use a public SMS spam collection dataset.
import pandas as pd

url = 'https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv'
df = pd.read_csv(url, sep='t', header=None, names=['label', 'message'])
print(df.head())
print(df['label'].value_counts())

Step 2: Preprocessing – Convert Text to Numbers

Use TfidfVectorizer to convert messages into numerical features.
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(stop_words='english', max_features=3000)
X = vectorizer.fit_transform(df['message'])
y = df['label'].map({'ham':0, 'spam':1})

Step 3: Train/Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 4: Train Logistic Regression

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

Step 5: Evaluate

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred):.3f}")
print(f"Recall: {recall_score(y_test, y_pred):.3f}")
print(f"F1: {f1_score(y_test, y_pred):.3f}")
print(confusion_matrix(y_test, y_pred))

Step 6: Save Model and Vectorizer

import joblib

joblib.dump(model, 'spam_model.joblib')
joblib.dump(vectorizer, 'vectorizer.joblib')


Two Minute Drill
  • Load SMS data, convert text to TF‑IDF features.
  • Train logistic regression classifier.
  • Evaluate using accuracy, precision, recall, F1.
  • Save model and vectorizer for inference.

Need more clarification?

Drop us an email at career@quipoinfotech.com