Regression Project

In this end‑to‑end project, you will build a house price prediction system using linear regression. You will apply data preprocessing, feature engineering, model training, evaluation, and saving.

Project: Predict house prices based on features like size, bedrooms, location, and age.

Step 1: Load and Explore Data

We will use the California housing dataset (built into scikit‑learn).

from sklearn.datasets import fetch_california_housing
import pandas as pd

data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['MedHouseVal'] = data.target
print(df.head())
print(df.info())

Step 2: Preprocessing

Check for missing values (none in this dataset). Scale features.

from sklearn.preprocessing import StandardScaler

X = df.drop('MedHouseVal', axis=1)
y = df['MedHouseVal']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 3: Train/Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

Step 4: Train Linear Regression

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

Step 5: Evaluate

from sklearn.metrics import mean_squared_error, r2_score

y_pred = model.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"RMSE: {rmse:.2f}")
print(f"R²: {r2:.2f}")

Step 6: Save Model and Scaler

import joblib

joblib.dump(model, 'house_price_model.joblib')
joblib.dump(scaler, 'scaler.joblib')

Two Minute Drill

Load data, explore, scale features.
Train linear regression model.
Evaluate with RMSE and R².
Save model and scaler for future use.

Need more clarification?

Drop us an email at career@quipoinfotech.com

Welcome to Quipoin

Quipoin Menu

Regression Project

Need more clarification?