Regression Project
In this end‑to‑end project, you will build a house price prediction system using linear regression. You will apply data preprocessing, feature engineering, model training, evaluation, and saving.
Project: Predict house prices based on features like size, bedrooms, location, and age.
Step 1: Load and Explore Data
We will use the California housing dataset (built into scikit‑learn).
from sklearn.datasets import fetch_california_housing
import pandas as pd
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['MedHouseVal'] = data.target
print(df.head())
print(df.info())Step 2: Preprocessing
Check for missing values (none in this dataset). Scale features.
from sklearn.preprocessing import StandardScaler
X = df.drop('MedHouseVal', axis=1)
y = df['MedHouseVal']
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)Step 3: Train/Test Split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)Step 4: Train Linear Regression
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)Step 5: Evaluate
from sklearn.metrics import mean_squared_error, r2_score
y_pred = model.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"RMSE: {rmse:.2f}")
print(f"R²: {r2:.2f}")Step 6: Save Model and Scaler
import joblib
joblib.dump(model, 'house_price_model.joblib')
joblib.dump(scaler, 'scaler.joblib')Two Minute Drill
- Load data, explore, scale features.
- Train linear regression model.
- Evaluate with RMSE and R².
- Save model and scaler for future use.
Need more clarification?
Drop us an email at career@quipoinfotech.com
