Histograms and Scatter Plots
Histograms show the distribution of a single variable (e.g., age of customers). Scatter plots reveal relationships between two variables (e.g., height vs. weight). Both are critical for understanding data before modeling.
Histogram
import matplotlib.pyplot as plt
import numpy as np
# Generate random data
data = np.random.randn(1000)
plt.hist(data, bins=30, edgecolor='black')
plt.title('Histogram of Normally Distributed Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()Scatter Plot
x = np.random.rand(50) * 10
y = 2 * x + 1 + np.random.randn(50) * 2 # linear with noise
plt.scatter(x, y, alpha=0.7)
plt.title('Scatter Plot with Linear Trend')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()Why These Plots Matter for AI
- Histogram: Check if data is normally distributed (assumption for some models). Detect skewness or outliers.
- Scatter plot: Visualize correlation between features and target. Identify non‑linear relationships.
Customizing Histogram Bins
plt.hist(data, bins=50, density=True, alpha=0.6, color='g')Scatter Plot with Color Mapping
colors = np.random.rand(50)
plt.scatter(x, y, c=colors, cmap='viridis')
plt.colorbar()Two Minute Drill
- Histogram:
plt.hist(data, bins=N)– shows distribution. - Scatter plot:
plt.scatter(x, y)– shows relationship. - Use for EDA (Exploratory Data Analysis) before modeling.
- Customize bins, colors, transparency for clarity.
Need more clarification?
Drop us an email at career@quipoinfotech.com
