Partial Derivatives in Machine Learning Interview Questions

Q1. Scenario: In a neural network, the loss function L depends on many weights, L(w1,w2,...,wn). How do you compute the effect of changing a single weight w1 on the loss while holding others constant?

Use the partial derivative ∂L/∂w1. It measures the instantaneous rate of change of L with respect to w1 alone. In backpropagation, you compute partial derivatives of the loss with respect to each weight (via the chain rule) and then update weights in the opposite direction of the gradient (vector of all partials).

Q2. Scenario: The temperature on a metal plate is given by T(x,y)= 2x^2 + 3y^2. At point (2,1), in which direction does the temperature increase most rapidly? What is that rate?

The gradient vector ∇T = (∂T/∂x, ∂T/∂y) = (4x, 6y). At (2,1), ∇T = (8,6). The direction of steepest ascent is the gradient direction (8,6), and the rate is the magnitude √(8²+6²)=10. This is used in gradient ascent optimization (or descent for minimization).

Q3. Scenario: For function f(x,y) = x²y - 4xy². Compute the partial derivatives ∂f/∂x and ∂f/∂y. Then interpret what they mean at the point where x=1, y=1.

∂f/∂x = 2xy - 4y², ∂f/∂y = x² - 8xy. At (1,1): ∂f/∂x = 2-4= -2, ∂f/∂y = 1-8= -7. This means if you increase x slightly, f decreases about 2 units; increase y slightly, f decreases about 7 units. In ML, these partials guide weight updates.

Q4. Scenario: A machine learning engineer is tuning two hyperparameters: learning rate α and regularization λ. The cost function J(α,λ) is not convex. How can partial derivatives help decide whether to increase or decrease each hyperparameter?

At a current point (α,λ), compute ∂J/∂α and ∂J/∂λ. If ∂J/∂α > 0, increasing α would increase cost (bad), so decrease α. If ∂J/∂α < 0, increase α. Same for λ. This is like training with hyperparameter gradients, used in methods like Bayesian optimization or gradient-based hyperparameter tuning.

Q5. Scenario: In logistic regression, the log-likelihood for a single data point is L(w0,w1) = y·log(s) + (1-y)·log(1-s) where s=1/(1+e^-(w0+w1x)). Compute the partial derivative of L with respect to w0 and interpret.

∂L/∂w0 = y - s. This is the error (actual - predicted probability). This simple expression is why logistic regression updates weights using (y - s)*x. Partial derivatives in machine learning often simplify to prediction errors, making gradient descent computationally efficient.

Welcome to Quipoin

Quipoin Menu