Q1. Scenario: You have a cost function J(w) = (w-2)^2 + 3. Starting from w=0, what is the gradient at this point? In which direction should you move to reduce cost?
The gradient (derivative) is J'(w)=2(w-2). At w=0, J' = -4. The negative gradient points toward decreasing cost: +4 direction. So move w to the right. After one gradient descent step: w_new = w_old - η·∇J. With η=0.1, w_new = 0 - 0.1*(-4)=0.4, getting closer to optimum 2.
Q2. Scenario: In a 3D landscape, a hiker wants to descend a mountain as quickly as possible. The elevation function is E(x,y)= x² + y². At position (1,2), what direction should the hiker go?
The gradient ∇E = (2x, 2y) = (2,4). The negative gradient direction is (-2,-4). That is the steepest descent direction. The gradient points uphill; moving opposite goes downhill fastest. In machine learning, we compute the gradient of the loss and move in the negative direction to minimize loss.
Q3. Scenario: A robot's cost function depends on joint angles. The gradient vector is zero at the current configuration. What does that imply?
A zero gradient means the point is a critical point: it could be a local minimum, local maximum, or saddle point. If the cost is convex, it's the global minimum. In practice, it could be a plateau or local minimum. The robot needs to examine the Hessian (second derivatives) to determine the nature of the critical point.
Q4. Scenario: In deep learning, why does backpropagation compute gradients efficiently compared to computing each partial derivative separately?
Backpropagation uses the chain rule to compute gradients layer by layer, reusing intermediate results. Instead of computing each ∂L/∂w separately with a forward pass per weight, backprop computes all gradients with one forward and one backward pass. This is O(n) for n weights rather than O(n²) if done naively.
Q5. Scenario: You have a neural network with a ReLU activation. The gradient of the loss with respect to a weight is zero when the neuron's input is negative. What problem does this cause in training?
"This is the dying ReLU" problem. If a neuron's input is always negative its gradient is zero so its weights never update. Solutions include Leaky ReLU (small slope for negatives) ELU or careful initialization and batch normalization. Understanding gradients helps diagnose training issues.
