9/30/2024

How Gradient calculation in batch size.

 Let's use a simplified example with just 2 data points and walk through the process with actual numbers. This will help illustrate how gradients are calculated and accumulated for a batch.

Let's assume we have a very simple model with one parameter w, currently set to 1.0. Our loss function is the square error, and we're using basic gradient descent with a learning rate of 0.1.

Data points:

  1. x1 = 2, y1 = 4
  2. x2 = 3, y2 = 5

Batch size = 2 (both data points in one batch)

Step 1: Forward pass

  • For x1: prediction = w * x1 = 1.0 * 2 = 2
  • For x2: prediction = w * x2 = 1.0 * 3 = 3

Step 2: Calculate losses

  • Loss1 = (prediction1 - y1)^2 = (2 - 4)^2 = 4
  • Loss2 = (prediction2 - y2)^2 = (3 - 5)^2 = 4
  • Total batch loss = (Loss1 + Loss2) / 2 = (4 + 4) / 2 = 4

Step 3: Backward pass (calculate gradients)

  • Gradient1 = 2 * (prediction1 - y1) * x1 = 2 * (2 - 4) * 2 = -8
  • Gradient2 = 2 * (prediction2 - y2) * x2 = 2 * (3 - 5) * 3 = -12

Step 4: Accumulate gradients

  • Total gradient = (Gradient1 + Gradient2) / 2 = (-8 + -12) / 2 = -10

Step 5: Update weight (once for the batch)

  • New w = old w - learning_rate * total gradient
  • New w = 1.0 - 0.1 * (-10) = 2.0

So, after processing this batch of 2 data points:

  • We calculated 2 individual gradients (-8 and -12)
  • We accumulated these into one total gradient (-10)
  • We performed one weight update, changing w from 1.0 to 2.0

This process would then repeat for the next batch. In this case, we've processed all our data, so this completes one epoch.

No comments:

Post a Comment