Let me explain how a specific normalized feature value is calculated using one concrete example.
Let's take the feature "GroupSize" which has:
- Min value: -0.045121
- Max value: 103.032967
These values are post-normalization, but we can work backwards to understand how they were calculated.
The Normalization Formula
The normalization function you're using is:
normalized_features = (features - mean) / std
Where:
features
are the original, raw valuesmean
is the average of all values for that feature in the training setstd
is the standard deviation of all values for that feature in the training set
Working Through An Example
Let's say we have these raw values for GroupSize in the training set:
- Raw values: [0, 0, 0, 0, 0, 1, 1, 1, 32, 64]
First, we calculate the mean:
- Mean = (0+0+0+0+0+1+1+1+32+64)/10 = 9.9
Then we calculate the standard deviation:
- Each deviation: [-9.9, -9.9, -9.9, -9.9, -9.9, -8.9, -8.9, -8.9, 22.1, 54.1]
- Squared deviations: [98.01, 98.01, 98.01, 98.01, 98.01, 79.21, 79.21, 79.21, 488.41, 2926.81]
- Average squared deviation: 4143.9/10 = 414.39
- Standard deviation = √414.39 ≈ 20.36
Now, we can normalize each value:
- For 0: (0 - 9.9) / 20.36 = -0.486
- For 1: (1 - 9.9) / 20.36 = -0.437
- For 32: (32 - 9.9) / 20.36 = 1.085
- For 64: (64 - 9.9) / 20.36 = 2.657
Explaining the Min/Max Values
Going back to your data:
- The min value for GroupSize (-0.045121) represents the original minimum value in your dataset after normalization
- The max value (103.032967) represents the original maximum value after normalization
For GroupSize, this extreme range suggests:
- Your raw data has a wide range of values
- The high maximum suggests outliers that are far from the mean, creating a highly skewed distribution
- The standard deviation is relatively small compared to the maximum value
Concrete Calculation
If we assume the mean of raw GroupSize is ฮผ and standard deviation is ฯ, then:
- Minimum normalized value: (min_raw - ฮผ) / ฯ = -0.045121
- Maximum normalized value: (max_raw - ฮผ) / ฯ = 103.032967
This tells us that your maximum raw value is over 103 standard deviations away from the mean, which is extremely far! This confirms that your raw data has a heavily skewed distribution with significant outliers.
The fact that most normalized values for GroupSize are close to the minimum (-0.045121) suggests that the most common value is slightly below the mean, while a few extreme outliers are pulling the mean upward.
This type of skewed distribution is exactly why techniques like masking and autoencoder approaches are beneficial - they can help the model learn robust representations even with such extreme distributions.