From Live Coding: Normalizing target values

I got a great question from two YouTube viewers “Dakota merrival” and “VCC1316” (hope you see this as I can’t seem to find your emails :sweat_smile:).
They asked about normalizing target values, whether it’s a good idea to do and how PerceptiLabs handles normalizing - specifically if we ensure that the normalization only is computed based on the training data but still applied to the validation and testing data.

For the first question I found two good links here:
https://stats.stackexchange.com/questions/111467/is-it-necessary-to-scale-the-target-value-in-addition-to-scaling-features-for-re
https://datascience.stackexchange.com/questions/35603/it-is-helpful-to-normalize-target-variables-for-a-regression-neural-network
To summarize it, scaling (not normalizing) the target values can be a good idea in case the values are so high that you are in risk of overflowing the gradients, but won’t have much effect besides that.
Normalizing the target values on the other hand can be damaging as you are changing your target datas distribution.

For the second question, I checked with our devs and we do indeed only calculate it based on the training data. We also automatically include it in the exported models pipeline so that it will behave the same if placed in production.

Hope that answers your questions and feel free to come with follow-ups! :slight_smile:

That’s useful :slight_smile: And very nice to remind people that statistical normalisation means shift the mean to zero and the standard deviation to 1, which is very much not what simple scaling is about (although sometimes we do speak of normalising data to the range [0,1] - for example, RBG(A) 0-255 values are sometimes scaled like that.)

Mr Ontologue’s ill-thought out question (I can say that because it’s me) was: doesn’t NOT scaling the target favour the higher value end of the regression through the mean squared error (MSE) loss, and the answer is, of course, that scaling on its own has no biasing effect via MSE…

That said, one does have to watch out for errors that are a function of the target magnitude - so called heteroscedastic effects (see. e.g. the 1st google hit here, where the top end errors are bigger.)

Then things get trickier…

1 Like