Info request: how does Perceptilabs handle gradient issues?

I just had a model outside PL explode, which was unexpected because exactly the same model had been run before and with exactly the same data, but it shows that model behaviour can be initialisation dependent.

So, since I’m now looking at the use of clipnorm/clipvalue etc. (for those unfamiliar with these, they are kwargs on the abstract Optimizer class, and therefore common to all the keras optimisers) I wondered how PL handles gradient explosion.

Does it apply gradient clipping in any way or is random gradient explosion just something the user should be aware of? Or are there other mitigations applied?

Great question @JulianSMoore!
We don’t do any gradient clipping right now, so for now you just need to be aware that gradient explosion is something that could happen and you may want to have an eye at the gradient charts just to make sure it starts without issues.

1 Like