NaNs when running RNNs/GRUs

I have run this twice and it has given me a NaN. The first time I thought it was an overclock issue, but I turned that off and it happened again. Is this a bug?

Here is the model, the first RNN is a simple RNN and the second is a GRU.

Both RNNs are using ReLU and the first has had the code altered to set return_sequences=True.

Is there anything else that you need?

Hi James

I started a general comment before noticing that the issue was specifically “NaN, infinity or too large” for float32 on the input but I’ll leave the general info for future reference in case anyone else searches for NaN, infinity.

If a model is unstable, variations in the random initialisations of weights etc. can make it runaway via gradient explosion, which is what the “squashing” activations help avoid, but ReLU is linear on the positive side, so models using ReLU can need extra attention.

But, here we don’t even get to backprop so that doesn’t apply :frowning: confirm this dataset has been used OK in other models? It looks as though the objectionable value(s) is in the target… what’s the max/min in the dataset?

Thanks Julian,

It is the same dataset that I have been using for all of my other tests. What is specific to this model, is that it is an RNN into a GRU. If I build a model with a GRU -> GRU I don’t have this problem. Right now I am running 3 levels of GRU and having no problems.

1 Like