Unusual bug: model now WORKS?

Keep reading… it makes sense eventually (by which I mean the true nature of the issue is revealed; this is issue reporting story-telling)

Yes, my photometric-z estimation model that seemed irrevocably borked (BritSpeak for Broken) is now a happy model! The last thing I did after various failed edits to Merge_2_1 code was to put in a print statement with bad syntax… take it out… reset the component (per the workaround described recently) and then this.

The bug is… no idea :wink: Why is it suddenly OK (hint: it’s not!) when no amount of messing around with it before could resolve any issue - most specifically the mismatched tensor rank!

and the answer is…

The component reset on merge _2_1 turned concatenation back to addition. I don’t think component reset should do that sort of change… it should only remove custom code I thought.

However this is where it finally makes (non)sense in a confusing way: why do I get a tensor rank error on Concat - where I don’t think there should be one - but suddenly everything is OK on Addition - where the input tensors are most definitely of different sizes and AFAICT not addable at all. (NB multiplication, subtraction and division are also good when I don’t think they should be…)

The preview shows ~18 category-like bars on the merge for a +/-* op, but one input has 14 floats and the other provides 18 categories… is TF padding the 14 somehow so that pointwise ops work??

I’m really looking forward to the answer to this one!

UPDATE Attempting to run the model with merge as ADD (with -v=3 option to PL) resulted in this message:

ValueError: Dimensions must be equal, but are 448 and 18 for '{{node training_model/math_merge__merge_2_1_keras/add}} = AddV2[T=DT_FLOAT]

so in fact TF doesn’t like this, but it isn’t picked up before running. I have no idea where the “448” could be coming from.

Ohhhh! Now I do :slight_smile: 448 = 14 (the # float inputs) x 32 = the current batch size

So the merge is failing to concatenate because one or other of the inputs doesn’t have a batch dimension?

UPDATE 2, 25 Aug, 08:26 UK Lazy me didn’t read the rest of the TF error message: it is

ValueError: Dimensions must be equal, but are 448 and 18 for ‘{{node training_model/math_merge__merge_2_1_keras/add}} = AddV2[T=DT_FLOAT](training_model/math_merge__scalar__concat_keras/concat, training_model/math_merge__categorical__concat_keras/concat)’ with input shapes: [448], [32,18]

so the batch is on the 18 categorical variables but somehow the batch is not a separate dimension for the 14 numeric variables. That is sure to help clarify things.

2 Likes

Hmm this is very interesting, so the Merge layer you call “Scalar_Concat” is merging on the batch dimension instead, which is why we only see a single float as output instead of 14 floats.

I added the information to the ticket, thanks for the deep dive!

(Side note: The “reset component” button works as intended, as the thought was that it should reset to the recommended settings, which is especially valuable in combination with autosettings, although we need to create more of those for it to be noticible)

1 Like

Glad to be of service with the details :slight_smile:

Re the reset: I did think that was probably intended but, in the specific case of merge a Concat might be error free but would not be with a true numeric operation if the tensors are of different rank, so resetting as is can cause and propagate an error needlessly.

Unfortunately I have no brilliant (or otherwise) suggestion as to how resets should work, since the component settings determine the code produced… but I will let you know if that changes :smiley:

1 Like