Now that I am beginning to pay attention to more technicalities - weights, biases, activation functions, etc. - more questions arise.

Consider this toy model of an encoder-decoder for mnist digits

Gradients are viewable during training by selecting the component and the gradient’s tab, and even though gradients must be considered as “into” each component there’s no real ambiguity. But consider this more complex model

If I now inspect the gradients on Convolution_3, what would I see and how would I interpret, use it? Gradient info must be flowing into Convolution_3 from Convolution_4 and Merge_2… would I see them separately? If not, of what value is gradient info “on” this component?

Alternatively, since gradients flow along the links/edges of the graph and there is only one link between each pair of nodes (components), would it make more sense to show gradients when a link is selected, leaving only weights and biases (?other things) to be displayer per component?