The perceptilabs blog has a nice post on resnets here with a ready-made textile model example.
Here are the first few layers
The Merge (Addition) takes the sequential convolutions Convolution_1 & Convolution_2 and Convolution_1 as inputs - the Convolution_1 connection being the skip connection - to make a Residual Block.
Question The skip connection provides the “gradient highway” that mitigates the vanishing gradient problem. I have read that it makes sense for this to be an identity connection initially, so that gradients propagate to the deepest levels without major obstructions from the beginning, but as training progresses, the balance between the architected flow through additional layers and the gradient highway should shift.
This can be done by applying a simple weight matrix on the skip connection.
How would this be done best in perceptilabs? via single dense layer? If Convolution_1 is 32 x 32 x 64, how many neurons does the dense require? 1024 or 64k? Or is it MUCH more efficient to use a convolution - which seems to defeat the point of the skip connection (though the most efficient choice of a a genuinely 1x1 convolution could be useful - but how would that be specified?)
And finally, the residual block illustrated in the blog has a ReLU after the pointwise addition of the Merge; I don’t see that feature in the Textile mode - activation is available on Dense, Convolution, etc. but not on Merge.
Is it that the ReLU after Merge is unnecessary since it has been applied on both the preceding Convolutions that are inputs to that Merge? Are there circumstances in which a subsequent ReLU is necessary - and how then would that be done in perceptilabs (without adding the unnecessary computational overhead or variable weights of a layer type that includes activation functions)?
Finally - are there any options (existing or planned) to specify initial default weights (random - choice of distribution, 1st, 2nd moments, etc.; uniform, constant…)?