What I love about PL is the clarity of thought one gets: model structure and behaviour (train/val/test) are front and centre, and the details (optimiser, learning rate, kernels, etc.) are accessible but not distracting.
What this then does is forces one to confront the difficulty of interpreting the visual feedback one gets in terms of evolving losses.
It is usually said that one can tell a model is overfitted when the validation/training loss is significantly (let’s not forget our confidence intervals) worse than the training loss, or when the validation loss starts to rise after having reached a minimum.
However, there are odd situations in which the validation etc. loss is LOWER than the training loss - this can happen when dropout is used to regularise, i.e. to prevent overfitting, because from what I have read dropout typically does not get applied during validation. (other situations: non-random validation/test datasets, for which there can be good justification)
So, my question is: what rules of thumb can we come up with for assessing model performance that take into account e.g. drop out. Could any of these be coded? For example, should the loss curves include the derivative, so it’s easier to see loss minima?
How do you think the tool could/should support the interpretation of the lovely graphs etc.?