Q to the community: Interpreting Fit - Losses, Validation, Dropout

What I love about PL is the clarity of thought one gets: model structure and behaviour (train/val/test) are front and centre, and the details (optimiser, learning rate, kernels, etc.) are accessible but not distracting.

What this then does is forces one to confront the difficulty of interpreting the visual feedback one gets in terms of evolving losses.

It is usually said that one can tell a model is overfitted when the validation/training loss is significantly (let’s not forget our confidence intervals) worse than the training loss, or when the validation loss starts to rise after having reached a minimum.

However, there are odd situations in which the validation etc. loss is LOWER than the training loss - this can happen when dropout is used to regularise, i.e. to prevent overfitting, because from what I have read dropout typically does not get applied during validation. (other situations: non-random validation/test datasets, for which there can be good justification)

So, my question is: what rules of thumb can we come up with for assessing model performance that take into account e.g. drop out. Could any of these be coded? For example, should the loss curves include the derivative, so it’s easier to see loss minima?

How do you think the tool could/should support the interpretation of the lovely graphs etc.?

Hi, drop-out is coded in the CNN and Dense components. You can enable it.

There is not a exact rule to improve, and normally you need to test different parameters and architectures to find the best.

I suggest to read a very good book from Andrew NG Machine Learning Yearning. You can download it from it’s website https://www.deeplearning.ai/programs/, in this book show different techniques and strategies to improve model performance.

@damilies Much appreciated & thx for the reminder about the drop-out controls.

I have always liked Ng’s approach (his type notation less so) but a quick search of his book didn’t really help, in fact it raised a new question!

In Ng’s book (footnote, p47) he says that if the “avoidable bias” - difference between the training error and optimal - is negative “you are doing better on the training set than the optimal error rate”.

But if optimal is zero error, how could one possibly “do better”? I think the answer is on p46 where he uses noisy speech as an example on which even humans can’t understand “14%”, i.e. the optimal error rate is 14% * so negative avoidable bias is only possible if the optimal error rate is > 0.

The other question is: are there no statistical tests one can use?

* But! we can expect ML to do better than humans sometimes… how does one know objectively what the “optimal error rate” is, he asked rhetorically.

I’m going to re-read this sections you mention. Thanks for your reply

1 Like