Continue training

Suppose I have a viable model, is it/will it be possible to

  • Resume training from data saved to disk (i.e. potentially in another session, on another machine…)
  • Do more training, i.e. if I had specified 10 epochs and training has completed, continue training from epoch 11 as though 10+n had initially been specified but training stopped/paused

Use case: suppose (following the hyperparameters question in Issues/Feedback) I have trained a model in several ways with different hyperparameters and have finally identified the best way to train…

I’ve already invested time & energy (literally, running the processor) and don’t want to re-run the best candidate just for the sake of reproducing a state I already have.

Serious question ends…

Bonus Question for the Theoretically Inclined: can trained parameters from different runs be combined in any way? Thinking: I’ve run, say, 10 epochs of the same model with different randomisation… to what extent (thinks, hand-waving appeal to Central Limit Theorem?) could the trained parameters be considered population samples that could be combined by e.g. taking means?

Superbonus Question: how could one apply genetic algorithms to combine parameters from different runs. Would one have to be able to “slice” the parameters by processing dependency (“top” of image through convolution etc. chain) or could one just consider insert-other-model-values in place of simple dropout. Literally copy values for dropout from the same location in another run.

Hi @JulianSMoore,
It is possible to resume the training. Whenever you Stop the training or the training finishes, tensorflow checkpoints will be created.
The next time you press Run on that model, you will see a question pop up saying something along the lines of “Run with weights?”.
If you press Yes to that question, you will resume training from where you last left off. If you press no it will start the training from scratch.

Bonus answer:
This is difficult to do in PerceptiLabs right now, but there are two ways of doing that.
You can either use ensemble learning or (as you mentioned) average the weights of the models.
Ensemble learning is the method of having your input go through multiple trained models and then you average the answer from all those models.
If the architectures are different, then ensemble learning will work better for you. If the architectures are the same, then averaging the weights may prove to be better, as the full model pipeline will be smaller (1 model instead of multiple).

Superbonus answer:
This is a really fun topic and there are a lot of different ways to do GA or NAS on Deep Learning.
Generally, you will have a population of a certain amount with random values (within certain limits). You then train this population and pick the top x% best performing. After that, you re-populate the now empty spots by going through all the models you picked from the old population and combining properties at random.
The random combination of properties can either be an average, or it can be that you for example pick the dropout value from model A and the # of neurons value from model B. Which properties which you pick from which model is then what is random in that approach.