It would be very helpful to be able to train a model and design/edit a model at the same time.
Most of the PL examples to date use relatively small data sets or limited numbers of epochs; when the data set is large, more epochs or greater accuracy - or hyperparameter tuning is going on - an instance could be completely tied up for hours or days (*).
It would be very nice/becoming necessary for serious use, to be able to run multiple instances so that at the very least one could model and train at the same time.
Who knows, maybe TF2 and CUDA are clever enough that I could even train several models at once on a single GPU (it might not be the most efficient approach though!). At the very least I ought to be able to train 2 (1x CPU, 1x GPU) and edit at the same time…
(*) speaking of which - does checkpoint data allow interrupted training to be resumed? So that, if I started a training run that would take 7 days and after 6.5 days Windows crashed, could I restart and only have to train from the last checkpoint?
Checkpoint interval… something else for user prefs