CPU only again (and still // for /) (0.11.11)

Once upon a time re-running a model would cause memory to saturate.

Now it seems that after 1 run PL wishes instead to spare the GPU from further exertions and so only uses CPU…

Whereas on run 1 CPU was (on this graph) ~20%, on run 2 it shot up immediately to almost 100% - here it is at 95%, elsewhere I was seeing, 98%, 99%. I had to stop this run (obviously).

Which was sad because I had just edited the code on the training node to use “/checkpoint” rather than “//checkpoint” and was going to walk away for 40 mins. (I hope the model saved was viable…)

UPDATE model was NOT viable after restarting server and reopening. This is a problem - I can’t tell what’s wrong… almost have to rebuild the entire thing every time I want to run something.

UPDATE 2 Even Ctrl-c the server and restarting would not persuade PL to use the GPU again. I checked with Jupyter notebook code that I could train on the GPU after 2 or 3 more failed attempts in PL and it ran just fine. (The only things I haven’ tried yet are exiting the anaconda prompt window and/or rebooting the machine…)

It is also effectively impossible to save the textile model and restore it: there is always a problem with the local data components… I need to disconnect them from other components, disconnect them from their data sources, reconnect them to the data sources, relink the components. Every. Time.

Ctrl+F5 does not help. Running in Incognito window does not help. Clearing the cache does not help.

I am in danger of becoming ungenerous in my reporting… I have reached my frustration limit and am walking away for the rest of the day. Hopefully, tomorrow I shall be full of the joys of spring again.

I would like to thing that I am stupidly overlooking something I should be doing… I wonder what it might be.

Any ideas? (on that or the reported issues :slight_smile: )

Hi @JulianSMoore,
For the saving and loading issues as well as the data components, we’ll look into those asap. Sorry for the frustration it caused!

For the CPU and GPU utilization, this will improve a lot in the next update. It’s a bi-effect of the current architecture, which is being changed.

Will update as soon as I have some better answers.

Great :+1: Happy to help with retesting later.