A CUDA Environment Runbook for 0.11.8

So, I have a runbook for anyone else who wants to know a precise sequence of steps for installation etc. that result in a viable, GPU-using TensorFlow, Perceptilabs & Jupyterlab Python environment, starting with Anaconda - because only Anaconda provides an easy way to setup the CUDA toolkit - and cuDNN - in an environment.

  • Note that the channel for cuDNN is conda-forge, but for the cudatoolkit it is anaconda
  • Python 3.7.10 doesn’t exist at Anaconda, so the latest available 3.7 is used
  • Mixing conda and pip isn’t ideal so we do only the absolute minimum with conda, and then use pip exclusively
  • conda and pip use “==” and “=” respectively for version specificaitons

Code…

conda create --name envname python=3.7.9
conda activate envname
pip install --upgrade pip setuptools
conda install -c anaconda cudatoolkit=10.0
conda install -c conda-forge cudnn=7.6.5.32 
pip install tensorflow-gpu==1.15.0
pip install perceptilabs-gpu
pip install jupyterlab
pip install -f https://github.com/Kojoley/atari-py/releases atari_py
pip install gym[atari]

If you mess it up, start again after removing the environment

conda remove --name envname --all

I have attached the live notes of the build, complete with the pip & conda lists so you can compare your results with mine.

Just for the record I note that the result of all this is…

  • No cuDNN initialisation failures, no missing dtypes errors, no CUDA Out Of Memory errors
  • Therefore no need for tensorflow option tweaks to workaround such errors
  • Perceptilabs 0.11.8 humming along with the Textile model, and GPU is maxed out
  • It only took a whole day, but now you should be able to do this in ~10 minutes

My build is recorded in various posts in the forum; I’m not going to repeat it here.

Building a fresh perceptilabs & TF1.15 environment.txt (17.0 KB)

4 Likes

Thank you very much for this @JulianSMoore, this is amazingly helpful! :pray:

1 Like

This version has been discontinued. Replace with
pip install perceptilabs

3 Likes

@JulianSMoore

Hi Julian,

I found this thread by search. Are we still using these versions for PL 0.13?

conda install -c anaconda cudatoolkit=10.0
conda install -c conda-forge cudnn=7.6.5.32

Hi @JWalker

I can’t remember exactly why - maybe a TF version change - but I needed to upgrade the cuda stuff…

based on what I am now using successfully (Windows, PY 3.8.10, PL 0.13.1, nVidia 1080 Ti) those lines should probably become:

conda install -c anaconda cudatoolkit=11.2.2
conda install -c conda-forge cudnn=8.1.0.77

Hope that helps.

(NB I am using GeForce driver 466.27, which is a bit out of date, the latest seems to be 496.76, but if these cuda 11 drivers are good for that old April 2021 release, they should also be good for the latest - I think)

When I type

conda install -c anaconda cudatoolkit=11.2.2

I get

PackagesNotFoundError: The following packages are not available from current channels:

  • cudatoolkit=11.2.2

I am trying

conda install -c esri cudatoolkit

Which ought to give me v11.2.0 instead.

Hmm… seems I also had to change the channel (from anaconda to conda-forge) - my setup batch file includes these lines

call conda install -y -q -c conda-forge cudatoolkit=11.2.2
call conda install -y -q -c conda-forge cudnn=8.1.0.77

I have no idea why channel content varies like this :expressionless:

Thank you. Those have worked perfectly. Do I also need to install tensorflow-gpu? I tried v1.15 and it gave me another error telling me that it couldn’t find it.

@JulianSMoore

Hi Julian,

So whilst those installed perfectly, my model now crashes.

I created a new environment in python and installed perceptilabs and cuda using the following:

conda create -n myenv python=3.8
conda activate myenv
pip install perceptilabs

As administrator!
call conda install -y -q -c conda-forge cudatoolkit=11.2.2
call conda install -y -q -c conda-forge cudnn=8.1.0.77

The I received the following error in PL when I tried to run a model.

Error during training!

Traceback (most recent call last):
  File "perceptilabs\coreInterface.py", line 32, in perceptilabs.coreInterface.TrainingSessionInterface.run_stepwise
  File "perceptilabs\coreInterface.py", line 33, in perceptilabs.coreInterface.TrainingSessionInterface.run_stepwise
  File "perceptilabs\coreInterface.py", line 52, in _main_loop
  File "perceptilabs\trainer\base.py", line 174, in run_stepwise
  File "perceptilabs\trainer\base.py", line 282, in _loop_over_dataset
  File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\function.py", line 3023, in __call__
    return graph_function._call_flat(
  File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\function.py", line 1960, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\function.py", line 591, in call
    outputs = execute.execute(
  File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[64,3,256,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node training_model/deep_learning_conv__convolution_1_keras/batch_normalization/FusedBatchNormV3 (defined at <rendered-code: 1 [DeepLearningConv]>:29) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[Identity_13/_6]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted:  OOM when allocating tensor with shape[64,3,256,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node training_model/deep_learning_conv__convolution_1_keras/batch_normalization/FusedBatchNormV3 (defined at <rendered-code: 1 [DeepLearningConv]>:29) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored. [Op:__inference__work_on_batch_3679]

Errors may have originated from an input operation.
Input Source operations connected to node training_model/deep_learning_conv__convolution_1_keras/batch_normalization/FusedBatchNormV3:
 training_model/deep_learning_conv__convolution_1_keras/depthwise_conv2d/BiasAdd (defined at <rendered-code: 1 [DeepLearningConv]>:25)

Input Source operations connected to node training_model/deep_learning_conv__convolution_1_keras/batch_normalization/FusedBatchNormV3:
 training_model/deep_learning_conv__convolution_1_keras/depthwise_conv2d/BiasAdd (defined at <rendered-code: 1 [DeepLearningConv]>:25)

Function call stack:
_work_on_batch -> _work_on_batch

Hi @JWalker,
It looks like you are running into an Out Of Memory (OOM) issue on your GPU, it could be that your CPU is more apt for running a large model than your current GPU :slight_smile:

If you try with another smaller model, does it work then?

Hi @robertl

Yes, if I reduce the resolution of the input pictures, then the model runs, but I am not sure if it is using the gpu properly (see hardware use in the picture below).

I also installed tensorflow-gpu following my earlier conversation with Julian on Slack using this command:

pip install -q --no-input tensorflow-gpu==2.5.0 tensorflow-probability tensorflow-addons tensorflow-gan

image

That is very little GPU usage for sure :thinking:
The GPU usage typically increases when the model is larger or if you are running with more batches, but if you are hitting a memory limit at that point it seems suspect that it doesn’t max out just before that.

Glad we got thew cuda stuff sorted! That’s progress.

I wish I knew why you had to run the conda install with admin privileges - something about user/base environment?? Anway - I shall add that info to the other post - thanks!

I was going to say that I thought there was only one tensorflow package these days but I would have been wrong: here’s TF GPU 2.7.0 so, yes, it seems that I was right and that it should indeed be (if you have GPU!)

pip install -q --no-input tensorflow-gpu==2.5.0

NB If you install PerceptiLabs first it will bring in tensorflow; I must confess I have never checked that it brought in CPU & GPU but I assumed it does :wink:

Thanks @robertl and @JulianSMoore

It is working properly. I increased the resolution of the imported images in the data wizard to 512 x 256 and I have evidence that the gpu is invoked (see below). Unfortunately, 1024 x 512 with a batch size of 64 is just a little too much for my vram of 4gb.

This leads to the question of whether the upcoming APUs with DDR5 will be better choice than cuda because you can use larger amounts of memory?

Also, if I want to run PL without the gpu, is there is a command line option to force it off? Or do I create a new environment in Anaconda without cuda installed?

image

TL; DR

Here is what you need to do to get cuda working.

conda create -n myenv python=3.8
conda activate myenv
pip install perceptilabs

As administrator!

call conda install -y -q -c conda-forge cudatoolkit=11.2.2
call conda install -y -q -c conda-forge cudnn=8.1.0.77
pip install -q --no-input tensorflow-gpu==2.5.0
1 Like

I have already created a feature request to choose between CPU/GPU in the system on Canny

Feel free to upvote it :slight_smile:

By the way, do you have any interesting “APU” links for info?

Not really, only that new APUs are coming and the new ddr5 looks very fast (and expensive)

AMD are making developments in the ML sphere.

https://medium.com/swlh/how-to-use-amd-gpus-for-machine-learning-on-windows-96ace916e97

Oh yes, the AMD APU (as opposed to Arithmetic Processing Unit or various other wrong things I thought of!).

Thanks for the Medium link - worth noting that it is about using plaidML to drive AMD on Windows (just so that the word plaidML gets indexed here :wink: )

image

(Is it just me or is that a penguin with a duck-bill and Burberry, somehow signifying a transplant from Linux of APU support??)

But will DDR5 be more expensive than an nVidia card? :expressionless:

1 Like