What you need for TF2? CUDA details etc. please

Continuing the weekend theme of entertaining feedback with movie/vid references… A blast from the past)

You can’t always get what you want
But if you try sometime you find
You get what you need

So. In anticipation of an upcoming release using TF2, please could you share a specific CUDA build?

And could you check that it is possible to build it in an Anaconda environment using conda install (as I did previously for TF1.15)?

And if you can’t get exactly what you wanted, i.e. conda doesn’t have the specific version(s) you would propose by default, is there a conda supported build that you can test and then recommend?

i.e. given the runbook for CUDA & TF1.15, what are the precise CUDA related replacements (cuda toolkit, cudnn, TF versions & channels) to deliver a guaranteed working environment?

Many thanks in advance!

And if I may ask @robertl, what version of Python will be supported/required for the TF2 release?

I’m going to start building based on

  • The Assumption of TF 2.4 (probably 2.4.1)
  • limiting CUDA to only versions available with conda

Which means:

  • CUDA Toolkit 11.0.221 (pretty sure TF2 requires CUDA 11)
  • cuDNN 8.1.0.77 from conda-forge channel (but could also use 8.0.5.39 from conda-forge)

But: TF2.4 could be 3.6-3.8 - will PL be using 3.8?

Tracking and providing info for the benefit of all… 1 build done… jupyterlab wouldn’t start - some issue with pywin32… reinstalled with pip install pywin32==225 and problem gone.

Haven’t tested TF2 yet…

Update 2021-04-07 08:35

I tried running a TF2 test nbotebook from the TensorFlow site here and received an issue with cuDNN… during attempted training

c:\users\julian\anaconda3\envs\tft2_4_env_cuda_11_py3_8\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57   try:
     58     ctx.ensure_initialized()
---> 59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:

UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node my_model/conv2d/Conv2D (defined at <ipython-input-4-1e051998210b>:10) ]] [Op:__inference_train_step_527]

Errors may have originated from an input operation.

Reminder of installation details

# packages in environment at C:\Users\Julian\anaconda3\envs\TFT2_4_ENV_CUDA_11_PY3_8:
#
# Name                    Version                   Build  Channel
tensorflow-estimator      2.4.0                    pypi_0    pypi
tensorflow-gpu            2.4.1                    pypi_0    pypi
# packages in environment at C:\Users\Julian\anaconda3\envs\TFT2_4_ENV_CUDA_11_PY3_8:
#
# Name                    Version                   Build  Channel
cudatoolkit               11.0.221             h74a9793_0    anaconda
# packages in environment at C:\Users\Julian\anaconda3\envs\TFT2_4_ENV_CUDA_11_PY3_8:
#
# Name                    Version                   Build  Channel
cudnn                     8.1.0.77             h3e0f4f4_0    conda-forge

Will try a different cuDNN version

cuDNN issue resolved by

  • removing cuDNN 8.1.0.77 (conda remove cuDNN)
  • installing cuDNN 8.0.5.39 (conda install -c conda-forge cudnn=8.0.5.39)

Test notebook now trains! GPU use is strong.

Results:

Epoch 1, Loss: 0.001776, Accuracy: 99.9500, Test Loss: 0.130018, Test Accuracy: 98.4100, Timing: 3.4877 seconds/epoch
Epoch 2, Loss: 0.000761, Accuracy: 99.9783, Test Loss: 0.135070, Test Accuracy: 98.4000, Timing: 3.4305 seconds/epoch
Epoch 3, Loss: 0.001070, Accuracy: 99.9667, Test Loss: 0.159565, Test Accuracy: 98.4500, Timing: 3.4037 seconds/epoch
Epoch 4, Loss: 0.001924, Accuracy: 99.9583, Test Loss: 0.148481, Test Accuracy: 98.4200, Timing: 3.4481 seconds/epoch
Epoch 5, Loss: 0.000880, Accuracy: 99.9750, Test Loss: 0.171329, Test Accuracy: 98.4700, Timing: 3.5546 seconds/epoch

The individual epochs are clearly visible in the CUDA usage

TF2_training_perf

UPDATE - nVidia DLL summary

NVIDIA files in C:\Users\Julian\anaconda3\envs\TFT2_4_ENV_CUDA_11_PY3_8\Library\bin
['Name', 'Company', 'Version']
cublas64_11.dll, NVIDIA Corporation, 6.14.11.1120
cublasLt64_11.dll, NVIDIA Corporation, 6.14.11.1120
cudart64_110.dll, NVIDIA Corporation, 6.14.11.11000
cudnn64_8.dll, NVIDIA Corporation, 6.14.11.6050
cudnn_adv_infer64_8.dll, NVIDIA Corporation, 6.14.11.11000
cudnn_adv_train64_8.dll, NVIDIA Corporation, 6.14.11.11000
cudnn_cnn_infer64_8.dll, NVIDIA Corporation, 6.14.11.11000
cudnn_cnn_train64_8.dll, NVIDIA Corporation, 6.14.11.11000
cudnn_ops_infer64_8.dll, NVIDIA Corporation, 6.14.11.11000
cudnn_ops_train64_8.dll, NVIDIA Corporation, 6.14.11.11000
cufft64_10.dll, NVIDIA Corporation, 6.14.11.1021
cufftw64_10.dll, NVIDIA Corporation, 6.14.11.1021
curand64_10.dll, NVIDIA Corporation, 6.14.11.1021
cusolver64_10.dll, NVIDIA Corporation, 6.14.11.1060
cusolverMg64_10.dll, NVIDIA Corporation, 6.14.11.1060
cusparse64_11.dll, NVIDIA Corporation, 6.14.11.1111
nppc64_11.dll, NVIDIA Corporation, 6.14.11.1110
nppial64_11.dll, NVIDIA Corporation, 6.14.11.1110
nppicc64_11.dll, NVIDIA Corporation, 6.14.11.1110
nppidei64_11.dll, NVIDIA Corporation, 6.14.11.1110
nppif64_11.dll, NVIDIA Corporation, 6.14.11.1110
nppig64_11.dll, NVIDIA Corporation, 6.14.11.1110
nppim64_11.dll, NVIDIA Corporation, 6.14.11.1110
nppist64_11.dll, NVIDIA Corporation, 6.14.11.1110
nppisu64_11.dll, NVIDIA Corporation, 6.14.11.1110
nppitc64_11.dll, NVIDIA Corporation, 6.14.11.1110
npps64_11.dll, NVIDIA Corporation, 6.14.11.1110
nvblas64_11.dll, NVIDIA Corporation, 6.14.11.1120
nvjpeg64_11.dll, NVIDIA Corporation, 6.14.11.1111
nvrtc64_110_0.dll, NVIDIA Corporation, 6.14.11.9000
nvvm64_33_0.dll, NVIDIA Corporation, 6.14.11.9000