I’m afraid of ending up in dependency hell if i try downgrading stuff (already been there ) but i was thinking of maybe just writing a small program that just trains the model without the debug stuff. I just need to learn some pandas (to process the .csv) and how to feed my model properly with training data I guess
Dependency hell indeed! You need to be able to back out but it sounds as though that might not be easy for you.
Would I be right in thinking you’re not using a python environment system like Anaconda?
I used to have OS installed CUDA - three in fact- and it all got very messy. So, I now use Anaconda (it’s overweight for me now, I should probably switch to miniconda) and with an Anconda environment - and in that I use CONDA INSTALL to the cuda toolkit and cuDNN ONLY (after Python of course) and then PIP INSTALL everything else.
I created a run book for this on windows (search for runbook in the forums) if that would be of any use to you (i.e. you can adapt to Linux, which I don’t know) in building another expendable/non-interfering environment with GPU support.
I actually use Anaconda…
It seems like what nvidia-smi reports is the CUDA-runtime, not toolkit version. CUDA runtime is bundled in the graphics drivers, right? Im on 470.57.02
So now i’ve:
conda install cudatoolkit=11.2.2 (was 11.3.something before)
and cudnn was not installed at all in the env
conda install cudnn=8.2.1
(perc) joakim@joakim-All-Series:~/anaconda3/envs/perc/lib$ conda list cudnn # packages in environment at /home/joakim/anaconda3/envs/perc: # # Name Version Build Channel cudnn 126.96.36.199 h86fa8c9_0 conda-forge (perc) joakim@joakim-All-Series:~/anaconda3/envs/perc/lib$ conda list cudatoolkit # packages in environment at /home/joakim/anaconda3/envs/perc: # # Name Version Build Channel cudatoolkit 11.2.2 he111cf0_8 conda-forge
That should be alright now?
@birdstream I think we’re making progress here. Lack of cuDNN in the env is unlikely to have been helpful!
Your card is a GTX 980; that’s Maxwell architecture and according to this cuDNN page your toolkit, cuDNN, architecture combination looks good to my (non-expert) eyes.
Your setup seems more likely to work now, but I can’t say much more than you just have to try it out now . I’m hoping there will be good news soon!
FWIW my setup (based on 1080 Ti; driver 466.27… I should update! to 471.68) is
Name Version Build Channel cudatoolkit 11.2.2 h933977f_8 conda-forge cudnn 188.8.131.52 h3e0f4f4_0 conda-forge
and that works for both TF 2.5 in Python directly (via Jupyter) and for TF 2.5 and PL (0.12.18 - just about to update)
Well i have tried it now and although it still seems to slow down, the app ismore responsive now. However, sometimes i cant stop training for some reason clicking stop, it does say “stopped” but if i go back to the statistics page it’s still running
That stop issue you are seeing I think is something on our end (we recently rebuilt some of the internal communication system). We have a ticket for it but if you happen to come across any situation where the bug happens more or less often it will be very helpful info
Ah okay, yes i did the update, so i’ll keep an eye out. I’m mostly doing the same convultional nets, but there are other types of model i wanna try so
@Birdstream - it would be cool to hear a bit more about what nets you’re trying and why: are you just re-doing standard architectures or are you experimenting? With some ideas/principles in mind or just to see what happens? There’s still so much “art” in all this I think there’s still great potential for unexpected discoveries to come out of left field - stuff that PL makes easier and more fun than just coding it up!
Right now im just dealing with the usual architectures but yeah, I’m the “lets try this, and see what happens!”-guy I find neural networks really interesting and want to learn about then. PL is a fantastic tool that abstracts the boring coding, which is perfect for me because i’m really not that good at it. I did however manage to get my super resolution net to “work” with bigger images than it was trained on by simply slice the input image and feed them to the model and stitching them back afterwards. It looks like crap, though… but hey, it’s a start anyway
My job is not ML or computer related at all, I’m working with visual quality inspection of enginge parts. And i know there is a lot of room for ML in this field
So what do you do? I recall you mentioned something about astronomy?
Hey @birdstream - that’s cool! I can code and I don’t mind it, but I hate debugging it I’m not an astronomer, I’m a business consultant - though I do have a physics background and one of my friends is a cosmologist. It was something he said (about not really believing someone else’s results) that made me think the redshift estimation problem could be interesting - it certainly taught me a lot about understanding overfitting (at least, for regressions) - I did most of it directly in Python but am porting it back to PL for the deeper training insights I can get without diving even deeper into TF!
I think @robertl has already done a live coding example of piston inspection that shows how easy it is to get started, but I guess that in a production environment you have very specific quality control criteria (false positive/false negative rates)… I think (haven’t looked myself) there’s increasing support for metrics (F1, etc.) and I expect the guys would be interested to hear what you would need to take ML into production. (Hmmm… are you thinking about resolution enhancement before e.g. crack detection, maybe with a bit of forward skip for the best of both worlds?)
All that having been said, my deep (and very long-standing) interest is in general AI and I have an idea that I want to try out using multi-head attention and transformers - until then it will be simpler convnets, dense, etc.
Hope you’re planning to stick around here: the more diversity of people and applications the better from my POV