ResNet18 from scratch

Prolegomena
The idea for a Projects section was inspired by the great discussion around @JWalker’s thread Increasing the number of images increases the loss and the thought that maybe it would be nice to have somewhere dedicated to open ended discussion, progress reports, etc.

I am always conscious of the fact that I use Q&A sites (e.g. stackexchange) and forums to get quick answers and prefer short threads on simple topics for that purpose - but I’m really liking the to-and-fro of ideas etc. we have here, so I wanted to have my cake and eat it, so to speak.

Anyway, that’s why I’m doing this here…

Intro

Thanks to the input from @birdstream in this thread, and the GitHub link to some code I’m ready to start building ResNet18.

Why?
The original motivation is to do what I said in the thread above - do some investigation of double descent using a model and dataset from a real paper so that I can compare results.

But I’ve now decided that doing this is going to help me think about translating the layer diagrams I see everywhere into PL or other models, get a better feel for structure, and practice component/block reuse in PL for bigger models - [added later] i.e. develop a workflow for larger PL projects.

Bigger models? I have some very grand ambitions - and ease of use is what drew me to PL in the first place. I can code, in a non-coder way, but code isn’t the priority for me - it’s what it does and I won’t want to be debugging stupid typos, wrong loop limits, etc. if I can avoid it!.

Setup

Windows 10 Home, 64-bit, build 21H2, PerceptiLabs 0.13.5; Python 3.8.10, TF2.5.0, GPU 1080 Ti, CUDA 11.2.2 & cuDNN 8.1.0.77 via conda-forge channel

The End

No, I’m just about to end this intro post and actually do some stuff now that I have added a new model to CIFAR-10 dataset I had already loaded up.

1st task - build ONE block and then think about how to efficiently copy/paste and update the #filters (or whether it’s just as easy to do it by hand)

i.e. turn this code from github into a set of PL components…

def make_basic_block_base(inputs, filter_num, stride=1):
    x = tf.keras.layers.Conv2D(filters=filter_num,
                                        kernel_size=(3, 3),
                                        strides=stride,
                                        kernel_initializer='he_normal',
                                        padding="same")(inputs)
    x = tf.keras.layers.BatchNormalization(axis=BN_AXIS)(x)
    x = tf.keras.layers.Conv2D(filters=filter_num,
                                        kernel_size=(3, 3),
                                        strides=1,
                                        kernel_initializer='he_normal',
                                        padding="same")(x)
    x = tf.keras.layers.BatchNormalization(axis=BN_AXIS)(x)

    shortcut = inputs
    if stride != 1:
        shortcut = tf.keras.layers.Conv2D(filters=filter_num,
                                            kernel_size=(1, 1),
                                            strides=stride,
                                            kernel_initializer='he_normal')(inputs)
        shortcut = tf.keras.layers.BatchNormalization(axis=BN_AXIS)(shortcut)

    x = tf.keras.layers.add([x, shortcut])
    x = tf.keras.layers.Activation('relu')(x)

    return x
3 Likes

1st CONV component

  • PL only accepts uniform stride; entering 3, 3 causes an error - not a problem here though
  • kernel_initializer is not controllable in settings panel and code shows it is ‘glorot_uniform’ - will need to edit the build function to use 'he_normal'
  • Don’t know what the code author has stride != 1 section for as I don’t see any use of it (yet)
  • padding = same
  • no dropout
  • batch norm on
  • no pooling

#features TBD when make_basic_block_base is called…

2nd CONV component

Had trouble with copy/paste, but after successfully copying and pasting a non-customised componet I was able to do it to Convolution_2 to get Convolution_2_1, but then got his on linking o/p of _2 to input of 2_1

Failed getting previews

Traceback (most recent call last):
  File "perceptilabs\models_interface.py", line 232, in perceptilabs.models_interface.ModelsInterface.get_previews
  File "perceptilabs\lwcore\utils.py", line 197, in perceptilabs.lwcore.utils.format_content
  File "perceptilabs\lwcore\utils.py", line 176, in perceptilabs.lwcore.utils._subsample_data
  File "perceptilabs\createDataObject.py", line 384, in perceptilabs.createDataObject.create_data_object
ValueError: max() arg is an empty sequence

But after that there are “0 problems” so, is it just a preview issue?

UPDATE

@robertl Suspected bug: copying/pasting customised component fails on linking…? I reset the component and re-customised the code to use the he_normal kernel-initializer and the preview is fine now. Now I have to remember to reset the other settings…

Damn… should have done the settings panel stuff first… can’t get at controls once code has been edited.

Will this be an issue internally (between components) on copying/pasting blocks or only at the output of the block?

OK, it looks like no activation on 1st CONV, and ReLU on 2nd…

But, how does this skip connection come in? Going away to think a bit!

ISSUE

I said earlier I would worry about the #features (filters argument in code) after building a block, because #features varies from block to block, but now I have realised that everything has to be done in code customisation if I start by setting kernel_initializer='he_normal'

Revised Strategy Do ALL build, link, settings via the UI and only then make final adjustments in code (even if they are repetitive… this could be a pain - and error-prone - with lots of layers.)

This keeps happening to me, too… and when it does, it pops up everytime i change something until everything is connected right :thinking: No need to explain how annoying this can be :sweat_smile: It does seem to be connected to one or some of the components that are auto-generated, because if i remove them (most notably merge, i think) it goes away :thinking: need to dig into this further to be sure, however

Hi Joakim

Do you still get the issue with merge? It’s recently had some dev love and seems to work for me now - at least I haven’t caught it out yet!

I’m normally quite irritable (and it shows, unfortunately!) but I haven’t been too badly affected and have a firm belief (good grief - I’m beginning to sound like the Chinese government) that the rapid evolution we’ve seen recently will take care of it soon. Apparently I can also be optimistic - despite the gloom of winter :smiley:

I am coming to the conclusion that this is a really good work-flow exercise :slight_smile:

1 Like

It does look like the problem is taken care of in the latest update! I haven’t caught it out yet, so fingers crossed :slight_smile: I do however have the problem that models keep dissapearing from deploy view after restarting PL now… Are you seeing the same issue?

Hey :wave:
Cool building ResNet from scratch! :slight_smile:

For the ValueError: max() arg is an empty sequence issue, we are hunting it down but it does seem to have something to do with the previews.

@robertl Suspected bug: copying/pasting customised component fails on linking…? I reset the component and re-customised the code to use the he_normal kernel-initializer and the preview is fine now. Now I have to remember to reset the other settings…
Damn… should have done the settings panel stuff first… can’t get at controls once code has been edited.
Will this be an issue internally (between components) on copying/pasting blocks or only at the output of the block?

Hmm, there is an issue where customized components can’t have their names changed since the class name then will differ from their actual name:


When I tested now it caused the copied to have the same preview as the original rather than being computed anew. You are likely encountering a variant of this issue.
Will add it as something we’ll take a look at.

Learn, do, teach :wink: - and maybe get some understanding en route!

there is an issue where customized components can’t have their names changed

Which is of course exactly what copy/paste does :slight_smile: hopefully you can just do a find.replace on the component name before/after to make the fix fairly straightforward!

Thanks for looking into it.

I am still unsure about how the skip connections are to be managed, so if you have any (brief - don’t want to tie you up) suggestions they would be most welcome.

When we built a small ResNet in the past we used the Merge component to create skip connections, would that work here? :slight_smile:

I was planning to use the Merge for the skip… thanks for the link - just the reminder I was hoping for :slight_smile:

1 Like

I’m following the github code

and realised that before the various blocks there is some setup, as follows

x = tf.keras.layers.ZeroPadding2D(padding=(3, 3), name='conv1_pad')(img_input)
x = tf.keras.layers.Conv2D(64, (7, 7),
                  strides=(2, 2),
                  padding='valid',
                  kernel_initializer='he_normal',
                  name='conv1')(x)
x = tf.keras.layers.BatchNormalization(axis=BN_AXIS, name='bn_conv1')(x)
x = tf.keras.layers.Activation('relu')(x)
x = tf.keras.layers.ZeroPadding2D(padding=(1, 1), name='pool1_pad')(x)
x = tf.keras.layers.MaxPooling2D((3, 3), strides=(2, 2))(x)

Questions (especially for @robertl)

The convolution, batchnorm, activation & maxpooling are all supported by the conv component, but…

  • What’s the best way to handle the zeropadding - initially and for the maxpooling? (The conv padding = 'valid' is fine :slight_smile: )
    • Zero padding on a convolution component can be Same or Valid - neither pad with zeros
      • NB these are “industry standard” but really, really poor and confusing labels: “same” means duplicate values as necessary to support the convolution, i.e. the added values are the same as their neighbours, but “valid” means no padding :roll_eyes: and the convolution is only performed within the array so that it remains valid - there are lots of Q&A on this, which just goes to show how unfortunate the labelling is, see e.g. here
    • Note - once maxpooling is turned on, padding for the maxpool is also configurable: what is maxpooling “area” and how does it related to the extent of zero padding
      image

NB it would be nice to be able to “pop-out” the settings as there are now more than fit comfortably in the accessible area and scrolling does work but it’s a bit frustrating here

Hey @JulianSMoore,
Cool that the ResNet project is progressing! :smiley:

I believe that “SAME” is with zero padding, at least to the sources I can find (here’s an example: https://wandb.ai/krishamehta/seo/reports/Difference-Between-SAME-and-VALID-Padding-in-TensorFlow--VmlldzoxODkwMzE)
Area and stride works the same for pooling as it does for a convolution, meaning that area is how many cells/pixels in all dimensions it will pool over to create a single new value and stride is how many cells/pixels it will jump between each value.
If the area is larger than 1 then “same” padding will start taking effect and one extra row/column of zeroes will be added per increase in area to cover the outmost coordinates in the image.

Hope that didn’t confuse more than it helped :sweat_smile:

As for the pop-out suggestion on settings, thanks! I’ll add it as something to look into :slight_smile:

Thanks Robert :slight_smile:

Yeah, “same” keeps the tensor size unchanged, adding additional edge values as necessary to allow the kernel to work on every element of the original tensor (can be more than one row/column depending on size & stride I think), “valid” shrinks the tensor because the kernel can’t work on the original tensors edges, so those elements don’t get values from convolution :slight_smile:

After further thought I think all of this can be done with the PL convolution component as follows…

I think what I was really trying to say was that the 1st two layers I quoted add padding explicitly before the conv, which does not add padding of it’s own and maybe they can be combined?

The kernel is 7x7 with stride 2 and the Conv2D has padding = 'valid', but I think this would be exactly the same as using no separate zero padding and setting padding = 'same' - and by the same reasoning the maxpooling can be handled directly in the same way with the PL settings of “same” for pooling padding.

Question: do you agree? :slight_smile: (i.e. the person who implemented the code I’m following could have been more efficient)

(There are formulae for padding etc. e.g. stats.stackexchange but I started to include Dilation - which isn’t in scope! - and confused myself :rofl:)

Hmm, if I follow it correctly it sounds correct that you can just use the standard TensorFlow Conv2D and maxpooling built-in padding rather than explicitly adding it on your own. Since it’s native in TF as well there is a good chance it’s more efficient than something custom built, although I haven’t dug into that one.

1 Like

A filesystem crawler and CSV builder for PL Datawizard use

I’ve noticed that because of the stride used on convolutions, the output tensors are shrinking and, because I started with tiny images for convenience, it seems conceivable they could actually reach zero size. So I want to use bigger images.

In fact I want to use “real” images with ResNet18 - those actually used for training ResNets by their original designers. However, the original ImageNet covers 1000 classes and is about 185GB in size. Not only is that a lot of data to store on a personal PC for just one project, it might take rather a long time to train a a single epoch on a PC GPU - and let’s not forget the pre-processing time.

So I found ImageNet100 (on Kaggle) that is only 100 classes and only ~16GB in size. It comprises four training folders (of 25 classes of ~1300 images each), a “validation” folder and a json file that maps category IDs to text labels (list of synonyms).

It looks like this in structure:

Unfortunately, there is no CSV file to drive PerceptiLabs with, so I wrote some code to do it and thought I would share it (see the attached Windows Jupyter notebook - watch out for path spec differences on Linux etc.)

Apart from anything else, not being a dev, every time I have to code something I have to dig up specifics, remind myself of forgotten details - and aim to learn a bit more (obscure limitations of r-strings!), so I thought sharing it would save others some of the effort while also perhaps providing some handy hints and tips (such as how to add a DataFrame column with dictionary lookup). The code is extensively commented to explain many things to my future self and to anyone else who’s interested :wink:

I created a “tvt” (training/validation/test) indicator so that one can also manipulate selections more easily in Excel without having to parse out the “val.X” folder from the image file path.

Note that “validation” in this dataset is, I think, “test”. There are 5k images for this and that is 3.7% of the 135k total. When setting up the datawizard splits, test should therefore be 3.7% and it’s up to me how to allocate the other 96.3% between training and validation. (I might add some code to create even smaller sets for test purposes - I could even adjust the proportion of test data that way too.)

The core code is in section 4.1. Eventually, it produces this pandas DataFrame

image

and uses DataFrame.to_csv to write the file out, which looks like this (note that the index column is included and labelled)

CSV Builder.ipynb (20.3 KB)

1 Like

TL;DR Setup & performance of pre-processing

For maximum speed of pre-processing, close any explorer window to the drive containing the dataset - and ideally have the dataset on SSD. If really desperate - and confident! - you could also turn off windows security for the duration.

Dear Diary…

With the CSV built (replacing ", " with " - ", as noted elsewhere to overcome the temporary PL CSV parsing error) we can now create a new project.

There are 135,000 rows and we know that 5,000 are the test values, so we set the split as 80:16:4 (NB PL does not accept non-integer %; I can’t see any reason for that since an integer % of the wrong number is going to give a non-integer value that needs to be rounded to an int, so why not float %? e.g. 5% of 2001 is 100.05)

I am ignoring the row-number and tvtIndicator as these are not part of the “data” per se, but also the “category”: by training on the categoryText, which will be one-hot encoded, I ensure that in the evaluation etc. my class labels mean something to the naked eye, though I expect that the length of the labels might create some issues for the UI (the longest is currently 87 chars - “great white shark - white shark - man-eater - man-eating shark - Carcharodon carcharias”)

There are a lot of images. -

  • Initializing pre-processing started at 09:16:30 ended at 09:51:30 (35 minutes) - unfortunately with an error :frowning:

    Couldn’t get model recommendations because the Kernel responded with an error

    Traceback (most recent call last):
    File “perceptilabs\models_interface.py”, line 264, in perceptilabs.models_interface.ModelsInterface.get_model_recommendation
    File “perceptilabs\automation\utils.py”, line 16, in perceptilabs.automation.utils.get_model_recommendation
    File “perceptilabs\automation\modelrecommender\base.py”, line 43, in perceptilabs.automation.modelrecommender.base.ModelRecommender.get_graph
    File “perceptilabs\automation\modelrecommender\base.py”, line 70, in perceptilabs.automation.modelrecommender.base.ModelRecommender.get_encoder_decoder_network
    File “perceptilabs\automation\modelrecommender\base.py”, line 124, in perceptilabs.automation.modelrecommender.base.ModelRecommender._add_decoder
    NotImplementedError: No decoder found for datatype ‘text’

PS Re the error: afterwards, the UI returns to the datawizard with all selections cleared; nothing can be done so I close the dialog. However the dataset has been added to the model hub, even though nothing can be done with it because preprocessing failed. Failed datasets shouldn’t be added, ideally, but it was not hard to unregister it, so not a huge deal.

Oddity: during initialisation of pre-processing, looking at resource usage (disk) I can see the process MsMpEng.exe accessing the dataset images - but, I can also see the same process accessing other images. Now, MsMpEng.exe is windows defender engine, and I guess that it is scanning every file as it opened, however I can’t think of any reason that other images should be opened by anything at this time.

It’s as though the pre-processing is ~ignoring the specific paths provided and doing an os.walk over the whole of the drive, which would be a) inefficient and b) not something the tool should do (privacy/courtesy - only access what is necessary/instructed) and not something that it actually does - that was just a hypothetical

Am I misreading the resource usage or is it really scanning the whole drive (I have lots of pictures there - it’s the backup drive for 15 years of personal photos)

ANSWER I am misreading: there were two obvious candidates for other things accessing images: Windows Media Player and Windows Explorer (with QTTabBar). Quitting WMP didn’t seem to affect things, but closing the explorer window removed all the non-dataset image scanning, so maybe Windows was just trying to build its thumbnail cache in advance of me clicking down to a photo folder on that drive?

Now that unnecessary Win defender activity has been stopped I can see that the necessary activity is running at about 15MB/sec (on a real disk) and that the rate per image is anything between 4KB and 1MB per second. Call it ~250KB/s on average and that means the system is scanning ~60 files per second and the initialisation of preprocessing is going to take ~40 minutes. Time for more coffee (we’re only 26 minutes in so far). (I didn’t get the coffee then - pre-processing errored at the end after 35 minutes… not bad for such a crude time estimate!)