Curious: why use numpy.random.randomstate for Data...Random

I’m beginning to dig through code working out what you do, why you do it etc. so that for custom code I can do compatible things…

Looking at the Data…Random element…

Docs say that is legacy to produce random guaranteed to be the same as numpy 1.16.

Why do you need that?

(I can see you use a different randomstate argument value for each of training, validation, test - which makes sense - but is randomstate necessary for this?)

Just asking, no big deal…

That’s an excellent question :slight_smile:

The purpose of random state is to synchronize two different random number generators. In this case, we use them for randomly selecting a sample from the dataset.

In our case, we load inputs and labels via two different data components. Since there is a a one-to-one correspondence between each input example and each label example, choosing randomly without synchronization would lead to a mismatch between inputs and labels. So therefore, we use the random state to ensure that the same sample is selected for both components.

We’ll definitely take this into account so that it is more obvious how the different parts work. Perhaps better documentation, perhaps a simpler user experience, or something else. May I ask, what would be your preference?

Ah, data + label synchronisation… got it. Thanks.

How to communicate?

Compare and contrast how it works when done right and what goes wrong if you don’t do it right?

Formal documentation is wonderful if you already know 90% of the answer… if you only know 10% it’s not so good. (I forget where I got this but “In order to ask the right question you must already have a pretty good idea of what would be a good answer”; so the greater the ignorance the harder it is to know what to ask!)

I think the user experience should follow the conceptualisation, not be considered in isolation:

  • poor conceptualisation + simplified UX = confusion and inability to attend to relevant parts because the UX doesn’t align with a conceptualisation that may itself not be in “orthogonal terms”
  • good conceptualisation + possibly more complex UX that reflects it is fine because all the parts are comprehensible and attention goes naturally to parts of interest or difficulty
1 Like