V0.12 - Data centric training, validation, test sets

Mnist data sets from NIST provide training and test data separately.

How I can use the data wizard to specify distinct training/validation/test sets rather than have PerceptiLabs arbitrarily divide the entire dataset into parts for different purposes.

This is important because the mnist test data set provides digits written by writers whose output is not presented during training.

When a train/validation/test split is specified as x, y, z % I believe that PL allocates an item to a split probabilistically, i.e. one cannot say the 1st x% is training, the 2nd y% is validation and the final z% is for testing. Is that correct (i.e. one could not force mnist test data to be used for test by making it the last z% of a single data set)?

(Have I asked this before? Or was it just in conversation with someone…?)

Hi @JulianSMoore,
Good question, we will provide options down the line to load separate data for testing, and probably for validation as well.

PerceptiLabs only does the split randomly if this checkbox is pressed, otherwise it will do the split in order of the CSV file (so the 1st x% is training etc.):

Ah! x:y:z sequentially if not randomised… good! I did actually turn that off on my 1st attempt but turned it back on on case Off was the cause of the issue I was having (it wasn’t :wink: )