Train/Val/Test Splits - is stratification included?

Does PL automatically ensure/maintain class proportions (i.e. balance) when doing its own data splits?

I believe scikit-learn(not doubt elsewhere too) has an option to stratify (I hand coded it in my own external development since the basics are relatively straightforward with pandas).

I think stratification is pretty easy with a single category (y/n classifications etc.) but it can get awkward with multiple categories since proportionality can be hard to maintain as the subsets become smaller.

Apart from PL’s own capabilities (now, planned) in this area, does anyone have any tips-n-tricks or other best practice guidance?

Hi @JulianSMoore,
We don’t have any smart division in PL as of yet, but it’s something we have on our radar :slight_smile:
Besides just keeping classes balanced, it would be interesting if there is some approach to keeping other data balanced as well, such as images or just numerical data.