Changing model *Data*

I am generating some test data to look at the Double Descent phenomenon as a regression because in a regression I can control numbers very precisely in order to look at “generalisation” etc.

I need to push a model until it shows the right behaviour and one way I can do this is make it

  • harder to approximate as the function becomes more complex
  • harder to “memorise” because there are more training data points supplied

To do this, I thought I would set up a single model and use different CSV files - with exactly the same structure etc. just different numbers of rows - by copying/pasting to the same path

Now, PL has a pre-processing pipeline and I suspect I will need to do something each time I update the CSV…

What is the best way to proceed?

Hi @JulianSMoore,
Cool project!

I think this will be a lot easier when this feature is done:
It’s getting spec:ed out right now so I can soon give you a time estimate on when to expect it.

If you want to start this a bit earlier, the best way to do it would likely be to create all the different CSV files, load them in as separate datasets in PL and create one model for them each (with the right preprocessing). Then you can build out the model you want to have in one of them and copy paste the components over to the other models.

Hope that helps!

Thanks @robertl

That sounds do-able for a couple of datasets… I’ll give it a go!

And yes, that feature-to-be sounds good

However - and I hope to remember to add these to feature requests - the following would be nice variations that could also help

  • Allow user to specify start/end rows of data to be processed for training/validation/test
  • Even more flexible: allow user to specify training/validation/test blocks by individual row ranges

(I like suggesting features that don’t require big functional changes but do provide more user control/flexibility)

1 Like

Thanks! :slight_smile:
I created some features for them here: