Some interesting discussion in here
To start off with, @JacoMoolman, really sorry about the issue you encountered!
We have it tracked and looking to fix it, although we have some other features and bug fixes in the pipeline so it may drop in a bit later.
By the time you are back it will be sorted (based on your “in a couple of months” timeline).
One solution we are looking at for the large CSV files (besides fixing the UI) is providing the ability to convert/concatenate columns into the Array datatype, be that either in the Data Wizard UI or in the CSV file itself.
We also have some thought on allowing you to create a new CSV based on existing CSVs @birdstream, although keeping all in synch is going to be important there as @JulianSMoore mentioned, which we have some thoughts on as well, but that’s a bit down the line. It could look something like:
- Existing loaded datasets can’t be modified from within PL, but instead combined to create new ones
- The new ones can be downloaded
- Automatically synch existing datasets based on what happens with the source, but update the “data version” when it happens
Very early thoughts though as you can see.
Then it’s an interesting point with the problem being correct or not. Large tabular datasets actually rarely use Deep Learning, but rather classic ML like clustering, SVMs, regression, etc. as it’s easier, pretty well performant and provides smaller more responsive models. Hence our initial focus on the Computer Vision domain
@robertl if there were some way for people to upvote bugs so their names were on it, everyone affected/benefiting from a new release could be notified - > feature request??
Haha, I feel like you are setting me up for presenting Canny
I’ve started playing around with using Canny for publicly tracking and voting on bugs, with the given benefit that anyone who votes on it gets notified when there is any update to the bug.
I’ll make an official announcement to it as soon as all the bugs are in there and some more internal stuff is sorted (like enabling logging in with your forum account, if that proves possible), but here is how this bug would look like in Canny: https://perceptilabs.canny.io/bug-reports/p/data-wizard-does-not-work-well-for-large-amount-of-columns
Feel free to browse the features and roadmaps we have in there as well