Must we manually specify the number of classes for one hot encoders? The preview is showing a good count but the #classes seems to default to 10?
At the moment you need to manually specify the number of classes for one hot encoders.
However, we are improving how the pre-processing meta data is saved so we might be able to automatically recommend it soon.
The issue is that to recommend it, we would need to go through the entire dataset to find the number of unique values that exists, which takes quite a while if your dataset is large, so we don’t want to do that every time.
Really? I just tried it on my full dataset of 2.4million rows as a pandas dataframe with df[‘type’].nunique() - result was instantaneous (the dataset I shared was only 40k rows)
Could take longer if not using pandas I guess…
Hmm, it might have to do with us reading it in lazily, but even then it should be fast if those 2.4m fit into memory
I’ll have to check with the devs and see what they say, thanks for the info on how it ran for you!
The PL environment contains dask, doesn’t that take care of it?