-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
@gpfreitas This is related to issue #5 and #6 and tries to condense them into a TODO list.
Items to do related to the argument specs of make_* functions from xarray_filters.datasets:
- Make
MLDatasetbe the default return value rather thanDataset - Remove the requirement for the
n_samplesargument in this case:MLDataset(make_blobs(n_samples=2000, shape=(200,10)))wheren_samplescan be taken fromshape - For functions that exist in
dask_glm, e.g. make_classification, we should default to making aMLDatasetas in thexarray_filters.datasetsso far, but usedask_glm's funcs for adask.arrayin eachDataArrayrather thansklearn.datasetsnumpybased approach.- Provide a
use_dask_glm=Truekeyword to control whether the functions indask_glm.datasetsare used.
- Provide a
- Change the sequence of acceptable strings for
astypeto the following (or equivalent way of specifying the data structures below as the output type):
( 'pandas.dataframe','dask.array', 'dask.dataframe', 'numpy.ndarray', ,'dataset', 'mldataset') -
xnamesshould belayers - docstring edits - See below: This is current docstring for
make_blobsfromxarray_filters- I think it needs more of the docs from the transformation part explained, e.g. that it typically outputs N-DDataArrays in anMLDatasetor any differences betweensklearnandxarray_filtersliken_samplesversusshape:
In [3]: ?make_blobs
Signature: make_blobs(n_samples=100, n_features=2, centers=3, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None, *, astype='dataset', **kwargs)
Docstring:
Like sklearn.datasets.samples_generator.make_blobs, but with added functionality.
Parameters
---------------------
Same parameters/arguments as sklearn.datasets.samples_generator.make_blobs, in addition to the following
keyword-only arguments:
astype: str
One of ('array', 'dataframe', 'dataset', 'mldataset') or None to return an NpXyTransformer. See documentation
of NpXyTransformer.astype.
**kwargs: dict
Optional arguments that depend on astype. See documentation of
NpXyTransformer.astype.
Note - where I said dask_glm above - also look at dask-ml
Metadata
Metadata
Assignees
Labels
No labels