Skip to content

Default arguments for xarray_filters.datasets.make_* functions #17

@PeterDSteinberg

Description

@PeterDSteinberg

@gpfreitas This is related to issue #5 and #6 and tries to condense them into a TODO list.

Items to do related to the argument specs of make_* functions from xarray_filters.datasets:

  • Make MLDataset be the default return value rather than Dataset
  • Remove the requirement for the n_samples argument in this case: MLDataset(make_blobs(n_samples=2000, shape=(200,10))) where n_samples can be taken from shape
  • For functions that exist in dask_glm, e.g. make_classification, we should default to making a MLDataset as in the xarray_filters.datasets so far, but use dask_glm's funcs for a dask.array in each DataArray rather than sklearn.datasets numpy based approach.
    • Provide a use_dask_glm=True keyword to control whether the functions in dask_glm.datasets are used.
  • Change the sequence of acceptable strings for astype to the following (or equivalent way of specifying the data structures below as the output type):
    ( 'pandas.dataframe','dask.array', 'dask.dataframe', 'numpy.ndarray', ,'dataset', 'mldataset')
  • xnames should be layers
  • docstring edits - See below: This is current docstring for make_blobs from xarray_filters - I think it needs more of the docs from the transformation part explained, e.g. that it typically outputs N-D DataArrays in an MLDataset or any differences between sklearn and xarray_filters like n_samples versus shape:
In [3]: ?make_blobs
Signature: make_blobs(n_samples=100, n_features=2, centers=3, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None, *, astype='dataset', **kwargs)
Docstring:
Like sklearn.datasets.samples_generator.make_blobs, but with added functionality.

Parameters
---------------------
Same parameters/arguments as sklearn.datasets.samples_generator.make_blobs, in addition to the following
keyword-only arguments:

astype: str
    One of ('array', 'dataframe', 'dataset', 'mldataset') or None to return an NpXyTransformer. See documentation
    of NpXyTransformer.astype.

**kwargs: dict
    Optional arguments that depend on astype. See documentation of
    NpXyTransformer.astype.

Note - where I said dask_glm above - also look at dask-ml

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions