As a loadable extension (with Rust)

Hey @rclement ! Sorry for the delay, but here's a continuation of  the discussions from https://github.com/rclement/datasette-ml/issues/3 

## Supporting queries in `sqlite-loadable-rs`

I created [a new issue in `sqlite-loadable-rs`](https://github.com/asg017/sqlite-loadable-rs/issues/9) to track adding querying support to that library. That way we can make queries like `CREATE TABLE` / `INSERT INTO` in the extension itself, which I know is a blocker for this work. Not sure if I'll have an ETA soon, but once that's in, it should unblock us here to make a proper loadable extension

## In Pure Rust, no Python?

The `native-ext` branch uses PyO3 for ML algorithms, which will be great to get started, but bundling Python in an extension can be tricky. I have the [sqlite-python](https://github.com/asg017/sqlite-python/blob/main/docs.md) project that lets you define loadable SQLite extensions with Python, which can be useful here, but can come with problems:

- It'll assume the user already has a pre-configured Python environment with the right packages installed
- It can be unstable when switching between different Python version
- Python <-> Rust isn't very fun in general

There's the [linfa](https://github.com/rust-ml/linfa) project that could help us move to a pure-Rust extension. It's the more complete scikit-like Rust crate I can find, so we can use those algorithms in `sqlite-ml` to remove the Python dependency. 

I played around with it a bit and it seems pretty advanced, most of the models seem to support serializing to a byte array (so we can persist a trained model across connections). It may not have 100% of the algorithms that scikit has, but probably enough for this?

## Defining the SQL API

I've been thinking about a few different ways to express `sqlite-ml` operations in pure SQL, using eponymous virtual tables, table functions, and regular scalar functions. Here are some of my thoughts, but definitely not complete:

### `ml_experiments` and friends

I think all these tables can be shadow tables that are read-only to users, since I dont think users will ever need to insert/update rows in these directly:

- ml_experiments
- ml_runs
- ml_models
- ml_metrics
- ml_deployments


### `ml_train`

`ml_train` can be an eponymous virtual table that users can `INSERT` into, to create new experiments/models.

```sql
insert into ml_train 
    values (
      'Iris prediction',  -- name of experiemnt
      'classification',  -- prediction type
      'logistic_regression',  -- algorithm
      'ml_datasets.iris',  -- source data. can be a table/view name, optional schema
      'target' -- target column
    );

```

### `ml_predict`

A table function that takes in a JSON array of values and predicts the target column:

```sql
select 
  iris.*, 
  prediction.prediction
from ml_datasets.iris as iris
from ml_predict(
  'Iris prediction', 
  json_array( 
    iris.sepal_length, 
    iris.sepal_width, 
    iris.petal_length, 
    iris.petal_width
  )
) from prediction;
```

### `ml_load_dataset`

If we wanted to just inline those default datasets into the extension, we could have eponymous virtual tables for each one like so:

```
select * from ml_datasets_iris;
select * from ml_datasets_breast_cancer;
select * from ml_datasets_diabetes;
```

Or if we don't want to bloat the size of the extension, we could offer a separate pre-built database file that people can attach themselves:

```sql
-- here ml_datasets_path() can return the path of the pre-built SQLite database, or create one if it doesnt exust
attach database ml_datasets_path() as ml_datasets;

select * from ml_datasets.iris;
select * from ml_datasets.breast_cancer;
select * from ml_datasets.diabetes;
```

--- 

Again, some very loose thoughts and notes, feel free to ask about anything!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

As a loadable extension (with Rust) #1

Supporting queries in `sqlite-loadable-rs`

In Pure Rust, no Python?

Defining the SQL API

`ml_experiments` and friends

`ml_train`

`ml_predict`

`ml_load_dataset`

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

As a loadable extension (with Rust) #1

Description

Supporting queries in sqlite-loadable-rs

In Pure Rust, no Python?

Defining the SQL API

ml_experiments and friends

ml_train

ml_predict

ml_load_dataset

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Supporting queries in `sqlite-loadable-rs`

`ml_experiments` and friends

`ml_train`

`ml_predict`

`ml_load_dataset`