This package trains, evaluates, and visualizes supervised models that predict fly odor responses from proboscis traces and engineered features.
pip install -e .The repository now ships with scripts/tune/optuna_mlp_tuning.py, a production-grade search
script that co-optimises PCA dimensionality and the
SampleWeightedMLPClassifier. The workflow enforces fly-level leakage guards
via GroupKFold, up-weights high-intensity class-5 responders, and evaluates
macro-F1 as the primary score.
python scripts/tune/optuna_mlp_tuning.py \
--data-csv data/all_envelope_rows_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv \
--n-trials 100 \
--timeout 7200 \
--output-dir outputs/optuna_results- Provide
--labels-csvwhen using the canonical wide export split across data and label tables. If labels are embedded in the features CSV, ensure areaction_strengthcolumn is present and omit--labels-csv. - Add
--features "AUC-During,global_max,..."to constrain the search to a curated subset of engineered scalars. The tuner validates every requested column, removes duplicates, and records the final selection alongside the saved hyperparameters. - Omit
--study-nameto let the script derive a study label from the selected model variant and engineered feature subset (for example,mlp_tuning_fp_optimized_mlp_7f_d3e41a2c). Provide an explicit name only when you intend to resume the exact same search space. The tuner persists the architecture and PCA component candidates as Optuna user attributes and will halt early with a clear error message if you try to resume a study whose search space no longer matches the requested configuration—without triggering the deprecatedsystem_attrwarnings emitted by older revisions. - Set
--modelto eithermlp(default) orfp_optimized_mlp. The latter multiplies responder samples by an additional{0: 1.0, 1: 2.0}class weight on top of the intensity-derived sample weights so the tuned network mirrors the false-positive minimising production variant. The deterministic baseline run now honours the same variant, ensuring the headline comparison reflects the precise architecture and class weighting you intend to deploy. - When
--model fp_optimized_mlpis selected, Optuna now samples only two-layer architectures. Baseline reporting and saved JSON payloads therefore always include exactly two hidden widths, and the CLI rejects single-layer configurations for this variant to keep deployment aligned with the production topology. - Whenever a feature subset leaves fewer usable columns than the requested PCA
dimensionality, the tuner, deterministic baseline, best-parameter replay, and
final retraining all clamp
n_componentsto the available feature count while enforcing the minimum viable dimension. This guard stops PCA from erroring out on engineered-only configurations with very few predictors. - Each trial prunes early when the interim macro-F1 under-performs the running Optuna median after two folds, keeping runtime within the two-hour budget.
- Sample weights default to 1.0 for non-responders and lower-intensity responses, with class-5 trials receiving a 5× multiplier during optimisation.
- The search space samples PCA dimensionalities from the admissible range of 3
to 64 (automatically truncated when a feature subset exposes fewer columns).
Mini-batch sizes and hidden-layer widths are now restricted to powers of two
–
(8, 16, 32, 64, 128, 256, 512, 1024)– to keep the tuned architecture aligned with production training runs. Saved JSON payloads therefore contain only those discrete values, and the CLI will validate any external payload against the same lists before training. - Provide
--best-params-json /path/to/best_params.jsonto skip optimisation and retrain/evaluate using a previously exported Optuna configuration. The JSON may contain either the raw Optuna trial parameters (architecture,h1,layer_config, etc.) or the normalised output written by this script.
The command writes all deliverables into --output-dir (defaults to
outputs/optuna_results/):
| File | Description |
|---|---|
optuna_study.db |
SQLite storage for resuming or auditing the study. |
optuna_trials.csv |
Tabular export of every trial with metrics and timings. |
optuna_history.html |
Interactive optimisation trace (Plotly). |
optuna_importances.html |
Hyperparameter importance plot emphasising PCA components. |
best_params.json |
Best configuration including architecture, optimiser settings, and the selected engineered features. |
best_<model>_model.joblib |
Retrained preprocessing + MLP pipeline for deployment. |
TUNING_REPORT.md |
Auto-generated summary comparing the tuned model with the baseline. |
The retrained pipeline includes the median imputer, scaler, PCA transform, and
the optimised neural network, allowing drop-in inference via
joblib.load(output_dir / "best_<model>_model.joblib"). Because .joblib files
are ignored by Git, they remain local run artefacts. When fp_optimized_mlp is
selected, the saved pipeline expects the responder-focused class weights to be
applied at fit time and therefore reflects the precision-oriented behaviour of
the production model.
After a successful Optuna run, you can rebuild and retrain the best pipeline without repeating the search:
python scripts/tune/optuna_mlp_tuning.py \
--data-csv data/all_envelope_rows_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv \
--best-params-json outputs/optuna_results/best_params.json \
--output-dir outputs/optuna_resultsThe script normalises the JSON payload, resolves the hidden-layer topology, and
trains the SampleWeightedMLPClassifier end to end with the saved hyperparameters.
All downstream artefacts (model, report, and parameter snapshot) are refreshed
to reflect the supplied configuration. When a feature subset was enforced, the
selected_features array persisted in best_params.json is honoured so the
re-evaluation mirrors the original search space. Any PCA dimensionality in the
payload that exceeds the reduced feature set is automatically clamped to keep
the reconstruction numerically valid.
Once best_params.json is available, the primary CLI can consume it directly so
you can train every supported model—MLP included—without rerunning Optuna:
flybehavior-response train \
--data-csv data/all_envelope_rows_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv \
--model all \
--best-params-json outputs/optuna_results/best_params.json \
--artifacts-dir artifactsProviding --best-params-json automatically enables PCA on the raw traces,
overrides --n-pcs with the tuned n_components, and instantiates the
SampleWeightedMLPClassifier with the Optuna-selected architecture, learning
rate, regularisation, and batch size. The generated config.json embedded in
each run directory now records the consolidated Optuna payload so downstream
evaluation jobs can trace exactly which hyperparameters were used. When the
payload enumerates a selected_features subset, the training command enforces
that exact list—even if a different --features string is supplied—so
retraining stays faithful to the search space. Missing columns now trigger a
hard failure with the full list of available engineered features to help you fix
typos before any models are saved.
-
Pin it as a dependency. In the consuming project (e.g.
Ramanlab-Auto-Data-Analysis), add the git URL to your dependency file so the environment always installs the latest revision of this project:# requirements.txt inside Ramanlab-Auto-Data-Analysis flybehavior-response @ git+https://github.com/colehanan1/FlyBehaviorScoring.gitPip normalizes hyphens and underscores, so
flybehavior-responseis the canonical project name exported bypyproject.toml. Older guidance that usedflypcaorflybehavior_responsewill fail with a metadata mismatch error because the installer pulls a distribution named differently from the requested requirement. Update the dependency string as shown above.With
pip>=22, this syntax works forrequirements.txt,pyproject.toml(PEP 621dependencies), andsetup.cfg.To confirm the dependency resolves correctly, install from git in a clean environment and inspect the resulting metadata:
python -m pip install "git+https://github.com/colehanan1/FlyBehaviorScoring.git#egg=flybehavior-response" python -m pip show flybehavior-responseThe
#egg=fragment is optional for modern pip but keeps older tooling happy when parsing the distribution name from the URL. -
Install together with the automation repo. Once the dependency is listed, a regular
pip install -r requirements.txt(orpip install -e .if the other repo itself is editable) pulls in this package exactly once—no manual reinstall inside each checkout is required. -
Call the CLI from jobs or notebooks. After installation, the
flybehavior-responseentry point is onPATH. Automation workflows can invoke it via shell scripts or Python:import subprocess subprocess.run( [ "flybehavior-response", "predict", "--data-csv", "/path/to/wide.csv", "--model-path", "/path/to/model_mlp.joblib", "--output-csv", "outputs/artifacts/predictions.csv", ], check=True, )
-
Stream raw geometry safely. Large frame-level exports no longer require loading the entire CSV into memory. Use the updated
preparesubcommand to stream in chunks, validate block continuity, and optionally persist a compressed parquet cache for subsequent runs:flybehavior-response \ prepare \ --data-csv data/geometry_frames.csv \ --labels-csv data/labels.csv \ --geom-columns "dataset,fly,fly_number,trial_type,trial_label,frame_idx,x,y" \ --geom-chunk-size 20000 \ --cache-parquet outputs/artifacts/geom_cache.parquet \ --aggregate-geometry \ --aggregate-stats mean,max \ --aggregate-format parquet \ --artifacts-dir artifactsThe stream honours the original column order, emits per-chunk diagnostics, and enforces uniqueness of
dataset/fly/fly_number/trial_type/trial_labelkeys across the optional labels CSV. Aggregation is optional; when enabled it produces a per-trial summary file alongside the cache. Choose between a compressed parquet (default, requirespyarroworfastparquet) and a portable CSV by passing--aggregate-format parquetor--aggregate-format csvrespectively. The same pipeline is available programmatically viaflybehavior_response.io.load_geom_framesandflybehavior_response.io.aggregate_trialsfor notebook workflows.Geometry exports that expose a different frame counter (for example a column named
frameinstead of the defaultframe_idx) are resolved automatically. The loader now detects the alternate header, validates contiguity against that column, and keeps the block integrity checks active without any additional flags.Only trials present in the labels CSV are streamed. Rows without labels are dropped up front so aggregation and caching operate on fully annotated data. To debug unexpected omissions, rerun
preparewith--keep-missing-labelsto surface a validation error listing the offending keys. -
Train directly from geometry frames. Provide
--geometry-framesto thetrain,eval, andpredictsubcommands to stream per-frame CSVs or parquet exports on the fly. Combine--geom-granularitywith the defaulttrialmode to materialise aggregated per-trial features or switch toframewhen frame-level rows are preferred. Aggregation honours the same--geom-statsoptions exposed byprepare, while--geom-normalizeapplies eitherzscoreorminmaxscaling before safely downcasting the values tofloat32so they align with the existing feature engineering pipeline. Thetraincommand now writes asplit_manifest.csvalongside the trained models describing the fly-levelGroupShuffleSplitassignment; pass--group-override noneto disable leakage guards when cross-fly isolation is not required. When the geometry stream includes raw coordinate columns namedeye_x,eye_y,prob_x, andprob_y, the loader assembles these intoeye_x_f*/eye_y_f*/prob_x_f*/prob_y_f*trace series for every trial. These traces mirror the format produced by the legacyprepare_rawworkflow, unlock PCA on raw motion signals without additional preprocessing, and remain aligned with the per-trial aggregation and leakage guards described above.
If you prefer to train on a curated feature panel instead of the entire
aggregate table, pass --geom-feature-columns to train, eval, or
predict. Supply a comma-separated list directly or reference a
newline-delimited file by prefixing the path with @:
flybehavior-response train \
--geometry-frames /path/to/geom_frames.csv \
--geometry-trials /path/to/geom_trial_summary.csv \
--labels-csv /path/to/labels.csv \
--geom-feature-columns @experiments/feature_subset.txt \
--model mlpThe loader validates the selection (for example r_before_mean or
metric_mean) and raises a schema error when any requested column is absent
so mistakes surface immediately. The resolved subset is also written to
config.json under geometry_feature_columns to keep the training
provenance auditable.
When a laboratory already maintains per-fly or per-trial statistics in a CSV, you can hand those features to the streaming loader with the new --geometry-trials flag. The file must contain one row per trial with the canonical identifier columns (dataset, fly, fly_number, trial_type, trial_label) plus the following engineered metrics so downstream models receive a consistent schema:
W_est_fly, H_est_fly, diag_est_fly, r_min_fly, r_max_fly, r_p01_fly, r_p99_fly, r_mean_fly, r_std_fly, n_frames, r_mean_trial, r_std_trial, r_max_trial, r95_trial, dx_mean_abs, dy_mean_abs, r_pct_robust_fly_max, r_pct_robust_fly_mean, r_before_mean, r_before_std, r_during_mean, r_during_std, r_during_minus_before_mean, cos_theta_during_mean, sin_theta_during_mean, direction_consistency, frac_high_ext_during, rise_speed.
During load_geometry_dataset the summaries are merged with the streamed aggregates before normalisation and downcasting, so every new numeric column participates in the same scaling pipeline. If a column is present in both the streamed aggregates and the external summary, the loader keeps the streamed value and warns about mismatches so accidental drift is visible. --geometry-trials is only valid when --geometry-frames is provided at trial granularity.
Example training command:
flybehavior-response train \
--geometry-frames /path/to/geom_frames.csv \
--geometry-trials /path/to/geom_trial_summary.csv \
--labels-csv /path/to/labels.csv \
--model logregThe run configuration now records the trial-summary path alongside the geometry frames so provenance remains auditable.
Provide a labels CSV containing the canonical user_score_odor column; values
greater than zero are automatically coerced to a binary responder target during
train/eval/predict so no manual preprocessing step is required.
Rows missing labels are dropped from the geometry stream by default to keep the
aggregation consistent—rerun with --keep-missing-labels if you want to audit
which trials were skipped.
The geometry enrichment step now emits additional, behaviourally grounded columns so downstream analyses no longer have to reconstruct stimulus epochs or basic response summaries manually. Each frame row in the enriched CSV includes:
| Column | Definition | Why it matters |
|---|---|---|
is_before |
Binary mask marking frames in the baseline (pre-odor) window. | Lets downstream code isolate baseline behaviour without re-deriving stimulus timing, which keeps reproducibility intact across labs. |
is_during |
Binary mask marking frames during odor stimulation. | Ensures frame-level filters target the causal window that determines the responder label. |
is_after |
Binary mask marking frames in the post-odor window. | Allows post-hoc inspection without contaminating training features that should focus on the odor epoch. |
r_before_mean |
Mean proboscis extension (percentage of the fly’s robust range) computed across baseline frames. | Captures resting proboscis position; elevated values indicate partial extension even before odor onset. |
r_before_std |
Standard deviation of proboscis extension during baseline. | Measures baseline “fidgeting.” High variance reveals spontaneous motion that can masquerade as responses. |
r_during_mean |
Mean extension percentage while odor is on. | Quantifies the sustained response amplitude during stimulation. |
r_during_std |
Standard deviation of extension during odor. | Summarises modulation depth; large swings reflect oscillatory probing, while small values indicate a rigid hold. |
r_during_minus_before_mean |
r_during_mean - r_before_mean. |
Expresses the odor-triggered change in the fly’s own units. Positive values are odor-locked proboscis extensions; zero or negative values show absence or suppression. |
cos_theta_during_mean |
Mean cosine of the proboscis direction vector (normalised dx/dy) during odor. |
Encodes whether the proboscis points forward, downward, or laterally—key for separating feeding-like probes from grooming. |
sin_theta_during_mean |
Mean sine of the proboscis direction vector during odor. | Complements cos_theta_during_mean so the full direction is available in head-centred coordinates. |
direction_consistency |
Length of the mean direction vector, computed as sqrt(cos_theta_during_mean**2 + sin_theta_during_mean**2). |
Scores directional stability: values near 1.0 mean deliberate probes, while lower values flag chaotic motion unrelated to odor. |
frac_high_ext_during |
Fraction of odor-period frames where r_pct_robust_fly exceeds 75 % of that fly’s robust range. Range: [0, 1]. |
Captures how long the proboscis stayed highly extended; separates quick flicks from sustained acceptance-like behaviour. |
rise_speed |
Initial slope of extension at odor onset: (mean extension in the first second of odor − r_before_mean) / 1 s expressed as percentage per second. |
Measures how quickly the response ramps. Fast rises are characteristic of true stimulus-driven reactions. |
The geometry loader populates these columns automatically whenever the labels table supplies odor_on_idx and odor_off_idx values alongside the raw proboscis coordinates. The enrichment runs during load_geom_frames and all CLI entry points that consume geometry inputs, so downstream scripts and notebooks receive consistent epoch flags and responder summaries without additional preprocessing.
If the odor timing columns are absent, the loader still succeeds and emits the
responder summary columns, but their values fall back to NaN and the
rise_speed metric remains undefined for those trials. Supplying the odor
indices is therefore strongly recommended whenever the experiment design makes
them available.
These columns make the per-frame CSV directly usable for training binary classifiers that decide whether a fly responded to the odor in a given trial. Follow this procedure when preparing data for a multilayer perceptron (MLP) or another lightweight model:
- Obtain the human-annotated (or rule-derived) trial labels where
Responder = 1denotes a clear odor response andResponder = 0denotes no response. - For each trial, collapse the per-frame enrichment into a single feature
vector by extracting exactly these ten scalar summaries:
[r_before_mean, r_before_std, r_during_mean, r_during_std, r_during_minus_before_mean, cos_theta_during_mean, sin_theta_during_mean, direction_consistency, frac_high_ext_during, rise_speed]. - Assemble a training table with one row per trial and join the responder labels as the target column.
- Train the MLP (or another binary classifier) on this 10-dimensional input to predict the responder label.
This feature set is intentionally compact, biologically interpretable, and
normalised per fly (all extension metrics operate on r_pct_robust_fly which
uses the fly’s own r_p01_fly/r_p99_fly range). It avoids dependence on
camera geometry or trial identifiers, and it limits the inputs to pre-odor and
during-odor information so the model answers the causal question: did the odor
move the proboscis away from baseline?
Avoid feeding raw per-frame series, file or fly identifiers, camera scaling
fields (W_est_fly, H_est_fly), or any post-odor aggregates into the
first-round classifier. Those inputs inject nuisance variation, leak
non-causal structure, and encourage overfitting on small datasets.
Remember that the enriched CSV is an intermediate artefact designed for reuse
across pipelines. Build the actual training matrix by selecting one row per
trial and projecting down to the summary columns listed above before invoking
flybehavior-response train or a custom scikit-learn script.
-
Regenerate the geometry cache without touching disk by using
--dry-runtogether with--cache-parquet; the CLI will validate inputs and report chunk-level statistics without writing artifacts. If the optional parquet engines are unavailable, switch to--aggregate-format csvfor downstream smoke tests. -
Validate the new pipeline locally. Run the focused pytest targets to confirm schema handling, cache behaviour, and aggregation parity:
PYTHONPATH=src pytest src/flybehavior_response/tests/test_response_io.py -k geom
-
Import the building blocks directly. When you need finer control than the CLI offers, import the core helpers:
from flybehavior_response.evaluate import load_pipeline pipeline = load_pipeline("/path/to/model_mlp.joblib") # df is a pandas DataFrame shaped like the merged training data predictions = pipeline.predict(df)
The
flybehavior_response.io.load_and_mergehelper mirrors the CLI’s CSV merging logic so scheduled jobs can stay fully programmatic. -
Match the NumPy major version with saved artifacts. Models trained with NumPy 1.x store their random state differently from NumPy 2.x. Loading those joblib files inside an environment that already upgraded to NumPy 2.x raises:
ValueError: state is not a legacy MT19937 stateInstall
numpy<2.0(already enforced by this package’s dependency pins) or rebuild the model artifact under the newer stack before invokingflybehavior-response predictinside automation repos. If you previously added asitecustomize.pyshim to coerce the MT19937 payload, remove it—the shim now runs even though NumPy is downgraded and corrupts the state with the following error:TypeError: unhashable type: 'dict'Delete or update the shim so it gracefully handles dictionary payloads. With NumPy 1.x the extra hook is unnecessary, and the loader will succeed without further tweaks. If the shim keeps calling into NumPy, but returns a class object instead of the literal string
"MT19937", the loader fails with:ValueError: <class 'numpy.random._mt19937.MT19937'> is not a known BitGenerator module.Update the shim so it returns
"MT19937"when NumPy requests a bit generator by name, or guard the entire file behind anumpy>=2check. With NumPy 1.x the extra hook is unnecessary, and the loader will succeed without further tweaks. If other tools in the same environment still require the compatibility layer, replace the file with a guarded variant that short-circuits on NumPy < 2.0 and normalises dictionary payloads safely:"""Runtime compatibility shims for external tools invoked by the pipeline.""" from __future__ import annotations import importlib from typing import Any import numpy as np def _normalise_mt19937_state(state: Any, target_name: str) -> Any: try: np_major = int(np.__version__.split(".")[0]) except Exception: np_major = 0 if np_major < 2: return state if isinstance(state, dict): payload = state.get("state") or state if isinstance(payload, dict) and {"key", "pos"}.issubset(payload): return { "bit_generator": target_name, "state": { "key": np.asarray(payload["key"], dtype=np.uint32), "pos": int(payload["pos"]), }, } return state def _install_numpy_joblib_shims() -> None: try: np_pickle = importlib.import_module("numpy.random._pickle") except ModuleNotFoundError: return original_ctor = getattr(np_pickle, "__bit_generator_ctor", None) if original_ctor is None: return class _CompatMT19937(np.random.MT19937): def __setstate__(self, state: Any) -> None: # type: ignore[override] super().__setstate__(_normalise_mt19937_state(state, type(self).__name__)) mapping = getattr(np_pickle, "BitGenerators", None) if isinstance(mapping, dict): mapping["MT19937"] = _CompatMT19937 def _compat_ctor(bit_generator: Any = "MT19937") -> Any: return original_ctor("MT19937") np_pickle.__bit_generator_ctor = _compat_ctor _install_numpy_joblib_shims()
This template preserves the original behaviour when NumPy 2.x is present, yet becomes a no-op under NumPy 1.x so your pipeline no longer crashes when loading FlyBehaviorScoring artifacts.
Follow these steps when you need a distributable artifact instead of an editable install or git reference:
- Create a clean environment and install the build backend once:
python -m pip install --upgrade pip build twine
- Produce both wheel and source distributions:
The artifacts land under
python -m build
dist/(for example,dist/flybehavior-response-0.1.0-py3-none-any.whl). - Upload to an index (test or production) with Twine:
Replace the repository URL or credentials as needed (
twine upload dist/*--repository testpypi).
Once published, downstream projects can depend on the released version instead of a git SHA:
flybehavior-response==0.1.0
If you only need automation machines to consume the latest commit, prefer the git dependency shown earlier—publishing is optional.
You do not have to cut a wheel to exercise the package from a private repo. Git-based installs work as long as the repository exposes a valid pyproject.toml (which this project does). Pick the option that matches your workflow:
-
Pin the main branch head for fast iteration:
flybehavior-response @ git+https://github.com/colehanan1/FlyBehaviorScoring.git -
Lock to a tag or commit for reproducible automation:
flybehavior-response @ git+https://github.com/colehanan1/FlyBehaviorScoring.git@v0.1.0 # or flybehavior-response @ git+https://github.com/colehanan1/FlyBehaviorScoring.git@<commit-sha> -
Reference a subdirectory if you reorganize the repo later (pip needs the leading
src/layout path):flybehavior-response @ git+https://github.com/colehanan1/FlyBehaviorScoring.git#subdirectory=.The
src/layout is already wired intopyproject.toml, so no extra flags are necessary today. Keep the#subdirectoryfragment in mind if you move the project under a monorepo path.
Regardless of which selector you use, pip show flybehavior-response should list the install location under the environment’s site-packages directory. If it does not, check that your requirements file matches the casing and punctuation above and that you do not have an older flypca editable install overshadowing it on sys.path.
After installation, the flybehavior-response command becomes available. Common arguments:
--data-csv: Wide proboscis trace CSV.--labels-csv: Labels CSV withuser_score_odorscores (0 = no response, 1-5 = increasing response strength).--features: Comma-separated engineered feature list (default:AUC-During,TimeToPeak-During,Peak-Value). Every entry must match a column in the merged dataset. The trainer now aborts when a requested feature is missing instead of silently reverting to the full feature set, keeping curated subsets intact.--raw-series: Prioritize the default raw coordinate prefixes (eye/proboscis channels).--no-raw: Drop all trace columns so only engineered features feed the models.--include-auc-before: AddsAUC-Beforeto the feature set.--use-raw-pca/--no-use-raw-pca: Toggle raw trace PCA (default enabled).--n-pcs: Number of PCA components (default 5).--model:lda,logreg,mlp,fp_optimized_mlp,both, orall(defaultall).--logreg-solver: Logistic regression solver (lbfgs,liblinear,saga; defaultlbfgs).--logreg-max-iter: Iteration cap for logistic regression (default1000; increase if convergence warnings appear).--cv: Stratified folds for cross-validation (default 0 for none).--artifacts-dir: Root directory for outputs (default./artifacts).--plots-dir: Plot directory (default./outputs/artifacts/plots).--seed: Random seed (default 42).--dry-run: Validate pipeline without saving artifacts.--verbose: Enable DEBUG logging.--fly,--fly-number,--trial-label/--testing-trial(predict only): Filter predictions to a single trial.
| Command | Purpose |
|---|---|
prepare |
Validate inputs, report class balance and intensity distribution, write merged parquet. |
train |
Fit preprocessing + models, compute metrics, save joblib/config/metrics. |
eval |
Reload saved models and recompute metrics on merged data. |
viz |
Generate PC scatter, LDA score histogram, and ROC curve (if available). |
predict |
Score a merged CSV with a saved model and write predictions. |
flybehavior-response prepare --data-csv data/all_envelope_rows_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv
flybehavior-response train --data-csv data/all_envelope_rows_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv --model all --n-pcs 2
flybehavior-response eval --data-csv data/all_envelope_rows_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv
# explicitly evaluate a past run directory
flybehavior-response eval --data-csv data/all_envelope_rows_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv --run-dir outputs/artifacts/2025-10-14T22-56-37Z
flybehavior-response viz --data-csv data/all_envelope_rows_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv --plots-dir outputs/artifacts/plots
flybehavior-response predict --data-csv merged.csv --model-path outputs/artifacts/<run>/model_logreg.joblib \
--output-csv outputs/artifacts/predictions.csv
# score a specific fly/trial tuple in the original envelope export
flybehavior-response predict --data-csv data/all_envelope_rows_wide.csv \
--model-path outputs/artifacts/<run>/model_logreg.joblib --fly september_09_fly_3 --fly-number 3 --trial-label t2 \
--output-csv outputs/artifacts/predictions_envelope_t2.csvThese arguments are available for prepare, train, eval, viz, and predict:
| Argument | Type | Default | Description |
|---|---|---|---|
--data-csv |
Path | - | Path to wide data CSV with engineered features and/or trace columns |
--labels-csv |
Path | - | Path to labels CSV (required for train/eval, optional for prepare) |
--features |
String | AUC-During,TimeToPeak-During,Peak-Value |
Comma-separated list of engineered features to use in training. See supported features list below. |
--series-prefixes |
String | dir_val_ |
Comma-separated list of column prefixes for time-series traces (e.g., dir_val_,my_custom_). Overrides defaults. |
--raw-series |
Flag | False | Use default raw coordinate prefixes (eye_x_f, eye_y_f, prob_x_f, prob_y_f). Useful when starting from prepare-raw outputs. |
--no-raw |
Flag | False | Drop all trace columns; use only engineered features for training. |
--include-auc-before |
Flag | False | Automatically add AUC-Before to the feature list in addition to --features. |
--use-raw-pca |
Flag | True | Enable PCA dimensionality reduction on trace columns (default: enabled). |
--no-use-raw-pca |
Flag | False | Explicitly disable PCA on trace columns. |
--n-pcs |
Int | 5 | Number of principal components to extract from trace data. Automatically clamped to available feature count. |
--model |
String | all |
Model selection: lda, logreg, mlp, fp_optimized_mlp, both, or all. |
--cv |
Int | 0 | Number of stratified cross-validation folds (0 = no CV, only train/test split). |
--artifacts-dir |
Path | ./outputs/artifacts |
Directory where models, metrics, and config are saved. |
--plots-dir |
Path | ./outputs/artifacts/plots |
Directory where visualizations are saved. |
--run-dir |
Path | - | Specific run directory to use for eval, viz, and predict. If not provided, the newest directory is auto-selected. |
--seed |
Int | 42 | Random seed for reproducibility. |
--verbose |
Flag | False | Enable DEBUG-level logging. |
--dry-run |
Flag | False | Validate pipeline configuration without writing artifacts. |
Used with: flybehavior-response prepare
| Argument | Type | Default | Description |
|---|---|---|---|
--cache-parquet |
Path | - | Optional destination for parquet cache when streaming geometry data. |
--use-cache |
Flag | False | Load geometry frames from existing parquet cache (must be used with --cache-parquet). |
--geom-columns |
String | - | Comma-separated geometry columns to retain while streaming (e.g., dataset,fly,fly_number,trial_type,trial_label,frame_idx,x,y). Reduces memory usage. |
--geom-chunk-size |
Int | 50000 | Number of rows to load per chunk when streaming large geometry CSVs. Larger chunks are faster but use more RAM. |
--frame-column |
String | frame_idx |
Name of the frame index column used to validate chunk contiguity when streaming. |
--aggregate-geometry |
Flag | False | Aggregate streamed geometry into per-trial summaries using specified statistics. |
--aggregate-stats |
String | mean,min,max |
Comma-separated list of aggregation functions for numeric geometry columns. Options: mean, min, max, std, median, etc. |
--aggregate-format |
String | parquet |
Output format for aggregated geometry: parquet or csv. |
--drop-missing-labels |
Flag | True | Drop rows without matching labels (default). |
--keep-missing-labels |
Flag | False | Retain rows without labels (for inspection; may cause merge errors). |
Used with: flybehavior-response train
Classification & Class Weighting:
| Argument | Type | Default | Description |
|---|---|---|---|
--classification-mode |
String | binary |
Classification strategy: binary (0 vs 1-5), multiclass (preserve all classes 0-5), threshold-1 (0-1 vs 2-5), or threshold-2 (0-2 vs 3-5). See classification modes section below. |
--class-weights |
String | - | Custom class weights for MLP (e.g., 0:2.0,1:1.0). Higher weight on class 0 reduces false positives. Default for fp_optimized_mlp is 0:1.0,1:2.0. |
--logreg-class-weights |
String | - | Custom class weights for logistic regression. Format: 0:1.0,1:2.0 or balanced for auto-balancing. Higher weight on class 1 increases sensitivity to responders. Example: 0:1.0,1:3.0 gives responders 3× weight. |
--rf-class-weights |
String | - | Custom class weights for Random Forest. Format: 0:1.0,1:2.0 or balanced for auto-balancing. |
Logistic Regression Options:
| Argument | Type | Default | Description |
|---|---|---|---|
--logreg-solver |
String | lbfgs |
Solver for logistic regression optimization: lbfgs, liblinear, or saga. Use saga for very large datasets. |
--logreg-max-iter |
Int | 1000 | Maximum iterations for convergence. Increase if warnings about non-convergence appear. |
Random Forest Options:
| Argument | Type | Default | Description |
|---|---|---|---|
--rf-n-estimators |
Int | 100 | Number of trees in the Random Forest. Larger values (e.g., 500) improve stability but increase training time. |
--rf-max-depth |
Int | None | Maximum depth of individual trees. None = no limit. Lower values reduce overfitting. |
Hyperparameter Tuning:
| Argument | Type | Default | Description |
|---|---|---|---|
--best-params-json |
Path | - | Path to Optuna-generated best_params.json from scripts/tune/optuna_mlp_tuning.py. Auto-enables PCA and applies tuned hyperparameters to MLP. |
Group-Aware Splitting (prevents data leakage):
| Argument | Type | Default | Description |
|---|---|---|---|
--group-column |
String | fly |
Column name for group-aware splits (e.g., fly ID). Ensures samples from the same group stay in train/test, preventing leakage. |
--group-override |
String | - | Override group splitting: supply none to disable group-aware splits (treats each trial as independent). |
--test-size |
Float | 0.2 | Fraction of samples reserved for test set (0.0–1.0). |
Geometry Input (for frame-level analysis):
| Argument | Type | Default | Description |
|---|---|---|---|
--geometry-frames |
Path | - | Path to per-frame geometry CSV/parquet for on-the-fly trial aggregation. |
--geometry-trials |
Path | - | Path to pre-computed per-trial geometry summary CSV to merge with data. |
--geom-cache-parquet |
Path | - | Optional parquet cache for streamed geometry frames. |
--geom-use-cache |
Flag | False | Load geometry from existing parquet cache. |
--geom-chunk-size |
Int | 50000 | Rows per chunk when streaming geometry. |
--geom-columns |
String | - | Comma-separated geometry columns to retain while streaming. |
--geom-feature-columns |
String | - | Comma-separated geometry columns to use as features. Prefix with @ to load from file (e.g., @geometry_features.txt). |
--geom-granularity |
String | trial |
trial (aggregated per-trial stats) or frame (frame-level rows). |
--geom-stats |
String | mean,min,max |
Aggregation functions for trial granularity: mean, min, max, std, etc. |
--geom-normalize |
String | none |
Normalization for geometry columns: none, zscore, or minmax. |
--no-geom-downcast |
Flag | False | Disable float32 downcasting for geometry columns (keep float64). |
--geom-drop-missing-labels |
Flag | True | Drop rows without labels (default). |
--geom-keep-missing-labels |
Flag | False | Retain rows without labels. |
Used with: flybehavior-response eval
Inherits all common arguments plus all geometry options from TRAIN. Additionally:
| Argument | Type | Default | Description |
|---|---|---|---|
--run-dir |
Path | - | Path to run directory containing saved models. If omitted, auto-selects the newest directory. |
Used with: flybehavior-response viz
Uses common arguments to regenerate visualizations (PCA scatter, LDA histograms, ROC curves) for an existing run.
Used with: flybehavior-response predict
| Argument | Type | Default | Description |
|---|---|---|---|
--model-path |
Path | - | Required. Path to a trained model (e.g., model_logreg.joblib, model_mlp.joblib). |
--output-csv |
Path | ./outputs/artifacts/predictions.csv |
Destination CSV for scored predictions. |
--threshold |
Float | 0.5 | Decision threshold for binary classification (0.0–1.0). Higher values reduce false positives (e.g., 0.65). |
--fly |
String | - | Filter predictions to a specific fly identifier. |
--fly-number |
Int | - | Filter predictions to a specific numeric fly identifier. |
--trial-label |
String | - | Filter predictions to a specific trial label. |
--testing-trial |
String | - | Legacy alias for --trial-label. |
Used with: flybehavior-response prepare-raw
Converts per-trial raw coordinate exports into modeling-ready CSVs.
| Argument | Type | Default | Description |
|---|---|---|---|
--data-csv / positional |
Path | - | Path to per-trial raw coordinate CSV. Can be supplied as positional argument or --data-csv. |
--data-npy |
Path | - | Path to per-trial coordinate matrix as .npy (trials × frames × channels). Requires --matrix-meta. |
--matrix-meta |
Path | - | JSON file describing matrix layout and per-trial metadata (required with --data-npy). |
--labels-csv |
Path | - | Required. Path to labels CSV. |
--out |
Path | data/all_eye_prob_coords_prepared.csv |
Destination CSV for prepared coordinates. |
--fps |
Int | 40 | Frame rate (frames per second) for temporal calculations. |
--odor-on-idx |
Int | 1230 | Frame index where odor stimulus begins. |
--odor-off-idx |
Int | 2430 | Frame index where odor stimulus ends. |
--truncate-before |
Int | 0 | Frames to keep before odor onset (0 = keep all). |
--truncate-after |
Int | 0 | Frames to keep after odor offset (0 = keep all). |
--series-prefixes |
String | eye_x_f,eye_y_f,prob_x_f,prob_y_f |
Comma-separated time-series column prefixes to extract. |
--compute-dir-val |
Flag | False | Also compute dir_val distances between proboscis and eye coordinates. |
All features must exist as columns in your input CSV. The following are recognized by the CLI:
Temporal & Amplitude Features:
AUC-Before: Area under curve before odor stimulusAUC-During: Area under curve during odor stimulusAUC-After: Area under curve after odor stimulusAUC-During-Before-Ratio: Ratio of AUC-During to AUC-Before⚠️ unstable, produces warningsAUC-After-Before-Ratio: Ratio of AUC-After to AUC-Before⚠️ unstable, produces warningsTimeToPeak-During: Frames to reach maximum response during stimulusPeak-Value: Maximum response value observed
Extrema Features:
global_min: Minimum value across entire trialglobal_max: Maximum value across entire trialtrimmed_global_min: Minimum after trimming outlierstrimmed_global_max: Maximum after trimming outlierslocal_min: Local minimum valuelocal_max: Local maximum valuelocal_min_before: Local minimum in pre-stimulus epochlocal_max_before: Local maximum in pre-stimulus epochlocal_min_during: Local minimum during stimuluslocal_max_during: Local maximum during stimuluslocal_max_over_global_min: Ratio of local max to global minlocal_max_during_over_global_min: Ratio of during-local-max to global minlocal_max_during_odor: Local max during odor periodlocal_max_during_odor_over_global_min: Ratio of odor-local-max to global min
Special Flags:
non_reactive_flag: 1 for non-responders, 0 otherwise.⚠️ Leaks target signal; do not use for training.
The --classification-mode argument controls how multi-class labels (0–5) are interpreted during training:
| Mode | Mapping | Use Case |
|---|---|---|
binary (default) |
Class 0 → Non-responder vs. Classes 1–5 → Responder | Standard binary response classification |
multiclass |
Classes 0–5 → Preserve all 6 classes | Distinguish between response intensities (fine-grained analysis) |
threshold-1 |
Classes 0–1 → Non-responder vs. Classes 2–5 → Strong responder | Separate strong from weak/non-responders |
threshold-2 |
Classes 0–2 → Non-responder vs. Classes 3–5 → Strong responder | Intermediate intensity threshold |
Example: Emphasizing strong responders
flybehavior-response train \
--data-csv data/all_envelope_rows_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv \
--classification-mode threshold-1 \
--model all \
--features "AUC-During,Peak-Value,global_max" \
--n-pcs 5This trains classifiers to distinguish strong responders (score ≥ 2) from weak/non-responders (score ≤ 1), which can improve early-stage behavioral phenotyping.
The package supports any time-series traces as features through the --series-prefixes mechanism. RMS (root mean square) values at each time point can be used as follows:
If you have pre-computed RMS columns (e.g., rms_0, rms_1, ..., rms_3600):
flybehavior-response train \
--data-csv data/wide_with_rms.csv \
--labels-csv data/labels.csv \
--features "AUC-During,Peak-Value" \
--series-prefixes "rms_" \
--n-pcs 5 \
--model allIf using both engineered features AND RMS traces together:
flybehavior-response train \
--data-csv data/wide_with_rms_and_features.csv \
--labels-csv data/labels.csv \
--features "AUC-During,Peak-Value,global_max,local_min,local_max" \
--series-prefixes "rms_" \
--n-pcs 5 \
--use-raw-pca \
--model allThe pipeline will:
- Extract all
rms_*columns (time-series) - Apply median imputation and standardization
- Reduce dimensionality to 5 principal components via PCA
- Scale the engineered features independently
- Train models on the combined feature set
Computing RMS from raw coordinates (if not pre-computed):
Use prepare-raw with --compute-dir-val to generate distance metrics (which approximate RMS):
flybehavior-response prepare-raw \
--data-csv data/raw_coordinates.csv \
--labels-csv data/labels.csv \
--out data/prepared_with_distances.csv \
--compute-dir-valThis creates dir_val_0, dir_val_1, ..., dir_val_3600 columns representing frame-by-frame distances, which can then be used with --series-prefixes "dir_val_".
Here's a comprehensive example using engineered features, RMS/distance traces, and multiple models:
flybehavior-response train \
--data-csv data/all_envelope_rows_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv \
--features "AUC-Before,AUC-During,AUC-After,TimeToPeak-During,Peak-Value,global_min,global_max,trimmed_global_min,trimmed_global_max,local_min,local_max,local_min_before,local_max_before,local_min_during,local_max_during,local_max_over_global_min,local_max_during_over_global_min" \
--series-prefixes "dir_val_" \
--n-pcs 10 \
--use-raw-pca \
--classification-mode binary \
--model all \
--logreg-class-weights "balanced" \
--rf-n-estimators 200 \
--cv 5 \
--artifacts-dir outputs/comprehensive_runThis trains all supported models (LDA, Logistic Regression, Random Forest, MLP, fp_optimized_mlp) on:
- All 17 engineered features
- 10 principal components from the time-series
dir_val_traces (RMS-like distances) - 5-fold cross-validation with stratified splits at the fly level
- Balanced class weighting for logistic regression
- 200 trees for Random Forest
--model alltrains LDA, logistic regression, and both neural network configurations using a shared stratified split and writes per-model confusion matrices into the run directory.- Each training run exports
predictions_<model>_{train,test}.csv(andvalidationwhen applicable) so you can audit which trials were classified correctly, along with their reaction probabilities and sample weights. --model mlpisolates the legacy neural baseline: a scikit-learnMLPClassifierwith a single hidden layer of 100 neurons between the feature input and the binary output unit.--model fp_optimized_mlpactivates the new false-positive minimising architecture. It stacks two ReLU-activated hidden layers sized 256 and 128, uses Adam with a 0.001 learning rate, honours proportional intensity weights, and multiplies responder samples (label==1) by an additional class weight of 2.0. Training automatically performs a stratified 70/15/15 train/validation/test split, monitors validation performance with early stopping (n_iter_no_change=10), and logs precision plus false-positive rates across all splits.- Optuna-generated payloads and manual configurations targeting
fp_optimized_mlpmust now specify exactly two hidden widths. The CLI enforces this constraint and raises if a single-layer payload is supplied, preventing accidental regressions when reusing tuned configurations. - Inspect
metrics.jsonfortest(andvalidation) entries to verify held-out accuracy, precision, recall, F1, and false-positive rates. Reviewconfusion_matrix_<model>.pngin the run directory for quick diagnostics. - Existing scripts that still pass
--model bothcontinue to run LDA + logistic regression only; update them to--model allto include the neural networks when desired.
Example run focused on minimising false positives:
flybehavior-response train \
--data-csv data/wide_features.csv \
--labels-csv data/labels.csv \
--features "AUC-During,Peak-Value,global_max,local_min,local_max" \
--series-prefixes "dir_val" \
--model fp_optimized_mlp \
--n-pcs 5 \
--cv 5 \
--artifacts-dir outputs/artifacts/fp_optimizedThe run directory records the combined sample/class weights, validation metrics, and a confusion matrix that highlights the reduced false-positive rate.
-
Use the Typer subcommand to convert per-trial eye/proboscis traces into a modeling-ready CSV with metadata and optional
dir_valdistances:flybehavior-response prepare-raw \ --data-csv data/all_eye_prob_coords_per_trial.csv \ --labels-csv data/scoring_results_opto_new_MINIMAL.csv \ --out data/all_eye_prob_coords_prepared.csv \ --fps 40 --odor-on-idx 1230 --odor-off-idx 2430 \ --truncate-before 0 --truncate-after 0 \ --series-prefixes "eye_x_f,eye_y_f,prob_x_f,prob_y_f" \ --no-compute-dir-val -
If your acquisition exports trials as a 3-D NumPy array (trials × frames × 4 channels), save the matrix to
.npyand provide a JSON metadata file describing each trial and the layout:flybehavior-response prepare-raw \ --data-npy data/all_eye_prob_coords_matrix.npy \ --matrix-meta data/all_eye_prob_coords_matrix.json \ --labels-csv data/scoring_results_opto_new_MINIMAL.csv \ --out data/all_eye_prob_coords_prepared.csv
The metadata JSON must contain a
metadata(ortrials) array with per-row descriptors (dataset,fly,fly_number,trial_type,trial_label– legacy exports may name thistesting_trialand will be auto-renamed), an optionallayoutfield (trial_time_channelortrial_channel_time), and optionalchannel_prefixesthat match the prefixes passed via--series-prefixes. -
The output keeps raw values with consistent 0-based frame indices per prefix, adds timing metadata, and can be fed directly to
flybehavior-response train --raw-series(or an explicit--series-prefixes eye_x_f,eye_y_f,prob_x_f,prob_y_fif you customise the channel order). -
All subcommands (
prepare,train,eval,viz,predict) accept--raw-seriesto prioritise the four eye/proboscis channels. When left unset, the loader still auto-detects the raw prefixes wheneverdir_val_traces are absent, so legacy scripts continue to run unchanged.
Once you have a wide table of raw coordinates, enable the raw channel handling on every CLI entry point with --raw-series (or supply an explicit --series-prefixes string if you re-ordered the channels):
# train all models on raw coordinates (engineered feature list is ignored automatically)
flybehavior-response train --raw-series \
--data-csv data/all_eye_prob_coords_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv \
--model all --n-pcs 5
# evaluate an existing run against the same raw inputs
flybehavior-response eval --raw-series \
--data-csv data/all_eye_prob_coords_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv \
--run-dir outputs/artifacts/<timestamp>
# regenerate confusion matrices and PCA/ROC plots for the raw-trained models
flybehavior-response viz --raw-series \
--data-csv data/all_eye_prob_coords_wide.csv \
--labels-csv data/scoring_results_opto_new_MINIMAL.csv \
--run-dir outputs/artifacts/<timestamp>
# score new raw trials with a saved pipeline
flybehavior-response predict --raw-series \
--data-csv data/all_eye_prob_coords_wide.csv \
--model-path outputs/artifacts/<timestamp>/model_logreg.joblib \
--output-csv outputs/artifacts/<timestamp>/raw_predictions.csv
The raw workflow is always two-step: generate a per-trial table with `prepare-raw`, then invoke `train`, `eval`, `viz`, and `predict` with `--raw-series` (or explicit `--series-prefixes`) so every command consumes the four eye/proboscis streams exactly as prepared.Need to benchmark engineered features without the high-dimensional traces? Add --no-raw to the same subcommands. The loader drops every dir_val_###, eye_x_f*, eye_y_f*, prob_x_f*, and prob_y_f* column before training, stores that decision in config.json, and automatically disables PCA on the now-missing traces. Downstream eval, viz, and predict runs inherit the configuration, so omitting --no-raw later still reproduces the engineered-only workflow unless you explicitly override the series selection. The same flag works when you stream geometry frames with --geometry-frames: the trial aggregator now skips raw trace assembly so you can train purely on engineered responder features while keeping the persisted configuration in sync with eval, viz, and predict commands.
During training the loader automatically recognises that engineered features are absent and logs that it is proceeding in a trace-only configuration. Keep PCA enabled (--use-raw-pca, the default) to derive compact principal components from the four coordinate streams.
Older exports that only include dir_val_### columns (no engineered metrics) are now supported out of the box. Simply point the trainer at the data/label CSVs—no extra flags are required:
flybehavior-response train \
--data-csv /path/to/dir_val_only_data.csv \
--labels-csv /path/to/labels.csv \
--model allThe loader detects that engineered features are missing, logs a trace-only message, and continues with PCA on the dir_val_ traces. The same behaviour applies to eval, viz, and predict, so the entire pipeline operates normally on these legacy tables.
-
Use the new
predictfilters when you want to score a single envelope or raw trial without extracting it manually:flybehavior-response predict \ --data-csv data/all_envelope_rows_wide.csv \ --model-path outputs/artifacts/<run>/model_logreg.joblib \ --fly september_09_fly_3 --fly-number 3 --testing-trial t2 \ --output-csv outputs/artifacts/<run>/prediction_september_09_fly_3_t2.csv
-
The loader automatically treats a
testing_trialcolumn as the canonicaltrial_label, so legacy exports continue to work. Supply any subset of the filters (--fly,--fly-number,--trial-label/--testing-trial) to narrow the prediction set; when all three are present, exactly one trial is returned and written with its reaction probability.
- Ensure trace columns follow contiguous 0-based numbering for each prefix (default
dir_val_). Columns beyonddir_val_3600are trimmed automatically for legacy datasets. user_score_odormust contain non-negative integers where0denotes no response and higher integers (e.g.,1-5) encode increasing reaction strength. Rows with missing labels are dropped automatically, while negative or fractional scores raise schema errors.- Training uses proportional sample weights derived from label intensity so stronger reactions (e.g.,
5) contribute more than weaker ones (e.g.,1). Review the logged weight summaries if model behaviour seems unexpected. - Duplicate keys across CSVs (
dataset,fly,fly_number,trial_type,trial_label) raise errors to prevent ambiguous merges. - Ratio features (
AUC-During-Before-Ratio,AUC-After-Before-Ratio) are supported but produce warnings because they are unstable. - The CLI recognises the following engineered scalar columns out of the box:
AUC-Before,AUC-During,AUC-After,AUC-During-Before-Ratio,AUC-After-Before-Ratio,TimeToPeak-During,Peak-Value,global_min,global_max,trimmed_global_min,trimmed_global_max,local_min,local_max,local_min_before,local_max_before,local_min_during,local_max_during,local_max_over_global_min,local_max_during_over_global_min,local_max_during_odor,local_max_during_odor_over_global_min, and the newly addednon_reactive_flag(1 for non-responders, 0 otherwise). Any subset passed via--features(or baked intobest_params.json) is validated against this list so feature-only runs fail fast when a requested column is absent. Becausenon_reactive_flagis derived directly from the binary label, only use it for auditing or rule-based workflows—feeding it into model training will trivially leak the target signal. - Use
--dry-runto confirm configuration before writing artifacts. - The CLI automatically selects the newest run directory containing model artifacts. Override with
--run-dirif you maintain multiple artifact trees (e.g.,outputs/artifacts/projections).