colehanan1 · colehanan1 · Jan 8, 2026 · Jan 8, 2026 · Jan 8, 2026
diff --git a/.gitignore b/.gitignore
@@ -169,4 +169,7 @@ helper-code/
 
 flywire_orn_database/
 
-diagnostics/
+diagnostics/*
+!diagnostics/
+!diagnostics/*.py
+!diagnostics/*.md
diff --git a/diagnostics/DELTA_PER_COLLAPSE_EXPLANATION.md b/diagnostics/DELTA_PER_COLLAPSE_EXPLANATION.md
@@ -0,0 +1,45 @@
+# ΔPER LASSO Collapse Explanation
+
+Latest audit run: `diagnostics/postA_postB_audit_20260108_154458`
+
+## Evidence of intercept-only collapse (LASSO)
+
+- opto_hex / delta_base: INTERCEPT-ONLY; n_selected=0, pred_std=0, cv_mse=0.100622, intercept_only_mse=0.100622, y_std=0.271894
+- opto_hex / delta_extended: INTERCEPT-ONLY; n_selected=0, pred_std=0, cv_mse=0.100622, intercept_only_mse=0.100622, y_std=0.271894
+- opto_EB / delta_base: INTERCEPT-ONLY; n_selected=0, pred_std=0, cv_mse=0.0165606, intercept_only_mse=0.0165606, y_std=0.110304
+- opto_EB / delta_extended: INTERCEPT-ONLY; n_selected=0, pred_std=0, cv_mse=0.0165606, intercept_only_mse=0.0165606, y_std=0.110304
+- opto_benz_1 / delta_base: non-intercept; n_selected=1, pred_std=0.0544301, cv_mse=0.0382769, intercept_only_mse=0.0388923, y_std=0.169038
+- opto_benz_1 / delta_extended: non-intercept; n_selected=1, pred_std=0.0544301, cv_mse=0.0382769, intercept_only_mse=0.0388923, y_std=0.169038
+
+## If LASSO collapsed, how much better are Ridge/ElasticNet?
+
+- opto_hex / delta_base: best alt = elasticnet_0.5 | Δcv_mse=0, Δnmse=0
+- opto_hex / delta_extended: best alt = elasticnet_0.5 | Δcv_mse=0, Δnmse=0
+- opto_EB / delta_base: best alt = ridge | Δcv_mse=0.00694191, Δnmse=0.570554
+- opto_EB / delta_extended: best alt = ridge | Δcv_mse=0.00694192, Δnmse=0.570555
+
+## Why LASSO collapses in ΔPER for opto_hex/opto_EB
+
+The audit shows ΔPER LASSO selecting zero features with pred_std=0 and cv_mse equal to intercept-only MSE. This indicates the LASSO penalty dominates the signal at small n, so the best cross-validated model is the intercept-only baseline. Expanding the ΔPER lambda grid (delta_extended) does not change this for opto_hex/opto_EB, so it is not a grid-resolution artifact.
+
+## Why low-range datasets look “perfect” in raw MSE
+
+Raw MSE is scale-dependent: smaller y_std yields smaller MSE even when relative error is similar. Normalized metrics (nmse and rmse_over_y_std) in `diagnostics/delta_model_comparison.csv` should be used for cross-condition comparisons. This avoids misreading low-variance conditions as “perfect fits.”
+
+## Recommended default model for ΔPER reporting
+
+When LASSO is intercept-only (n_selected=0, pred_std=0, cv_mse==intercept_only_mse), report the best ElasticNet/Ridge by CV MSE. This is already surfaced in `audit_primary_models.csv` in the latest audit run.
+
+## Reproducible commands
+
+```bash
+conda run -n DoOR python diagnostics/run_postA_postB_audit.py \
+  --door_cache door_cache \
+  --behavior_csv "/home/ramanlab/Documents/cole/Results/Opto/Reaction_Predictions(Strictest)/reaction_rates_summary_unordered.csv" \
+  --conditions opto_hex,opto_EB,opto_benz_1,opto_ACV,opto_3-oct \
+  --prediction_mode test_odorant \
+  --cv_folds 5 \
+  --lambda_range 0.0001,0.001,0.01,0.1,1.0 \
+  --lambda_range_delta 1e-8,1e-7,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1.0 \
+  --missing_control_policy skip
+```
diff --git a/diagnostics/PLAN_STABILITY.md b/diagnostics/PLAN_STABILITY.md
@@ -0,0 +1,32 @@
+# Stability + Metrics Layer Plan
+
+## Discovery summary
+- Feature matrices X are built in `src/door_toolkit/pathways/behavioral_prediction.py` via:
+  - `_extract_test_odorant_features`, `_extract_trained_odorant_features`, `_extract_interaction_features`.
+- Receptor ordering comes from `DoOREncoder.response_matrix` column order in `src/door_toolkit/encoder.py` and is exposed as `predictor.masked_receptor_names` (or `encoder.receptor_names`).
+- LASSO selection is in `LassoBehavioralPredictor.fit_behavior()` and `fit_lasso_with_fixed_scaler()` (same file).
+- Ridge/ElasticNet CV logic exists in `diagnostics/run_postA_postB_audit.py` (LOOCV grid search).
+- Audit outputs are under `diagnostics/postA_postB_audit_*/` with `audit_metrics.csv` + `audit_artifacts.json`.
+
+## Files to add/change
+- Add: `diagnostics/run_stability_and_metrics.py` (new stability + metrics runner).
+- Add: `tests/test_stability_metrics.py` (determinism + schema + intercept-only flag tests).
+- Update: `.gitignore` to allow tracked `diagnostics/*.py` and `diagnostics/*.md`.
+- Update: `docs/BEHAVIORAL_PREDICTION_ANALYSIS.md` with 5-line “how to run stability layer”.
+
+## Algorithms to implement
+- Standardized metrics for each (condition, mode, modelclass):
+  - y_std, y_var, y_min, y_max; pred_std, pred_min, pred_max; cv_mse; nmse; rmse_over_y_std;
+    intercept_only_flag; intercept_only_mse (LOOCV mean predictor).
+- ORN stability (LOOO):
+  - For each fold: fit model on n-1 odorants (same scaling rules as baseline).
+  - Record selected ORNs + coefficients; compute selection_frequency, sign_consistency,
+    mean/std abs(weight), mean rank by abs(weight).
+  - LASSO only if not intercept-only; ElasticNet for ΔPER when LASSO is intercept-only; Ridge uses rank stability.
+- Experiment shortlist: top 5 ORNs by stability_score = selection_frequency * sign_consistency,
+  plus confidence flags (nmse>=1, intercept-only, missing controls).
+
+## Verification steps
+- `pytest -q` (determinism + schema tests for stability outputs).
+- Run stability script on real CSV + conditions with seed=1337; check outputs:
+  - `stability_per_condition.csv`, `model_metrics.csv`, `EXPERIMENT_SHORTLIST.md`, `SUMMARY.md`, `RUN_COMMANDS.txt`.
diff --git a/diagnostics/baseline_drift_hypotheses.md b/diagnostics/baseline_drift_hypotheses.md
@@ -0,0 +1,34 @@
+# Baseline Drift Hypotheses
+
+## Summary
+No obvious in-place mutation or stochastic sources were found in the LASSO predictor or the ablation/focus scripts. The most plausible explanations are (1) accidental in-process mutation of a view derived from `X`, or (2) changes in target alignment when using ΔPER (control subtraction). Each hypothesis below includes file and function references.
+
+## Hypotheses (with code locations)
+
+### 1) View-based mutation risk (low likelihood)
+- `src/door_toolkit/pathways/behavioral_prediction.py:1560` `restrict_to_receptors()` returns `X[:, kept_indices_sorted]` without `.copy()`.
+  - This returns a view; if any downstream code mutates `X_restricted` in-place it could mutate the original `X` (and appear as baseline drift).
+  - In `scripts/lasso_with_focus_mode.py:421` the view is only passed to `StandardScaler.fit` and LASSO fitting, which do not mutate input arrays, so this risk is theoretical but low.
+
+### 2) In-place ablation (low likelihood)
+- `src/door_toolkit/pathways/behavioral_prediction.py:1410` `apply_receptor_ablation()` explicitly copies `X` before ablation.
+  - This is safe; baseline drift would require a different ablation path that modifies `X` in-place.
+  - `scripts/lasso_with_ablations.py:456` uses `apply_receptor_ablation()` (safe).
+
+### 3) Non-determinism in CV or lambda selection (unlikely)
+- `src/door_toolkit/pathways/behavioral_prediction.py:915` `LassoCV(... random_state=42)`.
+- `src/door_toolkit/pathways/behavioral_prediction.py:961` `cross_val_score` uses deterministic KFold (no shuffle).
+  - Without shuffle, folds are deterministic and reproducible; no randomness expected.
+
+### 4) Data alignment differences (likely for ΔPER vs raw)
+- `src/door_toolkit/pathways/behavioral_prediction.py:873-926` control subtraction uses different masks depending on `missing_control_policy`.
+  - ΔPER runs drop rows with NaNs in either opto or control (`skip`), or fill missing controls (`zero`).
+  - This can change sample counts and target variance vs raw fits, potentially leading to different selected features.
+
+### 5) Dataset label normalization changes (low impact)
+- `src/door_toolkit/pathways/behavioral_prediction.py:726` `_resolve_dataset_name()` normalizes dataset labels.
+  - If the CSV index has multiple labels that normalize to the same token, this can cause ambiguity errors; otherwise should not affect results.
+
+## Notes
+- No global caches or shared mutable matrices were found in the predictor; `get_receptor_profile()` returns fresh arrays.
+- The diagnostic script added in this task will validate reproducibility and detect constant-prediction collapses.
diff --git a/diagnostics/repo_state.md b/diagnostics/repo_state.md
@@ -0,0 +1,65 @@
+# Repo State Snapshot
+
+## Commands
+
+### git status -sb
+```
+## feature/lasso-subtract-control...origin/feature/lasso-subtract-control
+```
+
+### git log -n 20 --oneline
+```
+64c79ce feat: Implement control subtraction for LASSO behavioral prediction and add corresponding tests
+3225982 Merge pull request #2 from colehanan1/feature/lasso-ablation-analysis
+1ed3db4 refactor: Save ablation_summary and comparison plot to ablations/ subfolder
+8183dbc docs: Add ablation and focus mode CLI usage to documentation
+4278a80 feat: Add LASSO focus mode analysis for receptor circuit sufficiency
+026dad9 feat: Add LASSO ablation analysis for receptor circuit robustness
+bc9234f feat: Add support for strict mode in connectome analysis and deprecate Shapley importance method in favor of Shapley-proxy
+e5574e8 Merge pull request #1 from colehanan1/audit/codex_repo_analysis
+1b83066 feat: Add synthetic importance audit scripts for connectome, GLM, LASSO, and Shapley methods
+8093664 Add threshold calibration utilities and corresponding tests
+bd12a4b feat: Update .gitignore to include 'outputs/', 'helper-code/', and 'flywire_orn_database/' directories
+6f84d36 feat: Update .gitignore to include 'outputs/' and '.claude/' directories
+3ac9598 Release v1.0.1
+8ca9bf7 Add comprehensive test suites for mapping accounting and identifier resolution
+862c0f2 Add receptor sensitivity diagnostics script
+4219faa Release v1.0.0: Production-ready toolkit with mushroom body circuit validation
+49ba71a feat: Add Mushroom Body Circuit Validation module and update README with new features
+19b029b feat: Add FlyWire Mushroom Body Pathway Analysis script and Mushroom Body Tracer module
+458cf39 Add comprehensive documentation for behavioral prediction analysis, connectomics module, custom pathway guide, and FlyWire integration notes
+3b5f197 Add LASSO regression-based behavioral prediction and enhance existing predictor
+```
+
+### git diff
+```
+<no working tree diff>
+```
+
+### git diff --stat
+```
+<no working tree diff>
+```
+
+## Changed Files Relevant to Drift Investigation
+
+### Behavioral prediction core
+- `src/door_toolkit/pathways/behavioral_prediction.py`
+  - Commit `64c79ce` adds control-subtraction in `fit_behavior`, plus dataset name normalization helpers and new metadata fields.
+  - Helper utilities for ablation/focus mode are in this file (see `apply_receptor_ablation`, `fit_lasso_with_fixed_scaler`, `restrict_to_receptors`).
+
+### Ablation/focus scripts
+- `scripts/lasso_with_ablations.py` and `scripts/lasso_with_focus_mode.py` show no diffs in this branch vs main (branch equals main at `64c79ce`).
+
+### Helpers for X construction / scaling / lambda selection
+- `LassoBehavioralPredictor.fit_behavior()` in `behavioral_prediction.py` constructs X via `_extract_*` helpers.
+- Scaling uses `StandardScaler.fit_transform` (new arrays, no in-place mutation).
+- Lambda selection uses `LassoCV(random_state=42)` and `cross_val_score` with deterministic folds.
+
+## Commit Stats (64c79ce)
+```
+docs/BEHAVIORAL_PREDICTION_ANALYSIS.md             |  20 ++
+scripts/run_lasso_behavioral_prediction.py         | 239 +++++++++++++++++++++
+src/door_toolkit/pathways/behavioral_prediction.py | 156 +++++++++++++-
+tests/test_lasso_behavioral_prediction.py          |  86 ++++++++
+```