|
| 1 | +# IntegratedML Custom Models Constitution |
| 2 | + |
| 3 | +<!-- |
| 4 | + ============================================================================ |
| 5 | + SYNC IMPACT REPORT - Constitution v1.0.0 |
| 6 | + ============================================================================ |
| 7 | +
|
| 8 | + Version Change: INITIAL → 1.0.0 |
| 9 | +
|
| 10 | + Rationale: Initial constitution creation for IntegratedML Custom Models |
| 11 | + project. This establishes the foundational governance and principles for |
| 12 | + developing custom ML models that execute within InterSystems IRIS SQL. |
| 13 | +
|
| 14 | + Principles Defined: |
| 15 | + - I. In-Database ML (NEW) |
| 16 | + - II. Scikit-learn Compatibility (NEW) |
| 17 | + - III. Test-Driven Development (NEW) |
| 18 | + - IV. Low-Latency Performance (NEW) |
| 19 | + - V. Model State Management (NEW) |
| 20 | +
|
| 21 | + Templates Requiring Updates: |
| 22 | + ✅ plan-template.md - Constitution Check section references this file |
| 23 | + ✅ spec-template.md - Requirements align with model development principles |
| 24 | + ✅ tasks-template.md - Task categorization supports testing and performance gates |
| 25 | +
|
| 26 | + Follow-up TODOs: None - all critical fields populated |
| 27 | + ============================================================================ |
| 28 | +--> |
| 29 | + |
| 30 | +## Core Principles |
| 31 | + |
| 32 | +### I. In-Database ML |
| 33 | + |
| 34 | +All machine learning models MUST execute within InterSystems IRIS SQL without |
| 35 | +requiring data export or external processing pipelines. |
| 36 | + |
| 37 | +**Rationale**: The core value proposition of IntegratedML Custom Models is |
| 38 | +eliminating data movement. Models process data where it lives, enabling |
| 39 | +real-time predictions via `PREDICT()` directly in SQL queries. This principle |
| 40 | +is non-negotiable and defines the project's purpose. |
| 41 | + |
| 42 | +**Requirements**: |
| 43 | +- Models MUST be deployable via SQL `CREATE MODEL` statements |
| 44 | +- Training MUST execute in-database using `FROM table` syntax |
| 45 | +- Predictions MUST return directly to SQL result sets |
| 46 | +- NO data export to external systems for model execution |
| 47 | +- Parameters MUST be passed via JSON `USING` clause (IRIS 2025.2 syntax) |
| 48 | + |
| 49 | +### II. Scikit-learn Compatibility |
| 50 | + |
| 51 | +All custom models MUST implement the scikit-learn estimator interface |
| 52 | +(`fit`, `predict`, `get_params`, `set_params`). |
| 53 | + |
| 54 | +**Rationale**: IntegratedML requires scikit-learn compatible models to ensure |
| 55 | +proper integration with the IRIS ML engine. This interface provides consistent |
| 56 | +model lifecycle management, parameter handling, and serialization. |
| 57 | + |
| 58 | +**Requirements**: |
| 59 | +- Models MUST inherit from `IntegratedMLBaseModel` (shared/models/base.py) |
| 60 | +- `fit(X, y, **params)` method MUST accept training data and IntegratedML params |
| 61 | +- `predict(X)` method MUST return predictions matching IntegratedML expectations |
| 62 | +- Models MUST support parameter serialization via `get_params()`/`set_params()` |
| 63 | +- Custom preprocessing MUST occur within model methods, not externally |
| 64 | + |
| 65 | +### III. Test-Driven Development (NON-NEGOTIABLE) |
| 66 | + |
| 67 | +All features and models MUST have tests covering functional correctness, |
| 68 | +integration with IRIS, and performance benchmarks. |
| 69 | + |
| 70 | +**Rationale**: Given the critical nature of ML predictions in production |
| 71 | +workflows (credit risk, fraud detection, etc.), comprehensive testing is |
| 72 | +mandatory. Performance benchmarks ensure latency requirements are met. |
| 73 | + |
| 74 | +**Requirements**: |
| 75 | +- Unit tests MUST verify model logic (fit/predict behavior) |
| 76 | +- Integration tests MUST verify SQL integration (CREATE MODEL, PREDICT queries) |
| 77 | +- Performance benchmarks MUST measure training time and prediction latency |
| 78 | +- Test data generators MUST produce realistic volumes (>1000 records minimum) |
| 79 | +- Tests MUST be runnable via `make test` or `pytest demos/*/tests/` |
| 80 | +- E2E test (`tests/test_all_demos_e2e.py`) MUST pass before releases |
| 81 | + |
| 82 | +### IV. Low-Latency Performance |
| 83 | + |
| 84 | +Model predictions MUST complete within 50ms (p95) to support real-time |
| 85 | +applications. |
| 86 | + |
| 87 | +**Rationale**: IntegratedML Custom Models target production use cases like |
| 88 | +fraud detection and credit risk assessment that require immediate responses. |
| 89 | +Sub-50ms latency ensures models can be used in interactive applications and |
| 90 | +high-throughput batch processing. |
| 91 | + |
| 92 | +**Requirements**: |
| 93 | +- Prediction latency MUST be <50ms p95 for single-record predictions |
| 94 | +- Training time SHOULD be documented in demo test results |
| 95 | +- Performance benchmarks MUST be included in integration tests |
| 96 | +- Models MUST avoid unnecessary computation in predict path |
| 97 | +- Feature engineering MUST be optimized for repeated prediction calls |
| 98 | + |
| 99 | +### V. Model State Management |
| 100 | + |
| 101 | +Models MUST implement proper serialization to persist across database sessions |
| 102 | +and deployments. |
| 103 | + |
| 104 | +**Rationale**: IRIS stores trained models for reuse across queries and server |
| 105 | +restarts. Proper state management ensures models remain available and |
| 106 | +consistent without retraining. |
| 107 | + |
| 108 | +**Requirements**: |
| 109 | +- Models MUST implement `_get_model_state()` for serialization |
| 110 | +- Models MUST implement `_set_model_state()` for deserialization |
| 111 | +- Model state MUST include all trained parameters and preprocessing artifacts |
| 112 | +- Serialization MUST be compatible with IRIS persistence mechanisms |
| 113 | +- Models MUST handle version compatibility for state loaded from older versions |
| 114 | + |
| 115 | +## Technical Standards |
| 116 | + |
| 117 | +### Python Environment |
| 118 | + |
| 119 | +- **Python Version**: 3.8+ (compatible with IRIS 2025.2 Python runtime) |
| 120 | +- **Dependencies**: Managed via pyproject.toml with uv or pip |
| 121 | +- **Code Style**: Black formatting (line length 88) |
| 122 | +- **Type Hints**: Recommended but not required (mypy for shared/ modules) |
| 123 | +- **Linting**: flake8 with E203/W503 exceptions |
| 124 | + |
| 125 | +### IRIS Integration |
| 126 | + |
| 127 | +- **IRIS Version**: 2025.2+ (required for JSON USING clause syntax) |
| 128 | +- **IntegratedML Installation**: Via `intersystems-iris-automl` from InterSystems registry |
| 129 | +- **Connection**: Environment variables (.env) for host/port/namespace/credentials |
| 130 | +- **Deployment**: Docker Compose for reproducible IRIS environment |
| 131 | +- **Symlink**: iris_automl symlink MUST be created in Python path |
| 132 | + |
| 133 | +### Model Architecture |
| 134 | + |
| 135 | +- **Base Classes**: Extend IntegratedMLBaseModel, ClassificationModel, RegressionModel, or EnsembleModel |
| 136 | +- **Project Structure**: demos/{domain}/models/ for domain-specific implementations |
| 137 | +- **Shared Utilities**: shared/database/ for IRIS connections, shared/utils/ for helpers |
| 138 | +- **Documentation**: Each demo MUST have README explaining model architecture |
| 139 | + |
| 140 | +## Development Workflow |
| 141 | + |
| 142 | +### Feature Development Process |
| 143 | + |
| 144 | +1. **Specification**: Create feature spec following spec-template.md |
| 145 | +2. **Planning**: Generate implementation plan using plan-template.md |
| 146 | +3. **Testing**: Write tests FIRST (TDD - fail, then implement) |
| 147 | +4. **Implementation**: Build model following architecture patterns |
| 148 | +5. **Integration**: Validate SQL integration with IRIS |
| 149 | +6. **Benchmarking**: Measure and document performance |
| 150 | +7. **Documentation**: Update demo README with results |
| 151 | + |
| 152 | +### Quality Gates |
| 153 | + |
| 154 | +- All tests MUST pass (`make test`) |
| 155 | +- Code MUST be formatted (`make format`) |
| 156 | +- Linting MUST pass (`make lint`) |
| 157 | +- E2E test MUST complete successfully |
| 158 | +- Performance benchmarks MUST meet latency requirements (<50ms p95) |
| 159 | +- Demo results MUST be documented in README tables |
| 160 | + |
| 161 | +### Code Review Requirements |
| 162 | + |
| 163 | +- Changes MUST include tests (unit + integration) |
| 164 | +- Performance impact MUST be documented for model changes |
| 165 | +- SQL syntax MUST use IRIS 2025.2 JSON USING clause format |
| 166 | +- Breaking changes MUST be called out explicitly |
| 167 | +- Commit messages MUST follow conventional commits format |
| 168 | + |
| 169 | +## Governance |
| 170 | + |
| 171 | +This constitution supersedes all other development practices and guides all |
| 172 | +technical decisions for IntegratedML Custom Models. |
| 173 | + |
| 174 | +### Amendment Process |
| 175 | + |
| 176 | +1. Propose amendment with clear rationale and scope of impact |
| 177 | +2. Document breaking changes to existing principles |
| 178 | +3. Update affected templates (plan, spec, tasks) for consistency |
| 179 | +4. Increment constitution version per semantic versioning: |
| 180 | + - MAJOR: Backward-incompatible principle removal/redefinition |
| 181 | + - MINOR: New principle added or materially expanded guidance |
| 182 | + - PATCH: Clarifications, wording fixes, non-semantic refinements |
| 183 | +5. Obtain approval from project maintainers |
| 184 | +6. Execute migration plan if existing code affected |
| 185 | + |
| 186 | +### Compliance Review |
| 187 | + |
| 188 | +- All PRs MUST verify alignment with principles |
| 189 | +- Constitution violations MUST be justified in plan.md Complexity Tracking table |
| 190 | +- Unjustified complexity additions WILL be rejected |
| 191 | +- Performance regressions below constitutional thresholds WILL be rejected |
| 192 | +- Tests bypassing TDD process WILL be rejected |
| 193 | + |
| 194 | +### Runtime Guidance |
| 195 | + |
| 196 | +For AI development assistants: Consult CLAUDE.md for project-specific command |
| 197 | +references, common workflows, and architectural patterns. The constitution |
| 198 | +establishes WHAT to build; CLAUDE.md explains HOW to build it efficiently. |
| 199 | + |
| 200 | +**Version**: 1.0.0 | **Ratified**: 2025-10-10 | **Last Amended**: 2025-10-10 |
0 commit comments