Skip to content

Commit 9bf71fe

Browse files
authored
Merge pull request #4 from intersystems-community/001-use-the-current
fix: resolve DNA classifier config initialization bug
2 parents 1d368bc + 4050709 commit 9bf71fe

File tree

20 files changed

+5105
-0
lines changed

20 files changed

+5105
-0
lines changed

.specify/memory/constitution.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# IntegratedML Custom Models Constitution
2+
3+
<!--
4+
============================================================================
5+
SYNC IMPACT REPORT - Constitution v1.0.0
6+
============================================================================
7+
8+
Version Change: INITIAL → 1.0.0
9+
10+
Rationale: Initial constitution creation for IntegratedML Custom Models
11+
project. This establishes the foundational governance and principles for
12+
developing custom ML models that execute within InterSystems IRIS SQL.
13+
14+
Principles Defined:
15+
- I. In-Database ML (NEW)
16+
- II. Scikit-learn Compatibility (NEW)
17+
- III. Test-Driven Development (NEW)
18+
- IV. Low-Latency Performance (NEW)
19+
- V. Model State Management (NEW)
20+
21+
Templates Requiring Updates:
22+
✅ plan-template.md - Constitution Check section references this file
23+
✅ spec-template.md - Requirements align with model development principles
24+
✅ tasks-template.md - Task categorization supports testing and performance gates
25+
26+
Follow-up TODOs: None - all critical fields populated
27+
============================================================================
28+
-->
29+
30+
## Core Principles
31+
32+
### I. In-Database ML
33+
34+
All machine learning models MUST execute within InterSystems IRIS SQL without
35+
requiring data export or external processing pipelines.
36+
37+
**Rationale**: The core value proposition of IntegratedML Custom Models is
38+
eliminating data movement. Models process data where it lives, enabling
39+
real-time predictions via `PREDICT()` directly in SQL queries. This principle
40+
is non-negotiable and defines the project's purpose.
41+
42+
**Requirements**:
43+
- Models MUST be deployable via SQL `CREATE MODEL` statements
44+
- Training MUST execute in-database using `FROM table` syntax
45+
- Predictions MUST return directly to SQL result sets
46+
- NO data export to external systems for model execution
47+
- Parameters MUST be passed via JSON `USING` clause (IRIS 2025.2 syntax)
48+
49+
### II. Scikit-learn Compatibility
50+
51+
All custom models MUST implement the scikit-learn estimator interface
52+
(`fit`, `predict`, `get_params`, `set_params`).
53+
54+
**Rationale**: IntegratedML requires scikit-learn compatible models to ensure
55+
proper integration with the IRIS ML engine. This interface provides consistent
56+
model lifecycle management, parameter handling, and serialization.
57+
58+
**Requirements**:
59+
- Models MUST inherit from `IntegratedMLBaseModel` (shared/models/base.py)
60+
- `fit(X, y, **params)` method MUST accept training data and IntegratedML params
61+
- `predict(X)` method MUST return predictions matching IntegratedML expectations
62+
- Models MUST support parameter serialization via `get_params()`/`set_params()`
63+
- Custom preprocessing MUST occur within model methods, not externally
64+
65+
### III. Test-Driven Development (NON-NEGOTIABLE)
66+
67+
All features and models MUST have tests covering functional correctness,
68+
integration with IRIS, and performance benchmarks.
69+
70+
**Rationale**: Given the critical nature of ML predictions in production
71+
workflows (credit risk, fraud detection, etc.), comprehensive testing is
72+
mandatory. Performance benchmarks ensure latency requirements are met.
73+
74+
**Requirements**:
75+
- Unit tests MUST verify model logic (fit/predict behavior)
76+
- Integration tests MUST verify SQL integration (CREATE MODEL, PREDICT queries)
77+
- Performance benchmarks MUST measure training time and prediction latency
78+
- Test data generators MUST produce realistic volumes (>1000 records minimum)
79+
- Tests MUST be runnable via `make test` or `pytest demos/*/tests/`
80+
- E2E test (`tests/test_all_demos_e2e.py`) MUST pass before releases
81+
82+
### IV. Low-Latency Performance
83+
84+
Model predictions MUST complete within 50ms (p95) to support real-time
85+
applications.
86+
87+
**Rationale**: IntegratedML Custom Models target production use cases like
88+
fraud detection and credit risk assessment that require immediate responses.
89+
Sub-50ms latency ensures models can be used in interactive applications and
90+
high-throughput batch processing.
91+
92+
**Requirements**:
93+
- Prediction latency MUST be <50ms p95 for single-record predictions
94+
- Training time SHOULD be documented in demo test results
95+
- Performance benchmarks MUST be included in integration tests
96+
- Models MUST avoid unnecessary computation in predict path
97+
- Feature engineering MUST be optimized for repeated prediction calls
98+
99+
### V. Model State Management
100+
101+
Models MUST implement proper serialization to persist across database sessions
102+
and deployments.
103+
104+
**Rationale**: IRIS stores trained models for reuse across queries and server
105+
restarts. Proper state management ensures models remain available and
106+
consistent without retraining.
107+
108+
**Requirements**:
109+
- Models MUST implement `_get_model_state()` for serialization
110+
- Models MUST implement `_set_model_state()` for deserialization
111+
- Model state MUST include all trained parameters and preprocessing artifacts
112+
- Serialization MUST be compatible with IRIS persistence mechanisms
113+
- Models MUST handle version compatibility for state loaded from older versions
114+
115+
## Technical Standards
116+
117+
### Python Environment
118+
119+
- **Python Version**: 3.8+ (compatible with IRIS 2025.2 Python runtime)
120+
- **Dependencies**: Managed via pyproject.toml with uv or pip
121+
- **Code Style**: Black formatting (line length 88)
122+
- **Type Hints**: Recommended but not required (mypy for shared/ modules)
123+
- **Linting**: flake8 with E203/W503 exceptions
124+
125+
### IRIS Integration
126+
127+
- **IRIS Version**: 2025.2+ (required for JSON USING clause syntax)
128+
- **IntegratedML Installation**: Via `intersystems-iris-automl` from InterSystems registry
129+
- **Connection**: Environment variables (.env) for host/port/namespace/credentials
130+
- **Deployment**: Docker Compose for reproducible IRIS environment
131+
- **Symlink**: iris_automl symlink MUST be created in Python path
132+
133+
### Model Architecture
134+
135+
- **Base Classes**: Extend IntegratedMLBaseModel, ClassificationModel, RegressionModel, or EnsembleModel
136+
- **Project Structure**: demos/{domain}/models/ for domain-specific implementations
137+
- **Shared Utilities**: shared/database/ for IRIS connections, shared/utils/ for helpers
138+
- **Documentation**: Each demo MUST have README explaining model architecture
139+
140+
## Development Workflow
141+
142+
### Feature Development Process
143+
144+
1. **Specification**: Create feature spec following spec-template.md
145+
2. **Planning**: Generate implementation plan using plan-template.md
146+
3. **Testing**: Write tests FIRST (TDD - fail, then implement)
147+
4. **Implementation**: Build model following architecture patterns
148+
5. **Integration**: Validate SQL integration with IRIS
149+
6. **Benchmarking**: Measure and document performance
150+
7. **Documentation**: Update demo README with results
151+
152+
### Quality Gates
153+
154+
- All tests MUST pass (`make test`)
155+
- Code MUST be formatted (`make format`)
156+
- Linting MUST pass (`make lint`)
157+
- E2E test MUST complete successfully
158+
- Performance benchmarks MUST meet latency requirements (<50ms p95)
159+
- Demo results MUST be documented in README tables
160+
161+
### Code Review Requirements
162+
163+
- Changes MUST include tests (unit + integration)
164+
- Performance impact MUST be documented for model changes
165+
- SQL syntax MUST use IRIS 2025.2 JSON USING clause format
166+
- Breaking changes MUST be called out explicitly
167+
- Commit messages MUST follow conventional commits format
168+
169+
## Governance
170+
171+
This constitution supersedes all other development practices and guides all
172+
technical decisions for IntegratedML Custom Models.
173+
174+
### Amendment Process
175+
176+
1. Propose amendment with clear rationale and scope of impact
177+
2. Document breaking changes to existing principles
178+
3. Update affected templates (plan, spec, tasks) for consistency
179+
4. Increment constitution version per semantic versioning:
180+
- MAJOR: Backward-incompatible principle removal/redefinition
181+
- MINOR: New principle added or materially expanded guidance
182+
- PATCH: Clarifications, wording fixes, non-semantic refinements
183+
5. Obtain approval from project maintainers
184+
6. Execute migration plan if existing code affected
185+
186+
### Compliance Review
187+
188+
- All PRs MUST verify alignment with principles
189+
- Constitution violations MUST be justified in plan.md Complexity Tracking table
190+
- Unjustified complexity additions WILL be rejected
191+
- Performance regressions below constitutional thresholds WILL be rejected
192+
- Tests bypassing TDD process WILL be rejected
193+
194+
### Runtime Guidance
195+
196+
For AI development assistants: Consult CLAUDE.md for project-specific command
197+
references, common workflows, and architectural patterns. The constitution
198+
establishes WHAT to build; CLAUDE.md explains HOW to build it efficiently.
199+
200+
**Version**: 1.0.0 | **Ratified**: 2025-10-10 | **Last Amended**: 2025-10-10
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
#!/usr/bin/env bash
2+
3+
# Consolidated prerequisite checking script
4+
#
5+
# This script provides unified prerequisite checking for Spec-Driven Development workflow.
6+
# It replaces the functionality previously spread across multiple scripts.
7+
#
8+
# Usage: ./check-prerequisites.sh [OPTIONS]
9+
#
10+
# OPTIONS:
11+
# --json Output in JSON format
12+
# --require-tasks Require tasks.md to exist (for implementation phase)
13+
# --include-tasks Include tasks.md in AVAILABLE_DOCS list
14+
# --paths-only Only output path variables (no validation)
15+
# --help, -h Show help message
16+
#
17+
# OUTPUTS:
18+
# JSON mode: {"FEATURE_DIR":"...", "AVAILABLE_DOCS":["..."]}
19+
# Text mode: FEATURE_DIR:... \n AVAILABLE_DOCS: \n ✓/✗ file.md
20+
# Paths only: REPO_ROOT: ... \n BRANCH: ... \n FEATURE_DIR: ... etc.
21+
22+
set -e
23+
24+
# Parse command line arguments
25+
JSON_MODE=false
26+
REQUIRE_TASKS=false
27+
INCLUDE_TASKS=false
28+
PATHS_ONLY=false
29+
30+
for arg in "$@"; do
31+
case "$arg" in
32+
--json)
33+
JSON_MODE=true
34+
;;
35+
--require-tasks)
36+
REQUIRE_TASKS=true
37+
;;
38+
--include-tasks)
39+
INCLUDE_TASKS=true
40+
;;
41+
--paths-only)
42+
PATHS_ONLY=true
43+
;;
44+
--help|-h)
45+
cat << 'EOF'
46+
Usage: check-prerequisites.sh [OPTIONS]
47+
48+
Consolidated prerequisite checking for Spec-Driven Development workflow.
49+
50+
OPTIONS:
51+
--json Output in JSON format
52+
--require-tasks Require tasks.md to exist (for implementation phase)
53+
--include-tasks Include tasks.md in AVAILABLE_DOCS list
54+
--paths-only Only output path variables (no prerequisite validation)
55+
--help, -h Show this help message
56+
57+
EXAMPLES:
58+
# Check task prerequisites (plan.md required)
59+
./check-prerequisites.sh --json
60+
61+
# Check implementation prerequisites (plan.md + tasks.md required)
62+
./check-prerequisites.sh --json --require-tasks --include-tasks
63+
64+
# Get feature paths only (no validation)
65+
./check-prerequisites.sh --paths-only
66+
67+
EOF
68+
exit 0
69+
;;
70+
*)
71+
echo "ERROR: Unknown option '$arg'. Use --help for usage information." >&2
72+
exit 1
73+
;;
74+
esac
75+
done
76+
77+
# Source common functions
78+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
79+
source "$SCRIPT_DIR/common.sh"
80+
81+
# Get feature paths and validate branch
82+
eval $(get_feature_paths)
83+
check_feature_branch "$CURRENT_BRANCH" "$HAS_GIT" || exit 1
84+
85+
# If paths-only mode, output paths and exit (support JSON + paths-only combined)
86+
if $PATHS_ONLY; then
87+
if $JSON_MODE; then
88+
# Minimal JSON paths payload (no validation performed)
89+
printf '{"REPO_ROOT":"%s","BRANCH":"%s","FEATURE_DIR":"%s","FEATURE_SPEC":"%s","IMPL_PLAN":"%s","TASKS":"%s"}\n' \
90+
"$REPO_ROOT" "$CURRENT_BRANCH" "$FEATURE_DIR" "$FEATURE_SPEC" "$IMPL_PLAN" "$TASKS"
91+
else
92+
echo "REPO_ROOT: $REPO_ROOT"
93+
echo "BRANCH: $CURRENT_BRANCH"
94+
echo "FEATURE_DIR: $FEATURE_DIR"
95+
echo "FEATURE_SPEC: $FEATURE_SPEC"
96+
echo "IMPL_PLAN: $IMPL_PLAN"
97+
echo "TASKS: $TASKS"
98+
fi
99+
exit 0
100+
fi
101+
102+
# Validate required directories and files
103+
if [[ ! -d "$FEATURE_DIR" ]]; then
104+
echo "ERROR: Feature directory not found: $FEATURE_DIR" >&2
105+
echo "Run /speckit.specify first to create the feature structure." >&2
106+
exit 1
107+
fi
108+
109+
if [[ ! -f "$IMPL_PLAN" ]]; then
110+
echo "ERROR: plan.md not found in $FEATURE_DIR" >&2
111+
echo "Run /speckit.plan first to create the implementation plan." >&2
112+
exit 1
113+
fi
114+
115+
# Check for tasks.md if required
116+
if $REQUIRE_TASKS && [[ ! -f "$TASKS" ]]; then
117+
echo "ERROR: tasks.md not found in $FEATURE_DIR" >&2
118+
echo "Run /speckit.tasks first to create the task list." >&2
119+
exit 1
120+
fi
121+
122+
# Build list of available documents
123+
docs=()
124+
125+
# Always check these optional docs
126+
[[ -f "$RESEARCH" ]] && docs+=("research.md")
127+
[[ -f "$DATA_MODEL" ]] && docs+=("data-model.md")
128+
129+
# Check contracts directory (only if it exists and has files)
130+
if [[ -d "$CONTRACTS_DIR" ]] && [[ -n "$(ls -A "$CONTRACTS_DIR" 2>/dev/null)" ]]; then
131+
docs+=("contracts/")
132+
fi
133+
134+
[[ -f "$QUICKSTART" ]] && docs+=("quickstart.md")
135+
136+
# Include tasks.md if requested and it exists
137+
if $INCLUDE_TASKS && [[ -f "$TASKS" ]]; then
138+
docs+=("tasks.md")
139+
fi
140+
141+
# Output results
142+
if $JSON_MODE; then
143+
# Build JSON array of documents
144+
if [[ ${#docs[@]} -eq 0 ]]; then
145+
json_docs="[]"
146+
else
147+
json_docs=$(printf '"%s",' "${docs[@]}")
148+
json_docs="[${json_docs%,}]"
149+
fi
150+
151+
printf '{"FEATURE_DIR":"%s","AVAILABLE_DOCS":%s}\n' "$FEATURE_DIR" "$json_docs"
152+
else
153+
# Text output
154+
echo "FEATURE_DIR:$FEATURE_DIR"
155+
echo "AVAILABLE_DOCS:"
156+
157+
# Show status of each potential document
158+
check_file "$RESEARCH" "research.md"
159+
check_file "$DATA_MODEL" "data-model.md"
160+
check_dir "$CONTRACTS_DIR" "contracts/"
161+
check_file "$QUICKSTART" "quickstart.md"
162+
163+
if $INCLUDE_TASKS; then
164+
check_file "$TASKS" "tasks.md"
165+
fi
166+
fi

0 commit comments

Comments
 (0)