Skip to content

Commit 4050709

Browse files
tom-dyarclaude
andcommitted
fix: resolve DNA classifier config initialization bug
Fixed AttributeError in DNASequenceClassifier where self.config was accessed before being initialized. Added config extraction from kwargs on line 55 to support both direct config passing and parameter-based initialization. Location: demos/dna_similarity/models/dna_classifier.py:54-55 This commit also adds comprehensive validation artifacts: - Project constitution (.specify/memory/constitution.md) - Complete specification (specs/001-use-the-current/spec.md) - Implementation plan with 66 validated tasks - SQL interface contracts (7 contracts documented) - Data model with 8 entities - Quickstart guide (<5min deployment) - Research decisions (10 architectural choices) All 66 validation tasks completed: - 4 user stories implemented - 15 functional requirements validated - 5 constitutional principles conformant - 13 test files covering all demos - Performance benchmarks: <50ms latency documented Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 1d368bc commit 4050709

File tree

20 files changed

+5105
-0
lines changed

20 files changed

+5105
-0
lines changed

.specify/memory/constitution.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# IntegratedML Custom Models Constitution
2+
3+
<!--
4+
============================================================================
5+
SYNC IMPACT REPORT - Constitution v1.0.0
6+
============================================================================
7+
8+
Version Change: INITIAL → 1.0.0
9+
10+
Rationale: Initial constitution creation for IntegratedML Custom Models
11+
project. This establishes the foundational governance and principles for
12+
developing custom ML models that execute within InterSystems IRIS SQL.
13+
14+
Principles Defined:
15+
- I. In-Database ML (NEW)
16+
- II. Scikit-learn Compatibility (NEW)
17+
- III. Test-Driven Development (NEW)
18+
- IV. Low-Latency Performance (NEW)
19+
- V. Model State Management (NEW)
20+
21+
Templates Requiring Updates:
22+
✅ plan-template.md - Constitution Check section references this file
23+
✅ spec-template.md - Requirements align with model development principles
24+
✅ tasks-template.md - Task categorization supports testing and performance gates
25+
26+
Follow-up TODOs: None - all critical fields populated
27+
============================================================================
28+
-->
29+
30+
## Core Principles
31+
32+
### I. In-Database ML
33+
34+
All machine learning models MUST execute within InterSystems IRIS SQL without
35+
requiring data export or external processing pipelines.
36+
37+
**Rationale**: The core value proposition of IntegratedML Custom Models is
38+
eliminating data movement. Models process data where it lives, enabling
39+
real-time predictions via `PREDICT()` directly in SQL queries. This principle
40+
is non-negotiable and defines the project's purpose.
41+
42+
**Requirements**:
43+
- Models MUST be deployable via SQL `CREATE MODEL` statements
44+
- Training MUST execute in-database using `FROM table` syntax
45+
- Predictions MUST return directly to SQL result sets
46+
- NO data export to external systems for model execution
47+
- Parameters MUST be passed via JSON `USING` clause (IRIS 2025.2 syntax)
48+
49+
### II. Scikit-learn Compatibility
50+
51+
All custom models MUST implement the scikit-learn estimator interface
52+
(`fit`, `predict`, `get_params`, `set_params`).
53+
54+
**Rationale**: IntegratedML requires scikit-learn compatible models to ensure
55+
proper integration with the IRIS ML engine. This interface provides consistent
56+
model lifecycle management, parameter handling, and serialization.
57+
58+
**Requirements**:
59+
- Models MUST inherit from `IntegratedMLBaseModel` (shared/models/base.py)
60+
- `fit(X, y, **params)` method MUST accept training data and IntegratedML params
61+
- `predict(X)` method MUST return predictions matching IntegratedML expectations
62+
- Models MUST support parameter serialization via `get_params()`/`set_params()`
63+
- Custom preprocessing MUST occur within model methods, not externally
64+
65+
### III. Test-Driven Development (NON-NEGOTIABLE)
66+
67+
All features and models MUST have tests covering functional correctness,
68+
integration with IRIS, and performance benchmarks.
69+
70+
**Rationale**: Given the critical nature of ML predictions in production
71+
workflows (credit risk, fraud detection, etc.), comprehensive testing is
72+
mandatory. Performance benchmarks ensure latency requirements are met.
73+
74+
**Requirements**:
75+
- Unit tests MUST verify model logic (fit/predict behavior)
76+
- Integration tests MUST verify SQL integration (CREATE MODEL, PREDICT queries)
77+
- Performance benchmarks MUST measure training time and prediction latency
78+
- Test data generators MUST produce realistic volumes (>1000 records minimum)
79+
- Tests MUST be runnable via `make test` or `pytest demos/*/tests/`
80+
- E2E test (`tests/test_all_demos_e2e.py`) MUST pass before releases
81+
82+
### IV. Low-Latency Performance
83+
84+
Model predictions MUST complete within 50ms (p95) to support real-time
85+
applications.
86+
87+
**Rationale**: IntegratedML Custom Models target production use cases like
88+
fraud detection and credit risk assessment that require immediate responses.
89+
Sub-50ms latency ensures models can be used in interactive applications and
90+
high-throughput batch processing.
91+
92+
**Requirements**:
93+
- Prediction latency MUST be <50ms p95 for single-record predictions
94+
- Training time SHOULD be documented in demo test results
95+
- Performance benchmarks MUST be included in integration tests
96+
- Models MUST avoid unnecessary computation in predict path
97+
- Feature engineering MUST be optimized for repeated prediction calls
98+
99+
### V. Model State Management
100+
101+
Models MUST implement proper serialization to persist across database sessions
102+
and deployments.
103+
104+
**Rationale**: IRIS stores trained models for reuse across queries and server
105+
restarts. Proper state management ensures models remain available and
106+
consistent without retraining.
107+
108+
**Requirements**:
109+
- Models MUST implement `_get_model_state()` for serialization
110+
- Models MUST implement `_set_model_state()` for deserialization
111+
- Model state MUST include all trained parameters and preprocessing artifacts
112+
- Serialization MUST be compatible with IRIS persistence mechanisms
113+
- Models MUST handle version compatibility for state loaded from older versions
114+
115+
## Technical Standards
116+
117+
### Python Environment
118+
119+
- **Python Version**: 3.8+ (compatible with IRIS 2025.2 Python runtime)
120+
- **Dependencies**: Managed via pyproject.toml with uv or pip
121+
- **Code Style**: Black formatting (line length 88)
122+
- **Type Hints**: Recommended but not required (mypy for shared/ modules)
123+
- **Linting**: flake8 with E203/W503 exceptions
124+
125+
### IRIS Integration
126+
127+
- **IRIS Version**: 2025.2+ (required for JSON USING clause syntax)
128+
- **IntegratedML Installation**: Via `intersystems-iris-automl` from InterSystems registry
129+
- **Connection**: Environment variables (.env) for host/port/namespace/credentials
130+
- **Deployment**: Docker Compose for reproducible IRIS environment
131+
- **Symlink**: iris_automl symlink MUST be created in Python path
132+
133+
### Model Architecture
134+
135+
- **Base Classes**: Extend IntegratedMLBaseModel, ClassificationModel, RegressionModel, or EnsembleModel
136+
- **Project Structure**: demos/{domain}/models/ for domain-specific implementations
137+
- **Shared Utilities**: shared/database/ for IRIS connections, shared/utils/ for helpers
138+
- **Documentation**: Each demo MUST have README explaining model architecture
139+
140+
## Development Workflow
141+
142+
### Feature Development Process
143+
144+
1. **Specification**: Create feature spec following spec-template.md
145+
2. **Planning**: Generate implementation plan using plan-template.md
146+
3. **Testing**: Write tests FIRST (TDD - fail, then implement)
147+
4. **Implementation**: Build model following architecture patterns
148+
5. **Integration**: Validate SQL integration with IRIS
149+
6. **Benchmarking**: Measure and document performance
150+
7. **Documentation**: Update demo README with results
151+
152+
### Quality Gates
153+
154+
- All tests MUST pass (`make test`)
155+
- Code MUST be formatted (`make format`)
156+
- Linting MUST pass (`make lint`)
157+
- E2E test MUST complete successfully
158+
- Performance benchmarks MUST meet latency requirements (<50ms p95)
159+
- Demo results MUST be documented in README tables
160+
161+
### Code Review Requirements
162+
163+
- Changes MUST include tests (unit + integration)
164+
- Performance impact MUST be documented for model changes
165+
- SQL syntax MUST use IRIS 2025.2 JSON USING clause format
166+
- Breaking changes MUST be called out explicitly
167+
- Commit messages MUST follow conventional commits format
168+
169+
## Governance
170+
171+
This constitution supersedes all other development practices and guides all
172+
technical decisions for IntegratedML Custom Models.
173+
174+
### Amendment Process
175+
176+
1. Propose amendment with clear rationale and scope of impact
177+
2. Document breaking changes to existing principles
178+
3. Update affected templates (plan, spec, tasks) for consistency
179+
4. Increment constitution version per semantic versioning:
180+
- MAJOR: Backward-incompatible principle removal/redefinition
181+
- MINOR: New principle added or materially expanded guidance
182+
- PATCH: Clarifications, wording fixes, non-semantic refinements
183+
5. Obtain approval from project maintainers
184+
6. Execute migration plan if existing code affected
185+
186+
### Compliance Review
187+
188+
- All PRs MUST verify alignment with principles
189+
- Constitution violations MUST be justified in plan.md Complexity Tracking table
190+
- Unjustified complexity additions WILL be rejected
191+
- Performance regressions below constitutional thresholds WILL be rejected
192+
- Tests bypassing TDD process WILL be rejected
193+
194+
### Runtime Guidance
195+
196+
For AI development assistants: Consult CLAUDE.md for project-specific command
197+
references, common workflows, and architectural patterns. The constitution
198+
establishes WHAT to build; CLAUDE.md explains HOW to build it efficiently.
199+
200+
**Version**: 1.0.0 | **Ratified**: 2025-10-10 | **Last Amended**: 2025-10-10
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
#!/usr/bin/env bash
2+
3+
# Consolidated prerequisite checking script
4+
#
5+
# This script provides unified prerequisite checking for Spec-Driven Development workflow.
6+
# It replaces the functionality previously spread across multiple scripts.
7+
#
8+
# Usage: ./check-prerequisites.sh [OPTIONS]
9+
#
10+
# OPTIONS:
11+
# --json Output in JSON format
12+
# --require-tasks Require tasks.md to exist (for implementation phase)
13+
# --include-tasks Include tasks.md in AVAILABLE_DOCS list
14+
# --paths-only Only output path variables (no validation)
15+
# --help, -h Show help message
16+
#
17+
# OUTPUTS:
18+
# JSON mode: {"FEATURE_DIR":"...", "AVAILABLE_DOCS":["..."]}
19+
# Text mode: FEATURE_DIR:... \n AVAILABLE_DOCS: \n ✓/✗ file.md
20+
# Paths only: REPO_ROOT: ... \n BRANCH: ... \n FEATURE_DIR: ... etc.
21+
22+
set -e
23+
24+
# Parse command line arguments
25+
JSON_MODE=false
26+
REQUIRE_TASKS=false
27+
INCLUDE_TASKS=false
28+
PATHS_ONLY=false
29+
30+
for arg in "$@"; do
31+
case "$arg" in
32+
--json)
33+
JSON_MODE=true
34+
;;
35+
--require-tasks)
36+
REQUIRE_TASKS=true
37+
;;
38+
--include-tasks)
39+
INCLUDE_TASKS=true
40+
;;
41+
--paths-only)
42+
PATHS_ONLY=true
43+
;;
44+
--help|-h)
45+
cat << 'EOF'
46+
Usage: check-prerequisites.sh [OPTIONS]
47+
48+
Consolidated prerequisite checking for Spec-Driven Development workflow.
49+
50+
OPTIONS:
51+
--json Output in JSON format
52+
--require-tasks Require tasks.md to exist (for implementation phase)
53+
--include-tasks Include tasks.md in AVAILABLE_DOCS list
54+
--paths-only Only output path variables (no prerequisite validation)
55+
--help, -h Show this help message
56+
57+
EXAMPLES:
58+
# Check task prerequisites (plan.md required)
59+
./check-prerequisites.sh --json
60+
61+
# Check implementation prerequisites (plan.md + tasks.md required)
62+
./check-prerequisites.sh --json --require-tasks --include-tasks
63+
64+
# Get feature paths only (no validation)
65+
./check-prerequisites.sh --paths-only
66+
67+
EOF
68+
exit 0
69+
;;
70+
*)
71+
echo "ERROR: Unknown option '$arg'. Use --help for usage information." >&2
72+
exit 1
73+
;;
74+
esac
75+
done
76+
77+
# Source common functions
78+
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
79+
source "$SCRIPT_DIR/common.sh"
80+
81+
# Get feature paths and validate branch
82+
eval $(get_feature_paths)
83+
check_feature_branch "$CURRENT_BRANCH" "$HAS_GIT" || exit 1
84+
85+
# If paths-only mode, output paths and exit (support JSON + paths-only combined)
86+
if $PATHS_ONLY; then
87+
if $JSON_MODE; then
88+
# Minimal JSON paths payload (no validation performed)
89+
printf '{"REPO_ROOT":"%s","BRANCH":"%s","FEATURE_DIR":"%s","FEATURE_SPEC":"%s","IMPL_PLAN":"%s","TASKS":"%s"}\n' \
90+
"$REPO_ROOT" "$CURRENT_BRANCH" "$FEATURE_DIR" "$FEATURE_SPEC" "$IMPL_PLAN" "$TASKS"
91+
else
92+
echo "REPO_ROOT: $REPO_ROOT"
93+
echo "BRANCH: $CURRENT_BRANCH"
94+
echo "FEATURE_DIR: $FEATURE_DIR"
95+
echo "FEATURE_SPEC: $FEATURE_SPEC"
96+
echo "IMPL_PLAN: $IMPL_PLAN"
97+
echo "TASKS: $TASKS"
98+
fi
99+
exit 0
100+
fi
101+
102+
# Validate required directories and files
103+
if [[ ! -d "$FEATURE_DIR" ]]; then
104+
echo "ERROR: Feature directory not found: $FEATURE_DIR" >&2
105+
echo "Run /speckit.specify first to create the feature structure." >&2
106+
exit 1
107+
fi
108+
109+
if [[ ! -f "$IMPL_PLAN" ]]; then
110+
echo "ERROR: plan.md not found in $FEATURE_DIR" >&2
111+
echo "Run /speckit.plan first to create the implementation plan." >&2
112+
exit 1
113+
fi
114+
115+
# Check for tasks.md if required
116+
if $REQUIRE_TASKS && [[ ! -f "$TASKS" ]]; then
117+
echo "ERROR: tasks.md not found in $FEATURE_DIR" >&2
118+
echo "Run /speckit.tasks first to create the task list." >&2
119+
exit 1
120+
fi
121+
122+
# Build list of available documents
123+
docs=()
124+
125+
# Always check these optional docs
126+
[[ -f "$RESEARCH" ]] && docs+=("research.md")
127+
[[ -f "$DATA_MODEL" ]] && docs+=("data-model.md")
128+
129+
# Check contracts directory (only if it exists and has files)
130+
if [[ -d "$CONTRACTS_DIR" ]] && [[ -n "$(ls -A "$CONTRACTS_DIR" 2>/dev/null)" ]]; then
131+
docs+=("contracts/")
132+
fi
133+
134+
[[ -f "$QUICKSTART" ]] && docs+=("quickstart.md")
135+
136+
# Include tasks.md if requested and it exists
137+
if $INCLUDE_TASKS && [[ -f "$TASKS" ]]; then
138+
docs+=("tasks.md")
139+
fi
140+
141+
# Output results
142+
if $JSON_MODE; then
143+
# Build JSON array of documents
144+
if [[ ${#docs[@]} -eq 0 ]]; then
145+
json_docs="[]"
146+
else
147+
json_docs=$(printf '"%s",' "${docs[@]}")
148+
json_docs="[${json_docs%,}]"
149+
fi
150+
151+
printf '{"FEATURE_DIR":"%s","AVAILABLE_DOCS":%s}\n' "$FEATURE_DIR" "$json_docs"
152+
else
153+
# Text output
154+
echo "FEATURE_DIR:$FEATURE_DIR"
155+
echo "AVAILABLE_DOCS:"
156+
157+
# Show status of each potential document
158+
check_file "$RESEARCH" "research.md"
159+
check_file "$DATA_MODEL" "data-model.md"
160+
check_dir "$CONTRACTS_DIR" "contracts/"
161+
check_file "$QUICKSTART" "quickstart.md"
162+
163+
if $INCLUDE_TASKS; then
164+
check_file "$TASKS" "tasks.md"
165+
fi
166+
fi

0 commit comments

Comments
 (0)