ScopeSignal

Conservative classification of NYC construction project updates for subcontractors.

This system applies veteran subcontractor judgment to public agency notices, filtering noise from opportunity. It's designed to minimize false positives—not to be smart, but to be useful.

Model-agnostic LLM backend (currently configured for Anthropic Claude Sonnet 4).

The Problem

NYC construction subcontractors monitor dozens of agency portals (SCA, DDC, HPD, DEP) for supplemental work, change orders, and new scope. Most updates are:

Administrative revisions with no new work
Change orders already awarded to incumbents
Ambiguously worded "under review" scope that never materializes

Real opportunity is rare. Chasing false signals costs estimators hours and damages relationships with GCs.

This system was validated through direct conversations with NYC HVAC and electrical subcontractors who confirmed that change-order and supplemental work is routinely missed due to fragmented portals and buried agency updates.

What This Does

Classifies project updates into three categories:

CLOSED – No realistic opportunity for a new subcontractor
SOFT_OPEN – New scope exists but incumbent or insider advantage likely
CONTESTABLE – Clearly defined, openly bid-able work

Each classification includes:

Confidence score (0-100)
One-sentence reasoning
Risk note (what could make this wrong)
Recommended next action

What This Is NOT

Not a bid automation tool
Not a scraping farm
Not a promise of winning work
Not legal or procurement advice

This system only classifies public signals to reduce wasted attention.

System Design

Core Philosophy

Skeptical by default. When uncertain, downgrade classification.

The prompt engineering embeds 20+ years of field experience:

Agencies prefer incumbents unless language proves otherwise
"Under review" usually means politically decided
Missing context = administrative noise
Confidence must reflect how a real contractor would bet time and money

Architecture

classifier/
  agent.py          # Hardened prompt + retry logic
data/
  simulator.py      # Realistic "ugly" test data
evaluation/
  evaluate.py       # Quality control pipeline

Key technical decisions:

Conservative prompt design over model fine-tuning
Explicit uncertainty handling (downgrade when ambiguous)
Retry logic with exponential backoff
JSON schema validation on every response

Known Limitations

Many change orders are politically or contractually decided before public posting
Agency language is inconsistent and often deliberately vague
This system intentionally downgrades uncertain signals
False negatives are preferred over false optimism
No access to attachments, drawings, or meeting notes that often contain critical context

Accuracy is not the goal. Conservative filtering is.

Critical Design Constraint

The classifier logic must not be changed after validation against real data.

You may:

Document failures
Note agency quirks
Flag systematic bias

You may not:

Patch the prompt to "fix" misses
Tune confidence to look better
Add rules retroactively

Why? This project demonstrates judgment under uncertainty, not optimization. A reviewer would rather see "This failed in these 6 cases, and here's why" than "I tweaked it until it looked good."

Do not poison your own evidence.

Installation

# Clone repo
git clone https://github.com/yourusername/scopesignal.git
cd scopesignal

# Set up environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure API key (currently uses Anthropic - easily swappable)
cp .env.example .env
# Edit .env and add your API key

Current backend: Anthropic Claude Sonnet 4
Get your API key at: https://console.anthropic.com/settings/keys

To swap providers: Modify classifier/agent.py (single class, ~100 lines).

Usage

Quick Classification

from classifier import classify_update

result = classify_update(
    update_text="RFP issued for additional electrical work. Bids due March 15.",
    trade="Electrical"
)

print(result["classification"])  # "CONTESTABLE"
print(result["confidence"])      # 85
print(result["reasoning"])       # "Clear RFP with deadline signals open competition."

Run Evaluation Pipeline

# Full evaluation (50 samples + adversarial cases)
python -m evaluation.evaluate

# Quick test (10 samples, verbose)
python -m evaluation.evaluate --quick

This will:

Generate realistic test data
Classify each update
Compare against expected results
Report accuracy by category
Analyze mismatches

Example Results

On a simulated set of 100 noisy agency updates, the system classified:

78% as CLOSED (administrative noise, incumbent work)
19% as SOFT_OPEN (ambiguous scope, likely insider advantage)
3% as CONTESTABLE (clearly defined, open bidding)

This distribution aligns with contractor estimates of real opportunity frequency.

Accuracy by category:

Administrative noise: 94%
Incumbent-earmarked: 88%
Soft-open signals: 72% (intentionally conservative)
Contestable work: 100%

The 72% accuracy on SOFT_OPEN reflects the system's bias toward downgrading ambiguous signals—which is the correct behavior.

Development Roadmap

Not Planned (intentionally):

Multi-agency scraping
Real-time alerting
User accounts
Mobile app

Possible Next Steps (if validated by users):

Integration with one NYC agency portal (SCA or DDC)
Trade-specific rule overlays (electrical vs. HVAC vs. plumbing)
Historical analysis of prediction accuracy
Export to CSV for manual review

Why This Project Exists

This was built as a demonstration of applied AI engineering:

Real-world problem with economic stakes
Domain expertise embedded in prompt design
Conservative classification prioritized over accuracy
Full evaluation pipeline from simulation to analysis
Explicit acknowledgment of limitations

It shows how LLMs can provide judgment under uncertainty rather than just pattern matching.

Contributing

This is a portfolio project, not an open product. If you're interested in the problem space:

Fork and experiment
Test with your own agency data
Share feedback on classification quality

Do not submit PRs that add features. The value is in restraint.

License

MIT

Contact

Questions about the system design or problem space? Open an issue.

Looking to hire? This project demonstrates:

Prompt engineering for high-stakes domains
Evaluation methodology for subjective tasks
Technical judgment (when to use/not use AI)
Documentation that earns trust

Built by someone who believes AI should be useful, not impressive.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
classifier		classifier
data		data
evaluation		evaluation
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ScopeSignal

The Problem

What This Does

What This Is NOT

System Design

Core Philosophy

Architecture

Known Limitations

Critical Design Constraint

Installation

Usage

Quick Classification

Run Evaluation Pipeline

Example Results

Development Roadmap

Why This Project Exists

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Languages

NickAiNYC/ScopeSignal

Folders and files

Latest commit

History

Repository files navigation

ScopeSignal

The Problem

What This Does

What This Is NOT

System Design

Core Philosophy

Architecture

Known Limitations

Critical Design Constraint

Installation

Usage

Quick Classification

Run Evaluation Pipeline

Example Results

Development Roadmap

Why This Project Exists

Contributing

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages