🩺 Immunization Charts (python Version)

Current version: v0.1.0

📘 Introduction

This project provides a Python and Bash-based workflow for generating personalized immunization history charts and notice letters for children overdue for mandated vaccinations under the Child Care and Early Years Act (CCEYA) and ISPA.

Reports are generated in PDF format using Typst and a custom report template.

⚙️ Environment Setup

Written in Bash and Python
Uses Typst for typesetting
Python dependencies managed via pyproject.toml and uv

Virtual Environment

Install all dependencies (and create the .venv if it doesn't yet exist) before doing anything else:

uv sync
source .venv/bin/activate

ℹ️ uv sync only installs the core runtime packages by default. If you're planning to run tests or other dev tools, include the development group once via uv sync --group dev (or uv sync --all-groups if you prefer everything).

Code Quality & Pre-commit Hooks

To enable automatic code linting and formatting on every commit, initialize pre-commit hooks:

uv sync --group dev                 # Install development tools (pre-commit, pytest, etc.)
uv run pre-commit install           # Initialize git hooks

Now, whenever you commit changes, the pre-commit hook automatically:

Lints your code with ruff check --fix (auto-fixes issues when possible)
Formats your code with ruff format (enforces consistent style)

If any check fails, your commit is blocked until you fix the issues. You can also run checks manually anytime:

uv run pre-commit run --all-files   # Check all files

🛠️ Pipeline Overview & Architecture

This section describes how the pipeline orchestrates data flow and manages state across processing steps.

Module Organization

The pipeline/ package is organized by pipeline function, not by layer. Each step has its own module:

Step	Module	Purpose
1	`prepare_output.py`	Output directory setup
2	`preprocess.py`	Data validation & normalization → JSON artifact
3	`generate_qr_codes.py`	QR code PNG generation (optional)
4	`generate_notices.py`	Typst template rendering
5	`compile_notices.py`	Typst → PDF compilation
6	`validate_pdfs.py`	PDF validation (rules, summary, JSON report)
7	`encrypt_notice.py`	PDF encryption (optional)
8	`bundle_pdfs.py`	PDF bundling & grouping (optional)
9	`cleanup.py`	Intermediate file cleanup

Supporting modules: orchestrator.py (orchestrator), config_loader.py, data_models.py, enums.py, utils.py.

Template modules (in templates/ package): en_template.py, fr_template.py (Typst template rendering). For module structure questions, see docs/CODE_ANALYSIS_STANDARDS.md.

Orchestration Model

The pipeline follows a sequential, stateless step architecture where each processing step:

Reads fresh input from disk (either Excel files or the preprocessed JSON artifact)
Processes data independently without holding state between steps
Writes output to disk for the next step to discover
Never passes in-memory objects between steps via the orchestrator

This design ensures:

Modularity: Steps can be understood, tested, and modified in isolation
Resilience: Each step can be re-run independently if needed (e.g., if Step 4 fails, fix the code and re-run Steps 4-9 without reprocessing)
Simplicity: No complex data structures passed between components
Reproducibility: Same input always produces same output across runs

Data Management

The pipeline produces a single normalized JSON artifact (preprocessed_clients_<run_id>.json) during preprocessing. This artifact serves as the canonical source of truth:

Created by: preprocess.py (Step 2) - contains sorted clients with normalized metadata
Consumed by: generate_qr_codes.py (Step 3), generate_notices.py (Step 4), and bundle_pdfs.py (Step 8)
Format: Single JSON file with run metadata, total client count, warnings, and per-client details

Client data flows through specialized handlers during generation:

Stage	Input	Processing	Output
Preprocessing	Excel file	Data normalization, sorting, age calculation	`preprocessed_clients_<run_id>.json`
QR Generation	Preprocessed JSON	Payload formatting → PNG generation	PNG images in `artifacts/qr_codes/`
Typst Template	Preprocessed JSON	Template rendering with QR reference	`.typ` files in `artifacts/typst/`
PDF Compilation	Filesystem glob of `.typ` files	Typst subprocess	PDF files in `pdf_individual/`
PDF Bundling	In-memory `ClientArtifact` objects	Grouping and manifest generation	Bundle PDFs in `pdf_combined/`

Each step reads the JSON fresh when needed—there is no shared in-memory state passed between steps through the orchestrator.

Client Ordering

Clients are deterministically ordered during preprocessing by: school name → last name → first name → client ID, ensuring consistent, reproducible output across pipeline runs. Each client receives a deterministic sequence number (00001, 00002, etc.) that persists through all downstream operations.

🚦 Pipeline Steps

The main pipeline orchestrator (orchestrator.py) automates the end-to-end workflow for generating immunization notices and charts. Below are the nine sequential steps:

Output Preparation (prepare_output.py)
Prepares the output directory, optionally removing existing contents while preserving logs.
Preprocessing (preprocess.py)
Cleans, validates, and structures input data into a normalized JSON artifact (preprocessed_clients_<run_id>.json).
Generating QR Codes (generate_qr_codes.py, optional)
Generates QR code PNG files from templated payloads. Skipped if qr.enabled: false in parameters.yaml.
Generating Notices (generate_notices.py)
Renders Typst templates (.typ files) for each client from the preprocessed artifact, with QR code references.
Compiling Notices (compile_notices.py)
Compiles Typst templates into individual PDF notices using the typst command-line tool.
Validating PDFs (validate_pdfs.py)
Runs rule-based PDF validation and prints a summary. Writes a JSON report to output/metadata/<lang>_validation_<run_id>.json. Rules and severities are configured in config/parameters.yaml (see config README). Default rules include:
- exactly_two_pages (ensure each notice is 2 pages)
- signature_overflow (detect signature block on page 2 using invisible markers) Severity levels: disabled, warn, error (error halts the pipeline).
Encrypting PDFs (encrypt_notice.py, optional)
When encryption.enabled: true, encrypts individual PDFs using client metadata as password.
Bundling PDFs (bundle_pdfs.py, optional)
When bundling.bundle_size > 0, combines individual PDFs into bundles with optional grouping by school or board. Runs independently of encryption.
Cleanup (cleanup.py)
Removes intermediate files (.typ, .json, per-client PDFs) if pipeline.keep_intermediate_files: false. Optionally deletes unencrypted PDFs if cleanup.delete_unencrypted_pdfs: true.

Usage Example:

uv run viper <input_file> <language> [--output PATH]

Required Arguments:

<input_file>: Name of the input file (e.g., students.xlsx)
<language>: Language code (en or fr)

Optional Arguments:

--input PATH: Input directory (default: ../input)
--output PATH: Output directory (default: ../output)
--config PATH: Configuration directory (default: ../config)
--template NAME: PHU template name within phu_templates/ (e.g., wdgph); defaults to built-in templates/ when omitted

Configuration: See the complete configuration reference and examples in config/README.md:

Configuration overview and feature flags
QR Code settings (payload templating)
PDF Validation settings (rule-based quality checks)
PDF encryption settings (password templating)
Disease/chart/translation files

Direct link: Configuration Reference

Examples:

# Basic usage
uv run viper students.xlsx en

# Override output directory
uv run viper students.xlsx en --output /tmp/output

# Use a PHU-specific template (from phu_templates/my_phu/)
uv run viper students.xlsx en --template my_phu

Using PHU-Specific Templates

Public Health Units can create custom template directories for organization-specific branding and layouts. All PHU templates live under the phu_templates/ directory and are gitignored by default.

# Create your PHU template directory by copying defaults
cp -r templates/ phu_templates/my_phu/

# Customize templates and assets as needed, then run with your PHU template
uv run viper students.xlsx en --template my_phu

The --template argument expects a template name within phu_templates/ (flat names only; no / or \). For example, --template my_phu loads from phu_templates/my_phu/.

Each PHU template directory should contain:

conf.typ - Typst configuration and helper functions (required)
{lang}_template.py - Language modules with render_notice() for the languages you intend to generate (e.g., en_template.py for English, fr_template.py for French). Single-language templates are supported.
assets/ - Optional directory for images like logos or signatures if your templates reference them

Templates are loaded dynamically at runtime, enabling different organizations to maintain separate template sets without modifying core code. By default (when --template is not specified), the pipeline uses the built-in templates/ directory. It's recommended to start by copying from templates/ into phu_templates/<your_name>/ and customizing from there.

ℹ️ Typst preview note: The WDGPH code-server development environments render Typst files via Tinymist. The shared template at templates/conf.typ only defines helper functions, colour tokens, and table layouts that the generated notice .typ files import; it doesn't emit any pages on its own, so Tinymist has nothing to preview if attempted on this file. To examine the actual markup that uses these helpers, run the pipeline with pipeline.keep_intermediate_files: true in config/parameters.yaml so the generated notice .typ files stay in output/artifacts/ for manual inspection.

Outputs:

Processed notices and charts in the output/ directory
Log and summary information in the terminal

🧪 Running Tests

The test suite is organized in three layers (see docs/TESTING_STANDARDS.md for details):

Quick checks (unit tests, <100ms each):

uv run pytest -m unit

Integration tests (step interactions, 100ms–1s each):

uv run pytest -m integration

End-to-end tests (full pipeline, 1s–30s each):

uv run pytest -m e2e

All tests:

uv run pytest

With coverage report:

uv run pytest --cov=pipeline --cov-report=html

View coverage in htmlcov/index.html.

For CI/local development (skip slow E2E tests):

uv run pytest -m "not e2e"

✅ Before running tests, make sure you've installed the dev group at least once (uv sync --group dev) so that testing dependencies are available.

📂 Input Data

Use data extracts from Panorama PEAR
Place input files in the input/ subfolder (not tracked by Git)
Files must be .xlsx format with a single worksheet per file

Preprocessing

The preprocess.py (Step 2) module reads raw input data and produces a normalized JSON artifact.

Processing Workflow

Input: Excel file with raw client vaccination records
Processing:
- Validates schema (required columns, data types)
- Cleans and transforms client data (dates, addresses, vaccine history)
- Determines over/under 16 years old for recipient determination (uses date_notice_delivery from parameters.yaml)
- Assigns deterministic per-client sequence numbers sorted by: school → last name → first name → client ID
- Maps vaccine history against disease reference data
- Synthesizes stable school/board identifiers when missing
Output: Single JSON artifact at output/artifacts/preprocessed_clients_<run_id>.json

Logging is written to output/logs/preprocess_<run_id>.log for traceability.

Artifact Structure

The preprocessed artifact contains:

{
  "run_id": "20251023T200355",
  "language": "en",
  "total_clients": 5,
  "warnings": [],
  "clients": [
    {
      "sequence": 1,
      "client_id": "1009876545",
      "person": {"first_name": "...", "last_name": "...", "date_of_birth": "..."},
      "school": {"name": "...", "board": "..."},
      "contact": {"street_address": "...", "city": "...", "postal_code": "...", "province": "..."},
      "vaccines": {"due": "...", "received": [...]},
      "metadata": {"recipient": "...", "over_16": false}
    },
    ...
  ]
}

Configuration quick links

QR Code settings: see QR Code Configuration
PDF Encryption settings: see PDF Encryption Configuration

Changelog

See CHANGELOG.md for details of each release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🩺 Immunization Charts (python Version)

📘 Introduction

⚙️ Environment Setup

Virtual Environment

Code Quality & Pre-commit Hooks

🛠️ Pipeline Overview & Architecture

Module Organization

Orchestration Model

Data Management

Client Ordering

🚦 Pipeline Steps

Using PHU-Specific Templates

🧪 Running Tests

📂 Input Data

Preprocessing

Processing Workflow

Artifact Structure

Configuration quick links

Changelog

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 7

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 288 Commits
.github		.github
config		config
docs		docs
input		input
phu_templates		phu_templates
pipeline		pipeline
templates		templates
tests		tests
.git-changelog.toml		.git-changelog.toml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.MD		AGENTS.MD
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

License

WDGPH/immunization-charts-python

Folders and files

Latest commit

History

Repository files navigation

🩺 Immunization Charts (python Version)

📘 Introduction

⚙️ Environment Setup

Virtual Environment

Code Quality & Pre-commit Hooks

🛠️ Pipeline Overview & Architecture

Module Organization

Orchestration Model

Data Management

Client Ordering

🚦 Pipeline Steps

Using PHU-Specific Templates

🧪 Running Tests

📂 Input Data

Preprocessing

Processing Workflow

Artifact Structure

Configuration quick links

Changelog

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 7

Uh oh!

Languages

Packages