feat: implement integrated JIT benchmarking suite #605

Kesavaraja67 · 2026-01-15T12:13:53Z

Summary

Implements comprehensive JIT compiler benchmarking suite for Cortex operations as requested in cortexlinux/cortex-pro#3. Benchmarks CLI startup, command parsing, cache operations, and response streaming with Python 3.13+ experimental JIT support.

Related Issue

Closes cortexlinux/cortex-pro#3

Type of Change

New feature
Bug fix
Breaking change
Documentation update

AI Disclosure

AI/IDE/Agents used
Used Claude AI to help structure the benchmarking module and ensure proper error handling patterns

Testing

Test Coverage: 89% for jit_benchmark.py

Tested on:

Python 3.12.x (baseline, no JIT)
Python 3.13.x (without JIT)
Python 3.13.x (with PYTHON_JIT=1)

Commands tested:

cortex jit-benchmark                 # Run all benchmarks
cortex jit-benchmark list            # List available benchmarks  
cortex jit-benchmark info            # Show JIT status
cortex jit-benchmark run -b cli      # Run specific benchmark
cortex jit-benchmark run -i 50       # Custom iterations
cortex jit-benchmark run -o file.json # Export results
cortex jit-benchmark compare --baseline b.json --jit j.json

All benchmarks complete in <60 seconds.

Demo

Issue_jit_benchmark.mp4

Changes

New Files

cortex/jit_benchmark.py - Main benchmarking module (400+ lines)
tests/test_jit_benchmark.py - Comprehensive test suite (300+ lines)

Modified Files

cortex/cli.py - Added jit-benchmark command integration

Features

4 benchmark categories (CLI, parsing, cache, streaming)
JIT detection and status reporting
Statistical analysis (mean, median, stdev, min, max)
JSON export for cross-version comparison
Visual comparison tables with speedup calculations
Performance recommendations
Rich UI integration matching Cortex style
Comprehensive error handling
Type hints and docstrings throughout

Acceptance Criteria Met

Runs in under 60 seconds
Visual score output (Rich tables)
Model recommendations based on score
Saves results to history (JSON export)
Comparable across systems (standardized JSON format)

Checklist

Tests pass locally (pytest tests/test_jit_benchmark.py -v)
Code follows style guide (black formatted)
Documentation updated (README included)
Commit messages are clear
No breaking changes
CLA signed (will sign when prompted)

Pytest Coverage

Video.Project.7.mp4

Summary by CodeRabbit

New Features
- Added a JIT benchmarking suite and top-level CLI command to run, list, show status, export, and compare benchmarks (baseline vs JIT). Produces human-readable tables, JSON export, and performance recommendations.
Tests
- Comprehensive test coverage for benchmark execution, comparison, export, CLI flows, and reporting.
Documentation
- README and dedicated docs page with usage, commands, categories, methodology, and examples.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T12:14:09Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a new JIT benchmarking subsystem and CLI integration: cortex/jit_benchmark.py implements benchmarks, comparison/export, and recommendations; cortex/cli.py exposes jit-benchmark subcommands; tests, README, and docs added for usage and examples.

Changes

Cohort / File(s)	Summary
CLI integration `cortex/cli.py`	Add `jit_benchmark()` handler to `CortexCLI`, register top-level `jit-benchmark` command and subcommands (`run`, `list`, `info`, `compare`), parse options (`--benchmark`, `--iterations`, `--output`, `--baseline`, `--jit`), and route to `run_jit_benchmark`.
JIT benchmarking module `cortex/jit_benchmark.py`	New module providing data models (`BenchmarkResult`, `BenchmarkComparison`, `BenchmarkCategory`), `JITBenchmark` engine with bench methods (`_bench_cli_startup`, `_bench_command_parsing`, `_bench_cache_operations`, `_bench_response_streaming`), runner APIs (`run_all_benchmarks`, `run_benchmark`, `list_benchmarks`), reporting/export (`display_results`, `export_json`), comparison (`compare_results`), info (`show_jit_info`), and CLI entry `run_jit_benchmark`.
Tests `tests/test_jit_benchmark.py`	New comprehensive test suite covering serialization, comparison logic, JIT detection, time formatting, individual and aggregated benchmarks, JSON export/import, CLI paths (`info`, `list`, `run`, `compare`), and recommendation generation (uses mocks and temp files).
Docs & README `README.md`, `docs/JIT_BENCHMARK.md`	Add feature entry and usage examples to README; add `docs/JIT_BENCHMARK.md` describing CLI usage, available tests, iterations/output flags, JSON export, comparison workflow, and enabling Python JIT.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant JITBenchmark
    participant BenchFunc as Benchmark<br/>Functions
    User->>CLI: cortex jit-benchmark run --benchmark startup --iterations 100
    CLI->>JITBenchmark: run_jit_benchmark(action="run", benchmark_name="startup", iterations=100)
    JITBenchmark->>JITBenchmark: _detect_jit()
    JITBenchmark->>BenchFunc: _bench_cli_startup()
    BenchFunc-->>JITBenchmark: BenchmarkResult
    JITBenchmark->>JITBenchmark: aggregate & format results
    JITBenchmark-->>CLI: print results / exit code
    CLI-->>User: formatted output

sequenceDiagram
    participant User
    participant CLI
    participant CompareFunc as compare_results()
    participant FileSystem as File System
    User->>CLI: cortex jit-benchmark compare --baseline baseline.json --jit jit.json
    CLI->>CompareFunc: compare_results(baseline.json, jit.json)
    CompareFunc->>FileSystem: read baseline.json
    FileSystem-->>CompareFunc: baseline data
    CompareFunc->>FileSystem: read jit.json
    FileSystem-->>CompareFunc: jit data
    CompareFunc->>CompareFunc: compute BenchmarkComparison entries & summary
    CompareFunc-->>CLI: render Rich table (comparisons)
    CLI-->>User: comparison report

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

Suyashd999
Anshgrover23

Poem

🐇
I hop through code and time each beat,
Startup, parse, cache—no step too fleet.
Baseline meets JIT, the numbers gleam,
I nibble data, chase the dream.
Benchmark done — a measured leap! 🚀

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly summarizes the main feature: implementing a JIT benchmarking suite for Cortex operations.
Description check	✅ Passed	Description comprehensively covers objectives, testing, features, and acceptance criteria while following the template structure with all required sections.
Linked Issues check	✅ Passed	PR successfully implements all objectives from `#275`: benchmark suite with 4 categories, JIT detection, visual reports, JSON export, CLI integration, sub-60-second runtime, and cross-version comparisons.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to implementing the JIT benchmarking suite requested in `#275`; no unrelated modifications detected across new files, CLI integration, or documentation updates.
Docstring Coverage	✅ Passed	Docstring coverage is 93.10% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f13d0c and 2d4dbac.

📒 Files selected for processing (1)

cortex/cli.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide for Python code
Include type hints in Python code
Add docstrings for all public APIs in Python code
Use dry-run mode by default for all installation operations
Do not use silent sudo - require explicit user confirmation for privilege escalation
Implement Firejail sandboxing for execution of untrusted code
Log all operations to ~/.cortex/history.db for audit purposes

Files:

cortex/cli.py

🧬 Code graph analysis (1)

cortex/cli.py (1)

cortex/jit_benchmark.py (1)

run_jit_benchmark (403-429)

🔇 Additional comments (5)

cortex/cli.py (5)

26-26: LGTM — import is clear and scoped.

784-806: Confirm audit logging for JIT benchmark runs.
This path delegates to run_jit_benchmark without any visible audit record. Please verify that benchmark executions are logged to ~/.cortex/history.db (either inside cortex/jit_benchmark.py or here) to meet the audit requirement. As per coding guidelines.

2891-2894: Help table update looks good.

2979-3008: CLI subcommands and flags look solid.

3656-3661: Nice default action fallback for jit-benchmark.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-01-15T12:14:12Z

Summary of Changes

Hello @Kesavaraja67, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust JIT compiler benchmarking suite for Cortex operations. Its primary purpose is to provide developers and users with tools to measure and compare the performance impact of Python 3.13+'s experimental JIT compilation across critical application areas. By offering detailed metrics, visual comparisons, and actionable recommendations, this feature empowers users to understand and optimize Cortex's execution speed, particularly in environments where JIT can offer significant gains.

Highlights

Integrated JIT Benchmarking Suite: A comprehensive benchmarking suite has been implemented to evaluate the performance of Cortex operations with Python 3.13+'s experimental JIT compiler.
Diverse Benchmark Categories: The suite includes benchmarks for key Cortex operations such as CLI startup, command parsing, cache operations, and response streaming.
Detailed Performance Analysis: Results include statistical analysis (mean, median, standard deviation, min, max), JIT detection, and performance recommendations, presented with Rich UI integration.
Comparison and Export Capabilities: The suite supports exporting results to JSON for cross-version comparison and features visual comparison tables to highlight speedup and percentage improvement between baseline and JIT-enabled runs.
CLI Integration: New cortex jit-benchmark commands are added to the CLI, allowing users to run all benchmarks, specific benchmarks, list available benchmarks, view JIT status, and compare results.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive and well-structured JIT benchmarking suite. The code is clean, well-documented, and includes a thorough test suite. The CLI integration is also well-implemented. My feedback focuses on improving maintainability by reducing code duplication and magic numbers, increasing robustness against malformed input files, and adopting more modern Python syntax for type hints. Overall, this is a great feature addition.

cortex/jit_benchmark.py

Anshgrover23

@Kesavaraja67 Docs file is missing.

Kesavaraja67 · 2026-01-15T13:34:42Z

@Anshgrover23 will address that shortly.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

README.md (1)
203-203: Consider showing subcommand structure for consistency.

The command reference entry for cortex jit-benchmark doesn't indicate that it accepts subcommands, unlike other commands in the table that show their parameter structure (e.g., cortex install <query>, cortex rollback <id>).
📋 Suggested improvement

Consider updating to show the subcommand structure:
-| `cortex jit-benchmark` | Run Python 3.13+ JIT performance benchmarks |
+| `cortex jit-benchmark <subcommand>` | Run, compare, and analyze Python 3.13+ JIT performance benchmarks |
Or, if space permits, expand to multiple rows showing key subcommands like the cortex install entries do.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ca71bc and 7196d14.

📒 Files selected for processing (2)

README.md
docs/JIT_BENCHMARK.md

✅ Files skipped from review due to trivial changes (1)

docs/JIT_BENCHMARK.md

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2026-01-12T20:51:13.828Z

Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-12T20:51:13.828Z
Learning: Add documentation for all new features

Applied to files:

README.md

🔇 Additional comments (2)

README.md (2)

72-72: LGTM!

The feature table entry is clear and accurately describes the JIT benchmarking capability.

395-395: LGTM!

The completed feature entry accurately reflects the new JIT benchmarking capability.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

README.md

…dback

Kesavaraja67 · 2026-01-15T14:29:54Z

Hi @Anshgrover23 i have added documentation and also updated with the main repo and also linted my code but still the linting is failing.

Anshgrover23

@Kesavaraja67 Lint is failing on main. Will review once it’s fixed.

github-actions · 2026-01-15T20:45:58Z

CLA Verification Passed

All contributors have signed the CLA.

Contributor	Signed As
@Kesavaraja67	@Kesavaraja67
@Anshgrover23	@Anshgrover23

Anshgrover23 · 2026-01-15T20:46:02Z

@Kesavaraja67 Kindly pull the latest changes, lint issue is fixed on main.

…67/cortex into feat/jit-benchmarking

Kesavaraja67 · 2026-01-16T06:21:59Z

@Anshgrover23 added documentation and also updated with main branch.
PR is ready for the review.

…67/cortex into feat/jit-benchmarking

sonarqubecloud · 2026-01-16T10:06:49Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Kesavaraja67

Addressed the requested changes.

Anshgrover23 · 2026-01-16T12:07:06Z

@Kesavaraja67 Issue is shifted to Pro, as we are internally handling it now.

Kesavaraja67 added 2 commits January 15, 2026 17:26

feat: implement JIT benchmarking suite and comparison logic

692bc52

Merge remote-tracking branch 'upstream/main' into feat/jit-benchmarking

67223d4

Kesavaraja67 requested review from Anshgrover23, Suyashd999 and mikejmorgan-ai as code owners January 15, 2026 12:13

gemini-code-assist bot reviewed Jan 15, 2026

View reviewed changes

Kesavaraja67 added 2 commits January 15, 2026 18:49

refactor: address AI review feedback and improve benchmark robustness

8a610fe

Merge branch 'main' into feat/jit-benchmarking

2ca71bc

Anshgrover23 requested changes Jan 15, 2026

View reviewed changes

Kesavaraja67 added 3 commits January 15, 2026 19:35

chore: Added Documentation

4cdadfa

Merge remote-tracking branch 'upstream/main' into feat/jit-benchmarking

7c0b580

docs: add JIT benchmarking documentation and update main README

7196d14

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

README.md Show resolved Hide resolved

docs: complete jit-benchmark documentation per gemini-code-assist fee…

8f13d0c

…dback

Anshgrover23 assigned Kesavaraja67 Jan 15, 2026

Anshgrover23 marked this pull request as draft January 15, 2026 15:37

Anshgrover23 requested changes Jan 15, 2026

View reviewed changes

Merge branch 'main' into feat/jit-benchmarking

f5ca5c7

Kesavaraja67 and others added 5 commits January 16, 2026 08:41

Merge remote-tracking branch 'upstream/main' into feat/jit-benchmarking

dacce6d

Merge branch 'main' into feat/jit-benchmarking

0cce973

[autofix.ci] apply automated fixes

8a931c8

chore: sync upstream fixes and documentation updates

bcb92f6

Merge branch 'feat/jit-benchmarking' of https://github.com/Kesavaraja…

3cea92f

…67/cortex into feat/jit-benchmarking

Anshgrover23 and others added 3 commits January 16, 2026 15:18

Merge branch 'main' into feat/jit-benchmarking

b726294

Merge remote-tracking branch 'upstream/main' into feat/jit-benchmarking

e0a636d

Merge branch 'feat/jit-benchmarking' of https://github.com/Kesavaraja…

2d4dbac

…67/cortex into feat/jit-benchmarking

Kesavaraja67 commented Jan 16, 2026

View reviewed changes

Kesavaraja67 marked this pull request as ready for review January 16, 2026 10:11

Anshgrover23 closed this Jan 16, 2026

Uh oh!

feat: implement integrated JIT benchmarking suite #605

feat: implement integrated JIT benchmarking suite #605

Conversation

Kesavaraja67 commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Type of Change

AI Disclosure

Testing

Demo

Changes

New Files

Modified Files

Features

Acceptance Criteria Met

Checklist

Pytest Coverage

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot commented Jan 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Anshgrover23 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kesavaraja67 commented Jan 15, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kesavaraja67 commented Jan 15, 2026

Uh oh!

Anshgrover23 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CLA Verification Passed

Uh oh!

Anshgrover23 commented Jan 15, 2026

Uh oh!

Kesavaraja67 commented Jan 16, 2026

Uh oh!

sonarqubecloud bot commented Jan 16, 2026

Quality Gate passed

Uh oh!

Kesavaraja67 left a comment

Choose a reason for hiding this comment

Uh oh!

Anshgrover23 commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Kesavaraja67 commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Anshgrover23 left a comment •

edited

Loading

github-actions bot commented Jan 15, 2026 •

edited

Loading