better worker allocation for gunicorn + CUDA environment #265

keighrim · 2025-11-21T03:00:33Z

addresses #243

- test_issue_243.py: Test script to replicate VRAM duplication issue - ISSUE_243_ANALYSIS.md: Initial analysis of gunicorn/CUDA issue - ISSUE_243_REAL_WORLD_ANALYSIS.md: Analysis of whisper-wrapper implementation These are investigation/documentation artifacts, not SDK changes.

Removed outdated files: - test_issue_243.py (app-level test, no longer relevant) - ISSUE_243_ANALYSIS.md (superseded) - ISSUE_243_REAL_WORLD_ANALYSIS.md (superseded) New consolidated documentation: - ISSUE_243_INVESTIGATION.md: Complete investigation with SDK-level solution Key changes from previous analysis: - Focus on SDK-level VRAM management (not app-level) - Runtime VRAM checking via enhanced _profile_cuda_memory decorator - _get_model_requirements() API for apps to declare memory needs - Conservative worker count when CUDA detected - Runtime status via ?includeVRAM=true parameter - Addresses dynamic VRAM availability (not static calculation) - Process-safe torch.cuda.empty_cache() usage documented

Updated investigation document with: - Component 5: Automatic Memory Profiling - 80% VRAM requirement for first request (conservative) - Historical measurement for subsequent requests - Hash-based filenames for race-condition-safe persistence - Atomic writes via temp file + rename - Updated request flow to show 3-level priority: 1. App override (explicit) 2. Historical measurement 3. Conservative 80% - Updated implementation checklist with new components - Revised open questions and conclusion

… isn't sufficient

…hon version that provides pamameter hashing

codecov · 2025-11-21T11:07:46Z

Codecov Report

❌ Patch coverage is 19.77401% with 142 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@69bdca9). Learn more about missing BASE report.
⚠️ Report is 2 commits behind head on develop.

Files with missing lines	Patch %	Lines
clams/app/__init__.py	18.79%	108 Missing ⚠️
clams/restify/__init__.py	6.06%	31 Missing ⚠️
clams/appmetadata/__init__.py	72.72%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop     #265   +/-   ##
==========================================
  Coverage           ?   59.45%           
==========================================
  Files              ?        6           
  Lines              ?      846           
  Branches           ?        0           
==========================================
  Hits               ?      503           
  Misses             ?      343           
  Partials           ?        0

Flag	Coverage Δ
unittests	`59.45% <19.77%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

keighrim · 2025-11-22T12:35:58Z

this turned out to be a massive over-engineering based on wrong assumption and outdated information. Closing without merge. Will start a new PR to address the issue more neatly.

claude and others added 8 commits November 8, 2025 11:48

added gpu_mem_* fields to app metadata

a56b961

gunicorn workers are configured based on the set gpu_mem_min appmetadata

d0759ac

updated cuda profiler to check available vram, reject request if vram…

6e71270

… isn't sufficient

updated documentation regarding gpu apps

1576fea

added test for gpu-related new features, cleaned up planning document

e1e0d0f

clams-bot added this to infra Nov 21, 2025

github-project-automation bot moved this to Todo in infra Nov 21, 2025

keighrim mentioned this pull request Nov 21, 2025

gunicorn, torch, and cuda #243

Closed

keighrim force-pushed the claude/investigate-issue-243-011CUvLcJcFferWXmFu4nKu1 branch from e2309be to b7579c2 Compare November 21, 2025 03:35

disabled type checker for conditional torch imports, updated mmif-pyt…

328c4c4

…hon version that provides pamameter hashing

keighrim force-pushed the claude/investigate-issue-243-011CUvLcJcFferWXmFu4nKu1 branch from b7579c2 to 328c4c4 Compare November 21, 2025 11:05

changed vram usage estimation logic, more documentation on env vars

c7b12bd

keighrim closed this Nov 22, 2025

github-project-automation bot moved this from Todo to Done in infra Nov 22, 2025

keighrim deleted the claude/investigate-issue-243-011CUvLcJcFferWXmFu4nKu1 branch November 22, 2025 12:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

better worker allocation for gunicorn + CUDA environment #265

better worker allocation for gunicorn + CUDA environment #265

Uh oh!

keighrim commented Nov 21, 2025

Uh oh!

codecov bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

keighrim commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

better worker allocation for gunicorn + CUDA environment #265

better worker allocation for gunicorn + CUDA environment #265

Uh oh!

Conversation

keighrim commented Nov 21, 2025

Uh oh!

codecov bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

keighrim commented Nov 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Nov 21, 2025 •

edited

Loading