-
Notifications
You must be signed in to change notification settings - Fork 1
better worker allocation for gunicorn + CUDA environment #265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better worker allocation for gunicorn + CUDA environment #265
Conversation
- test_issue_243.py: Test script to replicate VRAM duplication issue - ISSUE_243_ANALYSIS.md: Initial analysis of gunicorn/CUDA issue - ISSUE_243_REAL_WORLD_ANALYSIS.md: Analysis of whisper-wrapper implementation These are investigation/documentation artifacts, not SDK changes.
Removed outdated files: - test_issue_243.py (app-level test, no longer relevant) - ISSUE_243_ANALYSIS.md (superseded) - ISSUE_243_REAL_WORLD_ANALYSIS.md (superseded) New consolidated documentation: - ISSUE_243_INVESTIGATION.md: Complete investigation with SDK-level solution Key changes from previous analysis: - Focus on SDK-level VRAM management (not app-level) - Runtime VRAM checking via enhanced _profile_cuda_memory decorator - _get_model_requirements() API for apps to declare memory needs - Conservative worker count when CUDA detected - Runtime status via ?includeVRAM=true parameter - Addresses dynamic VRAM availability (not static calculation) - Process-safe torch.cuda.empty_cache() usage documented
Updated investigation document with: - Component 5: Automatic Memory Profiling - 80% VRAM requirement for first request (conservative) - Historical measurement for subsequent requests - Hash-based filenames for race-condition-safe persistence - Atomic writes via temp file + rename - Updated request flow to show 3-level priority: 1. App override (explicit) 2. Historical measurement 3. Conservative 80% - Updated implementation checklist with new components - Revised open questions and conclusion
… isn't sufficient
e2309be to
b7579c2
Compare
…hon version that provides pamameter hashing
b7579c2 to
328c4c4
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #265 +/- ##
==========================================
Coverage ? 59.45%
==========================================
Files ? 6
Lines ? 846
Branches ? 0
==========================================
Hits ? 503
Misses ? 343
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
this turned out to be a massive over-engineering based on wrong assumption and outdated information. Closing without merge. Will start a new PR to address the issue more neatly. |
addresses #243