-
Notifications
You must be signed in to change notification settings - Fork 399
Description
Is your feature request related to a problem?
Hello team,
I am trying to build a multiplayer game that doesn't require you to download a client or specific powerful hardware to run by using this Unity Render Streaming technology. All examples here in the repo are based on one player (user) and multiple streams (of the same camera).. However when doing my research, I am arriving to the conclusion that, if I wanted to use Unity Render Streaming in a Headless GPU server to stream the game to multiple players, each with an individual camera/stream, then I will run into streaming bottleneck issues.
The issue is mainly with the GPU server hardware NVIDIA encoder (NVENC), most GPUs that are not for server usage can only support up to 8 streams, dedicated GPUs like the L4 or L40S, RTX 5090 etc.. have 2 or 3 NVENC encoders, which handle each between 10 - 15 1080p30 streams, (21-30 720p30)... however these GPUs can easily allocate many Unity instances since they have 24-80GB VRAM.. but all of that will be wasted because the main concern is the NVENC streams.
There are GPUs dedicated specifically to video streams like the A16, that have low VRAM but can process more than enough streams but they lack the GPU necessary to run the Unity instances.
So I was thinking, if URS had the ability to use a different GPU (in the same system) or multiple GPUs to offload all of the encoding, while the main GPU handles the Unity game instances, that would definitely open that streaming bottleneck, then send the stream to the Signaling server. So if you have multiple (more than 1) GPUs you can define which GPUs will handle the encoding.
So that's my feature request, I hope this can be added in the near future.
Describe the solution you'd like
Ability to offload streaming of Unity cameras to different GPUs in the system to be encoded.
Describe alternatives you've considered
As of today I will have to launch separate instances when the amount of streams are near the bottleneck even if I have plenty of VRAM available.
Additional context
| GPU | VRAM | Est. NVENC / notes | Est. max Unity instances (80% VRAM) | Est. max streams @ 720p (conservative) | Est. max streams @ 1080p (conservative) | Quick notes |
|---|---|---|---|---|---|---|
| NVIDIA L4 | 24 GB | NVENC present, good perf for 720p | ~12 | ~64 | ~32 | Cost-efficient starter for mostly 720p. |
| NVIDIA L40S | 48 GB | NVENC (Ada generation — higher throughput) | ~21 | ~96 | ~48 | Best next-step when L4 hits ~80%. |
| RTX 5090 | 32 GB | Blackwell (newer gen NVENC/AV1 capable) | ~14–15 | ~90–96 | ~40–48 | Very high compute; good for mixed workloads. |
| RTX 4090 | 24 GB | Consumer NVENC (Ada) | ~10 | ~60–64 | ~30–32 | Good small-node option; cheaper pods. |
| RTX 3090 | 24 GB | Older consumer NVENC | ~10 | ~60–64 | ~30–32 | Similar to 4090 but older architecture / efficiency. |
| NVIDIA A40 | 48 GB | Data-center Ampere NVENC (lower encoder density than Ada) | ~21 | ~64 | ~32 | Good for GPU compute + some streaming, but NVENC density lower. |
| NVIDIA A100 | 80 GB | No NVENC (compute-only) | ~36 | 0 (no NVENC) | 0 | Not suitable unless you offload encoding elsewhere. |
| NVIDIA T4 | 16 GB | NVENC present (Turing); small form-factor server GPU. (NVIDIA) | ~7 | ~48 | ~24 | Low-power option for light pods / cheap region capacity. |
| NVIDIA A10 | 24 GB | Data-center GPU — product page shows encoding capabilities; conservative NVENC capacity used here. (NVIDIA) | ~10 | ~32 | ~16 | Good midrange server GPU: more VRAM than T4, modest NVENC. |
| RTX 6000 Ada (workstation) | 48 GB | Ada-generation workstation GPU (high VRAM); high encoding throughput (Ada NVENC). (NVIDIA) | ~21 | ~96 | ~48 | Workstation equivalent of L40S; excellent VRAM + encoding throughput. |
| RTX 4080 | 16 GB | GeForce Ada family (NVENC present). (NVIDIA) | ~7 | ~48 | ~24 | Lower VRAM means fewer Unity instances; OK for small pods. |
| NVIDIA A16 | 4×16 GB (board: 64 GB aggregate, vGPU focus) | Designed for max video density — multi-encoder board (A16 datasheet: multiple on-chip encoders / decoders). Good for many concurrent streams. (NVIDIA Images) | ~29 | ~96 | ~48 | Built to host many virtual desktops/streams — excellent encoder density per board. |