Skip to content

[REQUEST]: Ability to offload stream encoding to a different GPU. #1035

@fegarza7

Description

@fegarza7

Is your feature request related to a problem?

Hello team,

I am trying to build a multiplayer game that doesn't require you to download a client or specific powerful hardware to run by using this Unity Render Streaming technology. All examples here in the repo are based on one player (user) and multiple streams (of the same camera).. However when doing my research, I am arriving to the conclusion that, if I wanted to use Unity Render Streaming in a Headless GPU server to stream the game to multiple players, each with an individual camera/stream, then I will run into streaming bottleneck issues.

The issue is mainly with the GPU server hardware NVIDIA encoder (NVENC), most GPUs that are not for server usage can only support up to 8 streams, dedicated GPUs like the L4 or L40S, RTX 5090 etc.. have 2 or 3 NVENC encoders, which handle each between 10 - 15 1080p30 streams, (21-30 720p30)... however these GPUs can easily allocate many Unity instances since they have 24-80GB VRAM.. but all of that will be wasted because the main concern is the NVENC streams.

There are GPUs dedicated specifically to video streams like the A16, that have low VRAM but can process more than enough streams but they lack the GPU necessary to run the Unity instances.

So I was thinking, if URS had the ability to use a different GPU (in the same system) or multiple GPUs to offload all of the encoding, while the main GPU handles the Unity game instances, that would definitely open that streaming bottleneck, then send the stream to the Signaling server. So if you have multiple (more than 1) GPUs you can define which GPUs will handle the encoding.

So that's my feature request, I hope this can be added in the near future.

Describe the solution you'd like

Ability to offload streaming of Unity cameras to different GPUs in the system to be encoded.

Describe alternatives you've considered

As of today I will have to launch separate instances when the amount of streams are near the bottleneck even if I have plenty of VRAM available.

Additional context

GPU VRAM Est. NVENC / notes Est. max Unity instances (80% VRAM) Est. max streams @ 720p (conservative) Est. max streams @ 1080p (conservative) Quick notes
NVIDIA L4 24 GB NVENC present, good perf for 720p ~12 ~64 ~32 Cost-efficient starter for mostly 720p.
NVIDIA L40S 48 GB NVENC (Ada generation — higher throughput) ~21 ~96 ~48 Best next-step when L4 hits ~80%.
RTX 5090 32 GB Blackwell (newer gen NVENC/AV1 capable) ~14–15 ~90–96 ~40–48 Very high compute; good for mixed workloads.
RTX 4090 24 GB Consumer NVENC (Ada) ~10 ~60–64 ~30–32 Good small-node option; cheaper pods.
RTX 3090 24 GB Older consumer NVENC ~10 ~60–64 ~30–32 Similar to 4090 but older architecture / efficiency.
NVIDIA A40 48 GB Data-center Ampere NVENC (lower encoder density than Ada) ~21 ~64 ~32 Good for GPU compute + some streaming, but NVENC density lower.
NVIDIA A100 80 GB No NVENC (compute-only) ~36 0 (no NVENC) 0 Not suitable unless you offload encoding elsewhere.
NVIDIA T4 16 GB NVENC present (Turing); small form-factor server GPU. (NVIDIA) ~7 ~48 ~24 Low-power option for light pods / cheap region capacity.
NVIDIA A10 24 GB Data-center GPU — product page shows encoding capabilities; conservative NVENC capacity used here. (NVIDIA) ~10 ~32 ~16 Good midrange server GPU: more VRAM than T4, modest NVENC.
RTX 6000 Ada (workstation) 48 GB Ada-generation workstation GPU (high VRAM); high encoding throughput (Ada NVENC). (NVIDIA) ~21 ~96 ~48 Workstation equivalent of L40S; excellent VRAM + encoding throughput.
RTX 4080 16 GB GeForce Ada family (NVENC present). (NVIDIA) ~7 ~48 ~24 Lower VRAM means fewer Unity instances; OK for small pods.
NVIDIA A16 4×16 GB (board: 64 GB aggregate, vGPU focus) Designed for max video density — multi-encoder board (A16 datasheet: multiple on-chip encoders / decoders). Good for many concurrent streams. (NVIDIA Images) ~29 ~96 ~48 Built to host many virtual desktops/streams — excellent encoder density per board.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions