[REQUEST]: Ability to offload stream encoding to a different GPU.

### Is your feature request related to a problem?

Hello team,

I am trying to build a multiplayer game that doesn't require you to download a client or specific powerful hardware to run by using this Unity Render Streaming technology. All examples here in the repo are based on one player (user) and multiple streams (of the same camera).. However when doing my research, I am arriving to the conclusion that, if I wanted to use Unity Render Streaming in a Headless GPU server to stream the game to multiple players, each with an individual camera/stream, then I will run into streaming bottleneck issues.

The issue is mainly with the GPU server hardware NVIDIA encoder (NVENC), most GPUs that are not for server usage can only support up to 8 streams, dedicated GPUs like the L4 or L40S, RTX 5090 etc.. have 2 or 3 NVENC encoders, which handle each between 10 - 15 1080p30 streams, (21-30 720p30)... however these GPUs can easily allocate many Unity instances since they have 24-80GB VRAM.. but all of that will be wasted because the main concern is the NVENC streams.

There are GPUs dedicated specifically to video streams like the A16, that have low VRAM but can process more than enough streams but they lack the GPU necessary to run the Unity instances.

So I was thinking, if URS had the ability to use a different GPU (in the same system) or multiple GPUs to offload all of the encoding, while the main GPU handles the Unity game instances, that would definitely open that streaming bottleneck, then send the stream to the Signaling server. So if you have multiple (more than 1) GPUs you can define which GPUs will handle the encoding.

So that's my feature request, I hope this can be added in the near future.


### Describe the solution you'd like

**Ability to offload streaming of Unity cameras to different GPUs in the system to be encoded.**

### Describe alternatives you've considered

As of today I will have to launch separate instances when the amount of streams are near the bottleneck even if I have plenty of VRAM available.

### Additional context

| GPU                            |                                         VRAM | Est. NVENC / notes                                                                                                                                                     | Est. max Unity instances (80% VRAM) | Est. max streams @ **720p** (conservative) | Est. max streams @ **1080p** (conservative) | Quick notes                                                                        |
| ------------------------------ | -------------------------------------------: | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------: | -----------------------------------------: | ------------------------------------------: | ---------------------------------------------------------------------------------- |
| **NVIDIA L4**                  |                                        24 GB | NVENC present, good perf for 720p                                                                                                                                      |                            **~12** |                                   **~64** |                                    **~32** | Cost-efficient starter for mostly 720p.                                            |
| **NVIDIA L40S**                |                                        48 GB | NVENC (Ada generation — higher throughput)                                                                                                                             |                            **~21** |                                   **~96** |                                    **~48** | Best next-step when L4 hits ~80%.                                                 |
| **RTX 5090**                   |                                        32 GB | Blackwell (newer gen NVENC/AV1 capable)                                                                                                                                |                         **~14–15** |                                **~90–96** |                                 **~40–48** | Very high compute; good for mixed workloads.                                       |
| **RTX 4090**                   |                                        24 GB | Consumer NVENC (Ada)                                                                                                                                                   |                            **~10** |                                **~60–64** |                                 **~30–32** | Good small-node option; cheaper pods.                                              |
| **RTX 3090**                   |                                        24 GB | Older consumer NVENC                                                                                                                                                   |                            **~10** |                                **~60–64** |                                 **~30–32** | Similar to 4090 but older architecture / efficiency.                               |
| **NVIDIA A40**                 |                                        48 GB | Data-center Ampere NVENC (lower encoder density than Ada)                                                                                                              |                            **~21** |                                   **~64** |                                    **~32** | Good for GPU compute + some streaming, but NVENC density lower.                    |
| **NVIDIA A100**                |                                        80 GB | **No NVENC** (compute-only)                                                                                                                                            |                            **~36** |                           **0 (no NVENC)** |                                       **0** | Not suitable unless you offload encoding elsewhere.                                |
| **NVIDIA T4**                  |                                        16 GB | NVENC present (Turing); small form-factor server GPU. ([NVIDIA][1])                                                                                                    |                             **~7** |                                   **~48** |                                    **~24** | Low-power option for light pods / cheap region capacity.                           |
| **NVIDIA A10**                 |                                        24 GB | Data-center GPU — product page shows encoding capabilities; conservative NVENC capacity used here. ([NVIDIA][2])                                                       |                            **~10** |                                   **~32** |                                    **~16** | Good midrange server GPU: more VRAM than T4, modest NVENC.                         |
| **RTX 6000 Ada (workstation)** |                                        48 GB | Ada-generation workstation GPU (high VRAM); high encoding throughput (Ada NVENC). ([NVIDIA][3])                                                                        |                            **~21** |                                   **~96** |                                    **~48** | Workstation equivalent of L40S; excellent VRAM + encoding throughput.              |
| **RTX 4080**                   |                                        16 GB | GeForce Ada family (NVENC present). ([NVIDIA][4])                                                                                                                      |                             **~7** |                                   **~48** |                                    **~24** | Lower VRAM means fewer Unity instances; OK for small pods.                         |
| **NVIDIA A16**                 | 4×16 GB (board: 64 GB aggregate, vGPU focus) | **Designed for max video density** — multi-encoder board (A16 datasheet: multiple on-chip encoders / decoders). Good for many concurrent streams. ([NVIDIA Images][5]) |                            **~29** |                                   **~96** |                                    **~48** | Built to host many virtual desktops/streams — excellent encoder density per board. |

[1]: https://www.nvidia.com/en-us/data-center/tesla-t4/?utm_source=chatgpt.com "NVIDIA T4 Tensor Core GPU for AI Inference | NVIDIA Data Center"
[2]: https://www.nvidia.com/en-us/data-center/products/a10-gpu/?utm_source=chatgpt.com "NVIDIA A10 Tensor Core GPU"
[3]: https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/rtx-6000/proviz-print-rtx6000-datasheet-web-2504660.pdf?utm_source=chatgpt.com "[PDF] NVIDIA RTX 6000 Ada Generation"
[4]: https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4080-family/?utm_source=chatgpt.com "GeForce RTX 4080 SUPER and RTX 4080 Graphics Cards | NVIDIA"
[5]: https://images.nvidia.com/content/Solutions/data-center/vgpu-a16-datasheet.pdf?utm_source=chatgpt.com "[PDF] vgpu-a16-datasheet.pdf - NVIDIA"


GPU	VRAM	Est. NVENC / notes	Est. max Unity instances (80% VRAM)	Est. max streams @ 720p (conservative)	Est. max streams @ 1080p (conservative)	Quick notes
NVIDIA L4	24 GB	NVENC present, good perf for 720p	~12	~64	~32	Cost-efficient starter for mostly 720p.
NVIDIA L40S	48 GB	NVENC (Ada generation — higher throughput)	~21	~96	~48	Best next-step when L4 hits ~80%.
RTX 5090	32 GB	Blackwell (newer gen NVENC/AV1 capable)	~14–15	~90–96	~40–48	Very high compute; good for mixed workloads.
RTX 4090	24 GB	Consumer NVENC (Ada)	~10	~60–64	~30–32	Good small-node option; cheaper pods.
RTX 3090	24 GB	Older consumer NVENC	~10	~60–64	~30–32	Similar to 4090 but older architecture / efficiency.
NVIDIA A40	48 GB	Data-center Ampere NVENC (lower encoder density than Ada)	~21	~64	~32	Good for GPU compute + some streaming, but NVENC density lower.
NVIDIA A100	80 GB	No NVENC (compute-only)	~36	0 (no NVENC)	0	Not suitable unless you offload encoding elsewhere.
NVIDIA T4	16 GB	NVENC present (Turing); small form-factor server GPU. (NVIDIA)	~7	~48	~24	Low-power option for light pods / cheap region capacity.
NVIDIA A10	24 GB	Data-center GPU — product page shows encoding capabilities; conservative NVENC capacity used here. (NVIDIA)	~10	~32	~16	Good midrange server GPU: more VRAM than T4, modest NVENC.
RTX 6000 Ada (workstation)	48 GB	Ada-generation workstation GPU (high VRAM); high encoding throughput (Ada NVENC). (NVIDIA)	~21	~96	~48	Workstation equivalent of L40S; excellent VRAM + encoding throughput.
RTX 4080	16 GB	GeForce Ada family (NVENC present). (NVIDIA)	~7	~48	~24	Lower VRAM means fewer Unity instances; OK for small pods.
NVIDIA A16	4×16 GB (board: 64 GB aggregate, vGPU focus)	Designed for max video density — multi-encoder board (A16 datasheet: multiple on-chip encoders / decoders). Good for many concurrent streams. (NVIDIA Images)	~29	~96	~48	Built to host many virtual desktops/streams — excellent encoder density per board.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REQUEST]: Ability to offload stream encoding to a different GPU. #1035

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[REQUEST]: Ability to offload stream encoding to a different GPU. #1035

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions