Here’s how AMD achieves a 39% performance increase in GPU-driven rendering

More tasks are handed to the GPU.

AMD's GPU-driven rendering demo.

AMD has shared a demo of the capabilities of Work Graphs GPU-driven rendering. This Direct3D feature aims to alleviate CPU bottlenecks by allowing the GPU to generate work instead of waiting for the CPU to send it.

Work Graphs is a new feature of the Direct3D 12 graphics API that allows the GPU to generate tasks for itself. This increased autonomy reduces the CPU overhead thanks to the reduction in communication between it and GPU. One of Work Graphs features called mesh nodes allows it to feed directly into a mesh shader, turning the work graph into an amplification shader.

Announced by Microsoft on March 11, we finally received a demo from AMD showing its potential uplifts. Developed in collaboration with Coburg University, this demo used a single Work Graph dispatch to render everything aside from the Skybox and UI. In this specific instance, Work Graphs improved performance by a whopping 39% on a Radeon RX 7900 XTX compared to regular methods using ExecuteIndirect.

Work Graphs drawcalls chart.

This demo was run with the following parameters:

  • 6,600 draw calls/frame (after coalescing)
  • 13 million triangles/frame
  • 200,000 work items passing through the graph
  • 37 nodes and 9 draw nodes
  • <200MiB of work graph backing store memory

GPU-driven rendering is something Unreal Engine developers recommended for some time now. Currently, only AMD’s RDNA 3 and Nvidia’s RTX 30 series or newer GPUs support this technology through their latest drivers. It is unclear if Intel GPUs support it too, but upcoming chips probably will anyway.

Graham Wihlidal from Epic Games said that before Work Graphs, it was difficult to perform fine-grained memory management on the GPU, which made it hard to support algorithms with dynamic work expansion. But with Work Graphs, complex and highly variable pipelines can run efficiently on the GPU. He added that the programming model also becomes significantly simpler for developers.

“Mesh nodes really close the loop in terms of providing an end-to-end replacement for Execute Indirect and moving the GPU programming model forward. Everything can move into a single graph and execute in a single dispatch, making it very easy to compose large applications from small bits and pieces. Moreover, problems like PSO switching, empty dispatches, and buffer memory management just go away, making full GPU driven pipelines accessible to many more applications and use cases than before,” said Matthäus Chajdas, software/hardware architect at AMD.

Work Graphs is the result of several years of collaboration between Microsoft and its partners. Though still in early stages, it carries many hopes and promises. Until games start implementing it, we can’t be sure of its real-world benefits, since many DirectX 12 features have yet to prove their worth for gamers.