Dunfey · Hotel WWDC as data, est. 1983
Front desk everything
Years
Topics

2020 Graphics & Games

WWDC20 · 34 min · Graphics & Games

Bring your Metal app to Apple silicon Macs

Meet the Tile Based Deferred Rendering (TBDR) GPU architecture for Apple silicon Macs — the heart of your Metal app or game’s graphics performance. Learn how you can translate or port your graphics-intensive app over to Apple silicon, and how to take advantage of TBDR and Metal when building natively for the platform. We’ll look at how TBDR compares with the Immediate Mode Rendering pipeline of older Macs, go through common issues you may face when bringing an app or game over, and explore how to offer incredible performance when building with the native SDK. We’ve designed this session in tandem with “Optimize Metal Performance for Apple silicon Macs.” After you’ve watched this session be sure to check that out next.

Watch at developer.apple.com ↗

Transcript all transcripts

Code shown on screen · 3 snippets

Metal feature detection swift · at 14:23 ↗
{
    self.appleGPUFeatures = metalDevice.supportsFamily(.apple5)
    
    self.simdgroupSize = computePipeline.threadExecutionWidth
    
    self.isLowPower = metalDevice.isLowPower
}
Enabling position invariance swift · at 20:58 ↗
// Renderer.swift
let options = MTLCompileOptions()
options.preserveInvariance = true

library = try device.makeLibrary(source: sourceString,
                                 options: options)


// vertex.metal
struct VertexOut
{
    float4 pos [[position, invariant]];
    float data;
};
Threadgroup synchronization swift · at 24:25 ↗
// Correct synchronization

// launched with threadgroup size = 64
kernel void kernelMain(uint tid [[ thread_index_in_threadgroup ]],
                       uint simd_size [[ threads_per_simdgroup ]],
                       device uint * res [[ buffer(0) ]])
{
    threadgroup uint buf[64];
    
    buf[tid] = initBuffer(tid);

    if (simd_size == 64u)
        simdgroup_barrier(mem_flags::mem_threadgroup);
    else
        threadgroup_barrier(mem_flags::mem_threadgroup);

    uint index = (tid < 32) ? tid + 32 : tid - 32;
    res[tid] = buf[tid] + buf[index];
}

Resources