Lines Matching defs:of

12 3. Issues can be fixed within mesa releases, independently of the schedule of other projects.
17 In case of GCN / RDNA the parallelism is achieved by executing the shader on several waves, and each wave has several lanes (32 or 64).
24 so in case of divergent control flow, the GPU must execute both code paths, each with some lanes disabled.
29 * logical CFG - directly translated from NIR and shows the intended control flow of the program.
32 Note that all nodes of the logical CFG also participate in the linear CFG, but not vice versa.
40 1. The divergence analysis pass calculates for each SSA definition if its value is guaranteed to be uniform across all threads of the workgroup.
42 3. Actual instruction selection. The advanced divergence analysis allows for better usage of the scalar unit, scalar memory loads and the scalar register file.
44 We have two types of instructions:
51 Temporaries can be fixed to a specific register, or just specify a register class (either a single register, or a vector of several registers).
55 The value numbering pass is necessary for two reasons: the lack of descriptor load representation in NIR,
61 In this phase, simpler instructions are combined into more complex instructions (like the different versions of multiply-add as well as neg, abs, clamp, and output modifiers) and constants are inlined, moves are eliminated, etc.
64 #### Setup of reduction temporaries
71 In the GCN/RDNA architecture, there is a special register called `exec` which is used for manually controlling which VALU threads (aka. *lanes*) are active. The value of `exec` has to change in divergent branches, loops, etc. and it needs to be restored after the branch or loop is complete. This pass ensures that the correct lanes are active in every branch.
75 A live-variable analysis is used to calculate the register need of the shader.
89 The register allocator works on SSA (as opposed to LLVM's which works on virtual registers). The SSA properties guarantee that there are always as many registers available as needed. The problem is that some instructions require a vector of neighboring registers to be available, but the free regs might be scattered. In this case, the register allocator inserts shuffle code (moving some temporaries to other registers) to make space for the variable. The assumption is that it is (almost) always better to have a few more moves than to sacrifice a wave. The RA does SSA-reconstruction on the fly, which makes its runtime linear.
93 The next step is a pass out of SSA by inserting parallelcopies at the end of blocks to match the phi nodes' semantics.
117 Which software stage gets executed on which hardware stage depends on what kind of software stages are present in the current pipeline.
123 #### Glossary of software stages
132 #### Glossary of hardware stages
135 * HS = Hull Shader, the HW equivalent of a Tessellation Control Shader, runs before the fixed function hardware performs tessellation
150 but from a SW perspective it's not part of the traditional pipeline,
159 This might be confusing due to a mismatch between the number of invocations of these shaders.
162 and there is some code at the beginning of each part to ensure the correct number of invocations by disabling some threads.
186 * HW ES and GS stages are merged, so ES outputs can go to LDS instead of VRAM
255 some bug reports of inexplicable crashes with assertion failures you can't reproduce.
260 To see the full list of downstream compiler flags, you can use eg. `rpm --eval "%optflags"`
272 Note that if any of these change the output, it does not necessarily mean that the error is there, as register assignment does also change.