1b8e80941SmrgNew IR, or NIR, is an IR for Mesa intended to sit below GLSL IR and Mesa IR. 2b8e80941SmrgIts design inherits from the various IRs that Mesa has used in the past, as 3b8e80941Smrgwell as Direct3D assembly, and it includes a few new ideas as well. It is a 4b8e80941Smrgflat (in terms of using instructions instead of expressions), typeless IR, 5b8e80941Smrgsimilar to TGSI and Mesa IR. It also supports SSA (although it doesn't require 6b8e80941Smrgit). 7b8e80941Smrg 8b8e80941SmrgVariables 9b8e80941Smrg========= 10b8e80941Smrg 11b8e80941SmrgNIR includes support for source-level GLSL variables through a structure mostly 12b8e80941Smrgcopied from GLSL IR. These will be used for linking and conversion from GLSL IR 13b8e80941Smrg(and later, from an AST), but for the most part, they will be lowered to 14b8e80941Smrgregisters (see below) and loads/stores. 15b8e80941Smrg 16b8e80941SmrgRegisters 17b8e80941Smrg========= 18b8e80941Smrg 19b8e80941SmrgRegisters are light-weight; they consist of a structure that only contains its 20b8e80941Smrgsize, its index for liveness analysis, and an optional name for debugging. In 21b8e80941Smrgaddition, registers can be local to a function or global to the entire shader; 22b8e80941Smrgthe latter will be used in ARB_shader_subroutine for passing parameters and 23b8e80941Smrggetting return values from subroutines. Registers can also be an array, in which 24b8e80941Smrgcase they can be accessed indirectly. Each ALU instruction (add, subtract, etc.) 25b8e80941Smrgworks directly with registers or SSA values (see below). 26b8e80941Smrg 27b8e80941SmrgSSA 28b8e80941Smrg======== 29b8e80941Smrg 30b8e80941SmrgEverywhere a register can be loaded/stored, an SSA value can be used instead. 31b8e80941SmrgThe only exception is that arrays/indirect addressing are not supported with 32b8e80941SmrgSSA; although research has been done on extensions of SSA to arrays before, it's 33b8e80941Smrgusually for the purpose of parallelization (which we're not interested in), and 34b8e80941Smrgadds some overhead in the form of adding copies or extra arrays (which is much 35b8e80941Smrgmore expensive than introducing copies between non-array registers). SSA uses 36b8e80941Smrgpoint directly to their corresponding definition, which in turn points to the 37b8e80941Smrginstruction it is part of. This creates an implicit use-def chain and avoids the 38b8e80941Smrgneed for an external structure for each SSA register. 39b8e80941Smrg 40b8e80941SmrgFunctions 41b8e80941Smrg========= 42b8e80941Smrg 43b8e80941SmrgSupport for function calls is mostly similar to GLSL IR. Each shader contains a 44b8e80941Smrglist of functions, and each function has a list of overloads. Each overload 45b8e80941Smrgcontains a list of parameters, and may contain an implementation which specifies 46b8e80941Smrgthe variables that correspond to the parameters and return value. Inlining a 47b8e80941Smrgfunction, assuming it has a single return point, is as simple as copying its 48b8e80941Smrginstructions, registers, and local variables into the target function and then 49b8e80941Smrginserting copies to and from the new parameters as appropriate. After functions 50b8e80941Smrgare inlined and any non-subroutine functions are deleted, parameters and return 51b8e80941Smrgvariables will be converted to global variables and then global registers. We 52b8e80941Smrgdon't do this lowering earlier (i.e. the fortranizer idea) for a few reasons: 53b8e80941Smrg 54b8e80941Smrg- If we want to do optimizations before link time, we need to have the function 55b8e80941Smrgsignature available during link-time. 56b8e80941Smrg 57b8e80941Smrg- If we do any inlining before link time, then we might wind up with the 58b8e80941Smrginlined function and the non-inlined function using the same global 59b8e80941Smrgvariables/registers which would preclude optimization. 60b8e80941Smrg 61b8e80941SmrgIntrinsics 62b8e80941Smrg========= 63b8e80941Smrg 64b8e80941SmrgAny operation (other than function calls and textures) which touches a variable 65b8e80941Smrgor is not referentially transparent is represented by an intrinsic. Intrinsics 66b8e80941Smrgare similar to the idea of a "builtin function," i.e. a function declaration 67b8e80941Smrgwhose implementation is provided by the backend, except they are more powerful 68b8e80941Smrgin the following ways: 69b8e80941Smrg 70b8e80941Smrg- They can also load and store registers when appropriate, which limits the 71b8e80941Smrgnumber of variables needed in later stages of the IR while obviating the need 72b8e80941Smrgfor a separate load/store variable instruction. 73b8e80941Smrg 74b8e80941Smrg- Intrinsics can be marked as side-effect free, which permits them to be 75b8e80941Smrgtreated like any other instruction when it comes to optimizations. This allows 76b8e80941Smrgload intrinsics to be represented as intrinsics while still being optimized 77b8e80941Smrgaway by dead code elimination, common subexpression elimination, etc. 78b8e80941Smrg 79b8e80941SmrgIntrinsics are used for: 80b8e80941Smrg 81b8e80941Smrg- Atomic operations 82b8e80941Smrg- Memory barriers 83b8e80941Smrg- Subroutine calls 84b8e80941Smrg- Geometry shader emitVertex and endPrimitive 85b8e80941Smrg- Loading and storing variables (before lowering) 86b8e80941Smrg- Loading and storing uniforms, shader inputs and outputs, etc (after lowering) 87b8e80941Smrg- Copying variables (cases where in GLSL the destination is a structure or 88b8e80941Smrgarray) 89b8e80941Smrg- The kitchen sink 90b8e80941Smrg- ... 91b8e80941Smrg 92b8e80941SmrgTextures 93b8e80941Smrg========= 94b8e80941Smrg 95b8e80941SmrgUnfortunately, there are far too many texture operations to represent each one 96b8e80941Smrgof them with an intrinsic, so there's a special texture instruction similar to 97b8e80941Smrgthe GLSL IR one. The biggest difference is that, while the texture instruction 98b8e80941Smrghas a sampler dereference field used just like in GLSL IR, this gets lowered to 99b8e80941Smrga texture unit index (with a possible indirect offset) while the type 100b8e80941Smrginformation of the original sampler is kept around for backends. Also, all the 101b8e80941Smrgnon-constant sources are stored in a single array to make it easier for 102b8e80941Smrgoptimization passes to iterate over all the sources. 103b8e80941Smrg 104b8e80941SmrgControl Flow 105b8e80941Smrg========= 106b8e80941Smrg 107b8e80941SmrgLike in GLSL IR, control flow consists of a tree of "control flow nodes", which 108b8e80941Smrginclude if statements and loops, and jump instructions (break, continue, and 109b8e80941Smrgreturn). Unlike GLSL IR, though, the leaves of the tree aren't statements but 110b8e80941Smrgbasic blocks. Each basic block also keeps track of its successors and 111b8e80941Smrgpredecessors, and function implementations keep track of the beginning basic 112b8e80941Smrgblock (the first basic block of the function) and the ending basic block (a fake 113b8e80941Smrgbasic block that every return statement points to). Together, these elements 114b8e80941Smrgmake up the control flow graph, in this case a redundant piece of information on 115b8e80941Smrgtop of the control flow tree that will be used by almost all the optimizations. 116b8e80941SmrgThere are helper functions to add and remove control flow nodes that also update 117b8e80941Smrgthe control flow graph, and so usually it doesn't need to be touched by passes 118b8e80941Smrgthat modify control flow nodes. 119