1b8e80941SmrgNew IR, or NIR, is an IR for Mesa intended to sit below GLSL IR and Mesa IR.
2b8e80941SmrgIts design inherits from the various IRs that Mesa has used in the past, as
3b8e80941Smrgwell as Direct3D assembly, and it includes a few new ideas as well. It is a
4b8e80941Smrgflat (in terms of using instructions instead of expressions), typeless IR,
5b8e80941Smrgsimilar to TGSI and Mesa IR.  It also supports SSA (although it doesn't require
6b8e80941Smrgit).
7b8e80941Smrg
8b8e80941SmrgVariables
9b8e80941Smrg=========
10b8e80941Smrg
11b8e80941SmrgNIR includes support for source-level GLSL variables through a structure mostly
12b8e80941Smrgcopied from GLSL IR. These will be used for linking and conversion from GLSL IR
13b8e80941Smrg(and later, from an AST), but for the most part, they will be lowered to
14b8e80941Smrgregisters (see below) and loads/stores.
15b8e80941Smrg
16b8e80941SmrgRegisters
17b8e80941Smrg=========
18b8e80941Smrg
19b8e80941SmrgRegisters are light-weight; they consist of a structure that only contains its
20b8e80941Smrgsize, its index for liveness analysis, and an optional name for debugging. In
21b8e80941Smrgaddition, registers can be local to a function or global to the entire shader;
22b8e80941Smrgthe latter will be used in ARB_shader_subroutine for passing parameters and
23b8e80941Smrggetting return values from subroutines. Registers can also be an array, in which
24b8e80941Smrgcase they can be accessed indirectly. Each ALU instruction (add, subtract, etc.)
25b8e80941Smrgworks directly with registers or SSA values (see below).
26b8e80941Smrg
27b8e80941SmrgSSA
28b8e80941Smrg========
29b8e80941Smrg
30b8e80941SmrgEverywhere a register can be loaded/stored, an SSA value can be used instead.
31b8e80941SmrgThe only exception is that arrays/indirect addressing are not supported with
32b8e80941SmrgSSA; although research has been done on extensions of SSA to arrays before, it's
33b8e80941Smrgusually for the purpose of parallelization (which we're not interested in), and
34b8e80941Smrgadds some overhead in the form of adding copies or extra arrays (which is much
35b8e80941Smrgmore expensive than introducing copies between non-array registers). SSA uses
36b8e80941Smrgpoint directly to their corresponding definition, which in turn points to the
37b8e80941Smrginstruction it is part of. This creates an implicit use-def chain and avoids the
38b8e80941Smrgneed for an external structure for each SSA register.
39b8e80941Smrg
40b8e80941SmrgFunctions
41b8e80941Smrg=========
42b8e80941Smrg
43b8e80941SmrgSupport for function calls is mostly similar to GLSL IR. Each shader contains a
44b8e80941Smrglist of functions, and each function has a list of overloads. Each overload
45b8e80941Smrgcontains a list of parameters, and may contain an implementation which specifies
46b8e80941Smrgthe variables that correspond to the parameters and return value. Inlining a
47b8e80941Smrgfunction, assuming it has a single return point, is as simple as copying its
48b8e80941Smrginstructions, registers, and local variables into the target function and then
49b8e80941Smrginserting copies to and from the new parameters as appropriate. After functions
50b8e80941Smrgare inlined and any non-subroutine functions are deleted, parameters and return
51b8e80941Smrgvariables will be converted to global variables and then global registers. We
52b8e80941Smrgdon't do this lowering earlier (i.e. the fortranizer idea) for a few reasons:
53b8e80941Smrg
54b8e80941Smrg- If we want to do optimizations before link time, we need to have the function
55b8e80941Smrgsignature available during link-time.
56b8e80941Smrg
57b8e80941Smrg- If we do any inlining before link time, then we might wind up with the
58b8e80941Smrginlined function and the non-inlined function using the same global
59b8e80941Smrgvariables/registers which would preclude optimization.
60b8e80941Smrg
61b8e80941SmrgIntrinsics
62b8e80941Smrg=========
63b8e80941Smrg
64b8e80941SmrgAny operation (other than function calls and textures) which touches a variable
65b8e80941Smrgor is not referentially transparent is represented by an intrinsic. Intrinsics
66b8e80941Smrgare similar to the idea of a "builtin function," i.e. a function declaration
67b8e80941Smrgwhose implementation is provided by the backend, except they are more powerful
68b8e80941Smrgin the following ways:
69b8e80941Smrg
70b8e80941Smrg- They can also load and store registers when appropriate, which limits the
71b8e80941Smrgnumber of variables needed in later stages of the IR while obviating the need
72b8e80941Smrgfor a separate load/store variable instruction.
73b8e80941Smrg
74b8e80941Smrg- Intrinsics can be marked as side-effect free, which permits them to be
75b8e80941Smrgtreated like any other instruction when it comes to optimizations. This allows
76b8e80941Smrgload intrinsics to be represented as intrinsics while still being optimized
77b8e80941Smrgaway by dead code elimination, common subexpression elimination, etc.
78b8e80941Smrg
79b8e80941SmrgIntrinsics are used for:
80b8e80941Smrg
81b8e80941Smrg- Atomic operations
82b8e80941Smrg- Memory barriers
83b8e80941Smrg- Subroutine calls
84b8e80941Smrg- Geometry shader emitVertex and endPrimitive
85b8e80941Smrg- Loading and storing variables (before lowering)
86b8e80941Smrg- Loading and storing uniforms, shader inputs and outputs, etc (after lowering)
87b8e80941Smrg- Copying variables (cases where in GLSL the destination is a structure or
88b8e80941Smrgarray)
89b8e80941Smrg- The kitchen sink
90b8e80941Smrg- ...
91b8e80941Smrg
92b8e80941SmrgTextures
93b8e80941Smrg=========
94b8e80941Smrg
95b8e80941SmrgUnfortunately, there are far too many texture operations to represent each one
96b8e80941Smrgof them with an intrinsic, so there's a special texture instruction similar to
97b8e80941Smrgthe GLSL IR one. The biggest difference is that, while the texture instruction
98b8e80941Smrghas a sampler dereference field used just like in GLSL IR, this gets lowered to
99b8e80941Smrga texture unit index (with a possible indirect offset) while the type
100b8e80941Smrginformation of the original sampler is kept around for backends. Also, all the
101b8e80941Smrgnon-constant sources are stored in a single array to make it easier for
102b8e80941Smrgoptimization passes to iterate over all the sources.
103b8e80941Smrg
104b8e80941SmrgControl Flow
105b8e80941Smrg=========
106b8e80941Smrg
107b8e80941SmrgLike in GLSL IR, control flow consists of a tree of "control flow nodes", which
108b8e80941Smrginclude if statements and loops, and jump instructions (break, continue, and
109b8e80941Smrgreturn). Unlike GLSL IR, though, the leaves of the tree aren't statements but
110b8e80941Smrgbasic blocks. Each basic block also keeps track of its successors and
111b8e80941Smrgpredecessors, and function implementations keep track of the beginning basic
112b8e80941Smrgblock (the first basic block of the function) and the ending basic block (a fake
113b8e80941Smrgbasic block that every return statement points to). Together, these elements
114b8e80941Smrgmake up the control flow graph, in this case a redundant piece of information on
115b8e80941Smrgtop of the control flow tree that will be used by almost all the optimizations.
116b8e80941SmrgThere are helper functions to add and remove control flow nodes that also update
117b8e80941Smrgthe control flow graph, and so usually it doesn't need to be touched by passes
118b8e80941Smrgthat modify control flow nodes.
119