101e04c3fSmrgWelcome to Mesa's GLSL compiler.  A brief overview of how things flow:
201e04c3fSmrg
301e04c3fSmrg1) lex and yacc-based preprocessor takes the incoming shader string
401e04c3fSmrgand produces a new string containing the preprocessed shader.  This
501e04c3fSmrgtakes care of things like #if, #ifdef, #define, and preprocessor macro
601e04c3fSmrginvocations.  Note that #version, #extension, and some others are
701e04c3fSmrgpassed straight through.  See glcpp/*
801e04c3fSmrg
901e04c3fSmrg2) lex and yacc-based parser takes the preprocessed string and
1001e04c3fSmrggenerates the AST (abstract syntax tree).  Almost no checking is
1101e04c3fSmrgperformed in this stage.  See glsl_lexer.ll and glsl_parser.yy.
1201e04c3fSmrg
1301e04c3fSmrg3) The AST is converted to "HIR".  This is the intermediate
1401e04c3fSmrgrepresentation of the compiler.  Constructors are generated, function
1501e04c3fSmrgcalls are resolved to particular function signatures, and all the
1601e04c3fSmrgsemantic checking is performed.  See ast_*.cpp for the conversion, and
1701e04c3fSmrgir.h for the IR structures.
1801e04c3fSmrg
1901e04c3fSmrg4) The driver (Mesa, or main.cpp for the standalone binary) performs
2001e04c3fSmrgoptimizations.  These include copy propagation, dead code elimination,
2101e04c3fSmrgconstant folding, and others.  Generally the driver will call
2201e04c3fSmrgoptimizations in a loop, as each may open up opportunities for other
2301e04c3fSmrgoptimizations to do additional work.  See most files called ir_*.cpp
2401e04c3fSmrg
2501e04c3fSmrg5) linking is performed.  This does checking to ensure that the
2601e04c3fSmrgoutputs of the vertex shader match the inputs of the fragment shader,
2701e04c3fSmrgand assigns locations to uniforms, attributes, and varyings.  See
2801e04c3fSmrglinker.cpp.
2901e04c3fSmrg
3001e04c3fSmrg6) The driver may perform additional optimization at this point, as
3101e04c3fSmrgfor example dead code elimination previously couldn't remove functions
3201e04c3fSmrgor global variable usage when we didn't know what other code would be
3301e04c3fSmrglinked in.
3401e04c3fSmrg
3501e04c3fSmrg7) The driver performs code generation out of the IR, taking a linked
3601e04c3fSmrgshader program and producing a compiled program for each stage.  See
3701e04c3fSmrg../mesa/program/ir_to_mesa.cpp for Mesa IR code generation.
3801e04c3fSmrg
3901e04c3fSmrgFAQ:
4001e04c3fSmrg
4101e04c3fSmrgQ: What is HIR versus IR versus LIR?
4201e04c3fSmrg
4301e04c3fSmrgA: The idea behind the naming was that ast_to_hir would produce a
4401e04c3fSmrghigh-level IR ("HIR"), with things like matrix operations, structure
4501e04c3fSmrgassignments, etc., present.  A series of lowering passes would occur
4601e04c3fSmrgthat do things like break matrix multiplication into a series of dot
4701e04c3fSmrgproducts/MADs, make structure assignment be a series of assignment of
4801e04c3fSmrgcomponents, flatten if statements into conditional moves, and such,
4901e04c3fSmrgproducing a low level IR ("LIR").
5001e04c3fSmrg
5101e04c3fSmrgHowever, it now appears that each driver will have different
5201e04c3fSmrgrequirements from a LIR.  A 915-generation chipset wants all functions
5301e04c3fSmrginlined, all loops unrolled, all ifs flattened, no variable array
5401e04c3fSmrgaccesses, and matrix multiplication broken down.  The Mesa IR backend
5501e04c3fSmrgfor swrast would like matrices and structure assignment broken down,
5601e04c3fSmrgbut it can support function calls and dynamic branching.  A 965 vertex
5701e04c3fSmrgshader IR backend could potentially even handle some matrix operations
5801e04c3fSmrgwithout breaking them down, but the 965 fragment shader IR backend
5901e04c3fSmrgwould want to break to have (almost) all operations down channel-wise
6001e04c3fSmrgand perform optimization on that.  As a result, there's no single
6101e04c3fSmrglow-level IR that will make everyone happy.  So that usage has fallen
6201e04c3fSmrgout of favor, and each driver will perform a series of lowering passes
6301e04c3fSmrgto take the HIR down to whatever restrictions it wants to impose
6401e04c3fSmrgbefore doing codegen.
6501e04c3fSmrg
6601e04c3fSmrgQ: How is the IR structured?
6701e04c3fSmrg
6801e04c3fSmrgA: The best way to get started seeing it would be to run the
6901e04c3fSmrgstandalone compiler against a shader:
7001e04c3fSmrg
7101e04c3fSmrg./glsl_compiler --dump-lir \
7201e04c3fSmrg	~/src/piglit/tests/shaders/glsl-orangebook-ch06-bump.frag
7301e04c3fSmrg
7401e04c3fSmrgSo for example one of the ir_instructions in main() contains:
7501e04c3fSmrg
7601e04c3fSmrg(assign (constant bool (1)) (var_ref litColor)  (expression vec3 * (var_ref Surf
7701e04c3fSmrgaceColor) (var_ref __retval) ) )
7801e04c3fSmrg
7901e04c3fSmrgOr more visually:
8001e04c3fSmrg                     (assign)
8101e04c3fSmrg                 /       |        \
8201e04c3fSmrg        (var_ref)  (expression *)  (constant bool 1)
8301e04c3fSmrg         /          /           \
8401e04c3fSmrg(litColor)      (var_ref)    (var_ref)
8501e04c3fSmrg                  /                  \
8601e04c3fSmrg           (SurfaceColor)          (__retval)
8701e04c3fSmrg
8801e04c3fSmrgwhich came from:
8901e04c3fSmrg
9001e04c3fSmrglitColor = SurfaceColor * max(dot(normDelta, LightDir), 0.0);
9101e04c3fSmrg
9201e04c3fSmrg(the max call is not represented in this expression tree, as it was a
9301e04c3fSmrgfunction call that got inlined but not brought into this expression
9401e04c3fSmrgtree)
9501e04c3fSmrg
9601e04c3fSmrgEach of those nodes is a subclass of ir_instruction.  A particular
9701e04c3fSmrgir_instruction instance may only appear once in the whole IR tree with
9801e04c3fSmrgthe exception of ir_variables, which appear once as variable
9901e04c3fSmrgdeclarations:
10001e04c3fSmrg
10101e04c3fSmrg(declare () vec3 normDelta)
10201e04c3fSmrg
10301e04c3fSmrgand multiple times as the targets of variable dereferences:
10401e04c3fSmrg...
10501e04c3fSmrg(assign (constant bool (1)) (var_ref __retval) (expression float dot
10601e04c3fSmrg (var_ref normDelta) (var_ref LightDir) ) )
10701e04c3fSmrg...
10801e04c3fSmrg(assign (constant bool (1)) (var_ref __retval) (expression vec3 -
10901e04c3fSmrg (var_ref LightDir) (expression vec3 * (constant float (2.000000))
11001e04c3fSmrg (expression vec3 * (expression float dot (var_ref normDelta) (var_ref
11101e04c3fSmrg LightDir) ) (var_ref normDelta) ) ) ) )
11201e04c3fSmrg...
11301e04c3fSmrg
11401e04c3fSmrgEach node has a type.  Expressions may involve several different types:
11501e04c3fSmrg(declare (uniform ) mat4 gl_ModelViewMatrix)
11601e04c3fSmrg((assign (constant bool (1)) (var_ref constructor_tmp) (expression
11701e04c3fSmrg vec4 * (var_ref gl_ModelViewMatrix) (var_ref gl_Vertex) ) )
11801e04c3fSmrg
11901e04c3fSmrgAn expression tree can be arbitrarily deep, and the compiler tries to
12001e04c3fSmrgkeep them structured like that so that things like algebraic
12101e04c3fSmrgoptimizations ((color * 1.0 == color) and ((mat1 * mat2) * vec == mat1
12201e04c3fSmrg* (mat2 * vec))) or recognizing operation patterns for code generation
12301e04c3fSmrg(vec1 * vec2 + vec3 == mad(vec1, vec2, vec3)) are easier.  This comes
12401e04c3fSmrgat the expense of additional trickery in implementing some
12501e04c3fSmrgoptimizations like CSE where one must navigate an expression tree.
12601e04c3fSmrg
12701e04c3fSmrgQ: Why no SSA representation?
12801e04c3fSmrg
12901e04c3fSmrgA: Converting an IR tree to SSA form makes dead code elimination,
13001e04c3fSmrgcommon subexpression elimination, and many other optimizations much
13101e04c3fSmrgeasier.  However, in our primarily vector-based language, there's some
13201e04c3fSmrgmajor questions as to how it would work.  Do we do SSA on the scalar
13301e04c3fSmrgor vector level?  If we do it at the vector level, we're going to end
13401e04c3fSmrgup with many different versions of the variable when encountering code
13501e04c3fSmrglike:
13601e04c3fSmrg
13701e04c3fSmrg(assign (constant bool (1)) (swiz x (var_ref __retval) ) (var_ref a) )
13801e04c3fSmrg(assign (constant bool (1)) (swiz y (var_ref __retval) ) (var_ref b) )
13901e04c3fSmrg(assign (constant bool (1)) (swiz z (var_ref __retval) ) (var_ref c) )
14001e04c3fSmrg
14101e04c3fSmrgIf every masked update of a component relies on the previous value of
14201e04c3fSmrgthe variable, then we're probably going to be quite limited in our
14301e04c3fSmrgdead code elimination wins, and recognizing common expressions may
14401e04c3fSmrgjust not happen.  On the other hand, if we operate channel-wise, then
14501e04c3fSmrgwe'll be prone to optimizing the operation on one of the channels at
14601e04c3fSmrgthe expense of making its instruction flow different from the other
14701e04c3fSmrgchannels, and a vector-based GPU would end up with worse code than if
14801e04c3fSmrgwe didn't optimize operations on that channel!
14901e04c3fSmrg
15001e04c3fSmrgOnce again, it appears that our optimization requirements are driven
15101e04c3fSmrgsignificantly by the target architecture.  For now, targeting the Mesa
15201e04c3fSmrgIR backend, SSA does not appear to be that important to producing
15301e04c3fSmrgexcellent code, but we do expect to do some SSA-based optimizations
15401e04c3fSmrgfor the 965 fragment shader backend when that is developed.
15501e04c3fSmrg
15601e04c3fSmrgQ: How should I expand instructions that take multiple backend instructions?
15701e04c3fSmrg
15801e04c3fSmrgSometimes you'll have to do the expansion in your code generation --
15901e04c3fSmrgsee, for example, ir_to_mesa.cpp's handling of ir_unop_sqrt.  However,
16001e04c3fSmrgin many cases you'll want to do a pass over the IR to convert
16101e04c3fSmrgnon-native instructions to a series of native instructions.  For
16201e04c3fSmrgexample, for the Mesa backend we have ir_div_to_mul_rcp.cpp because
16301e04c3fSmrgMesa IR (and many hardware backends) only have a reciprocal
16401e04c3fSmrginstruction, not a divide.  Implementing non-native instructions this
16501e04c3fSmrgway gives the chance for constant folding to occur, so (a / 2.0)
16601e04c3fSmrgbecomes (a * 0.5) after codegen instead of (a * (1.0 / 2.0))
16701e04c3fSmrg
16801e04c3fSmrgQ: How shoud I handle my special hardware instructions with respect to IR?
16901e04c3fSmrg
17001e04c3fSmrgOur current theory is that if multiple targets have an instruction for
17101e04c3fSmrgsome operation, then we should probably be able to represent that in
17201e04c3fSmrgthe IR.  Generally this is in the form of an ir_{bin,un}op expression
17301e04c3fSmrgtype.  For example, we initially implemented fract() using (a -
17401e04c3fSmrgfloor(a)), but both 945 and 965 have instructions to give that result,
17501e04c3fSmrgand it would also simplify the implementation of mod(), so
17601e04c3fSmrgir_unop_fract was added.  The following areas need updating to add a
17701e04c3fSmrgnew expression type:
17801e04c3fSmrg
17901e04c3fSmrgir.h (new enum)
18001e04c3fSmrgir.cpp:operator_strs (used for ir_reader)
18101e04c3fSmrgir_constant_expression.cpp (you probably want to be able to constant fold)
18201e04c3fSmrgir_validate.cpp (check users have the right types)
18301e04c3fSmrg
18401e04c3fSmrgYou may also need to update the backends if they will see the new expr type:
18501e04c3fSmrg
18601e04c3fSmrg../mesa/program/ir_to_mesa.cpp
18701e04c3fSmrg
18801e04c3fSmrgYou can then use the new expression from builtins (if all backends
18901e04c3fSmrgwould rather see it), or scan the IR and convert to use your new
19001e04c3fSmrgexpression type (see ir_mod_to_floor, for example).
19101e04c3fSmrg
19201e04c3fSmrgQ: How is memory management handled in the compiler?
19301e04c3fSmrg
19401e04c3fSmrgThe hierarchical memory allocator "talloc" developed for the Samba
19501e04c3fSmrgproject is used, so that things like optimization passes don't have to
19601e04c3fSmrgworry about their garbage collection so much.  It has a few nice
19701e04c3fSmrgfeatures, including low performance overhead and good debugging
19801e04c3fSmrgsupport that's trivially available.
19901e04c3fSmrg
20001e04c3fSmrgGenerally, each stage of the compile creates a talloc context and
20101e04c3fSmrgallocates its memory out of that or children of it.  At the end of the
20201e04c3fSmrgstage, the pieces still live are stolen to a new context and the old
20301e04c3fSmrgone freed, or the whole context is kept for use by the next stage.
20401e04c3fSmrg
20501e04c3fSmrgFor IR transformations, a temporary context is used, then at the end
20601e04c3fSmrgof all transformations, reparent_ir reparents all live nodes under the
20701e04c3fSmrgshader's IR list, and the old context full of dead nodes is freed.
20801e04c3fSmrgWhen developing a single IR transformation pass, this means that you
20901e04c3fSmrgwant to allocate instruction nodes out of the temporary context, so if
21001e04c3fSmrgit becomes dead it doesn't live on as the child of a live node.  At
21101e04c3fSmrgthe moment, optimization passes aren't passed that temporary context,
21201e04c3fSmrgso they find it by calling talloc_parent() on a nearby IR node.  The
21301e04c3fSmrgtalloc_parent() call is expensive, so many passes will cache the
21401e04c3fSmrgresult of the first talloc_parent().  Cleaning up all the optimization
21501e04c3fSmrgpasses to take a context argument and not call talloc_parent() is left
21601e04c3fSmrgas an exercise.
21701e04c3fSmrg
21801e04c3fSmrgQ: What is the file naming convention in this directory?
21901e04c3fSmrg
22001e04c3fSmrgInitially, there really wasn't one.  We have since adopted one:
22101e04c3fSmrg
22201e04c3fSmrg - Files that implement code lowering passes should be named lower_*
2237ec681f3Smrg   (e.g., lower_builtins.cpp).
22401e04c3fSmrg - Files that implement optimization passes should be named opt_*.
22501e04c3fSmrg - Files that implement a class that is used throught the code should
22601e04c3fSmrg   take the name of that class (e.g., ir_hierarchical_visitor.cpp).
22701e04c3fSmrg - Files that contain code not fitting in one of the previous
22801e04c3fSmrg   categories should have a sensible name (e.g., glsl_parser.yy).
229