Cross Reference: /xsrc/external/mit/MesaLib/dist/src/compiler/glsl/README

01e04c3fSmrgWelcome to Mesa's GLSL compiler.  A brief overview of how things flow:
01e04c3fSmrg
01e04c3fSmrg1) lex and yacc-based preprocessor takes the incoming shader string
01e04c3fSmrgand produces a new string containing the preprocessed shader.  This
01e04c3fSmrgtakes care of things like #if, #ifdef, #define, and preprocessor macro
01e04c3fSmrginvocations.  Note that #version, #extension, and some others are
01e04c3fSmrgpassed straight through.  See glcpp/*
01e04c3fSmrg
01e04c3fSmrg2) lex and yacc-based parser takes the preprocessed string and
01e04c3fSmrggenerates the AST (abstract syntax tree).  Almost no checking is
01e04c3fSmrgperformed in this stage.  See glsl_lexer.ll and glsl_parser.yy.
01e04c3fSmrg
01e04c3fSmrg3) The AST is converted to "HIR".  This is the intermediate
01e04c3fSmrgrepresentation of the compiler.  Constructors are generated, function
01e04c3fSmrgcalls are resolved to particular function signatures, and all the
01e04c3fSmrgsemantic checking is performed.  See ast_*.cpp for the conversion, and
01e04c3fSmrgir.h for the IR structures.
01e04c3fSmrg
01e04c3fSmrg4) The driver (Mesa, or main.cpp for the standalone binary) performs
01e04c3fSmrgoptimizations.  These include copy propagation, dead code elimination,
01e04c3fSmrgconstant folding, and others.  Generally the driver will call
01e04c3fSmrgoptimizations in a loop, as each may open up opportunities for other
01e04c3fSmrgoptimizations to do additional work.  See most files called ir_*.cpp
01e04c3fSmrg
01e04c3fSmrg5) linking is performed.  This does checking to ensure that the
01e04c3fSmrgoutputs of the vertex shader match the inputs of the fragment shader,
01e04c3fSmrgand assigns locations to uniforms, attributes, and varyings.  See
01e04c3fSmrglinker.cpp.
01e04c3fSmrg
01e04c3fSmrg6) The driver may perform additional optimization at this point, as
01e04c3fSmrgfor example dead code elimination previously couldn't remove functions
01e04c3fSmrgor global variable usage when we didn't know what other code would be
01e04c3fSmrglinked in.
01e04c3fSmrg
01e04c3fSmrg7) The driver performs code generation out of the IR, taking a linked
01e04c3fSmrgshader program and producing a compiled program for each stage.  See
01e04c3fSmrg../mesa/program/ir_to_mesa.cpp for Mesa IR code generation.
01e04c3fSmrg
01e04c3fSmrgFAQ:
01e04c3fSmrg
01e04c3fSmrgQ: What is HIR versus IR versus LIR?
01e04c3fSmrg
01e04c3fSmrgA: The idea behind the naming was that ast_to_hir would produce a
01e04c3fSmrghigh-level IR ("HIR"), with things like matrix operations, structure
01e04c3fSmrgassignments, etc., present.  A series of lowering passes would occur
01e04c3fSmrgthat do things like break matrix multiplication into a series of dot
01e04c3fSmrgproducts/MADs, make structure assignment be a series of assignment of
01e04c3fSmrgcomponents, flatten if statements into conditional moves, and such,
01e04c3fSmrgproducing a low level IR ("LIR").
01e04c3fSmrg
01e04c3fSmrgHowever, it now appears that each driver will have different
01e04c3fSmrgrequirements from a LIR.  A 915-generation chipset wants all functions
01e04c3fSmrginlined, all loops unrolled, all ifs flattened, no variable array
01e04c3fSmrgaccesses, and matrix multiplication broken down.  The Mesa IR backend
01e04c3fSmrgfor swrast would like matrices and structure assignment broken down,
01e04c3fSmrgbut it can support function calls and dynamic branching.  A 965 vertex
01e04c3fSmrgshader IR backend could potentially even handle some matrix operations
01e04c3fSmrgwithout breaking them down, but the 965 fragment shader IR backend
01e04c3fSmrgwould want to break to have (almost) all operations down channel-wise
01e04c3fSmrgand perform optimization on that.  As a result, there's no single
01e04c3fSmrglow-level IR that will make everyone happy.  So that usage has fallen
01e04c3fSmrgout of favor, and each driver will perform a series of lowering passes
01e04c3fSmrgto take the HIR down to whatever restrictions it wants to impose
01e04c3fSmrgbefore doing codegen.
01e04c3fSmrg
01e04c3fSmrgQ: How is the IR structured?
01e04c3fSmrg
01e04c3fSmrgA: The best way to get started seeing it would be to run the
01e04c3fSmrgstandalone compiler against a shader:
01e04c3fSmrg
01e04c3fSmrg./glsl_compiler --dump-lir \
01e04c3fSmrg	~/src/piglit/tests/shaders/glsl-orangebook-ch06-bump.frag
01e04c3fSmrg
01e04c3fSmrgSo for example one of the ir_instructions in main() contains:
01e04c3fSmrg
01e04c3fSmrg(assign (constant bool (1)) (var_ref litColor)  (expression vec3 * (var_ref Surf
01e04c3fSmrgaceColor) (var_ref __retval) ) )
01e04c3fSmrg
01e04c3fSmrgOr more visually:
01e04c3fSmrg                     (assign)
01e04c3fSmrg                 /       |        \
01e04c3fSmrg        (var_ref)  (expression *)  (constant bool 1)
01e04c3fSmrg         /          /           \
01e04c3fSmrg(litColor)      (var_ref)    (var_ref)
01e04c3fSmrg                  /                  \
01e04c3fSmrg           (SurfaceColor)          (__retval)
01e04c3fSmrg
01e04c3fSmrgwhich came from:
01e04c3fSmrg
01e04c3fSmrglitColor = SurfaceColor * max(dot(normDelta, LightDir), 0.0);
01e04c3fSmrg
01e04c3fSmrg(the max call is not represented in this expression tree, as it was a
01e04c3fSmrgfunction call that got inlined but not brought into this expression
01e04c3fSmrgtree)
01e04c3fSmrg
01e04c3fSmrgEach of those nodes is a subclass of ir_instruction.  A particular
01e04c3fSmrgir_instruction instance may only appear once in the whole IR tree with
01e04c3fSmrgthe exception of ir_variables, which appear once as variable
01e04c3fSmrgdeclarations:
01e04c3fSmrg
01e04c3fSmrg(declare () vec3 normDelta)
01e04c3fSmrg
01e04c3fSmrgand multiple times as the targets of variable dereferences:
01e04c3fSmrg...
01e04c3fSmrg(assign (constant bool (1)) (var_ref __retval) (expression float dot
01e04c3fSmrg (var_ref normDelta) (var_ref LightDir) ) )
01e04c3fSmrg...
01e04c3fSmrg(assign (constant bool (1)) (var_ref __retval) (expression vec3 -
01e04c3fSmrg (var_ref LightDir) (expression vec3 * (constant float (2.000000))
01e04c3fSmrg (expression vec3 * (expression float dot (var_ref normDelta) (var_ref
01e04c3fSmrg LightDir) ) (var_ref normDelta) ) ) ) )
01e04c3fSmrg...
01e04c3fSmrg
01e04c3fSmrgEach node has a type.  Expressions may involve several different types:
01e04c3fSmrg(declare (uniform ) mat4 gl_ModelViewMatrix)
01e04c3fSmrg((assign (constant bool (1)) (var_ref constructor_tmp) (expression
01e04c3fSmrg vec4 * (var_ref gl_ModelViewMatrix) (var_ref gl_Vertex) ) )
01e04c3fSmrg
01e04c3fSmrgAn expression tree can be arbitrarily deep, and the compiler tries to
01e04c3fSmrgkeep them structured like that so that things like algebraic
01e04c3fSmrgoptimizations ((color * 1.0 == color) and ((mat1 * mat2) * vec == mat1
01e04c3fSmrg* (mat2 * vec))) or recognizing operation patterns for code generation
01e04c3fSmrg(vec1 * vec2 + vec3 == mad(vec1, vec2, vec3)) are easier.  This comes
01e04c3fSmrgat the expense of additional trickery in implementing some
01e04c3fSmrgoptimizations like CSE where one must navigate an expression tree.
01e04c3fSmrg
01e04c3fSmrgQ: Why no SSA representation?
01e04c3fSmrg
01e04c3fSmrgA: Converting an IR tree to SSA form makes dead code elimination,
01e04c3fSmrgcommon subexpression elimination, and many other optimizations much
01e04c3fSmrgeasier.  However, in our primarily vector-based language, there's some
01e04c3fSmrgmajor questions as to how it would work.  Do we do SSA on the scalar
01e04c3fSmrgor vector level?  If we do it at the vector level, we're going to end
01e04c3fSmrgup with many different versions of the variable when encountering code
01e04c3fSmrglike:
01e04c3fSmrg
01e04c3fSmrg(assign (constant bool (1)) (swiz x (var_ref __retval) ) (var_ref a) )
01e04c3fSmrg(assign (constant bool (1)) (swiz y (var_ref __retval) ) (var_ref b) )
01e04c3fSmrg(assign (constant bool (1)) (swiz z (var_ref __retval) ) (var_ref c) )
01e04c3fSmrg
01e04c3fSmrgIf every masked update of a component relies on the previous value of
01e04c3fSmrgthe variable, then we're probably going to be quite limited in our
01e04c3fSmrgdead code elimination wins, and recognizing common expressions may
01e04c3fSmrgjust not happen.  On the other hand, if we operate channel-wise, then
01e04c3fSmrgwe'll be prone to optimizing the operation on one of the channels at
01e04c3fSmrgthe expense of making its instruction flow different from the other
01e04c3fSmrgchannels, and a vector-based GPU would end up with worse code than if
01e04c3fSmrgwe didn't optimize operations on that channel!
01e04c3fSmrg
01e04c3fSmrgOnce again, it appears that our optimization requirements are driven
01e04c3fSmrgsignificantly by the target architecture.  For now, targeting the Mesa
01e04c3fSmrgIR backend, SSA does not appear to be that important to producing
01e04c3fSmrgexcellent code, but we do expect to do some SSA-based optimizations
01e04c3fSmrgfor the 965 fragment shader backend when that is developed.
01e04c3fSmrg
01e04c3fSmrgQ: How should I expand instructions that take multiple backend instructions?
01e04c3fSmrg
01e04c3fSmrgSometimes you'll have to do the expansion in your code generation --
01e04c3fSmrgsee, for example, ir_to_mesa.cpp's handling of ir_unop_sqrt.  However,
01e04c3fSmrgin many cases you'll want to do a pass over the IR to convert
01e04c3fSmrgnon-native instructions to a series of native instructions.  For
01e04c3fSmrgexample, for the Mesa backend we have ir_div_to_mul_rcp.cpp because
01e04c3fSmrgMesa IR (and many hardware backends) only have a reciprocal
01e04c3fSmrginstruction, not a divide.  Implementing non-native instructions this
01e04c3fSmrgway gives the chance for constant folding to occur, so (a / 2.0)
01e04c3fSmrgbecomes (a * 0.5) after codegen instead of (a * (1.0 / 2.0))
01e04c3fSmrg
01e04c3fSmrgQ: How shoud I handle my special hardware instructions with respect to IR?
01e04c3fSmrg
01e04c3fSmrgOur current theory is that if multiple targets have an instruction for
01e04c3fSmrgsome operation, then we should probably be able to represent that in
01e04c3fSmrgthe IR.  Generally this is in the form of an ir_{bin,un}op expression
01e04c3fSmrgtype.  For example, we initially implemented fract() using (a -
01e04c3fSmrgfloor(a)), but both 945 and 965 have instructions to give that result,
01e04c3fSmrgand it would also simplify the implementation of mod(), so
01e04c3fSmrgir_unop_fract was added.  The following areas need updating to add a
01e04c3fSmrgnew expression type:
01e04c3fSmrg
01e04c3fSmrgir.h (new enum)
01e04c3fSmrgir.cpp:operator_strs (used for ir_reader)
01e04c3fSmrgir_constant_expression.cpp (you probably want to be able to constant fold)
01e04c3fSmrgir_validate.cpp (check users have the right types)
01e04c3fSmrg
01e04c3fSmrgYou may also need to update the backends if they will see the new expr type:
01e04c3fSmrg
01e04c3fSmrg../mesa/program/ir_to_mesa.cpp
01e04c3fSmrg
01e04c3fSmrgYou can then use the new expression from builtins (if all backends
01e04c3fSmrgwould rather see it), or scan the IR and convert to use your new
01e04c3fSmrgexpression type (see ir_mod_to_floor, for example).
01e04c3fSmrg
01e04c3fSmrgQ: How is memory management handled in the compiler?
01e04c3fSmrg
01e04c3fSmrgThe hierarchical memory allocator "talloc" developed for the Samba
01e04c3fSmrgproject is used, so that things like optimization passes don't have to
01e04c3fSmrgworry about their garbage collection so much.  It has a few nice
01e04c3fSmrgfeatures, including low performance overhead and good debugging
01e04c3fSmrgsupport that's trivially available.
01e04c3fSmrg
01e04c3fSmrgGenerally, each stage of the compile creates a talloc context and
01e04c3fSmrgallocates its memory out of that or children of it.  At the end of the
01e04c3fSmrgstage, the pieces still live are stolen to a new context and the old
01e04c3fSmrgone freed, or the whole context is kept for use by the next stage.
01e04c3fSmrg
01e04c3fSmrgFor IR transformations, a temporary context is used, then at the end
01e04c3fSmrgof all transformations, reparent_ir reparents all live nodes under the
01e04c3fSmrgshader's IR list, and the old context full of dead nodes is freed.
01e04c3fSmrgWhen developing a single IR transformation pass, this means that you
01e04c3fSmrgwant to allocate instruction nodes out of the temporary context, so if
01e04c3fSmrgit becomes dead it doesn't live on as the child of a live node.  At
01e04c3fSmrgthe moment, optimization passes aren't passed that temporary context,
01e04c3fSmrgso they find it by calling talloc_parent() on a nearby IR node.  The
01e04c3fSmrgtalloc_parent() call is expensive, so many passes will cache the
01e04c3fSmrgresult of the first talloc_parent().  Cleaning up all the optimization
01e04c3fSmrgpasses to take a context argument and not call talloc_parent() is left
01e04c3fSmrgas an exercise.
01e04c3fSmrg
01e04c3fSmrgQ: What is the file naming convention in this directory?
01e04c3fSmrg
01e04c3fSmrgInitially, there really wasn't one.  We have since adopted one:
01e04c3fSmrg
01e04c3fSmrg - Files that implement code lowering passes should be named lower_*
7ec681f3Smrg   (e.g., lower_builtins.cpp).
01e04c3fSmrg - Files that implement optimization passes should be named opt_*.
01e04c3fSmrg - Files that implement a class that is used throught the code should
01e04c3fSmrg   take the name of that class (e.g., ir_hierarchical_visitor.cpp).
01e04c3fSmrg - Files that contain code not fitting in one of the previous
01e04c3fSmrg   categories should have a sensible name (e.g., glsl_parser.yy).