Cross Reference: /xsrc/external/mit/MesaLib.old/dist/src/compiler/glsl/README

b8e80941SmrgWelcome to Mesa's GLSL compiler.  A brief overview of how things flow:
b8e80941Smrg
b8e80941Smrg1) lex and yacc-based preprocessor takes the incoming shader string
b8e80941Smrgand produces a new string containing the preprocessed shader.  This
b8e80941Smrgtakes care of things like #if, #ifdef, #define, and preprocessor macro
b8e80941Smrginvocations.  Note that #version, #extension, and some others are
b8e80941Smrgpassed straight through.  See glcpp/*
b8e80941Smrg
b8e80941Smrg2) lex and yacc-based parser takes the preprocessed string and
b8e80941Smrggenerates the AST (abstract syntax tree).  Almost no checking is
b8e80941Smrgperformed in this stage.  See glsl_lexer.ll and glsl_parser.yy.
b8e80941Smrg
b8e80941Smrg3) The AST is converted to "HIR".  This is the intermediate
b8e80941Smrgrepresentation of the compiler.  Constructors are generated, function
b8e80941Smrgcalls are resolved to particular function signatures, and all the
b8e80941Smrgsemantic checking is performed.  See ast_*.cpp for the conversion, and
b8e80941Smrgir.h for the IR structures.
b8e80941Smrg
b8e80941Smrg4) The driver (Mesa, or main.cpp for the standalone binary) performs
b8e80941Smrgoptimizations.  These include copy propagation, dead code elimination,
b8e80941Smrgconstant folding, and others.  Generally the driver will call
b8e80941Smrgoptimizations in a loop, as each may open up opportunities for other
b8e80941Smrgoptimizations to do additional work.  See most files called ir_*.cpp
b8e80941Smrg
b8e80941Smrg5) linking is performed.  This does checking to ensure that the
b8e80941Smrgoutputs of the vertex shader match the inputs of the fragment shader,
b8e80941Smrgand assigns locations to uniforms, attributes, and varyings.  See
b8e80941Smrglinker.cpp.
b8e80941Smrg
b8e80941Smrg6) The driver may perform additional optimization at this point, as
b8e80941Smrgfor example dead code elimination previously couldn't remove functions
b8e80941Smrgor global variable usage when we didn't know what other code would be
b8e80941Smrglinked in.
b8e80941Smrg
b8e80941Smrg7) The driver performs code generation out of the IR, taking a linked
b8e80941Smrgshader program and producing a compiled program for each stage.  See
b8e80941Smrg../mesa/program/ir_to_mesa.cpp for Mesa IR code generation.
b8e80941Smrg
b8e80941SmrgFAQ:
b8e80941Smrg
b8e80941SmrgQ: What is HIR versus IR versus LIR?
b8e80941Smrg
b8e80941SmrgA: The idea behind the naming was that ast_to_hir would produce a
b8e80941Smrghigh-level IR ("HIR"), with things like matrix operations, structure
b8e80941Smrgassignments, etc., present.  A series of lowering passes would occur
b8e80941Smrgthat do things like break matrix multiplication into a series of dot
b8e80941Smrgproducts/MADs, make structure assignment be a series of assignment of
b8e80941Smrgcomponents, flatten if statements into conditional moves, and such,
b8e80941Smrgproducing a low level IR ("LIR").
b8e80941Smrg
b8e80941SmrgHowever, it now appears that each driver will have different
b8e80941Smrgrequirements from a LIR.  A 915-generation chipset wants all functions
b8e80941Smrginlined, all loops unrolled, all ifs flattened, no variable array
b8e80941Smrgaccesses, and matrix multiplication broken down.  The Mesa IR backend
b8e80941Smrgfor swrast would like matrices and structure assignment broken down,
b8e80941Smrgbut it can support function calls and dynamic branching.  A 965 vertex
b8e80941Smrgshader IR backend could potentially even handle some matrix operations
b8e80941Smrgwithout breaking them down, but the 965 fragment shader IR backend
b8e80941Smrgwould want to break to have (almost) all operations down channel-wise
b8e80941Smrgand perform optimization on that.  As a result, there's no single
b8e80941Smrglow-level IR that will make everyone happy.  So that usage has fallen
b8e80941Smrgout of favor, and each driver will perform a series of lowering passes
b8e80941Smrgto take the HIR down to whatever restrictions it wants to impose
b8e80941Smrgbefore doing codegen.
b8e80941Smrg
b8e80941SmrgQ: How is the IR structured?
b8e80941Smrg
b8e80941SmrgA: The best way to get started seeing it would be to run the
b8e80941Smrgstandalone compiler against a shader:
b8e80941Smrg
b8e80941Smrg./glsl_compiler --dump-lir \
b8e80941Smrg	~/src/piglit/tests/shaders/glsl-orangebook-ch06-bump.frag
b8e80941Smrg
b8e80941SmrgSo for example one of the ir_instructions in main() contains:
b8e80941Smrg
b8e80941Smrg(assign (constant bool (1)) (var_ref litColor)  (expression vec3 * (var_ref Surf
b8e80941SmrgaceColor) (var_ref __retval) ) )
b8e80941Smrg
b8e80941SmrgOr more visually:
b8e80941Smrg                     (assign)
b8e80941Smrg                 /       |        \
b8e80941Smrg        (var_ref)  (expression *)  (constant bool 1)
b8e80941Smrg         /          /           \
b8e80941Smrg(litColor)      (var_ref)    (var_ref)
b8e80941Smrg                  /                  \
b8e80941Smrg           (SurfaceColor)          (__retval)
b8e80941Smrg
b8e80941Smrgwhich came from:
b8e80941Smrg
b8e80941SmrglitColor = SurfaceColor * max(dot(normDelta, LightDir), 0.0);
b8e80941Smrg
b8e80941Smrg(the max call is not represented in this expression tree, as it was a
b8e80941Smrgfunction call that got inlined but not brought into this expression
b8e80941Smrgtree)
b8e80941Smrg
b8e80941SmrgEach of those nodes is a subclass of ir_instruction.  A particular
b8e80941Smrgir_instruction instance may only appear once in the whole IR tree with
b8e80941Smrgthe exception of ir_variables, which appear once as variable
b8e80941Smrgdeclarations:
b8e80941Smrg
b8e80941Smrg(declare () vec3 normDelta)
b8e80941Smrg
b8e80941Smrgand multiple times as the targets of variable dereferences:
b8e80941Smrg...
b8e80941Smrg(assign (constant bool (1)) (var_ref __retval) (expression float dot
b8e80941Smrg (var_ref normDelta) (var_ref LightDir) ) )
b8e80941Smrg...
b8e80941Smrg(assign (constant bool (1)) (var_ref __retval) (expression vec3 -
b8e80941Smrg (var_ref LightDir) (expression vec3 * (constant float (2.000000))
b8e80941Smrg (expression vec3 * (expression float dot (var_ref normDelta) (var_ref
b8e80941Smrg LightDir) ) (var_ref normDelta) ) ) ) )
b8e80941Smrg...
b8e80941Smrg
b8e80941SmrgEach node has a type.  Expressions may involve several different types:
b8e80941Smrg(declare (uniform ) mat4 gl_ModelViewMatrix)
b8e80941Smrg((assign (constant bool (1)) (var_ref constructor_tmp) (expression
b8e80941Smrg vec4 * (var_ref gl_ModelViewMatrix) (var_ref gl_Vertex) ) )
b8e80941Smrg
b8e80941SmrgAn expression tree can be arbitrarily deep, and the compiler tries to
b8e80941Smrgkeep them structured like that so that things like algebraic
b8e80941Smrgoptimizations ((color * 1.0 == color) and ((mat1 * mat2) * vec == mat1
b8e80941Smrg* (mat2 * vec))) or recognizing operation patterns for code generation
b8e80941Smrg(vec1 * vec2 + vec3 == mad(vec1, vec2, vec3)) are easier.  This comes
b8e80941Smrgat the expense of additional trickery in implementing some
b8e80941Smrgoptimizations like CSE where one must navigate an expression tree.
b8e80941Smrg
b8e80941SmrgQ: Why no SSA representation?
b8e80941Smrg
b8e80941SmrgA: Converting an IR tree to SSA form makes dead code elimination,
b8e80941Smrgcommon subexpression elimination, and many other optimizations much
b8e80941Smrgeasier.  However, in our primarily vector-based language, there's some
b8e80941Smrgmajor questions as to how it would work.  Do we do SSA on the scalar
b8e80941Smrgor vector level?  If we do it at the vector level, we're going to end
b8e80941Smrgup with many different versions of the variable when encountering code
b8e80941Smrglike:
b8e80941Smrg
b8e80941Smrg(assign (constant bool (1)) (swiz x (var_ref __retval) ) (var_ref a) )
b8e80941Smrg(assign (constant bool (1)) (swiz y (var_ref __retval) ) (var_ref b) )
b8e80941Smrg(assign (constant bool (1)) (swiz z (var_ref __retval) ) (var_ref c) )
b8e80941Smrg
b8e80941SmrgIf every masked update of a component relies on the previous value of
b8e80941Smrgthe variable, then we're probably going to be quite limited in our
b8e80941Smrgdead code elimination wins, and recognizing common expressions may
b8e80941Smrgjust not happen.  On the other hand, if we operate channel-wise, then
b8e80941Smrgwe'll be prone to optimizing the operation on one of the channels at
b8e80941Smrgthe expense of making its instruction flow different from the other
b8e80941Smrgchannels, and a vector-based GPU would end up with worse code than if
b8e80941Smrgwe didn't optimize operations on that channel!
b8e80941Smrg
b8e80941SmrgOnce again, it appears that our optimization requirements are driven
b8e80941Smrgsignificantly by the target architecture.  For now, targeting the Mesa
b8e80941SmrgIR backend, SSA does not appear to be that important to producing
b8e80941Smrgexcellent code, but we do expect to do some SSA-based optimizations
b8e80941Smrgfor the 965 fragment shader backend when that is developed.
b8e80941Smrg
b8e80941SmrgQ: How should I expand instructions that take multiple backend instructions?
b8e80941Smrg
b8e80941SmrgSometimes you'll have to do the expansion in your code generation --
b8e80941Smrgsee, for example, ir_to_mesa.cpp's handling of ir_unop_sqrt.  However,
b8e80941Smrgin many cases you'll want to do a pass over the IR to convert
b8e80941Smrgnon-native instructions to a series of native instructions.  For
b8e80941Smrgexample, for the Mesa backend we have ir_div_to_mul_rcp.cpp because
b8e80941SmrgMesa IR (and many hardware backends) only have a reciprocal
b8e80941Smrginstruction, not a divide.  Implementing non-native instructions this
b8e80941Smrgway gives the chance for constant folding to occur, so (a / 2.0)
b8e80941Smrgbecomes (a * 0.5) after codegen instead of (a * (1.0 / 2.0))
b8e80941Smrg
b8e80941SmrgQ: How shoud I handle my special hardware instructions with respect to IR?
b8e80941Smrg
b8e80941SmrgOur current theory is that if multiple targets have an instruction for
b8e80941Smrgsome operation, then we should probably be able to represent that in
b8e80941Smrgthe IR.  Generally this is in the form of an ir_{bin,un}op expression
b8e80941Smrgtype.  For example, we initially implemented fract() using (a -
b8e80941Smrgfloor(a)), but both 945 and 965 have instructions to give that result,
b8e80941Smrgand it would also simplify the implementation of mod(), so
b8e80941Smrgir_unop_fract was added.  The following areas need updating to add a
b8e80941Smrgnew expression type:
b8e80941Smrg
b8e80941Smrgir.h (new enum)
b8e80941Smrgir.cpp:operator_strs (used for ir_reader)
b8e80941Smrgir_constant_expression.cpp (you probably want to be able to constant fold)
b8e80941Smrgir_validate.cpp (check users have the right types)
b8e80941Smrg
b8e80941SmrgYou may also need to update the backends if they will see the new expr type:
b8e80941Smrg
b8e80941Smrg../mesa/program/ir_to_mesa.cpp
b8e80941Smrg
b8e80941SmrgYou can then use the new expression from builtins (if all backends
b8e80941Smrgwould rather see it), or scan the IR and convert to use your new
b8e80941Smrgexpression type (see ir_mod_to_floor, for example).
b8e80941Smrg
b8e80941SmrgQ: How is memory management handled in the compiler?
b8e80941Smrg
b8e80941SmrgThe hierarchical memory allocator "talloc" developed for the Samba
b8e80941Smrgproject is used, so that things like optimization passes don't have to
b8e80941Smrgworry about their garbage collection so much.  It has a few nice
b8e80941Smrgfeatures, including low performance overhead and good debugging
b8e80941Smrgsupport that's trivially available.
b8e80941Smrg
b8e80941SmrgGenerally, each stage of the compile creates a talloc context and
b8e80941Smrgallocates its memory out of that or children of it.  At the end of the
b8e80941Smrgstage, the pieces still live are stolen to a new context and the old
b8e80941Smrgone freed, or the whole context is kept for use by the next stage.
b8e80941Smrg
b8e80941SmrgFor IR transformations, a temporary context is used, then at the end
b8e80941Smrgof all transformations, reparent_ir reparents all live nodes under the
b8e80941Smrgshader's IR list, and the old context full of dead nodes is freed.
b8e80941SmrgWhen developing a single IR transformation pass, this means that you
b8e80941Smrgwant to allocate instruction nodes out of the temporary context, so if
b8e80941Smrgit becomes dead it doesn't live on as the child of a live node.  At
b8e80941Smrgthe moment, optimization passes aren't passed that temporary context,
b8e80941Smrgso they find it by calling talloc_parent() on a nearby IR node.  The
b8e80941Smrgtalloc_parent() call is expensive, so many passes will cache the
b8e80941Smrgresult of the first talloc_parent().  Cleaning up all the optimization
b8e80941Smrgpasses to take a context argument and not call talloc_parent() is left
b8e80941Smrgas an exercise.
b8e80941Smrg
b8e80941SmrgQ: What is the file naming convention in this directory?
b8e80941Smrg
b8e80941SmrgInitially, there really wasn't one.  We have since adopted one:
b8e80941Smrg
b8e80941Smrg - Files that implement code lowering passes should be named lower_*
b8e80941Smrg   (e.g., lower_noise.cpp).
b8e80941Smrg - Files that implement optimization passes should be named opt_*.
b8e80941Smrg - Files that implement a class that is used throught the code should
b8e80941Smrg   take the name of that class (e.g., ir_hierarchical_visitor.cpp).
b8e80941Smrg - Files that contain code not fitting in one of the previous
b8e80941Smrg   categories should have a sensible name (e.g., glsl_parser.yy).