docs/source/tgsi.rst

848b8605SmrgTGSI
848b8605Smrg====
848b8605Smrg
848b8605SmrgTGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
848b8605Smrgfor describing shaders. Since Gallium is inherently shaderful, shaders are
848b8605Smrgan important part of the API. TGSI is the only intermediate representation
848b8605Smrgused by all drivers.
848b8605Smrg
848b8605SmrgBasics
848b8605Smrg------
848b8605Smrg
848b8605SmrgAll TGSI instructions, known as *opcodes*, operate on arbitrary-precision
848b8605Smrgfloating-point four-component vectors. An opcode may have up to one
848b8605Smrgdestination register, known as *dst*, and between zero and three source
848b8605Smrgregisters, called *src0* through *src2*, or simply *src* if there is only
848b8605Smrgone.
848b8605Smrg
848b8605SmrgSome instructions, like :opcode:`I2F`, permit re-interpretation of vector
848b8605Smrgcomponents as integers. Other instructions permit using registers as
848b8605Smrgtwo-component vectors with double precision; see :ref:`doubleopcodes`.
848b8605Smrg
848b8605SmrgWhen an instruction has a scalar result, the result is usually copied into
848b8605Smrgeach of the components of *dst*. When this happens, the result is said to be
848b8605Smrg*replicated* to *dst*. :opcode:`RCP` is one such instruction.
848b8605Smrg
848b8605SmrgModifiers
848b8605Smrg^^^^^^^^^^^^^^^
848b8605Smrg
b8e80941SmrgTGSI supports modifiers on inputs (as well as saturate and precise modifier
b8e80941Smrgon instructions).
848b8605Smrg
b8e80941SmrgFor arithmetic instruction having a precise modifier certain optimizations
b8e80941Smrgwhich may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be
b8e80941Smrgoptimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
b8e80941SmrgMAD instruction.
b8e80941Smrg
b8e80941SmrgFor inputs which have a floating point type, both absolute value and
b8e80941Smrgnegation modifiers are supported (with absolute value being applied
b8e80941Smrgfirst).  The only source of TGSI_OPCODE_MOV and the second and third
b8e80941Smrgsources of TGSI_OPCODE_UCMP are considered to have float type for
b8e80941Smrgapplying modifiers.
848b8605Smrg
848b8605SmrgFor inputs which have signed or unsigned type only the negate modifier is
848b8605Smrgsupported.
848b8605Smrg
848b8605SmrgInstruction Set
848b8605Smrg---------------
848b8605Smrg
848b8605SmrgCore ISA
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgThese opcodes are guaranteed to be available regardless of the driver being
848b8605Smrgused.
848b8605Smrg
848b8605Smrg.. opcode:: ARL - Address Register Load
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = (int) \lfloor src.x\rfloor
848b8605Smrg
b8e80941Smrg  dst.y = (int) \lfloor src.y\rfloor
848b8605Smrg
b8e80941Smrg  dst.z = (int) \lfloor src.z\rfloor
848b8605Smrg
b8e80941Smrg  dst.w = (int) \lfloor src.w\rfloor
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: MOV - Move
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src.x
848b8605Smrg
848b8605Smrg  dst.y = src.y
848b8605Smrg
848b8605Smrg  dst.z = src.z
848b8605Smrg
848b8605Smrg  dst.w = src.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: LIT - Light Coefficients
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x &= 1 \\
848b8605Smrg  dst.y &= max(src.x, 0) \\
848b8605Smrg  dst.z &= (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 \\
848b8605Smrg  dst.w &= 1
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: RCP - Reciprocal
848b8605Smrg
848b8605SmrgThis instruction replicates its result.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = \frac{1}{src.x}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: RSQ - Reciprocal Square Root
848b8605Smrg
848b8605SmrgThis instruction replicates its result. The results are undefined for src <= 0.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = \frac{1}{\sqrt{src.x}}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SQRT - Square Root
848b8605Smrg
848b8605SmrgThis instruction replicates its result. The results are undefined for src < 0.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = {\sqrt{src.x}}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: EXP - Approximate Exponential Base 2
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x &= 2^{\lfloor src.x\rfloor} \\
848b8605Smrg  dst.y &= src.x - \lfloor src.x\rfloor \\
848b8605Smrg  dst.z &= 2^{src.x} \\
848b8605Smrg  dst.w &= 1
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: LOG - Approximate Logarithm Base 2
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x &= \lfloor\log_2{|src.x|}\rfloor \\
848b8605Smrg  dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\
848b8605Smrg  dst.z &= \log_2{|src.x|} \\
848b8605Smrg  dst.w &= 1
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: MUL - Multiply
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x \times src1.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y \times src1.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z \times src1.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w \times src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ADD - Add
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x + src1.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y + src1.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z + src1.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w + src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: DP3 - 3-component Dot Product
848b8605Smrg
848b8605SmrgThis instruction replicates its result.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: DP4 - 4-component Dot Product
848b8605Smrg
848b8605SmrgThis instruction replicates its result.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: DST - Distance Vector
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x &= 1\\
848b8605Smrg  dst.y &= src0.y \times src1.y\\
848b8605Smrg  dst.z &= src0.z\\
848b8605Smrg  dst.w &= src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: MIN - Minimum
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = min(src0.x, src1.x)
848b8605Smrg
848b8605Smrg  dst.y = min(src0.y, src1.y)
848b8605Smrg
848b8605Smrg  dst.z = min(src0.z, src1.z)
848b8605Smrg
848b8605Smrg  dst.w = min(src0.w, src1.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: MAX - Maximum
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = max(src0.x, src1.x)
848b8605Smrg
848b8605Smrg  dst.y = max(src0.y, src1.y)
848b8605Smrg
848b8605Smrg  dst.z = max(src0.z, src1.z)
848b8605Smrg
848b8605Smrg  dst.w = max(src0.w, src1.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SLT - Set On Less Than
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SGE - Set On Greater Equal Than
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: MAD - Multiply And Add
848b8605Smrg
b8e80941SmrgPerform a * b + c. The implementation is free to decide whether there is an
b8e80941Smrgintermediate rounding step or not.
b8e80941Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x \times src1.x + src2.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y \times src1.y + src2.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z \times src1.z + src2.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w \times src1.w + src2.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: LRP - Linear Interpolate
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
848b8605Smrg
848b8605Smrg
b8e80941Smrg.. opcode:: FMA - Fused Multiply-Add
848b8605Smrg
b8e80941SmrgPerform a * b + c with no intermediate rounding step.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = src0.x \times src1.x + src2.x
848b8605Smrg
b8e80941Smrg  dst.y = src0.y \times src1.y + src2.y
848b8605Smrg
b8e80941Smrg  dst.z = src0.z \times src1.z + src2.z
848b8605Smrg
b8e80941Smrg  dst.w = src0.w \times src1.w + src2.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: FRC - Fraction
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src.x - \lfloor src.x\rfloor
848b8605Smrg
848b8605Smrg  dst.y = src.y - \lfloor src.y\rfloor
848b8605Smrg
848b8605Smrg  dst.z = src.z - \lfloor src.z\rfloor
848b8605Smrg
848b8605Smrg  dst.w = src.w - \lfloor src.w\rfloor
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: FLR - Floor
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = \lfloor src.x\rfloor
848b8605Smrg
848b8605Smrg  dst.y = \lfloor src.y\rfloor
848b8605Smrg
848b8605Smrg  dst.z = \lfloor src.z\rfloor
848b8605Smrg
848b8605Smrg  dst.w = \lfloor src.w\rfloor
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ROUND - Round
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = round(src.x)
848b8605Smrg
848b8605Smrg  dst.y = round(src.y)
848b8605Smrg
848b8605Smrg  dst.z = round(src.z)
848b8605Smrg
848b8605Smrg  dst.w = round(src.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: EX2 - Exponential Base 2
848b8605Smrg
848b8605SmrgThis instruction replicates its result.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = 2^{src.x}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: LG2 - Logarithm Base 2
848b8605Smrg
848b8605SmrgThis instruction replicates its result.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = \log_2{src.x}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: POW - Power
848b8605Smrg
848b8605SmrgThis instruction replicates its result.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = src0.x^{src1.x}
848b8605Smrg
848b8605Smrg
b8e80941Smrg.. opcode:: LDEXP - Multiply Number by Integral Power of 2
848b8605Smrg
b8e80941Smrgsrc1 is an integer.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = src0.x * 2^{src1.x}
b8e80941Smrg  dst.y = src0.y * 2^{src1.y}
b8e80941Smrg  dst.z = src0.z * 2^{src1.z}
b8e80941Smrg  dst.w = src0.w * 2^{src1.w}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: COS - Cosine
848b8605Smrg
848b8605SmrgThis instruction replicates its result.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = \cos{src.x}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: DDX, DDX_FINE - Derivative Relative To X
848b8605Smrg
848b8605SmrgThe fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
848b8605Smrgadvertised. When it is, the fine version guarantees one derivative per row
848b8605Smrgwhile DDX is allowed to be the same for the entire 2x2 quad.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = partialx(src.x)
848b8605Smrg
848b8605Smrg  dst.y = partialx(src.y)
848b8605Smrg
848b8605Smrg  dst.z = partialx(src.z)
848b8605Smrg
848b8605Smrg  dst.w = partialx(src.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: DDY, DDY_FINE - Derivative Relative To Y
848b8605Smrg
848b8605SmrgThe fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
848b8605Smrgadvertised. When it is, the fine version guarantees one derivative per column
848b8605Smrgwhile DDY is allowed to be the same for the entire 2x2 quad.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = partialy(src.x)
848b8605Smrg
848b8605Smrg  dst.y = partialy(src.y)
848b8605Smrg
848b8605Smrg  dst.z = partialy(src.z)
848b8605Smrg
848b8605Smrg  dst.w = partialy(src.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: PK2H - Pack Two 16-bit Floats
848b8605Smrg
b8e80941SmrgThis instruction replicates its result.
848b8605Smrg
b8e80941Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst = f32\_to\_f16(src.x) | f32\_to\_f16(src.y) << 16
848b8605Smrg
848b8605Smrg
b8e80941Smrg.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
848b8605Smrg
b8e80941SmrgThis instruction replicates its result.
848b8605Smrg
b8e80941Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst = f32\_to\_unorm16(src.x) | f32\_to\_unorm16(src.y) << 16
848b8605Smrg
848b8605Smrg
b8e80941Smrg.. opcode:: PK4B - Pack Four Signed 8-bit Scalars
848b8605Smrg
b8e80941SmrgThis instruction replicates its result.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst = f32\_to\_snorm8(src.x) |
b8e80941Smrg        (f32\_to\_snorm8(src.y) << 8) |
b8e80941Smrg        (f32\_to\_snorm8(src.z) << 16) |
b8e80941Smrg        (f32\_to\_snorm8(src.w) << 24)
848b8605Smrg
848b8605Smrg
b8e80941Smrg.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
848b8605Smrg
b8e80941SmrgThis instruction replicates its result.
848b8605Smrg
b8e80941Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst = f32\_to\_unorm8(src.x) |
b8e80941Smrg        (f32\_to\_unorm8(src.y) << 8) |
b8e80941Smrg        (f32\_to\_unorm8(src.z) << 16) |
b8e80941Smrg        (f32\_to\_unorm8(src.w) << 24)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SEQ - Set On Equal
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SGT - Set On Greater Than
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SIN - Sine
848b8605Smrg
848b8605SmrgThis instruction replicates its result.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = \sin{src.x}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SLE - Set On Less Equal Than
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SNE - Set On Not Equal
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg  dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: TEX - Texture Lookup
848b8605Smrg
848b8605Smrg  for array textures src0.y contains the slice for 1D,
848b8605Smrg  and src0.z contain the slice for 2D.
848b8605Smrg
848b8605Smrg  for shadow textures with no arrays (and not cube map),
848b8605Smrg  src0.z contains the reference value.
848b8605Smrg
848b8605Smrg  for shadow textures with arrays, src0.z contains
848b8605Smrg  the reference value for 1D arrays, and src0.w contains
848b8605Smrg  the reference value for 2D arrays and cube maps.
848b8605Smrg
848b8605Smrg  for cube map array shadow textures, the reference value
848b8605Smrg  cannot be passed in src0.w, and TEX2 must be used instead.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  coord = src0
848b8605Smrg
848b8605Smrg  shadow_ref = src0.z or src0.w (optional)
848b8605Smrg
848b8605Smrg  unit = src1
848b8605Smrg
848b8605Smrg  dst = texture\_sample(unit, coord, shadow_ref)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only)
848b8605Smrg
848b8605Smrg  this is the same as TEX, but uses another reg to encode the
848b8605Smrg  reference value.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  coord = src0
848b8605Smrg
848b8605Smrg  shadow_ref = src1.x
848b8605Smrg
848b8605Smrg  unit = src2
848b8605Smrg
848b8605Smrg  dst = texture\_sample(unit, coord, shadow_ref)
848b8605Smrg
848b8605Smrg
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: TXD - Texture Lookup with Derivatives
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  coord = src0
848b8605Smrg
848b8605Smrg  ddx = src1
848b8605Smrg
848b8605Smrg  ddy = src2
848b8605Smrg
848b8605Smrg  unit = src3
848b8605Smrg
848b8605Smrg  dst = texture\_sample\_deriv(unit, coord, ddx, ddy)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: TXP - Projective Texture Lookup
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  coord.x = src0.x / src0.w
848b8605Smrg
848b8605Smrg  coord.y = src0.y / src0.w
848b8605Smrg
848b8605Smrg  coord.z = src0.z / src0.w
848b8605Smrg
848b8605Smrg  coord.w = src0.w
848b8605Smrg
848b8605Smrg  unit = src1
848b8605Smrg
848b8605Smrg  dst = texture\_sample(unit, coord)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UP2H - Unpack Two 16-Bit Floats
848b8605Smrg
b8e80941Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = f16\_to\_f32(src0.x \& 0xffff)
848b8605Smrg
b8e80941Smrg  dst.y = f16\_to\_f32(src0.x >> 16)
848b8605Smrg
b8e80941Smrg  dst.z = f16\_to\_f32(src0.x \& 0xffff)
848b8605Smrg
b8e80941Smrg  dst.w = f16\_to\_f32(src0.x >> 16)
848b8605Smrg
848b8605Smrg.. note::
848b8605Smrg
848b8605Smrg   Considered for removal.
848b8605Smrg
b8e80941Smrg.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
848b8605Smrg
848b8605Smrg  TBD
848b8605Smrg
848b8605Smrg.. note::
848b8605Smrg
848b8605Smrg   Considered for removal.
848b8605Smrg
b8e80941Smrg.. opcode:: UP4B - Unpack Four Signed 8-Bit Values
848b8605Smrg
848b8605Smrg  TBD
848b8605Smrg
848b8605Smrg.. note::
848b8605Smrg
848b8605Smrg   Considered for removal.
848b8605Smrg
b8e80941Smrg.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
848b8605Smrg
848b8605Smrg  TBD
848b8605Smrg
848b8605Smrg.. note::
848b8605Smrg
848b8605Smrg   Considered for removal.
848b8605Smrg
b8e80941Smrg
848b8605Smrg.. opcode:: ARR - Address Register Load With Round
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = (int) round(src.x)
848b8605Smrg
b8e80941Smrg  dst.y = (int) round(src.y)
848b8605Smrg
b8e80941Smrg  dst.z = (int) round(src.z)
848b8605Smrg
b8e80941Smrg  dst.w = (int) round(src.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SSG - Set Sign
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: CMP - Compare
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x < 0) ? src1.x : src2.x
848b8605Smrg
848b8605Smrg  dst.y = (src0.y < 0) ? src1.y : src2.y
848b8605Smrg
848b8605Smrg  dst.z = (src0.z < 0) ? src1.z : src2.z
848b8605Smrg
848b8605Smrg  dst.w = (src0.w < 0) ? src1.w : src2.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: KILL_IF - Conditional Discard
848b8605Smrg
848b8605Smrg  Conditional discard.  Allowed in fragment shaders only.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
848b8605Smrg    discard
848b8605Smrg  endif
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: KILL - Discard
848b8605Smrg
848b8605Smrg  Unconditional discard.  Allowed in fragment shaders only.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: TXB - Texture Lookup With Bias
848b8605Smrg
848b8605Smrg  for cube map array textures and shadow cube maps, the bias value
848b8605Smrg  cannot be passed in src0.w, and TXB2 must be used instead.
848b8605Smrg
848b8605Smrg  if the target is a shadow texture, the reference value is always
848b8605Smrg  in src.z (this prevents shadow 3d and shadow 2d arrays from
848b8605Smrg  using this instruction, but this is not needed).
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  coord.x = src0.x
848b8605Smrg
848b8605Smrg  coord.y = src0.y
848b8605Smrg
848b8605Smrg  coord.z = src0.z
848b8605Smrg
848b8605Smrg  coord.w = none
848b8605Smrg
848b8605Smrg  bias = src0.w
848b8605Smrg
848b8605Smrg  unit = src1
848b8605Smrg
848b8605Smrg  dst = texture\_sample(unit, coord, bias)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only)
848b8605Smrg
848b8605Smrg  this is the same as TXB, but uses another reg to encode the
848b8605Smrg  lod bias value for cube map arrays and shadow cube maps.
848b8605Smrg  Presumably shadow 2d arrays and shadow 3d targets could use
848b8605Smrg  this encoding too, but this is not legal.
848b8605Smrg
848b8605Smrg  shadow cube map arrays are neither possible nor required.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  coord = src0
848b8605Smrg
848b8605Smrg  bias = src1.x
848b8605Smrg
848b8605Smrg  unit = src2
848b8605Smrg
848b8605Smrg  dst = texture\_sample(unit, coord, bias)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: DIV - Divide
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = \frac{src0.x}{src1.x}
848b8605Smrg
848b8605Smrg  dst.y = \frac{src0.y}{src1.y}
848b8605Smrg
848b8605Smrg  dst.z = \frac{src0.z}{src1.z}
848b8605Smrg
848b8605Smrg  dst.w = \frac{src0.w}{src1.w}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: DP2 - 2-component Dot Product
848b8605Smrg
848b8605SmrgThis instruction replicates its result.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst = src0.x \times src1.x + src0.y \times src1.y
848b8605Smrg
848b8605Smrg
b8e80941Smrg.. opcode:: TEX_LZ - Texture Lookup With LOD = 0
b8e80941Smrg
b8e80941Smrg  This is the same as TXL with LOD = 0. Like every texture opcode, it obeys
b8e80941Smrg  pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod.
b8e80941Smrg  There is no way to override those two in shaders.
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  coord.x = src0.x
b8e80941Smrg
b8e80941Smrg  coord.y = src0.y
b8e80941Smrg
b8e80941Smrg  coord.z = src0.z
b8e80941Smrg
b8e80941Smrg  coord.w = none
b8e80941Smrg
b8e80941Smrg  lod = 0
b8e80941Smrg
b8e80941Smrg  unit = src1
b8e80941Smrg
b8e80941Smrg  dst = texture\_sample(unit, coord, lod)
b8e80941Smrg
b8e80941Smrg
848b8605Smrg.. opcode:: TXL - Texture Lookup With explicit LOD
848b8605Smrg
848b8605Smrg  for cube map array textures, the explicit lod value
848b8605Smrg  cannot be passed in src0.w, and TXL2 must be used instead.
848b8605Smrg
848b8605Smrg  if the target is a shadow texture, the reference value is always
848b8605Smrg  in src.z (this prevents shadow 3d / 2d array / cube targets from
848b8605Smrg  using this instruction, but this is not needed).
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  coord.x = src0.x
848b8605Smrg
848b8605Smrg  coord.y = src0.y
848b8605Smrg
848b8605Smrg  coord.z = src0.z
848b8605Smrg
848b8605Smrg  coord.w = none
848b8605Smrg
848b8605Smrg  lod = src0.w
848b8605Smrg
848b8605Smrg  unit = src1
848b8605Smrg
848b8605Smrg  dst = texture\_sample(unit, coord, lod)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only)
848b8605Smrg
848b8605Smrg  this is the same as TXL, but uses another reg to encode the
848b8605Smrg  explicit lod value.
848b8605Smrg  Presumably shadow 3d / 2d array / cube targets could use
848b8605Smrg  this encoding too, but this is not legal.
848b8605Smrg
848b8605Smrg  shadow cube map arrays are neither possible nor required.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  coord = src0
848b8605Smrg
848b8605Smrg  lod = src1.x
848b8605Smrg
848b8605Smrg  unit = src2
848b8605Smrg
848b8605Smrg  dst = texture\_sample(unit, coord, lod)
848b8605Smrg
848b8605Smrg
848b8605SmrgCompute ISA
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgThese opcodes are primarily provided for special-use computational shaders.
848b8605SmrgSupport for these opcodes indicated by a special pipe capability bit (TBD).
848b8605Smrg
848b8605SmrgXXX doesn't look like most of the opcodes really belong here.
848b8605Smrg
848b8605Smrg.. opcode:: CEIL - Ceiling
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = \lceil src.x\rceil
848b8605Smrg
848b8605Smrg  dst.y = \lceil src.y\rceil
848b8605Smrg
848b8605Smrg  dst.z = \lceil src.z\rceil
848b8605Smrg
848b8605Smrg  dst.w = \lceil src.w\rceil
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: TRUNC - Truncate
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = trunc(src.x)
848b8605Smrg
848b8605Smrg  dst.y = trunc(src.y)
848b8605Smrg
848b8605Smrg  dst.z = trunc(src.z)
848b8605Smrg
848b8605Smrg  dst.w = trunc(src.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: MOD - Modulus
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x \bmod src1.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y \bmod src1.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z \bmod src1.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w \bmod src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UARL - Integer Address Register Load
848b8605Smrg
848b8605Smrg  Moves the contents of the source register, assumed to be an integer, into the
848b8605Smrg  destination register, which is assumed to be an address (ADDR) register.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: TXF - Texel Fetch
848b8605Smrg
848b8605Smrg  As per NV_gpu_shader4, extract a single texel from a specified texture
b8e80941Smrg  image or PIPE_BUFFER resource. The source sampler may not be a CUBE or
b8e80941Smrg  SHADOW.  src 0 is a
848b8605Smrg  four-component signed integer vector used to identify the single texel
b8e80941Smrg  accessed. 3 components + level.  If the texture is multisampled, then
b8e80941Smrg  the fourth component indicates the sample, not the mipmap level.
b8e80941Smrg  Just like texture instructions, an optional
848b8605Smrg  offset vector is provided, which is subject to various driver restrictions
b8e80941Smrg  (regarding range, source of offsets). This instruction ignores the sampler
b8e80941Smrg  state.
b8e80941Smrg
848b8605Smrg  TXF(uint_vec coord, int_vec offset).
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: TXQ - Texture Size Query
848b8605Smrg
848b8605Smrg  As per NV_gpu_program4, retrieve the dimensions of the texture depending on
848b8605Smrg  the target. For 1D (width), 2D/RECT/CUBE (width, height), 3D (width, height,
848b8605Smrg  depth), 1D array (width, layers), 2D array (width, height, layers).
848b8605Smrg  Also return the number of accessible levels (last_level - first_level + 1)
848b8605Smrg  in W.
848b8605Smrg
848b8605Smrg  For components which don't return a resource dimension, their value
848b8605Smrg  is undefined.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  lod = src0.x
848b8605Smrg
848b8605Smrg  dst.x = texture\_width(unit, lod)
848b8605Smrg
848b8605Smrg  dst.y = texture\_height(unit, lod)
848b8605Smrg
848b8605Smrg  dst.z = texture\_depth(unit, lod)
848b8605Smrg
848b8605Smrg  dst.w = texture\_levels(unit)
848b8605Smrg
b8e80941Smrg
b8e80941Smrg.. opcode:: TXQS - Texture Samples Query
b8e80941Smrg
b8e80941Smrg  This retrieves the number of samples in the texture, and stores it
b8e80941Smrg  into the x component as an unsigned integer. The other components are
b8e80941Smrg  undefined.  If the texture is not multisampled, this function returns
b8e80941Smrg  (1, undef, undef, undef).
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.x = texture\_samples(unit)
b8e80941Smrg
b8e80941Smrg
848b8605Smrg.. opcode:: TG4 - Texture Gather
848b8605Smrg
848b8605Smrg  As per ARB_texture_gather, gathers the four texels to be used in a bi-linear
848b8605Smrg  filtering operation and packs them into a single register.  Only works with
848b8605Smrg  2D, 2D array, cubemaps, and cubemaps arrays.  For 2D textures, only the
848b8605Smrg  addressing modes of the sampler and the top level of any mip pyramid are
848b8605Smrg  used. Set W to zero.  It behaves like the TEX instruction, but a filtered
848b8605Smrg  sample is not generated. The four samples that contribute to filtering are
848b8605Smrg  placed into xyzw in clockwise order, starting with the (u,v) texture
848b8605Smrg  coordinate delta at the following locations (-, +), (+, +), (+, -), (-, -),
848b8605Smrg  where the magnitude of the deltas are half a texel.
848b8605Smrg
848b8605Smrg  PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
848b8605Smrg  depth compares, single component selection, and a non-constant offset. It
848b8605Smrg  doesn't allow support for the GL independent offset to get i0,j0. This would
848b8605Smrg  require another CAP is hw can do it natively. For now we lower that before
848b8605Smrg  TGSI.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg   coord = src0
848b8605Smrg
848b8605Smrg   component = src1
848b8605Smrg
848b8605Smrg   dst = texture\_gather4 (unit, coord, component)
848b8605Smrg
848b8605Smrg(with SM5 - cube array shadow)
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg   coord = src0
848b8605Smrg
848b8605Smrg   compare = src1
848b8605Smrg
848b8605Smrg   dst = texture\_gather (uint, coord, compare)
848b8605Smrg
848b8605Smrg.. opcode:: LODQ - level of detail query
848b8605Smrg
848b8605Smrg   Compute the LOD information that the texture pipe would use to access the
848b8605Smrg   texture. The Y component contains the computed LOD lambda_prime. The X
848b8605Smrg   component contains the LOD that will be accessed, based on min/max lod's
848b8605Smrg   and mipmap filters.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg   coord = src0
848b8605Smrg
848b8605Smrg   dst.xy = lodq(uint, coord);
848b8605Smrg
b8e80941Smrg.. opcode:: CLOCK - retrieve the current shader time
b8e80941Smrg
b8e80941Smrg   Invoking this instruction multiple times in the same shader should
b8e80941Smrg   cause monotonically increasing values to be returned. The values
b8e80941Smrg   are implicitly 64-bit, so if fewer than 64 bits of precision are
b8e80941Smrg   available, to provide expected wraparound semantics, the value
b8e80941Smrg   should be shifted up so that the most significant bit of the time
b8e80941Smrg   is the most significant bit of the 64-bit value.
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = clock()
b8e80941Smrg
b8e80941Smrg
848b8605SmrgInteger ISA
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
848b8605SmrgThese opcodes are used for integer operations.
848b8605SmrgSupport for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: I2F - Signed Integer To Float
848b8605Smrg
848b8605Smrg   Rounding is unspecified (round to nearest even suggested).
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (float) src.x
848b8605Smrg
848b8605Smrg  dst.y = (float) src.y
848b8605Smrg
848b8605Smrg  dst.z = (float) src.z
848b8605Smrg
848b8605Smrg  dst.w = (float) src.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: U2F - Unsigned Integer To Float
848b8605Smrg
848b8605Smrg   Rounding is unspecified (round to nearest even suggested).
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (float) src.x
848b8605Smrg
848b8605Smrg  dst.y = (float) src.y
848b8605Smrg
848b8605Smrg  dst.z = (float) src.z
848b8605Smrg
848b8605Smrg  dst.w = (float) src.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: F2I - Float to Signed Integer
848b8605Smrg
848b8605Smrg   Rounding is towards zero (truncate).
848b8605Smrg   Values outside signed range (including NaNs) produce undefined results.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (int) src.x
848b8605Smrg
848b8605Smrg  dst.y = (int) src.y
848b8605Smrg
848b8605Smrg  dst.z = (int) src.z
848b8605Smrg
848b8605Smrg  dst.w = (int) src.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: F2U - Float to Unsigned Integer
848b8605Smrg
848b8605Smrg   Rounding is towards zero (truncate).
848b8605Smrg   Values outside unsigned range (including NaNs) produce undefined results.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (unsigned) src.x
848b8605Smrg
848b8605Smrg  dst.y = (unsigned) src.y
848b8605Smrg
848b8605Smrg  dst.z = (unsigned) src.z
848b8605Smrg
848b8605Smrg  dst.w = (unsigned) src.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UADD - Integer Add
848b8605Smrg
848b8605Smrg   This instruction works the same for signed and unsigned integers.
848b8605Smrg   The low 32bit of the result is returned.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x + src1.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y + src1.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z + src1.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w + src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UMAD - Integer Multiply And Add
848b8605Smrg
848b8605Smrg   This instruction works the same for signed and unsigned integers.
848b8605Smrg   The multiplication returns the low 32bit (as does the result itself).
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x \times src1.x + src2.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y \times src1.y + src2.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z \times src1.z + src2.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w \times src1.w + src2.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UMUL - Integer Multiply
848b8605Smrg
848b8605Smrg   This instruction works the same for signed and unsigned integers.
848b8605Smrg   The low 32bit of the result is returned.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x \times src1.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y \times src1.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z \times src1.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w \times src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: IMUL_HI - Signed Integer Multiply High Bits
848b8605Smrg
848b8605Smrg   The high 32bits of the multiplication of 2 signed integers are returned.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x \times src1.x) >> 32
848b8605Smrg
848b8605Smrg  dst.y = (src0.y \times src1.y) >> 32
848b8605Smrg
848b8605Smrg  dst.z = (src0.z \times src1.z) >> 32
848b8605Smrg
848b8605Smrg  dst.w = (src0.w \times src1.w) >> 32
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
848b8605Smrg
848b8605Smrg   The high 32bits of the multiplication of 2 unsigned integers are returned.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x \times src1.x) >> 32
848b8605Smrg
848b8605Smrg  dst.y = (src0.y \times src1.y) >> 32
848b8605Smrg
848b8605Smrg  dst.z = (src0.z \times src1.z) >> 32
848b8605Smrg
848b8605Smrg  dst.w = (src0.w \times src1.w) >> 32
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: IDIV - Signed Integer Division
848b8605Smrg
848b8605Smrg   TBD: behavior for division by zero.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = \frac{src0.x}{src1.x}
848b8605Smrg
b8e80941Smrg  dst.y = \frac{src0.y}{src1.y}
848b8605Smrg
b8e80941Smrg  dst.z = \frac{src0.z}{src1.z}
848b8605Smrg
b8e80941Smrg  dst.w = \frac{src0.w}{src1.w}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UDIV - Unsigned Integer Division
848b8605Smrg
848b8605Smrg   For division by zero, 0xffffffff is returned.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = \frac{src0.x}{src1.x}
848b8605Smrg
b8e80941Smrg  dst.y = \frac{src0.y}{src1.y}
848b8605Smrg
b8e80941Smrg  dst.z = \frac{src0.z}{src1.z}
848b8605Smrg
b8e80941Smrg  dst.w = \frac{src0.w}{src1.w}
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UMOD - Unsigned Integer Remainder
848b8605Smrg
848b8605Smrg   If second arg is zero, 0xffffffff is returned.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = src0.x \bmod src1.x
848b8605Smrg
b8e80941Smrg  dst.y = src0.y \bmod src1.y
848b8605Smrg
b8e80941Smrg  dst.z = src0.z \bmod src1.z
848b8605Smrg
b8e80941Smrg  dst.w = src0.w \bmod src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: NOT - Bitwise Not
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = \sim src.x
848b8605Smrg
848b8605Smrg  dst.y = \sim src.y
848b8605Smrg
848b8605Smrg  dst.z = \sim src.z
848b8605Smrg
848b8605Smrg  dst.w = \sim src.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: AND - Bitwise And
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x \& src1.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y \& src1.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z \& src1.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w \& src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: OR - Bitwise Or
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x | src1.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y | src1.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z | src1.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w | src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: XOR - Bitwise Xor
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x \oplus src1.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y \oplus src1.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z \oplus src1.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w \oplus src1.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: IMAX - Maximum of Signed Integers
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = max(src0.x, src1.x)
848b8605Smrg
848b8605Smrg  dst.y = max(src0.y, src1.y)
848b8605Smrg
848b8605Smrg  dst.z = max(src0.z, src1.z)
848b8605Smrg
848b8605Smrg  dst.w = max(src0.w, src1.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UMAX - Maximum of Unsigned Integers
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = max(src0.x, src1.x)
848b8605Smrg
848b8605Smrg  dst.y = max(src0.y, src1.y)
848b8605Smrg
848b8605Smrg  dst.z = max(src0.z, src1.z)
848b8605Smrg
848b8605Smrg  dst.w = max(src0.w, src1.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: IMIN - Minimum of Signed Integers
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = min(src0.x, src1.x)
848b8605Smrg
848b8605Smrg  dst.y = min(src0.y, src1.y)
848b8605Smrg
848b8605Smrg  dst.z = min(src0.z, src1.z)
848b8605Smrg
848b8605Smrg  dst.w = min(src0.w, src1.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UMIN - Minimum of Unsigned Integers
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = min(src0.x, src1.x)
848b8605Smrg
848b8605Smrg  dst.y = min(src0.y, src1.y)
848b8605Smrg
848b8605Smrg  dst.z = min(src0.z, src1.z)
848b8605Smrg
848b8605Smrg  dst.w = min(src0.w, src1.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SHL - Shift Left
848b8605Smrg
848b8605Smrg   The shift count is masked with 0x1f before the shift is applied.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x << (0x1f \& src1.x)
848b8605Smrg
848b8605Smrg  dst.y = src0.y << (0x1f \& src1.y)
848b8605Smrg
848b8605Smrg  dst.z = src0.z << (0x1f \& src1.z)
848b8605Smrg
848b8605Smrg  dst.w = src0.w << (0x1f \& src1.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
848b8605Smrg
848b8605Smrg   The shift count is masked with 0x1f before the shift is applied.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x >> (0x1f \& src1.x)
848b8605Smrg
848b8605Smrg  dst.y = src0.y >> (0x1f \& src1.y)
848b8605Smrg
848b8605Smrg  dst.z = src0.z >> (0x1f \& src1.z)
848b8605Smrg
848b8605Smrg  dst.w = src0.w >> (0x1f \& src1.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: USHR - Logical Shift Right
848b8605Smrg
848b8605Smrg   The shift count is masked with 0x1f before the shift is applied.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x >> (unsigned) (0x1f \& src1.x)
848b8605Smrg
848b8605Smrg  dst.y = src0.y >> (unsigned) (0x1f \& src1.y)
848b8605Smrg
848b8605Smrg  dst.z = src0.z >> (unsigned) (0x1f \& src1.z)
848b8605Smrg
848b8605Smrg  dst.w = src0.w >> (unsigned) (0x1f \& src1.w)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UCMP - Integer Conditional Move
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = src0.x ? src1.x : src2.x
848b8605Smrg
848b8605Smrg  dst.y = src0.y ? src1.y : src2.y
848b8605Smrg
848b8605Smrg  dst.z = src0.z ? src1.z : src2.z
848b8605Smrg
848b8605Smrg  dst.w = src0.w ? src1.w : src2.w
848b8605Smrg
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ISSG - Integer Set Sign
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: FSLT - Float Set On Less Than (ordered)
848b8605Smrg
848b8605Smrg   Same comparison as SLT but returns integer instead of 1.0/0.0 float
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x < src1.x) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y < src1.y) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z < src1.z) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w < src1.w) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ISLT - Signed Integer Set On Less Than
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x < src1.x) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y < src1.y) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z < src1.z) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w < src1.w) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: USLT - Unsigned Integer Set On Less Than
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x < src1.x) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y < src1.y) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z < src1.z) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w < src1.w) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
848b8605Smrg
848b8605Smrg   Same comparison as SGE but returns integer instead of 1.0/0.0 float
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x >= src1.x) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y >= src1.y) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z >= src1.z) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w >= src1.w) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ISGE - Signed Integer Set On Greater Equal Than
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x >= src1.x) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y >= src1.y) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z >= src1.z) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w >= src1.w) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x >= src1.x) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y >= src1.y) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z >= src1.z) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w >= src1.w) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: FSEQ - Float Set On Equal (ordered)
848b8605Smrg
848b8605Smrg   Same comparison as SEQ but returns integer instead of 1.0/0.0 float
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x == src1.x) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y == src1.y) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z == src1.z) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w == src1.w) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: USEQ - Integer Set On Equal
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x == src1.x) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y == src1.y) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z == src1.z) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w == src1.w) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: FSNE - Float Set On Not Equal (unordered)
848b8605Smrg
848b8605Smrg   Same comparison as SNE but returns integer instead of 1.0/0.0 float
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x != src1.x) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y != src1.y) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z != src1.z) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w != src1.w) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: USNE - Integer Set On Not Equal
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = (src0.x != src1.x) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.y = (src0.y != src1.y) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.z = (src0.z != src1.z) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg  dst.w = (src0.w != src1.w) ? \sim 0 : 0
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: INEG - Integer Negate
848b8605Smrg
848b8605Smrg  Two's complement.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = -src.x
848b8605Smrg
848b8605Smrg  dst.y = -src.y
848b8605Smrg
848b8605Smrg  dst.z = -src.z
848b8605Smrg
848b8605Smrg  dst.w = -src.w
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: IABS - Integer Absolute Value
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.x = |src.x|
848b8605Smrg
848b8605Smrg  dst.y = |src.y|
848b8605Smrg
848b8605Smrg  dst.z = |src.z|
848b8605Smrg
848b8605Smrg  dst.w = |src.w|
848b8605Smrg
848b8605SmrgBitwise ISA
848b8605Smrg^^^^^^^^^^^
848b8605SmrgThese opcodes are used for bit-level manipulation of integers.
848b8605Smrg
848b8605Smrg.. opcode:: IBFE - Signed Bitfield Extract
848b8605Smrg
b8e80941Smrg  Like GLSL bitfieldExtract. Extracts a set of bits from the input, and
b8e80941Smrg  sign-extends them if the high bit of the extracted window is set.
848b8605Smrg
848b8605Smrg  Pseudocode::
848b8605Smrg
848b8605Smrg    def ibfe(value, offset, bits):
b8e80941Smrg      if offset < 0 or bits < 0 or offset + bits > 32:
b8e80941Smrg        return undefined
848b8605Smrg      if bits == 0: return 0
848b8605Smrg      # Note: >> sign-extends
b8e80941Smrg      return (value << (32 - offset - bits)) >> (32 - bits)
848b8605Smrg
848b8605Smrg.. opcode:: UBFE - Unsigned Bitfield Extract
848b8605Smrg
b8e80941Smrg  Like GLSL bitfieldExtract. Extracts a set of bits from the input, without
b8e80941Smrg  any sign-extension.
848b8605Smrg
848b8605Smrg  Pseudocode::
848b8605Smrg
848b8605Smrg    def ubfe(value, offset, bits):
b8e80941Smrg      if offset < 0 or bits < 0 or offset + bits > 32:
b8e80941Smrg        return undefined
848b8605Smrg      if bits == 0: return 0
848b8605Smrg      # Note: >> does not sign-extend
b8e80941Smrg      return (value << (32 - offset - bits)) >> (32 - bits)
848b8605Smrg
848b8605Smrg.. opcode:: BFI - Bitfield Insert
848b8605Smrg
b8e80941Smrg  Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits
b8e80941Smrg  of 'insert'.
848b8605Smrg
848b8605Smrg  Pseudocode::
848b8605Smrg
848b8605Smrg    def bfi(base, insert, offset, bits):
b8e80941Smrg      if offset < 0 or bits < 0 or offset + bits > 32:
b8e80941Smrg        return undefined
b8e80941Smrg      # << defined such that mask == ~0 when bits == 32, offset == 0
848b8605Smrg      mask = ((1 << bits) - 1) << offset
848b8605Smrg      return ((insert << offset) & mask) | (base & ~mask)
848b8605Smrg
848b8605Smrg.. opcode:: BREV - Bitfield Reverse
848b8605Smrg
848b8605Smrg  See SM5 instruction BFREV. Reverses the bits of the argument.
848b8605Smrg
848b8605Smrg.. opcode:: POPC - Population Count
848b8605Smrg
848b8605Smrg  See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
848b8605Smrg
848b8605Smrg.. opcode:: LSB - Index of lowest set bit
848b8605Smrg
848b8605Smrg  See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
848b8605Smrg  bit of the argument. Returns -1 if none are set.
848b8605Smrg
848b8605Smrg.. opcode:: IMSB - Index of highest non-sign bit
848b8605Smrg
848b8605Smrg  See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
848b8605Smrg  non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
848b8605Smrg  highest 1 bit for positive numbers). Returns -1 if all bits are the same
848b8605Smrg  (i.e. for inputs 0 and -1).
848b8605Smrg
848b8605Smrg.. opcode:: UMSB - Index of highest set bit
848b8605Smrg
848b8605Smrg  See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
848b8605Smrg  set bit of the argument. Returns -1 if none are set.
848b8605Smrg
848b8605SmrgGeometry ISA
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgThese opcodes are only supported in geometry shaders; they have no meaning
848b8605Smrgin any other type of shader.
848b8605Smrg
848b8605Smrg.. opcode:: EMIT - Emit
848b8605Smrg
848b8605Smrg  Generate a new vertex for the current primitive into the specified vertex
848b8605Smrg  stream using the values in the output registers.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ENDPRIM - End Primitive
848b8605Smrg
848b8605Smrg  Complete the current primitive in the specified vertex stream (consisting of
848b8605Smrg  the emitted vertices), and start a new one.
848b8605Smrg
848b8605Smrg
848b8605SmrgGLSL ISA
848b8605Smrg^^^^^^^^^^
848b8605Smrg
848b8605SmrgThese opcodes are part of :term:`GLSL`'s opcode set. Support for these
848b8605Smrgopcodes is determined by a special capability bit, ``GLSL``.
b8e80941SmrgSome require glsl version 1.30 (UIF/SWITCH/CASE/DEFAULT/ENDSWITCH).
848b8605Smrg
848b8605Smrg.. opcode:: CAL - Subroutine Call
848b8605Smrg
848b8605Smrg  push(pc)
848b8605Smrg  pc = target
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: RET - Subroutine Call Return
848b8605Smrg
848b8605Smrg  pc = pop()
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: CONT - Continue
848b8605Smrg
848b8605Smrg  Unconditionally moves the point of execution to the instruction after the
848b8605Smrg  last bgnloop. The instruction must appear within a bgnloop/endloop.
848b8605Smrg
848b8605Smrg.. note::
848b8605Smrg
848b8605Smrg   Support for CONT is determined by a special capability bit,
848b8605Smrg   ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: BGNLOOP - Begin a Loop
848b8605Smrg
848b8605Smrg  Start a loop. Must have a matching endloop.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: BGNSUB - Begin Subroutine
848b8605Smrg
848b8605Smrg  Starts definition of a subroutine. Must have a matching endsub.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ENDLOOP - End a Loop
848b8605Smrg
848b8605Smrg  End a loop started with bgnloop.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ENDSUB - End Subroutine
848b8605Smrg
848b8605Smrg  Ends definition of a subroutine.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: NOP - No Operation
848b8605Smrg
848b8605Smrg  Do nothing.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: BRK - Break
848b8605Smrg
848b8605Smrg  Unconditionally moves the point of execution to the instruction after the
848b8605Smrg  next endloop or endswitch. The instruction must appear within a loop/endloop
848b8605Smrg  or switch/endswitch.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: IF - Float If
848b8605Smrg
848b8605Smrg  Start an IF ... ELSE .. ENDIF block.  Condition evaluates to true if
848b8605Smrg
848b8605Smrg    src0.x != 0.0
848b8605Smrg
848b8605Smrg  where src0.x is interpreted as a floating point register.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: UIF - Bitwise If
848b8605Smrg
848b8605Smrg  Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
848b8605Smrg
848b8605Smrg    src0.x != 0
848b8605Smrg
848b8605Smrg  where src0.x is interpreted as an integer register.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ELSE - Else
848b8605Smrg
848b8605Smrg  Starts an else block, after an IF or UIF statement.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ENDIF - End If
848b8605Smrg
848b8605Smrg  Ends an IF or UIF block.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SWITCH - Switch
848b8605Smrg
848b8605Smrg   Starts a C-style switch expression. The switch consists of one or multiple
848b8605Smrg   CASE statements, and at most one DEFAULT statement. Execution of a statement
848b8605Smrg   ends when a BRK is hit, but just like in C falling through to other cases
848b8605Smrg   without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
848b8605Smrg   just as last statement, and fallthrough is allowed into/from it.
848b8605Smrg   CASE src arguments are evaluated at bit level against the SWITCH src argument.
848b8605Smrg
848b8605Smrg   Example::
848b8605Smrg
848b8605Smrg     SWITCH src[0].x
848b8605Smrg     CASE src[0].x
848b8605Smrg     (some instructions here)
848b8605Smrg     (optional BRK here)
848b8605Smrg     DEFAULT
848b8605Smrg     (some instructions here)
848b8605Smrg     (optional BRK here)
848b8605Smrg     CASE src[0].x
848b8605Smrg     (some instructions here)
848b8605Smrg     (optional BRK here)
848b8605Smrg     ENDSWITCH
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: CASE - Switch case
848b8605Smrg
848b8605Smrg   This represents a switch case label. The src arg must be an integer immediate.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: DEFAULT - Switch default
848b8605Smrg
848b8605Smrg   This represents the default case in the switch, which is taken if no other
848b8605Smrg   case matches.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ENDSWITCH - End of switch
848b8605Smrg
848b8605Smrg   Ends a switch expression.
848b8605Smrg
848b8605Smrg
848b8605SmrgInterpolation ISA
848b8605Smrg^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgThe interpolation instructions allow an input to be interpolated in a
848b8605Smrgdifferent way than its declaration. This corresponds to the GLSL 4.00
848b8605SmrginterpolateAt* functions. The first argument of each of these must come from
848b8605Smrg``TGSI_FILE_INPUT``.
848b8605Smrg
848b8605Smrg.. opcode:: INTERP_CENTROID - Interpolate at the centroid
848b8605Smrg
848b8605Smrg   Interpolates the varying specified by src0 at the centroid
848b8605Smrg
848b8605Smrg.. opcode:: INTERP_SAMPLE - Interpolate at the specified sample
848b8605Smrg
848b8605Smrg   Interpolates the varying specified by src0 at the sample id specified by
848b8605Smrg   src1.x (interpreted as an integer)
848b8605Smrg
848b8605Smrg.. opcode:: INTERP_OFFSET - Interpolate at the specified offset
848b8605Smrg
848b8605Smrg   Interpolates the varying specified by src0 at the offset src1.xy from the
848b8605Smrg   pixel center (interpreted as floats)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. _doubleopcodes:
848b8605Smrg
848b8605SmrgDouble ISA
848b8605Smrg^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgThe double-precision opcodes reinterpret four-component vectors into
848b8605Smrgtwo-component vectors with doubled precision in each component.
848b8605Smrg
b8e80941Smrg.. opcode:: DABS - Absolute
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = |src0.xy|
b8e80941Smrg
b8e80941Smrg  dst.zw = |src0.zw|
848b8605Smrg
848b8605Smrg.. opcode:: DADD - Add
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.xy = src0.xy + src1.xy
848b8605Smrg
848b8605Smrg  dst.zw = src0.zw + src1.zw
848b8605Smrg
b8e80941Smrg.. opcode:: DSEQ - Set on Equal
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = src0.xy == src1.xy ? \sim 0 : 0
848b8605Smrg
b8e80941Smrg  dst.z = src0.zw == src1.zw ? \sim 0 : 0
848b8605Smrg
b8e80941Smrg.. opcode:: DSNE - Set on Not Equal
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = src0.xy != src1.xy ? \sim 0 : 0
848b8605Smrg
b8e80941Smrg  dst.z = src0.zw != src1.zw ? \sim 0 : 0
848b8605Smrg
848b8605Smrg.. opcode:: DSLT - Set on Less than
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.x = src0.xy < src1.xy ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg  dst.z = src0.zw < src1.zw ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg.. opcode:: DSGE - Set on Greater equal
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.x = src0.xy >= src1.xy ? \sim 0 : 0
848b8605Smrg
b8e80941Smrg  dst.z = src0.zw >= src1.zw ? \sim 0 : 0
848b8605Smrg
848b8605Smrg.. opcode:: DFRAC - Fraction
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.xy = src.xy - \lfloor src.xy\rfloor
848b8605Smrg
848b8605Smrg  dst.zw = src.zw - \lfloor src.zw\rfloor
848b8605Smrg
b8e80941Smrg.. opcode:: DTRUNC - Truncate
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = trunc(src.xy)
b8e80941Smrg
b8e80941Smrg  dst.zw = trunc(src.zw)
b8e80941Smrg
b8e80941Smrg.. opcode:: DCEIL - Ceiling
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = \lceil src.xy\rceil
b8e80941Smrg
b8e80941Smrg  dst.zw = \lceil src.zw\rceil
b8e80941Smrg
b8e80941Smrg.. opcode:: DFLR - Floor
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = \lfloor src.xy\rfloor
b8e80941Smrg
b8e80941Smrg  dst.zw = \lfloor src.zw\rfloor
b8e80941Smrg
b8e80941Smrg.. opcode:: DROUND - Fraction
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = round(src.xy)
b8e80941Smrg
b8e80941Smrg  dst.zw = round(src.zw)
b8e80941Smrg
b8e80941Smrg.. opcode:: DSSG - Set Sign
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0
b8e80941Smrg
b8e80941Smrg  dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0
848b8605Smrg
848b8605Smrg.. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
848b8605Smrg
848b8605SmrgLike the ``frexp()`` routine in many math libraries, this opcode stores the
848b8605Smrgexponent of its source to ``dst0``, and the significand to ``dst1``, such that
b8e80941Smrg:math:`dst1 \times 2^{dst0} = src` . The results are replicated across
b8e80941Smrgchannels.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst0.xy = dst.zw = frac(src.xy)
848b8605Smrg
b8e80941Smrg  dst1 = frac(src.xy)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: DLDEXP - Multiply Number by Integral Power of 2
848b8605Smrg
b8e80941SmrgThis opcode is the inverse of :opcode:`DFRACEXP`. The second
b8e80941Smrgsource is an integer.
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst.xy = src0.xy \times 2^{src1.x}
848b8605Smrg
b8e80941Smrg  dst.zw = src0.zw \times 2^{src1.z}
848b8605Smrg
848b8605Smrg.. opcode:: DMIN - Minimum
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.xy = min(src0.xy, src1.xy)
848b8605Smrg
848b8605Smrg  dst.zw = min(src0.zw, src1.zw)
848b8605Smrg
848b8605Smrg.. opcode:: DMAX - Maximum
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.xy = max(src0.xy, src1.xy)
848b8605Smrg
848b8605Smrg  dst.zw = max(src0.zw, src1.zw)
848b8605Smrg
848b8605Smrg.. opcode:: DMUL - Multiply
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.xy = src0.xy \times src1.xy
848b8605Smrg
848b8605Smrg  dst.zw = src0.zw \times src1.zw
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: DMAD - Multiply And Add
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg  dst.xy = src0.xy \times src1.xy + src2.xy
848b8605Smrg
848b8605Smrg  dst.zw = src0.zw \times src1.zw + src2.zw
848b8605Smrg
848b8605Smrg
b8e80941Smrg.. opcode:: DFMA - Fused Multiply-Add
b8e80941Smrg
b8e80941SmrgPerform a * b + c with no intermediate rounding step.
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = src0.xy \times src1.xy + src2.xy
b8e80941Smrg
b8e80941Smrg  dst.zw = src0.zw \times src1.zw + src2.zw
b8e80941Smrg
b8e80941Smrg
b8e80941Smrg.. opcode:: DDIV - Divide
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = \frac{src0.xy}{src1.xy}
b8e80941Smrg
b8e80941Smrg  dst.zw = \frac{src0.zw}{src1.zw}
b8e80941Smrg
b8e80941Smrg
848b8605Smrg.. opcode:: DRCP - Reciprocal
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg   dst.xy = \frac{1}{src.xy}
848b8605Smrg
848b8605Smrg   dst.zw = \frac{1}{src.zw}
848b8605Smrg
848b8605Smrg.. opcode:: DSQRT - Square Root
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
848b8605Smrg   dst.xy = \sqrt{src.xy}
848b8605Smrg
848b8605Smrg   dst.zw = \sqrt{src.zw}
848b8605Smrg
b8e80941Smrg.. opcode:: DRSQ - Reciprocal Square Root
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = \frac{1}{\sqrt{src.xy}}
b8e80941Smrg
b8e80941Smrg   dst.zw = \frac{1}{\sqrt{src.zw}}
b8e80941Smrg
b8e80941Smrg.. opcode:: F2D - Float to Double
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = double(src0.x)
b8e80941Smrg
b8e80941Smrg   dst.zw = double(src0.y)
b8e80941Smrg
b8e80941Smrg.. opcode:: D2F - Double to Float
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.x = float(src0.xy)
b8e80941Smrg
b8e80941Smrg   dst.y = float(src0.zw)
b8e80941Smrg
b8e80941Smrg.. opcode:: I2D - Int to Double
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = double(src0.x)
b8e80941Smrg
b8e80941Smrg   dst.zw = double(src0.y)
b8e80941Smrg
b8e80941Smrg.. opcode:: D2I - Double to Int
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.x = int(src0.xy)
b8e80941Smrg
b8e80941Smrg   dst.y = int(src0.zw)
b8e80941Smrg
b8e80941Smrg.. opcode:: U2D - Unsigned Int to Double
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = double(src0.x)
b8e80941Smrg
b8e80941Smrg   dst.zw = double(src0.y)
b8e80941Smrg
b8e80941Smrg.. opcode:: D2U - Double to Unsigned Int
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.x = unsigned(src0.xy)
b8e80941Smrg
b8e80941Smrg   dst.y = unsigned(src0.zw)
b8e80941Smrg
b8e80941Smrg64-bit Integer ISA
b8e80941Smrg^^^^^^^^^^^^^^^^^^
b8e80941Smrg
b8e80941SmrgThe 64-bit integer opcodes reinterpret four-component vectors into
b8e80941Smrgtwo-component vectors with 64-bits in each component.
b8e80941Smrg
b8e80941Smrg.. opcode:: I64ABS - 64-bit Integer Absolute Value
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = |src0.xy|
b8e80941Smrg
b8e80941Smrg  dst.zw = |src0.zw|
b8e80941Smrg
b8e80941Smrg.. opcode:: I64NEG - 64-bit Integer Negate
b8e80941Smrg
b8e80941Smrg  Two's complement.
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = -src.xy
b8e80941Smrg
b8e80941Smrg  dst.zw = -src.zw
b8e80941Smrg
b8e80941Smrg.. opcode:: I64SSG - 64-bit Integer Set Sign
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0
b8e80941Smrg
b8e80941Smrg  dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0
b8e80941Smrg
b8e80941Smrg.. opcode:: U64ADD - 64-bit Integer Add
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = src0.xy + src1.xy
b8e80941Smrg
b8e80941Smrg  dst.zw = src0.zw + src1.zw
b8e80941Smrg
b8e80941Smrg.. opcode:: U64MUL - 64-bit Integer Multiply
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = src0.xy * src1.xy
b8e80941Smrg
b8e80941Smrg  dst.zw = src0.zw * src1.zw
b8e80941Smrg
b8e80941Smrg.. opcode:: U64SEQ - 64-bit Integer Set on Equal
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.x = src0.xy == src1.xy ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg  dst.z = src0.zw == src1.zw ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg.. opcode:: U64SNE - 64-bit Integer Set on Not Equal
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.x = src0.xy != src1.xy ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg  dst.z = src0.zw != src1.zw ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg.. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.x = src0.xy < src1.xy ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg  dst.z = src0.zw < src1.zw ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg.. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.x = src0.xy >= src1.xy ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg  dst.z = src0.zw >= src1.zw ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg.. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.x = src0.xy < src1.xy ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg  dst.z = src0.zw < src1.zw ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg.. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.x = src0.xy >= src1.xy ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg  dst.z = src0.zw >= src1.zw ? \sim 0 : 0
b8e80941Smrg
b8e80941Smrg.. opcode:: I64MIN - Minimum of 64-bit Signed Integers
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = min(src0.xy, src1.xy)
b8e80941Smrg
b8e80941Smrg  dst.zw = min(src0.zw, src1.zw)
b8e80941Smrg
b8e80941Smrg.. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = min(src0.xy, src1.xy)
b8e80941Smrg
b8e80941Smrg  dst.zw = min(src0.zw, src1.zw)
b8e80941Smrg
b8e80941Smrg.. opcode:: I64MAX - Maximum of 64-bit Signed Integers
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = max(src0.xy, src1.xy)
b8e80941Smrg
b8e80941Smrg  dst.zw = max(src0.zw, src1.zw)
b8e80941Smrg
b8e80941Smrg.. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = max(src0.xy, src1.xy)
b8e80941Smrg
b8e80941Smrg  dst.zw = max(src0.zw, src1.zw)
b8e80941Smrg
b8e80941Smrg.. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer
b8e80941Smrg
b8e80941Smrg   The shift count is masked with 0x3f before the shift is applied.
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = src0.xy << (0x3f \& src1.x)
b8e80941Smrg
b8e80941Smrg  dst.zw = src0.zw << (0x3f \& src1.y)
b8e80941Smrg
b8e80941Smrg.. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer)
b8e80941Smrg
b8e80941Smrg   The shift count is masked with 0x3f before the shift is applied.
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = src0.xy >> (0x3f \& src1.x)
b8e80941Smrg
b8e80941Smrg  dst.zw = src0.zw >> (0x3f \& src1.y)
b8e80941Smrg
b8e80941Smrg.. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer)
b8e80941Smrg
b8e80941Smrg   The shift count is masked with 0x3f before the shift is applied.
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = src0.xy >> (unsigned) (0x3f \& src1.x)
b8e80941Smrg
b8e80941Smrg  dst.zw = src0.zw >> (unsigned) (0x3f \& src1.y)
b8e80941Smrg
b8e80941Smrg.. opcode:: I64DIV - 64-bit Signed Integer Division
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = \frac{src0.xy}{src1.xy}
b8e80941Smrg
b8e80941Smrg  dst.zw = \frac{src0.zw}{src1.zw}
b8e80941Smrg
b8e80941Smrg.. opcode:: U64DIV - 64-bit Unsigned Integer Division
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = \frac{src0.xy}{src1.xy}
b8e80941Smrg
b8e80941Smrg  dst.zw = \frac{src0.zw}{src1.zw}
b8e80941Smrg
b8e80941Smrg.. opcode:: U64MOD - 64-bit Unsigned Integer Remainder
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = src0.xy \bmod src1.xy
b8e80941Smrg
b8e80941Smrg  dst.zw = src0.zw \bmod src1.zw
b8e80941Smrg
b8e80941Smrg.. opcode:: I64MOD - 64-bit Signed Integer Remainder
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst.xy = src0.xy \bmod src1.xy
b8e80941Smrg
b8e80941Smrg  dst.zw = src0.zw \bmod src1.zw
b8e80941Smrg
b8e80941Smrg.. opcode:: F2U64 - Float to 64-bit Unsigned Int
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = (uint64_t) src0.x
b8e80941Smrg
b8e80941Smrg   dst.zw = (uint64_t) src0.y
b8e80941Smrg
b8e80941Smrg.. opcode:: F2I64 - Float to 64-bit Int
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = (int64_t) src0.x
b8e80941Smrg
b8e80941Smrg   dst.zw = (int64_t) src0.y
b8e80941Smrg
b8e80941Smrg.. opcode:: U2I64 - Unsigned Integer to 64-bit Integer
b8e80941Smrg
b8e80941Smrg   This is a zero extension.
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = (int64_t) src0.x
b8e80941Smrg
b8e80941Smrg   dst.zw = (int64_t) src0.y
b8e80941Smrg
b8e80941Smrg.. opcode:: I2I64 - Signed Integer to 64-bit Integer
b8e80941Smrg
b8e80941Smrg   This is a sign extension.
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = (int64_t) src0.x
b8e80941Smrg
b8e80941Smrg   dst.zw = (int64_t) src0.y
b8e80941Smrg
b8e80941Smrg.. opcode:: D2U64 - Double to 64-bit Unsigned Int
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = (uint64_t) src0.xy
b8e80941Smrg
b8e80941Smrg   dst.zw = (uint64_t) src0.zw
b8e80941Smrg
b8e80941Smrg.. opcode:: D2I64 - Double to 64-bit Int
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = (int64_t) src0.xy
b8e80941Smrg
b8e80941Smrg   dst.zw = (int64_t) src0.zw
b8e80941Smrg
b8e80941Smrg.. opcode:: U642F - 64-bit unsigned integer to float
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.x = (float) src0.xy
b8e80941Smrg
b8e80941Smrg   dst.y = (float) src0.zw
b8e80941Smrg
b8e80941Smrg.. opcode:: I642F - 64-bit Int to Float
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.x = (float) src0.xy
b8e80941Smrg
b8e80941Smrg   dst.y = (float) src0.zw
b8e80941Smrg
b8e80941Smrg.. opcode:: U642D - 64-bit unsigned integer to double
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = (double) src0.xy
b8e80941Smrg
b8e80941Smrg   dst.zw = (double) src0.zw
b8e80941Smrg
b8e80941Smrg.. opcode:: I642D - 64-bit Int to double
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg   dst.xy = (double) src0.xy
b8e80941Smrg
b8e80941Smrg   dst.zw = (double) src0.zw
848b8605Smrg
848b8605Smrg.. _samplingopcodes:
848b8605Smrg
848b8605SmrgResource Sampling Opcodes
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgThose opcodes follow very closely semantics of the respective Direct3D
848b8605Smrginstructions. If in doubt double check Direct3D documentation.
848b8605SmrgNote that the swizzle on SVIEW (src1) determines texel swizzling
848b8605Smrgafter lookup.
848b8605Smrg
848b8605Smrg.. opcode:: SAMPLE
848b8605Smrg
848b8605Smrg  Using provided address, sample data from the specified texture using the
b8e80941Smrg  filtering mode identified by the given sampler. The source data may come from
848b8605Smrg  any resource type other than buffers.
848b8605Smrg
848b8605Smrg  Syntax: ``SAMPLE dst, address, sampler_view, sampler``
848b8605Smrg
848b8605Smrg  Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
848b8605Smrg
848b8605Smrg.. opcode:: SAMPLE_I
848b8605Smrg
848b8605Smrg  Simplified alternative to the SAMPLE instruction.  Using the provided
848b8605Smrg  integer address, SAMPLE_I fetches data from the specified sampler view
848b8605Smrg  without any filtering.  The source data may come from any resource type
848b8605Smrg  other than CUBE.
848b8605Smrg
848b8605Smrg  Syntax: ``SAMPLE_I dst, address, sampler_view``
848b8605Smrg
848b8605Smrg  Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
848b8605Smrg
848b8605Smrg  The 'address' is specified as unsigned integers. If the 'address' is out of
848b8605Smrg  range [0...(# texels - 1)] the result of the fetch is always 0 in all
848b8605Smrg  components.  As such the instruction doesn't honor address wrap modes, in
848b8605Smrg  cases where that behavior is desirable 'SAMPLE' instruction should be used.
848b8605Smrg  address.w always provides an unsigned integer mipmap level. If the value is
848b8605Smrg  out of the range then the instruction always returns 0 in all components.
848b8605Smrg  address.yz are ignored for buffers and 1d textures.  address.z is ignored
848b8605Smrg  for 1d texture arrays and 2d textures.
848b8605Smrg
848b8605Smrg  For 1D texture arrays address.y provides the array index (also as unsigned
848b8605Smrg  integer). If the value is out of the range of available array indices
848b8605Smrg  [0... (array size - 1)] then the opcode always returns 0 in all components.
848b8605Smrg  For 2D texture arrays address.z provides the array index, otherwise it
848b8605Smrg  exhibits the same behavior as in the case for 1D texture arrays.  The exact
848b8605Smrg  semantics of the source address are presented in the table below:
848b8605Smrg
848b8605Smrg  +---------------------------+----+-----+-----+---------+
848b8605Smrg  | resource type             | X  |  Y  |  Z  |    W    |
848b8605Smrg  +===========================+====+=====+=====+=========+
848b8605Smrg  | ``PIPE_BUFFER``           | x  |     |     | ignored |
848b8605Smrg  +---------------------------+----+-----+-----+---------+
848b8605Smrg  | ``PIPE_TEXTURE_1D``       | x  |     |     |   mpl   |
848b8605Smrg  +---------------------------+----+-----+-----+---------+
848b8605Smrg  | ``PIPE_TEXTURE_2D``       | x  |  y  |     |   mpl   |
848b8605Smrg  +---------------------------+----+-----+-----+---------+
848b8605Smrg  | ``PIPE_TEXTURE_3D``       | x  |  y  |  z  |   mpl   |
848b8605Smrg  +---------------------------+----+-----+-----+---------+
848b8605Smrg  | ``PIPE_TEXTURE_RECT``     | x  |  y  |     |   mpl   |
848b8605Smrg  +---------------------------+----+-----+-----+---------+
848b8605Smrg  | ``PIPE_TEXTURE_CUBE``     | not allowed as source    |
848b8605Smrg  +---------------------------+----+-----+-----+---------+
848b8605Smrg  | ``PIPE_TEXTURE_1D_ARRAY`` | x  | idx |     |   mpl   |
848b8605Smrg  +---------------------------+----+-----+-----+---------+
848b8605Smrg  | ``PIPE_TEXTURE_2D_ARRAY`` | x  |  y  | idx |   mpl   |
848b8605Smrg  +---------------------------+----+-----+-----+---------+
848b8605Smrg
848b8605Smrg  Where 'mpl' is a mipmap level and 'idx' is the array index.
848b8605Smrg
848b8605Smrg.. opcode:: SAMPLE_I_MS
848b8605Smrg
848b8605Smrg  Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
848b8605Smrg
848b8605Smrg  Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
848b8605Smrg
848b8605Smrg.. opcode:: SAMPLE_B
848b8605Smrg
848b8605Smrg  Just like the SAMPLE instruction with the exception that an additional bias
848b8605Smrg  is applied to the level of detail computed as part of the instruction
848b8605Smrg  execution.
848b8605Smrg
848b8605Smrg  Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
848b8605Smrg
848b8605Smrg  Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
848b8605Smrg
848b8605Smrg.. opcode:: SAMPLE_C
848b8605Smrg
848b8605Smrg  Similar to the SAMPLE instruction but it performs a comparison filter. The
848b8605Smrg  operands to SAMPLE_C are identical to SAMPLE, except that there is an
848b8605Smrg  additional float32 operand, reference value, which must be a register with
848b8605Smrg  single-component, or a scalar literal.  SAMPLE_C makes the hardware use the
848b8605Smrg  current samplers compare_func (in pipe_sampler_state) to compare reference
848b8605Smrg  value against the red component value for the surce resource at each texel
848b8605Smrg  that the currently configured texture filter covers based on the provided
848b8605Smrg  coordinates.
848b8605Smrg
848b8605Smrg  Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
848b8605Smrg
848b8605Smrg  Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
848b8605Smrg
848b8605Smrg.. opcode:: SAMPLE_C_LZ
848b8605Smrg
848b8605Smrg  Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
848b8605Smrg  for level-zero.
848b8605Smrg
848b8605Smrg  Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
848b8605Smrg
848b8605Smrg  Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SAMPLE_D
848b8605Smrg
848b8605Smrg  SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
848b8605Smrg  the source address in the x direction and the y direction are provided by
848b8605Smrg  extra parameters.
848b8605Smrg
848b8605Smrg  Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
848b8605Smrg
848b8605Smrg  Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
848b8605Smrg
848b8605Smrg.. opcode:: SAMPLE_L
848b8605Smrg
848b8605Smrg  SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
848b8605Smrg  directly as a scalar value, representing no anisotropy.
848b8605Smrg
848b8605Smrg  Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
848b8605Smrg
848b8605Smrg  Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
848b8605Smrg
848b8605Smrg.. opcode:: GATHER4
848b8605Smrg
848b8605Smrg  Gathers the four texels to be used in a bi-linear filtering operation and
848b8605Smrg  packs them into a single register.  Only works with 2D, 2D array, cubemaps,
848b8605Smrg  and cubemaps arrays.  For 2D textures, only the addressing modes of the
848b8605Smrg  sampler and the top level of any mip pyramid are used. Set W to zero.  It
848b8605Smrg  behaves like the SAMPLE instruction, but a filtered sample is not
848b8605Smrg  generated. The four samples that contribute to filtering are placed into
848b8605Smrg  xyzw in counter-clockwise order, starting with the (u,v) texture coordinate
848b8605Smrg  delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
848b8605Smrg  magnitude of the deltas are half a texel.
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: SVIEWINFO
848b8605Smrg
848b8605Smrg  Query the dimensions of a given sampler view.  dst receives width, height,
848b8605Smrg  depth or array size and number of mipmap levels as int4. The dst can have a
848b8605Smrg  writemask which will specify what info is the caller interested in.
848b8605Smrg
848b8605Smrg  Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
848b8605Smrg
848b8605Smrg  Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
848b8605Smrg
848b8605Smrg  src_mip_level is an unsigned integer scalar. If it's out of range then
848b8605Smrg  returns 0 for width, height and depth/array size but the total number of
848b8605Smrg  mipmap is still returned correctly for the given sampler view.  The returned
848b8605Smrg  width, height and depth values are for the mipmap level selected by the
848b8605Smrg  src_mip_level and are in the number of texels.  For 1d texture array width
848b8605Smrg  is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
848b8605Smrg  still in dst.w.  In contrast to d3d10 resinfo, there's no way in the tgsi
848b8605Smrg  instruction encoding to specify the return type (float/rcpfloat/uint), hence
848b8605Smrg  always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
848b8605Smrg  resinfo allowing swizzling dst values is ignored (due to the interaction
848b8605Smrg  with rcpfloat modifier which requires some swizzle handling in the state
848b8605Smrg  tracker anyway).
848b8605Smrg
848b8605Smrg.. opcode:: SAMPLE_POS
848b8605Smrg
b8e80941Smrg  Query the position of a sample in the given resource or render target
b8e80941Smrg  when per-sample fragment shading is in effect.
b8e80941Smrg
b8e80941Smrg  Syntax: ``SAMPLE_POS dst, source, sample_index``
b8e80941Smrg
b8e80941Smrg  dst receives float4 (x, y, undef, undef) indicated where the sample is
b8e80941Smrg  located. Sample locations are in the range [0, 1] where 0.5 is the center
b8e80941Smrg  of the fragment.
b8e80941Smrg
b8e80941Smrg  source is either a sampler view (to indicate a shader resource) or temp
b8e80941Smrg  register (to indicate the render target).  The source register may have
b8e80941Smrg  an optional swizzle to apply to the returned result
b8e80941Smrg
b8e80941Smrg  sample_index is an integer scalar indicating which sample position is to
b8e80941Smrg  be queried.
b8e80941Smrg
b8e80941Smrg  If per-sample shading is not in effect or the source resource or render
b8e80941Smrg  target is not multisampled, the result is (0.5, 0.5, undef, undef).
b8e80941Smrg
b8e80941Smrg  NOTE: no driver has implemented this opcode yet (and no state tracker
b8e80941Smrg  emits it).  This information is subject to change.
848b8605Smrg
848b8605Smrg.. opcode:: SAMPLE_INFO
848b8605Smrg
b8e80941Smrg  Query the number of samples in a multisampled resource or render target.
b8e80941Smrg
b8e80941Smrg  Syntax: ``SAMPLE_INFO dst, source``
b8e80941Smrg
b8e80941Smrg  dst receives int4 (n, 0, 0, 0) where n is the number of samples in a
b8e80941Smrg  resource or the render target.
b8e80941Smrg
b8e80941Smrg  source is either a sampler view (to indicate a shader resource) or temp
b8e80941Smrg  register (to indicate the render target).  The source register may have
b8e80941Smrg  an optional swizzle to apply to the returned result
b8e80941Smrg
b8e80941Smrg  If per-sample shading is not in effect or the source resource or render
b8e80941Smrg  target is not multisampled, the result is (1, 0, 0, 0).
b8e80941Smrg
b8e80941Smrg  NOTE: no driver has implemented this opcode yet (and no state tracker
b8e80941Smrg  emits it).  This information is subject to change.
b8e80941Smrg
b8e80941Smrg.. opcode:: LOD - level of detail
b8e80941Smrg
b8e80941Smrg   Same syntax as the SAMPLE opcode but instead of performing an actual
b8e80941Smrg   texture lookup/filter, return the computed LOD information that the
b8e80941Smrg   texture pipe would use to access the texture. The Y component contains
b8e80941Smrg   the computed LOD lambda_prime. The X component contains the LOD that will
b8e80941Smrg   be accessed, based on min/max lod's and mipmap filters.
b8e80941Smrg   The Z and W components are set to 0.
b8e80941Smrg
b8e80941Smrg   Syntax: ``LOD dst, address, sampler_view, sampler``
848b8605Smrg
848b8605Smrg
848b8605Smrg.. _resourceopcodes:
848b8605Smrg
848b8605SmrgResource Access Opcodes
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
b8e80941SmrgFor these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY.
b8e80941Smrg
b8e80941Smrg.. opcode:: LOAD - Fetch data from a shader buffer or image
848b8605Smrg
848b8605Smrg               Syntax: ``LOAD dst, resource, address``
848b8605Smrg
b8e80941Smrg               Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]``
848b8605Smrg
848b8605Smrg               Using the provided integer address, LOAD fetches data
848b8605Smrg               from the specified buffer or texture without any
848b8605Smrg               filtering.
848b8605Smrg
848b8605Smrg               The 'address' is specified as a vector of unsigned
848b8605Smrg               integers.  If the 'address' is out of range the result
848b8605Smrg               is unspecified.
848b8605Smrg
848b8605Smrg               Only the first mipmap level of a resource can be read
848b8605Smrg               from using this instruction.
848b8605Smrg
848b8605Smrg               For 1D or 2D texture arrays, the array index is
848b8605Smrg               provided as an unsigned integer in address.y or
848b8605Smrg               address.z, respectively.  address.yz are ignored for
848b8605Smrg               buffers and 1D textures.  address.z is ignored for 1D
848b8605Smrg               texture arrays and 2D textures.  address.w is always
848b8605Smrg               ignored.
848b8605Smrg
b8e80941Smrg               A swizzle suffix may be added to the resource argument
b8e80941Smrg               this will cause the resource data to be swizzled accordingly.
b8e80941Smrg
848b8605Smrg.. opcode:: STORE - Write data to a shader resource
848b8605Smrg
848b8605Smrg               Syntax: ``STORE resource, address, src``
848b8605Smrg
b8e80941Smrg               Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]``
848b8605Smrg
848b8605Smrg               Using the provided integer address, STORE writes data
848b8605Smrg               to the specified buffer or texture.
848b8605Smrg
848b8605Smrg               The 'address' is specified as a vector of unsigned
848b8605Smrg               integers.  If the 'address' is out of range the result
848b8605Smrg               is unspecified.
848b8605Smrg
848b8605Smrg               Only the first mipmap level of a resource can be
848b8605Smrg               written to using this instruction.
848b8605Smrg
848b8605Smrg               For 1D or 2D texture arrays, the array index is
848b8605Smrg               provided as an unsigned integer in address.y or
848b8605Smrg               address.z, respectively.  address.yz are ignored for
848b8605Smrg               buffers and 1D textures.  address.z is ignored for 1D
848b8605Smrg               texture arrays and 2D textures.  address.w is always
848b8605Smrg               ignored.
848b8605Smrg
b8e80941Smrg.. opcode:: RESQ - Query information about a resource
848b8605Smrg
b8e80941Smrg  Syntax: ``RESQ dst, resource``
848b8605Smrg
b8e80941Smrg  Example: ``RESQ TEMP[0], BUFFER[0]``
848b8605Smrg
b8e80941Smrg  Returns information about the buffer or image resource. For buffer
b8e80941Smrg  resources, the size (in bytes) is returned in the x component. For
b8e80941Smrg  image resources, .xyz will contain the width/height/layers of the
b8e80941Smrg  image, while .w will contain the number of samples for multi-sampled
b8e80941Smrg  images.
b8e80941Smrg
b8e80941Smrg.. opcode:: FBFETCH - Load data from framebuffer
b8e80941Smrg
b8e80941Smrg  Syntax: ``FBFETCH dst, output``
b8e80941Smrg
b8e80941Smrg  Example: ``FBFETCH TEMP[0], OUT[0]``
848b8605Smrg
b8e80941Smrg  This is only valid on ``COLOR`` semantic outputs. Returns the color
b8e80941Smrg  of the current position in the framebuffer from before this fragment
b8e80941Smrg  shader invocation. May return the same value from multiple calls for
b8e80941Smrg  a particular output within a single invocation. Note that result may
b8e80941Smrg  be undefined if a fragment is drawn multiple times without a blend
b8e80941Smrg  barrier in between.
848b8605Smrg
848b8605Smrg
b8e80941Smrg.. _bindlessopcodes:
848b8605Smrg
b8e80941SmrgBindless Opcodes
b8e80941Smrg^^^^^^^^^^^^^^^^
848b8605Smrg
b8e80941SmrgThese opcodes are for working with bindless sampler or image handles and
b8e80941Smrgrequire PIPE_CAP_BINDLESS_TEXTURE.
848b8605Smrg
b8e80941Smrg.. opcode:: IMG2HND - Get a bindless handle for a image
848b8605Smrg
b8e80941Smrg  Syntax: ``IMG2HND dst, image``
848b8605Smrg
b8e80941Smrg  Example: ``IMG2HND TEMP[0], IMAGE[0]``
848b8605Smrg
b8e80941Smrg  Sets 'dst' to a bindless handle for 'image'.
848b8605Smrg
b8e80941Smrg.. opcode:: SAMP2HND - Get a bindless handle for a sampler
848b8605Smrg
b8e80941Smrg  Syntax: ``SAMP2HND dst, sampler``
848b8605Smrg
b8e80941Smrg  Example: ``SAMP2HND TEMP[0], SAMP[0]``
848b8605Smrg
b8e80941Smrg  Sets 'dst' to a bindless handle for 'sampler'.
848b8605Smrg
848b8605Smrg
b8e80941Smrg.. _threadsyncopcodes:
b8e80941Smrg
b8e80941SmrgInter-thread synchronization opcodes
b8e80941Smrg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
b8e80941Smrg
b8e80941SmrgThese opcodes are intended for communication between threads running
b8e80941Smrgwithin the same compute grid.  For now they're only valid in compute
b8e80941Smrgprograms.
848b8605Smrg
848b8605Smrg.. opcode:: BARRIER - Thread group barrier
848b8605Smrg
848b8605Smrg  ``BARRIER``
848b8605Smrg
848b8605Smrg  This opcode suspends the execution of the current thread until all
848b8605Smrg  the remaining threads in the working group reach the same point of
848b8605Smrg  the program.  Results are unspecified if any of the remaining
848b8605Smrg  threads terminates or never reaches an executed BARRIER instruction.
848b8605Smrg
b8e80941Smrg.. opcode:: MEMBAR - Memory barrier
b8e80941Smrg
b8e80941Smrg  ``MEMBAR type``
b8e80941Smrg
b8e80941Smrg  This opcode waits for the completion of all memory accesses based on
b8e80941Smrg  the type passed in. The type is an immediate bitfield with the following
b8e80941Smrg  meaning:
b8e80941Smrg
b8e80941Smrg  Bit 0: Shader storage buffers
b8e80941Smrg  Bit 1: Atomic buffers
b8e80941Smrg  Bit 2: Images
b8e80941Smrg  Bit 3: Shared memory
b8e80941Smrg  Bit 4: Thread group
b8e80941Smrg
b8e80941Smrg  These may be passed in in any combination. An implementation is free to not
b8e80941Smrg  distinguish between these as it sees fit. However these map to all the
b8e80941Smrg  possibilities made available by GLSL.
848b8605Smrg
848b8605Smrg.. _atomopcodes:
848b8605Smrg
848b8605SmrgAtomic opcodes
848b8605Smrg^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgThese opcodes provide atomic variants of some common arithmetic and
848b8605Smrglogical operations.  In this context atomicity means that another
848b8605Smrgconcurrent memory access operation that affects the same memory
848b8605Smrglocation is guaranteed to be performed strictly before or after the
b8e80941Smrgentire execution of the atomic operation. The resource may be a BUFFER,
b8e80941SmrgIMAGE, HWATOMIC, or MEMORY.  In the case of an image, the offset works
b8e80941Smrgthe same as for ``LOAD`` and ``STORE``, specified above. For atomic
b8e80941Smrgcounters, the offset is an immediate index to the base hw atomic
b8e80941Smrgcounter for this operation.
b8e80941SmrgThese atomic operations may only be used with 32-bit integer image formats.
848b8605Smrg
848b8605Smrg.. opcode:: ATOMUADD - Atomic integer addition
848b8605Smrg
848b8605Smrg  Syntax: ``ATOMUADD dst, resource, offset, src``
848b8605Smrg
b8e80941Smrg  Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
b8e80941Smrg
b8e80941Smrg  The following operation is performed atomically:
b8e80941Smrg
b8e80941Smrg.. math::
b8e80941Smrg
b8e80941Smrg  dst_x = resource[offset]
b8e80941Smrg
b8e80941Smrg  resource[offset] = dst_x + src_x
b8e80941Smrg
848b8605Smrg
b8e80941Smrg.. opcode:: ATOMFADD - Atomic floating point addition
b8e80941Smrg
b8e80941Smrg  Syntax: ``ATOMFADD dst, resource, offset, src``
b8e80941Smrg
b8e80941Smrg  Example: ``ATOMFADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
b8e80941Smrg
b8e80941Smrg  The following operation is performed atomically:
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst_x = resource[offset]
848b8605Smrg
b8e80941Smrg  resource[offset] = dst_x + src_x
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ATOMXCHG - Atomic exchange
848b8605Smrg
848b8605Smrg  Syntax: ``ATOMXCHG dst, resource, offset, src``
848b8605Smrg
b8e80941Smrg  Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
848b8605Smrg
b8e80941Smrg  The following operation is performed atomically:
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst_x = resource[offset]
848b8605Smrg
b8e80941Smrg  resource[offset] = src_x
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ATOMCAS - Atomic compare-and-exchange
848b8605Smrg
848b8605Smrg  Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
848b8605Smrg
b8e80941Smrg  Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]``
848b8605Smrg
b8e80941Smrg  The following operation is performed atomically:
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst_x = resource[offset]
848b8605Smrg
b8e80941Smrg  resource[offset] = (dst_x == cmp_x ? src_x : dst_x)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ATOMAND - Atomic bitwise And
848b8605Smrg
848b8605Smrg  Syntax: ``ATOMAND dst, resource, offset, src``
848b8605Smrg
b8e80941Smrg  Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
848b8605Smrg
b8e80941Smrg  The following operation is performed atomically:
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst_x = resource[offset]
848b8605Smrg
b8e80941Smrg  resource[offset] = dst_x \& src_x
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ATOMOR - Atomic bitwise Or
848b8605Smrg
848b8605Smrg  Syntax: ``ATOMOR dst, resource, offset, src``
848b8605Smrg
b8e80941Smrg  Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
848b8605Smrg
b8e80941Smrg  The following operation is performed atomically:
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst_x = resource[offset]
848b8605Smrg
b8e80941Smrg  resource[offset] = dst_x | src_x
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ATOMXOR - Atomic bitwise Xor
848b8605Smrg
848b8605Smrg  Syntax: ``ATOMXOR dst, resource, offset, src``
848b8605Smrg
b8e80941Smrg  Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
848b8605Smrg
b8e80941Smrg  The following operation is performed atomically:
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst_x = resource[offset]
848b8605Smrg
b8e80941Smrg  resource[offset] = dst_x \oplus src_x
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ATOMUMIN - Atomic unsigned minimum
848b8605Smrg
848b8605Smrg  Syntax: ``ATOMUMIN dst, resource, offset, src``
848b8605Smrg
b8e80941Smrg  Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
848b8605Smrg
b8e80941Smrg  The following operation is performed atomically:
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst_x = resource[offset]
848b8605Smrg
b8e80941Smrg  resource[offset] = (dst_x < src_x ? dst_x : src_x)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ATOMUMAX - Atomic unsigned maximum
848b8605Smrg
848b8605Smrg  Syntax: ``ATOMUMAX dst, resource, offset, src``
848b8605Smrg
b8e80941Smrg  Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
848b8605Smrg
b8e80941Smrg  The following operation is performed atomically:
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst_x = resource[offset]
848b8605Smrg
b8e80941Smrg  resource[offset] = (dst_x > src_x ? dst_x : src_x)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ATOMIMIN - Atomic signed minimum
848b8605Smrg
848b8605Smrg  Syntax: ``ATOMIMIN dst, resource, offset, src``
848b8605Smrg
b8e80941Smrg  Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
848b8605Smrg
b8e80941Smrg  The following operation is performed atomically:
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst_x = resource[offset]
848b8605Smrg
b8e80941Smrg  resource[offset] = (dst_x < src_x ? dst_x : src_x)
848b8605Smrg
848b8605Smrg
848b8605Smrg.. opcode:: ATOMIMAX - Atomic signed maximum
848b8605Smrg
848b8605Smrg  Syntax: ``ATOMIMAX dst, resource, offset, src``
848b8605Smrg
b8e80941Smrg  Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
848b8605Smrg
b8e80941Smrg  The following operation is performed atomically:
848b8605Smrg
848b8605Smrg.. math::
848b8605Smrg
b8e80941Smrg  dst_x = resource[offset]
b8e80941Smrg
b8e80941Smrg  resource[offset] = (dst_x > src_x ? dst_x : src_x)
b8e80941Smrg
b8e80941Smrg
b8e80941Smrg.. _interlaneopcodes:
b8e80941Smrg
b8e80941SmrgInter-lane opcodes
b8e80941Smrg^^^^^^^^^^^^^^^^^^
b8e80941Smrg
b8e80941SmrgThese opcodes reduce the given value across the shader invocations
b8e80941Smrgrunning in the current SIMD group. Every thread in the subgroup will receive
b8e80941Smrgthe same result. The BALLOT operations accept a single-channel argument that
b8e80941Smrgis treated as a boolean and produce a 64-bit value.
b8e80941Smrg
b8e80941Smrg.. opcode:: VOTE_ANY - Value is set in any of the active invocations
b8e80941Smrg
b8e80941Smrg  Syntax: ``VOTE_ANY dst, value``
b8e80941Smrg
b8e80941Smrg  Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x``
b8e80941Smrg
b8e80941Smrg
b8e80941Smrg.. opcode:: VOTE_ALL - Value is set in all of the active invocations
b8e80941Smrg
b8e80941Smrg  Syntax: ``VOTE_ALL dst, value``
b8e80941Smrg
b8e80941Smrg  Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x``
b8e80941Smrg
b8e80941Smrg
b8e80941Smrg.. opcode:: VOTE_EQ - Value is the same in all of the active invocations
b8e80941Smrg
b8e80941Smrg  Syntax: ``VOTE_EQ dst, value``
b8e80941Smrg
b8e80941Smrg  Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x``
b8e80941Smrg
b8e80941Smrg
b8e80941Smrg.. opcode:: BALLOT - Lanemask of whether the value is set in each active
b8e80941Smrg            invocation
b8e80941Smrg
b8e80941Smrg  Syntax: ``BALLOT dst, value``
b8e80941Smrg
b8e80941Smrg  Example: ``BALLOT TEMP[0].xy, TEMP[1].x``
b8e80941Smrg
b8e80941Smrg  When the argument is a constant true, this produces a bitmask of active
b8e80941Smrg  invocations. In fragment shaders, this can include helper invocations
b8e80941Smrg  (invocations whose outputs and writes to memory are discarded, but which
b8e80941Smrg  are used to compute derivatives).
b8e80941Smrg
b8e80941Smrg
b8e80941Smrg.. opcode:: READ_FIRST - Broadcast the value from the first active
b8e80941Smrg            invocation to all active lanes
848b8605Smrg
b8e80941Smrg  Syntax: ``READ_FIRST dst, value``
848b8605Smrg
b8e80941Smrg  Example: ``READ_FIRST TEMP[0], TEMP[1]``
b8e80941Smrg
b8e80941Smrg
b8e80941Smrg.. opcode:: READ_INVOC - Retrieve the value from the given invocation
b8e80941Smrg            (need not be uniform)
b8e80941Smrg
b8e80941Smrg  Syntax: ``READ_INVOC dst, value, invocation``
b8e80941Smrg
b8e80941Smrg  Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x``
b8e80941Smrg
b8e80941Smrg  invocation.x controls the invocation number to read from for all channels.
b8e80941Smrg  The invocation number must be the same across all active invocations in a
b8e80941Smrg  sub-group; otherwise, the results are undefined.
848b8605Smrg
848b8605Smrg
848b8605SmrgExplanation of symbols used
848b8605Smrg------------------------------
848b8605Smrg
848b8605Smrg
848b8605SmrgFunctions
848b8605Smrg^^^^^^^^^^^^^^
848b8605Smrg
848b8605Smrg
848b8605Smrg  :math:`|x|`       Absolute value of `x`.
848b8605Smrg
848b8605Smrg  :math:`\lceil x \rceil` Ceiling of `x`.
848b8605Smrg
848b8605Smrg  clamp(x,y,z)      Clamp x between y and z.
848b8605Smrg                    (x < y) ? y : (x > z) ? z : x
848b8605Smrg
848b8605Smrg  :math:`\lfloor x\rfloor` Floor of `x`.
848b8605Smrg
848b8605Smrg  :math:`\log_2{x}` Logarithm of `x`, base 2.
848b8605Smrg
848b8605Smrg  max(x,y)          Maximum of x and y.
848b8605Smrg                    (x > y) ? x : y
848b8605Smrg
848b8605Smrg  min(x,y)          Minimum of x and y.
848b8605Smrg                    (x < y) ? x : y
848b8605Smrg
848b8605Smrg  partialx(x)       Derivative of x relative to fragment's X.
848b8605Smrg
848b8605Smrg  partialy(x)       Derivative of x relative to fragment's Y.
848b8605Smrg
848b8605Smrg  pop()             Pop from stack.
848b8605Smrg
848b8605Smrg  :math:`x^y`       `x` to the power `y`.
848b8605Smrg
848b8605Smrg  push(x)           Push x on stack.
848b8605Smrg
848b8605Smrg  round(x)          Round x.
848b8605Smrg
848b8605Smrg  trunc(x)          Truncate x, i.e. drop the fraction bits.
848b8605Smrg
848b8605Smrg
848b8605SmrgKeywords
848b8605Smrg^^^^^^^^^^^^^
848b8605Smrg
848b8605Smrg
848b8605Smrg  discard           Discard fragment.
848b8605Smrg
848b8605Smrg  pc                Program counter.
848b8605Smrg
848b8605Smrg  target            Label of target instruction.
848b8605Smrg
848b8605Smrg
848b8605SmrgOther tokens
848b8605Smrg---------------
848b8605Smrg
848b8605Smrg
848b8605SmrgDeclaration
848b8605Smrg^^^^^^^^^^^
848b8605Smrg
848b8605Smrg
848b8605SmrgDeclares a register that is will be referenced as an operand in Instruction
848b8605Smrgtokens.
848b8605Smrg
848b8605SmrgFile field contains register file that is being declared and is one
848b8605Smrgof TGSI_FILE.
848b8605Smrg
848b8605SmrgUsageMask field specifies which of the register components can be accessed
848b8605Smrgand is one of TGSI_WRITEMASK.
848b8605Smrg
848b8605SmrgThe Local flag specifies that a given value isn't intended for
848b8605Smrgsubroutine parameter passing and, as a result, the implementation
848b8605Smrgisn't required to give any guarantees of it being preserved across
848b8605Smrgsubroutine boundaries.  As it's merely a compiler hint, the
848b8605Smrgimplementation is free to ignore it.
848b8605Smrg
848b8605SmrgIf Dimension flag is set to 1, a Declaration Dimension token follows.
848b8605Smrg
848b8605SmrgIf Semantic flag is set to 1, a Declaration Semantic token follows.
848b8605Smrg
848b8605SmrgIf Interpolate flag is set to 1, a Declaration Interpolate token follows.
848b8605Smrg
848b8605SmrgIf file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
848b8605Smrg
848b8605SmrgIf Array flag is set to 1, a Declaration Array token follows.
848b8605Smrg
848b8605SmrgArray Declaration
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgDeclarations can optional have an ArrayID attribute which can be referred by
b8e80941Smrgindirect addressing operands. An ArrayID of zero is reserved and treated as
848b8605Smrgif no ArrayID is specified.
848b8605Smrg
848b8605SmrgIf an indirect addressing operand refers to a specific declaration by using
848b8605Smrgan ArrayID only the registers in this declaration are guaranteed to be
848b8605Smrgaccessed, accessing any register outside this declaration results in undefined
848b8605Smrgbehavior. Note that for compatibility the effective index is zero-based and
848b8605Smrgnot relative to the specified declaration
848b8605Smrg
848b8605SmrgIf no ArrayID is specified with an indirect addressing operand the whole
848b8605Smrgregister file might be accessed by this operand. This is strongly discouraged
848b8605Smrgand will prevent packing of scalar/vec2 arrays and effective alias analysis.
b8e80941SmrgThis is only legal for TEMP and CONST register files.
848b8605Smrg
848b8605SmrgDeclaration Semantic
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgVertex and fragment shader input and output registers may be labeled
848b8605Smrgwith semantic information consisting of a name and index.
848b8605Smrg
848b8605SmrgFollows Declaration token if Semantic bit is set.
848b8605Smrg
848b8605SmrgSince its purpose is to link a shader with other stages of the pipeline,
848b8605Smrgit is valid to follow only those Declaration tokens that declare a register
848b8605Smrgeither in INPUT or OUTPUT file.
848b8605Smrg
848b8605SmrgSemanticName field contains the semantic name of the register being declared.
848b8605SmrgThere is no default value.
848b8605Smrg
848b8605SmrgSemanticIndex is an optional subscript that can be used to distinguish
848b8605Smrgdifferent register declarations with the same semantic name. The default value
848b8605Smrgis 0.
848b8605Smrg
848b8605SmrgThe meanings of the individual semantic names are explained in the following
848b8605Smrgsections.
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_POSITION
848b8605Smrg""""""""""""""""""""""
848b8605Smrg
848b8605SmrgFor vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
848b8605Smrgoutput register which contains the homogeneous vertex position in the clip
848b8605Smrgspace coordinate system.  After clipping, the X, Y and Z components of the
848b8605Smrgvertex will be divided by the W value to get normalized device coordinates.
848b8605Smrg
848b8605SmrgFor fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
b8e80941Smrgfragment shader input (or system value, depending on which one is
b8e80941Smrgsupported by the driver) contains the fragment's window position.  The X
848b8605Smrgcomponent starts at zero and always increases from left to right.
848b8605SmrgThe Y component starts at zero and always increases but Y=0 may either
848b8605Smrgindicate the top of the window or the bottom depending on the fragment
848b8605Smrgcoordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
848b8605SmrgThe Z coordinate ranges from 0 to 1 to represent depth from the front
b8e80941Smrgto the back of the Z buffer.  The W component contains the interpolated
b8e80941Smrgreciprocal of the vertex position W component (corresponding to gl_Fragcoord,
b8e80941Smrgbut unlike d3d10 which interpolates the same 1/w but then gives back
b8e80941Smrgthe reciprocal of the interpolated value).
848b8605Smrg
848b8605SmrgFragment shaders may also declare an output register with
848b8605SmrgTGSI_SEMANTIC_POSITION.  Only the Z component is writable.  This allows
848b8605Smrgthe fragment shader to change the fragment's Z position.
848b8605Smrg
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_COLOR
848b8605Smrg"""""""""""""""""""
848b8605Smrg
848b8605SmrgFor vertex shader outputs or fragment shader inputs/outputs, this
b8e80941Smrglabel indicates that the register contains an R,G,B,A color.
848b8605Smrg
848b8605SmrgSeveral shader inputs/outputs may contain colors so the semantic index
848b8605Smrgis used to distinguish them.  For example, color[0] may be the diffuse
848b8605Smrgcolor while color[1] may be the specular color.
848b8605Smrg
848b8605SmrgThis label is needed so that the flat/smooth shading can be applied
848b8605Smrgto the right interpolants during rasterization.
848b8605Smrg
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_BCOLOR
848b8605Smrg""""""""""""""""""""
848b8605Smrg
848b8605SmrgBack-facing colors are only used for back-facing polygons, and are only valid
848b8605Smrgin vertex shader outputs. After rasterization, all polygons are front-facing
848b8605Smrgand COLOR and BCOLOR end up occupying the same slots in the fragment shader,
848b8605Smrgso all BCOLORs effectively become regular COLORs in the fragment shader.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_FOG
848b8605Smrg"""""""""""""""""
848b8605Smrg
848b8605SmrgVertex shader inputs and outputs and fragment shader inputs may be
848b8605Smrglabeled with TGSI_SEMANTIC_FOG to indicate that the register contains
848b8605Smrga fog coordinate.  Typically, the fragment shader will use the fog coordinate
848b8605Smrgto compute a fog blend factor which is used to blend the normal fragment color
848b8605Smrgwith a constant fog color.  But fog coord really is just an ordinary vec4
848b8605Smrgregister like regular semantics.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_PSIZE
848b8605Smrg"""""""""""""""""""
848b8605Smrg
848b8605SmrgVertex shader input and output registers may be labeled with
848b8605SmrgTGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
848b8605Smrgin the form (S, 0, 0, 1).  The point size controls the width or diameter
848b8605Smrgof points for rasterization.  This label cannot be used in fragment
848b8605Smrgshaders.
848b8605Smrg
848b8605SmrgWhen using this semantic, be sure to set the appropriate state in the
848b8605Smrg:ref:`rasterizer` first.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_TEXCOORD
848b8605Smrg""""""""""""""""""""""
848b8605Smrg
848b8605SmrgOnly available if PIPE_CAP_TGSI_TEXCOORD is exposed !
848b8605Smrg
848b8605SmrgVertex shader outputs and fragment shader inputs may be labeled with
848b8605Smrgthis semantic to make them replaceable by sprite coordinates via the
848b8605Smrgsprite_coord_enable state in the :ref:`rasterizer`.
848b8605SmrgThe semantic index permitted with this semantic is limited to <= 7.
848b8605Smrg
848b8605SmrgIf the driver does not support TEXCOORD, sprite coordinate replacement
848b8605Smrgapplies to inputs with the GENERIC semantic instead.
848b8605Smrg
848b8605SmrgThe intended use case for this semantic is gl_TexCoord.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_PCOORD
848b8605Smrg""""""""""""""""""""
848b8605Smrg
848b8605SmrgOnly available if PIPE_CAP_TGSI_TEXCOORD is exposed !
848b8605Smrg
848b8605SmrgFragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
848b8605Smrgthat the register contains sprite coordinates in the form (x, y, 0, 1), if
848b8605Smrgthe current primitive is a point and point sprites are enabled. Otherwise,
848b8605Smrgthe contents of the register are undefined.
848b8605Smrg
848b8605SmrgThe intended use case for this semantic is gl_PointCoord.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_GENERIC
848b8605Smrg"""""""""""""""""""""
848b8605Smrg
848b8605SmrgAll vertex/fragment shader inputs/outputs not labeled with any other
848b8605Smrgsemantic label can be considered to be generic attributes.  Typical
848b8605Smrguses of generic inputs/outputs are texcoords and user-defined values.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_NORMAL
848b8605Smrg""""""""""""""""""""
848b8605Smrg
848b8605SmrgIndicates that a vertex shader input is a normal vector.  This is
848b8605Smrgtypically only used for legacy graphics APIs.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_FACE
848b8605Smrg""""""""""""""""""
848b8605Smrg
b8e80941SmrgThis label applies to fragment shader inputs (or system values,
b8e80941Smrgdepending on which one is supported by the driver) and indicates that
b8e80941Smrgthe register contains front/back-face information.
b8e80941Smrg
b8e80941SmrgIf it is an input, it will be a floating-point vector in the form (F, 0, 0, 1),
b8e80941Smrgwhere F will be positive when the fragment belongs to a front-facing polygon,
b8e80941Smrgand negative when the fragment belongs to a back-facing polygon.
b8e80941Smrg
b8e80941SmrgIf it is a system value, it will be an integer vector in the form (F, 0, 0, 1),
b8e80941Smrgwhere F is 0xffffffff when the fragment belongs to a front-facing polygon and
b8e80941Smrg0 when the fragment belongs to a back-facing polygon.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_EDGEFLAG
848b8605Smrg""""""""""""""""""""""
848b8605Smrg
848b8605SmrgFor vertex shaders, this sematic label indicates that an input or
848b8605Smrgoutput is a boolean edge flag.  The register layout is [F, x, x, x]
848b8605Smrgwhere F is 0.0 or 1.0 and x = don't care.  Normally, the vertex shader
848b8605Smrgsimply copies the edge flag input to the edgeflag output.
848b8605Smrg
848b8605SmrgEdge flags are used to control which lines or points are actually
848b8605Smrgdrawn when the polygon mode converts triangles/quads/polygons into
848b8605Smrgpoints or lines.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_STENCIL
848b8605Smrg"""""""""""""""""""""
848b8605Smrg
848b8605SmrgFor fragment shaders, this semantic label indicates that an output
848b8605Smrgis a writable stencil reference value. Only the Y component is writable.
848b8605SmrgThis allows the fragment shader to change the fragments stencilref value.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_VIEWPORT_INDEX
848b8605Smrg""""""""""""""""""""""""""""
848b8605Smrg
848b8605SmrgFor geometry shaders, this semantic label indicates that an output
848b8605Smrgcontains the index of the viewport (and scissor) to use.
b8e80941SmrgThis is an integer value, and only the X component is used.
b8e80941Smrg
b8e80941SmrgIf PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is
b8e80941Smrgsupported, then this semantic label can also be used in vertex or
b8e80941Smrgtessellation evaluation shaders, respectively. Only the value written in the
b8e80941Smrglast vertex processing stage is used.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_LAYER
848b8605Smrg"""""""""""""""""""
848b8605Smrg
848b8605SmrgFor geometry shaders, this semantic label indicates that an output
848b8605Smrgcontains the layer value to use for the color and depth/stencil surfaces.
b8e80941SmrgThis is an integer value, and only the X component is used.
b8e80941Smrg(Also known as rendertarget array index.)
848b8605Smrg
b8e80941SmrgIf PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is
b8e80941Smrgsupported, then this semantic label can also be used in vertex or
b8e80941Smrgtessellation evaluation shaders, respectively. Only the value written in the
b8e80941Smrglast vertex processing stage is used.
848b8605Smrg
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_CLIPDIST
848b8605Smrg""""""""""""""""""""""
848b8605Smrg
b8e80941SmrgNote this covers clipping and culling distances.
b8e80941Smrg
848b8605SmrgWhen components of vertex elements are identified this way, these
848b8605Smrgvalues are each assumed to be a float32 signed distance to a plane.
b8e80941Smrg
b8e80941SmrgFor clip distances:
848b8605SmrgPrimitive setup only invokes rasterization on pixels for which
b8e80941Smrgthe interpolated plane distances are >= 0.
b8e80941Smrg
b8e80941SmrgFor cull distances:
b8e80941SmrgPrimitives will be completely discarded if the plane distance
b8e80941Smrgfor all of the vertices in the primitive are < 0.
b8e80941SmrgIf a vertex has a cull distance of NaN, that vertex counts as "out"
b8e80941Smrg(as if its < 0);
b8e80941Smrg
b8e80941SmrgMultiple clip/cull planes can be implemented simultaneously, by
b8e80941Smrgannotating multiple components of one or more vertex elements with
b8e80941Smrgthe above specified semantic.
b8e80941SmrgThe limits on both clip and cull distances are bound
848b8605Smrgby the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
848b8605Smrgthe maximum number of components that can be used to hold the
848b8605Smrgdistances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
848b8605Smrgwhich specifies the maximum number of registers which can be
848b8605Smrgannotated with those semantics.
b8e80941SmrgThe properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
b8e80941Smrgare used to divide up the 2 x vec4 space between clipping and culling.
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_SAMPLEID
848b8605Smrg""""""""""""""""""""""
848b8605Smrg
848b8605SmrgFor fragment shaders, this semantic label indicates that a system value
b8e80941Smrgcontains the current sample id (i.e. gl_SampleID) as an unsigned int.
b8e80941SmrgOnly the X component is used.  If per-sample shading is not enabled,
b8e80941Smrgthe result is (0, undef, undef, undef).
b8e80941Smrg
b8e80941SmrgNote that if the fragment shader uses this system value, the fragment
b8e80941Smrgshader is automatically executed at per sample frequency.
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_SAMPLEPOS
848b8605Smrg"""""""""""""""""""""""
848b8605Smrg
b8e80941SmrgFor fragment shaders, this semantic label indicates that a system
b8e80941Smrgvalue contains the current sample's position as float4(x, y, undef, undef)
b8e80941Smrgin the render target (i.e.  gl_SamplePosition) when per-fragment shading
b8e80941Smrgis in effect.  Position values are in the range [0, 1] where 0.5 is
b8e80941Smrgthe center of the fragment.
b8e80941Smrg
b8e80941SmrgNote that if the fragment shader uses this system value, the fragment
b8e80941Smrgshader is automatically executed at per sample frequency.
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_SAMPLEMASK
848b8605Smrg""""""""""""""""""""""""
848b8605Smrg
b8e80941SmrgFor fragment shaders, this semantic label can be applied to either a
b8e80941Smrgshader system value input or output.
b8e80941Smrg
b8e80941SmrgFor a system value, the sample mask indicates the set of samples covered by
b8e80941Smrgthe current primitive.  If MSAA is not enabled, the value is (1, 0, 0, 0).
b8e80941Smrg
b8e80941SmrgFor an output, the sample mask is used to disable further sample processing.
b8e80941Smrg
b8e80941SmrgFor both, the register type is uint[4] but only the X component is used
b8e80941Smrg(i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up
b8e80941Smrgto 32x MSAA is supported).
848b8605Smrg
848b8605SmrgTGSI_SEMANTIC_INVOCATIONID
848b8605Smrg""""""""""""""""""""""""""
848b8605Smrg
848b8605SmrgFor geometry shaders, this semantic label indicates that a system value
b8e80941Smrgcontains the current invocation id (i.e. gl_InvocationID).
b8e80941SmrgThis is an integer value, and only the X component is used.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_INSTANCEID
b8e80941Smrg""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains
b8e80941Smrgthe current instance id (i.e. gl_InstanceID). It does not include the base
b8e80941Smrginstance. This is an integer value, and only the X component is used.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_VERTEXID
b8e80941Smrg""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains
b8e80941Smrgthe current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the
b8e80941Smrgbase vertex. This is an integer value, and only the X component is used.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_VERTEXID_NOBASE
b8e80941Smrg"""""""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains
b8e80941Smrgthe current vertex id without including the base vertex (this corresponds to
b8e80941Smrgd3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX
b8e80941Smrg== TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component
b8e80941Smrgis used.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_BASEVERTEX
b8e80941Smrg""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains
b8e80941Smrgthe base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls,
b8e80941Smrgthis contains the first (or start) value instead.
b8e80941SmrgThis is an integer value, and only the X component is used.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_PRIMID
b8e80941Smrg""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor geometry and fragment shaders, this semantic label indicates the value
b8e80941Smrgcontains the primitive id (i.e. gl_PrimitiveID). This is an integer value,
b8e80941Smrgand only the X component is used.
b8e80941SmrgFIXME: This right now can be either a ordinary input or a system value...
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_PATCH
b8e80941Smrg"""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates a
b8e80941Smrggeneric per-patch attribute. Such semantics will not implicitly be per-vertex
b8e80941Smrgarrays.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_TESSCOORD
b8e80941Smrg"""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor tessellation evaluation shaders, this semantic label indicates the
b8e80941Smrgcoordinates of the vertex being processed. This is available in XYZ; W is
b8e80941Smrgundefined.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_TESSOUTER
b8e80941Smrg"""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates the
b8e80941Smrgouter tessellation levels of the patch. Isoline tessellation will only have XY
b8e80941Smrgdefined, triangle will have XYZ and quads will have XYZW defined. This
b8e80941Smrgcorresponds to gl_TessLevelOuter.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_TESSINNER
b8e80941Smrg"""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates the
b8e80941Smrginner tessellation levels of the patch. The X value is only defined for
b8e80941Smrgtriangle tessellation, while quads will have XY defined. This is entirely
b8e80941Smrgundefined for isoline tessellation.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_VERTICESIN
b8e80941Smrg""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates the
b8e80941Smrgnumber of vertices provided in the input patch. Only the X value is defined.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_HELPER_INVOCATION
b8e80941Smrg"""""""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor fragment shaders, this semantic indicates whether the current
b8e80941Smrginvocation is covered or not. Helper invocations are created in order
b8e80941Smrgto properly compute derivatives, however it may be desirable to skip
b8e80941Smrgsome of the logic in those cases. See ``gl_HelperInvocation`` documentation.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_BASEINSTANCE
b8e80941Smrg""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor vertex shaders, the base instance argument supplied for this
b8e80941Smrgdraw. This is an integer value, and only the X component is used.
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_DRAWID
b8e80941Smrg""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor vertex shaders, the zero-based index of the current draw in a
b8e80941Smrg``glMultiDraw*`` invocation. This is an integer value, and only the X
b8e80941Smrgcomponent is used.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_WORK_DIM
b8e80941Smrg""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor compute shaders started via opencl this retrieves the work_dim
b8e80941Smrgparameter to the clEnqueueNDRangeKernel call with which the shader
b8e80941Smrgwas started.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_GRID_SIZE
b8e80941Smrg"""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor compute shaders, this semantic indicates the maximum (x, y, z) dimensions
b8e80941Smrgof a grid of thread blocks.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_BLOCK_ID
b8e80941Smrg""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor compute shaders, this semantic indicates the (x, y, z) coordinates of the
b8e80941Smrgcurrent block inside of the grid.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_BLOCK_SIZE
b8e80941Smrg""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor compute shaders, this semantic indicates the maximum (x, y, z) dimensions
b8e80941Smrgof a block in threads.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_THREAD_ID
b8e80941Smrg"""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgFor compute shaders, this semantic indicates the (x, y, z) coordinates of the
b8e80941Smrgcurrent thread inside of the block.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_SIZE
b8e80941Smrg"""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgThis semantic indicates the subgroup size for the current invocation. This is
b8e80941Smrgan integer of at most 64, as it indicates the width of lanemasks. It does not
b8e80941Smrgdepend on the number of invocations that are active.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_INVOCATION
b8e80941Smrg"""""""""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgThe index of the current invocation within its subgroup.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_EQ_MASK
b8e80941Smrg""""""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgA bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
b8e80941Smrg``1 << subgroup_invocation`` in arbitrary precision arithmetic.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_GE_MASK
b8e80941Smrg""""""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgA bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
b8e80941Smrg``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation``
b8e80941Smrgin arbitrary precision arithmetic.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_GT_MASK
b8e80941Smrg""""""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgA bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
b8e80941Smrg``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)``
b8e80941Smrgin arbitrary precision arithmetic.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_LE_MASK
b8e80941Smrg""""""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgA bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
b8e80941Smrg``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic.
b8e80941Smrg
b8e80941Smrg
b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_LT_MASK
b8e80941Smrg""""""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgA bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
b8e80941Smrg``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
b8e80941Smrg
848b8605Smrg
848b8605SmrgDeclaration Interpolate
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgThis token is only valid for fragment shader INPUT declarations.
848b8605Smrg
848b8605SmrgThe Interpolate field specifes the way input is being interpolated by
848b8605Smrgthe rasteriser and is one of TGSI_INTERPOLATE_*.
848b8605Smrg
848b8605SmrgThe Location field specifies the location inside the pixel that the
848b8605Smrginterpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that
848b8605Smrgwhen per-sample shading is enabled, the implementation may choose to
848b8605Smrginterpolate at the sample irrespective of the Location field.
848b8605Smrg
848b8605SmrgThe CylindricalWrap bitfield specifies which register components
848b8605Smrgshould be subject to cylindrical wrapping when interpolating by the
848b8605Smrgrasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component
848b8605Smrgshould be interpolated according to cylindrical wrapping rules.
848b8605Smrg
848b8605Smrg
848b8605SmrgDeclaration Sampler View
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgFollows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
848b8605Smrg
848b8605SmrgDCL SVIEW[#], resource, type(s)
848b8605Smrg
848b8605SmrgDeclares a shader input sampler view and assigns it to a SVIEW[#]
848b8605Smrgregister.
848b8605Smrg
848b8605Smrgresource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
848b8605Smrg
848b8605Smrgtype must be 1 or 4 entries (if specifying on a per-component
848b8605Smrglevel) out of UNORM, SNORM, SINT, UINT and FLOAT.
848b8605Smrg
b8e80941SmrgFor TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes
b8e80941Smrgwhich take an explicit SVIEW[#] source register), there may be optionally
b8e80941SmrgSVIEW[#] declarations.  In this case, the SVIEW index is implied by the
b8e80941SmrgSAMP index, and there must be a corresponding SVIEW[#] declaration for
b8e80941Smrgeach SAMP[#] declaration.  Drivers are free to ignore this if they wish.
b8e80941SmrgBut note in particular that some drivers need to know the sampler type
b8e80941Smrg(float/int/unsigned) in order to generate the correct code, so cases
b8e80941Smrgwhere integer textures are sampled, SVIEW[#] declarations should be
b8e80941Smrgused.
b8e80941Smrg
b8e80941SmrgNOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes
b8e80941Smrgin the same shader.
848b8605Smrg
848b8605SmrgDeclaration Resource
848b8605Smrg^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgFollows Declaration token if file is TGSI_FILE_RESOURCE.
848b8605Smrg
848b8605SmrgDCL RES[#], resource [, WR] [, RAW]
848b8605Smrg
848b8605SmrgDeclares a shader input resource and assigns it to a RES[#]
848b8605Smrgregister.
848b8605Smrg
848b8605Smrgresource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
848b8605Smrg2DArray.
848b8605Smrg
848b8605SmrgIf the RAW keyword is not specified, the texture data will be
848b8605Smrgsubject to conversion, swizzling and scaling as required to yield
848b8605Smrgthe specified data type from the physical data format of the bound
848b8605Smrgresource.
848b8605Smrg
848b8605SmrgIf the RAW keyword is specified, no channel conversion will be
848b8605Smrgperformed: the values read for each of the channels (X,Y,Z,W) will
848b8605Smrgcorrespond to consecutive words in the same order and format
848b8605Smrgthey're found in memory.  No element-to-address conversion will be
848b8605Smrgperformed either: the value of the provided X coordinate will be
848b8605Smrginterpreted in byte units instead of texel units.  The result of
848b8605Smrgaccessing a misaligned address is undefined.
848b8605Smrg
848b8605SmrgUsage of the STORE opcode is only allowed if the WR (writable) flag
848b8605Smrgis set.
848b8605Smrg
b8e80941SmrgHardware Atomic Register File
b8e80941Smrg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
b8e80941Smrg
b8e80941SmrgHardware atomics are declared as a 2D array with an optional array id.
b8e80941Smrg
b8e80941SmrgThe first member of the dimension is the buffer resource the atomic
b8e80941Smrgis located in.
b8e80941SmrgThe second member is a range into the buffer resource, either for
b8e80941Smrgone or multiple counters. If this is an array, the declaration will have
b8e80941Smrgan unique array id.
b8e80941Smrg
b8e80941SmrgEach counter is 4 bytes in size, and index and ranges are in counters not bytes.
b8e80941SmrgDCL HWATOMIC[0][0]
b8e80941SmrgDCL HWATOMIC[0][1]
b8e80941Smrg
b8e80941SmrgThis declares two atomics, one at the start of the buffer and one in the
b8e80941Smrgsecond 4 bytes.
b8e80941Smrg
b8e80941SmrgDCL HWATOMIC[0][0]
b8e80941SmrgDCL HWATOMIC[1][0]
b8e80941SmrgDCL HWATOMIC[1][1..3], ARRAY(1)
b8e80941Smrg
b8e80941SmrgThis declares 5 atomics, one in buffer 0 at 0,
b8e80941Smrgone in buffer 1 at 0, and an array of 3 atomics in
b8e80941Smrgthe buffer 1, starting at 1.
848b8605Smrg
848b8605SmrgProperties
848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
848b8605Smrg
848b8605SmrgProperties are general directives that apply to the whole TGSI program.
848b8605Smrg
848b8605SmrgFS_COORD_ORIGIN
848b8605Smrg"""""""""""""""
848b8605Smrg
848b8605SmrgSpecifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
848b8605SmrgThe default value is UPPER_LEFT.
848b8605Smrg
848b8605SmrgIf UPPER_LEFT, the position will be (0,0) at the upper left corner and
848b8605Smrgincrease downward and rightward.
848b8605SmrgIf LOWER_LEFT, the position will be (0,0) at the lower left corner and
848b8605Smrgincrease upward and rightward.
848b8605Smrg
848b8605SmrgOpenGL defaults to LOWER_LEFT, and is configurable with the
848b8605SmrgGL_ARB_fragment_coord_conventions extension.
848b8605Smrg
848b8605SmrgDirectX 9/10 use UPPER_LEFT.
848b8605Smrg
848b8605SmrgFS_COORD_PIXEL_CENTER
848b8605Smrg"""""""""""""""""""""
848b8605Smrg
848b8605SmrgSpecifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
848b8605SmrgThe default value is HALF_INTEGER.
848b8605Smrg
848b8605SmrgIf HALF_INTEGER, the fractionary part of the position will be 0.5
848b8605SmrgIf INTEGER, the fractionary part of the position will be 0.0
848b8605Smrg
848b8605SmrgNote that this does not affect the set of fragments generated by
848b8605Smrgrasterization, which is instead controlled by half_pixel_center in the
848b8605Smrgrasterizer.
848b8605Smrg
848b8605SmrgOpenGL defaults to HALF_INTEGER, and is configurable with the
848b8605SmrgGL_ARB_fragment_coord_conventions extension.
848b8605Smrg
848b8605SmrgDirectX 9 uses INTEGER.
848b8605SmrgDirectX 10 uses HALF_INTEGER.
848b8605Smrg
848b8605SmrgFS_COLOR0_WRITES_ALL_CBUFS
848b8605Smrg""""""""""""""""""""""""""
848b8605SmrgSpecifies that writes to the fragment shader color 0 are replicated to all
848b8605Smrgbound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
848b8605SmrgfragData is directed to a single color buffer, but fragColor is broadcast.
848b8605Smrg
848b8605SmrgVS_PROHIBIT_UCPS
848b8605Smrg""""""""""""""""""""""""""
848b8605SmrgIf this property is set on the program bound to the shader stage before the
848b8605Smrgfragment shader, user clip planes should have no effect (be disabled) even if
848b8605Smrgthat shader does not write to any clip distance outputs and the rasterizer's
848b8605Smrgclip_plane_enable is non-zero.
848b8605SmrgThis property is only supported by drivers that also support shader clip
848b8605Smrgdistance outputs.
848b8605SmrgThis is useful for APIs that don't have UCPs and where clip distances written
848b8605Smrgby a shader cannot be disabled.
848b8605Smrg
848b8605SmrgGS_INVOCATIONS
848b8605Smrg""""""""""""""
848b8605Smrg
848b8605SmrgSpecifies the number of times a geometry shader should be executed for each
848b8605Smrginput primitive. Each invocation will have a different
848b8605SmrgTGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
848b8605Smrgbe 1.
848b8605Smrg
848b8605SmrgVS_WINDOW_SPACE_POSITION
848b8605Smrg""""""""""""""""""""""""""
848b8605SmrgIf this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
848b8605Smrgis assumed to contain window space coordinates.
848b8605SmrgDivision of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
848b8605Smrgdirectly taken from the 4-th component of the shader output.
848b8605SmrgNaturally, clipping is not performed on window coordinates either.
848b8605SmrgThe effect of this property is undefined if a geometry or tessellation shader
848b8605Smrgare in use.
848b8605Smrg
b8e80941SmrgTCS_VERTICES_OUT
b8e80941Smrg""""""""""""""""
b8e80941Smrg
b8e80941SmrgThe number of vertices written by the tessellation control shader. This
b8e80941Smrgeffectively defines the patch input size of the tessellation evaluation shader
b8e80941Smrgas well.
b8e80941Smrg
b8e80941SmrgTES_PRIM_MODE
b8e80941Smrg"""""""""""""
b8e80941Smrg
b8e80941SmrgThis sets the tessellation primitive mode, one of ``PIPE_PRIM_TRIANGLES``,
b8e80941Smrg``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no
b8e80941Smrgseparate isolines settings, the regular lines is assumed to mean isolines.)
b8e80941Smrg
b8e80941SmrgTES_SPACING
b8e80941Smrg"""""""""""
b8e80941Smrg
b8e80941SmrgThis sets the spacing mode of the tessellation generator, one of
b8e80941Smrg``PIPE_TESS_SPACING_*``.
b8e80941Smrg
b8e80941SmrgTES_VERTEX_ORDER_CW
b8e80941Smrg"""""""""""""""""""
b8e80941Smrg
b8e80941SmrgThis sets the vertex order to be clockwise if the value is 1, or
b8e80941Smrgcounter-clockwise if set to 0.
b8e80941Smrg
b8e80941SmrgTES_POINT_MODE
b8e80941Smrg""""""""""""""
b8e80941Smrg
b8e80941SmrgIf set to a non-zero value, this turns on point mode for the tessellator,
b8e80941Smrgwhich means that points will be generated instead of primitives.
b8e80941Smrg
b8e80941SmrgNUM_CLIPDIST_ENABLED
b8e80941Smrg""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgHow many clip distance scalar outputs are enabled.
b8e80941Smrg
b8e80941SmrgNUM_CULLDIST_ENABLED
b8e80941Smrg""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgHow many cull distance scalar outputs are enabled.
b8e80941Smrg
b8e80941SmrgFS_EARLY_DEPTH_STENCIL
b8e80941Smrg""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgWhether depth test, stencil test, and occlusion query should run before
b8e80941Smrgthe fragment shader (regardless of fragment shader side effects). Corresponds
b8e80941Smrgto GLSL early_fragment_tests.
b8e80941Smrg
b8e80941SmrgNEXT_SHADER
b8e80941Smrg"""""""""""
b8e80941Smrg
b8e80941SmrgWhich shader stage will MOST LIKELY follow after this shader when the shader
b8e80941Smrgis bound. This is only a hint to the driver and doesn't have to be precise.
b8e80941SmrgOnly set for VS and TES.
b8e80941Smrg
b8e80941SmrgCS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH
b8e80941Smrg"""""""""""""""""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgThreads per block in each dimension, if known at compile time. If the block size
b8e80941Smrgis known all three should be at least 1. If it is unknown they should all be set
b8e80941Smrgto 0 or not set.
b8e80941Smrg
b8e80941SmrgMUL_ZERO_WINS
b8e80941Smrg"""""""""""""
b8e80941Smrg
b8e80941SmrgThe MUL TGSI operation (FP32 multiplication) will return 0 if either
b8e80941Smrgof the operands are equal to 0. That means that 0 * Inf = 0. This
b8e80941Smrgshould be set the same way for an entire pipeline. Note that this
b8e80941Smrgapplies not only to the literal MUL TGSI opcode, but all FP32
b8e80941Smrgmultiplications implied by other operations, such as MAD, FMA, DP2,
b8e80941SmrgDP3, DP4, DST, LOG, LRP, and possibly others. If there is a
b8e80941Smrgmismatch between shaders, then it is unspecified whether this behavior
b8e80941Smrgwill be enabled.
b8e80941Smrg
b8e80941SmrgFS_POST_DEPTH_COVERAGE
b8e80941Smrg""""""""""""""""""""""
b8e80941Smrg
b8e80941SmrgWhen enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples
b8e80941Smrgthat have failed the depth/stencil tests. This is only valid when
b8e80941SmrgFS_EARLY_DEPTH_STENCIL is also specified.
b8e80941Smrg
b8e80941Smrg
848b8605SmrgTexture Sampling and Texture Formats
848b8605Smrg------------------------------------
848b8605Smrg
848b8605SmrgThis table shows how texture image components are returned as (x,y,z,w) tuples
848b8605Smrgby TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
848b8605Smrg:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
848b8605Smrgwell.
848b8605Smrg
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| Texture Components | Gallium      | OpenGL             | Direct3D 9   |
848b8605Smrg+====================+==============+====================+==============+
848b8605Smrg| R                  | (r, 0, 0, 1) | (r, 0, 0, 1)       | (r, 1, 1, 1) |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| RG                 | (r, g, 0, 1) | (r, g, 0, 1)       | (r, g, 1, 1) |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| RGB                | (r, g, b, 1) | (r, g, b, 1)       | (r, g, b, 1) |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| RGBA               | (r, g, b, a) | (r, g, b, a)       | (r, g, b, a) |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| A                  | (0, 0, 0, a) | (0, 0, 0, a)       | (0, 0, 0, a) |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| L                  | (l, l, l, 1) | (l, l, l, 1)       | (l, l, l, 1) |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| LA                 | (l, l, l, a) | (l, l, l, a)       | (l, l, l, a) |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| I                  | (i, i, i, i) | (i, i, i, i)       | N/A          |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| UV                 | XXX TBD      | (0, 0, 0, 1)       | (u, v, 1, 1) |
848b8605Smrg|                    |              | [#envmap-bumpmap]_ |              |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| Z                  | XXX TBD      | (z, z, z, 1)       | (0, z, 0, 1) |
848b8605Smrg|                    |              | [#depth-tex-mode]_ |              |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg| S                  | (s, s, s, s) | unknown            | unknown      |
848b8605Smrg+--------------------+--------------+--------------------+--------------+
848b8605Smrg
848b8605Smrg.. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
848b8605Smrg.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
848b8605Smrg   or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.