1848b8605SmrgTGSI 2848b8605Smrg==== 3848b8605Smrg 4848b8605SmrgTGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language 5848b8605Smrgfor describing shaders. Since Gallium is inherently shaderful, shaders are 6848b8605Smrgan important part of the API. TGSI is the only intermediate representation 7848b8605Smrgused by all drivers. 8848b8605Smrg 9848b8605SmrgBasics 10848b8605Smrg------ 11848b8605Smrg 12848b8605SmrgAll TGSI instructions, known as *opcodes*, operate on arbitrary-precision 13848b8605Smrgfloating-point four-component vectors. An opcode may have up to one 14848b8605Smrgdestination register, known as *dst*, and between zero and three source 15848b8605Smrgregisters, called *src0* through *src2*, or simply *src* if there is only 16848b8605Smrgone. 17848b8605Smrg 18848b8605SmrgSome instructions, like :opcode:`I2F`, permit re-interpretation of vector 19848b8605Smrgcomponents as integers. Other instructions permit using registers as 20848b8605Smrgtwo-component vectors with double precision; see :ref:`doubleopcodes`. 21848b8605Smrg 22848b8605SmrgWhen an instruction has a scalar result, the result is usually copied into 23848b8605Smrgeach of the components of *dst*. When this happens, the result is said to be 24848b8605Smrg*replicated* to *dst*. :opcode:`RCP` is one such instruction. 25848b8605Smrg 26848b8605SmrgModifiers 27848b8605Smrg^^^^^^^^^^^^^^^ 28848b8605Smrg 29b8e80941SmrgTGSI supports modifiers on inputs (as well as saturate and precise modifier 30b8e80941Smrgon instructions). 31848b8605Smrg 32b8e80941SmrgFor arithmetic instruction having a precise modifier certain optimizations 33b8e80941Smrgwhich may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be 34b8e80941Smrgoptimized to TGSI_OPCODE_MAD, because some hardware only supports the fused 35b8e80941SmrgMAD instruction. 36b8e80941Smrg 37b8e80941SmrgFor inputs which have a floating point type, both absolute value and 38b8e80941Smrgnegation modifiers are supported (with absolute value being applied 39b8e80941Smrgfirst). The only source of TGSI_OPCODE_MOV and the second and third 40b8e80941Smrgsources of TGSI_OPCODE_UCMP are considered to have float type for 41b8e80941Smrgapplying modifiers. 42848b8605Smrg 43848b8605SmrgFor inputs which have signed or unsigned type only the negate modifier is 44848b8605Smrgsupported. 45848b8605Smrg 46848b8605SmrgInstruction Set 47848b8605Smrg--------------- 48848b8605Smrg 49848b8605SmrgCore ISA 50848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^^ 51848b8605Smrg 52848b8605SmrgThese opcodes are guaranteed to be available regardless of the driver being 53848b8605Smrgused. 54848b8605Smrg 55848b8605Smrg.. opcode:: ARL - Address Register Load 56848b8605Smrg 57848b8605Smrg.. math:: 58848b8605Smrg 59b8e80941Smrg dst.x = (int) \lfloor src.x\rfloor 60848b8605Smrg 61b8e80941Smrg dst.y = (int) \lfloor src.y\rfloor 62848b8605Smrg 63b8e80941Smrg dst.z = (int) \lfloor src.z\rfloor 64848b8605Smrg 65b8e80941Smrg dst.w = (int) \lfloor src.w\rfloor 66848b8605Smrg 67848b8605Smrg 68848b8605Smrg.. opcode:: MOV - Move 69848b8605Smrg 70848b8605Smrg.. math:: 71848b8605Smrg 72848b8605Smrg dst.x = src.x 73848b8605Smrg 74848b8605Smrg dst.y = src.y 75848b8605Smrg 76848b8605Smrg dst.z = src.z 77848b8605Smrg 78848b8605Smrg dst.w = src.w 79848b8605Smrg 80848b8605Smrg 81848b8605Smrg.. opcode:: LIT - Light Coefficients 82848b8605Smrg 83848b8605Smrg.. math:: 84848b8605Smrg 85848b8605Smrg dst.x &= 1 \\ 86848b8605Smrg dst.y &= max(src.x, 0) \\ 87848b8605Smrg dst.z &= (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 \\ 88848b8605Smrg dst.w &= 1 89848b8605Smrg 90848b8605Smrg 91848b8605Smrg.. opcode:: RCP - Reciprocal 92848b8605Smrg 93848b8605SmrgThis instruction replicates its result. 94848b8605Smrg 95848b8605Smrg.. math:: 96848b8605Smrg 97848b8605Smrg dst = \frac{1}{src.x} 98848b8605Smrg 99848b8605Smrg 100848b8605Smrg.. opcode:: RSQ - Reciprocal Square Root 101848b8605Smrg 102848b8605SmrgThis instruction replicates its result. The results are undefined for src <= 0. 103848b8605Smrg 104848b8605Smrg.. math:: 105848b8605Smrg 106848b8605Smrg dst = \frac{1}{\sqrt{src.x}} 107848b8605Smrg 108848b8605Smrg 109848b8605Smrg.. opcode:: SQRT - Square Root 110848b8605Smrg 111848b8605SmrgThis instruction replicates its result. The results are undefined for src < 0. 112848b8605Smrg 113848b8605Smrg.. math:: 114848b8605Smrg 115848b8605Smrg dst = {\sqrt{src.x}} 116848b8605Smrg 117848b8605Smrg 118848b8605Smrg.. opcode:: EXP - Approximate Exponential Base 2 119848b8605Smrg 120848b8605Smrg.. math:: 121848b8605Smrg 122848b8605Smrg dst.x &= 2^{\lfloor src.x\rfloor} \\ 123848b8605Smrg dst.y &= src.x - \lfloor src.x\rfloor \\ 124848b8605Smrg dst.z &= 2^{src.x} \\ 125848b8605Smrg dst.w &= 1 126848b8605Smrg 127848b8605Smrg 128848b8605Smrg.. opcode:: LOG - Approximate Logarithm Base 2 129848b8605Smrg 130848b8605Smrg.. math:: 131848b8605Smrg 132848b8605Smrg dst.x &= \lfloor\log_2{|src.x|}\rfloor \\ 133848b8605Smrg dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\ 134848b8605Smrg dst.z &= \log_2{|src.x|} \\ 135848b8605Smrg dst.w &= 1 136848b8605Smrg 137848b8605Smrg 138848b8605Smrg.. opcode:: MUL - Multiply 139848b8605Smrg 140848b8605Smrg.. math:: 141848b8605Smrg 142848b8605Smrg dst.x = src0.x \times src1.x 143848b8605Smrg 144848b8605Smrg dst.y = src0.y \times src1.y 145848b8605Smrg 146848b8605Smrg dst.z = src0.z \times src1.z 147848b8605Smrg 148848b8605Smrg dst.w = src0.w \times src1.w 149848b8605Smrg 150848b8605Smrg 151848b8605Smrg.. opcode:: ADD - Add 152848b8605Smrg 153848b8605Smrg.. math:: 154848b8605Smrg 155848b8605Smrg dst.x = src0.x + src1.x 156848b8605Smrg 157848b8605Smrg dst.y = src0.y + src1.y 158848b8605Smrg 159848b8605Smrg dst.z = src0.z + src1.z 160848b8605Smrg 161848b8605Smrg dst.w = src0.w + src1.w 162848b8605Smrg 163848b8605Smrg 164848b8605Smrg.. opcode:: DP3 - 3-component Dot Product 165848b8605Smrg 166848b8605SmrgThis instruction replicates its result. 167848b8605Smrg 168848b8605Smrg.. math:: 169848b8605Smrg 170848b8605Smrg dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z 171848b8605Smrg 172848b8605Smrg 173848b8605Smrg.. opcode:: DP4 - 4-component Dot Product 174848b8605Smrg 175848b8605SmrgThis instruction replicates its result. 176848b8605Smrg 177848b8605Smrg.. math:: 178848b8605Smrg 179848b8605Smrg dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w 180848b8605Smrg 181848b8605Smrg 182848b8605Smrg.. opcode:: DST - Distance Vector 183848b8605Smrg 184848b8605Smrg.. math:: 185848b8605Smrg 186848b8605Smrg dst.x &= 1\\ 187848b8605Smrg dst.y &= src0.y \times src1.y\\ 188848b8605Smrg dst.z &= src0.z\\ 189848b8605Smrg dst.w &= src1.w 190848b8605Smrg 191848b8605Smrg 192848b8605Smrg.. opcode:: MIN - Minimum 193848b8605Smrg 194848b8605Smrg.. math:: 195848b8605Smrg 196848b8605Smrg dst.x = min(src0.x, src1.x) 197848b8605Smrg 198848b8605Smrg dst.y = min(src0.y, src1.y) 199848b8605Smrg 200848b8605Smrg dst.z = min(src0.z, src1.z) 201848b8605Smrg 202848b8605Smrg dst.w = min(src0.w, src1.w) 203848b8605Smrg 204848b8605Smrg 205848b8605Smrg.. opcode:: MAX - Maximum 206848b8605Smrg 207848b8605Smrg.. math:: 208848b8605Smrg 209848b8605Smrg dst.x = max(src0.x, src1.x) 210848b8605Smrg 211848b8605Smrg dst.y = max(src0.y, src1.y) 212848b8605Smrg 213848b8605Smrg dst.z = max(src0.z, src1.z) 214848b8605Smrg 215848b8605Smrg dst.w = max(src0.w, src1.w) 216848b8605Smrg 217848b8605Smrg 218848b8605Smrg.. opcode:: SLT - Set On Less Than 219848b8605Smrg 220848b8605Smrg.. math:: 221848b8605Smrg 222848b8605Smrg dst.x = (src0.x < src1.x) ? 1.0F : 0.0F 223848b8605Smrg 224848b8605Smrg dst.y = (src0.y < src1.y) ? 1.0F : 0.0F 225848b8605Smrg 226848b8605Smrg dst.z = (src0.z < src1.z) ? 1.0F : 0.0F 227848b8605Smrg 228848b8605Smrg dst.w = (src0.w < src1.w) ? 1.0F : 0.0F 229848b8605Smrg 230848b8605Smrg 231848b8605Smrg.. opcode:: SGE - Set On Greater Equal Than 232848b8605Smrg 233848b8605Smrg.. math:: 234848b8605Smrg 235848b8605Smrg dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F 236848b8605Smrg 237848b8605Smrg dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F 238848b8605Smrg 239848b8605Smrg dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F 240848b8605Smrg 241848b8605Smrg dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F 242848b8605Smrg 243848b8605Smrg 244848b8605Smrg.. opcode:: MAD - Multiply And Add 245848b8605Smrg 246b8e80941SmrgPerform a * b + c. The implementation is free to decide whether there is an 247b8e80941Smrgintermediate rounding step or not. 248b8e80941Smrg 249848b8605Smrg.. math:: 250848b8605Smrg 251848b8605Smrg dst.x = src0.x \times src1.x + src2.x 252848b8605Smrg 253848b8605Smrg dst.y = src0.y \times src1.y + src2.y 254848b8605Smrg 255848b8605Smrg dst.z = src0.z \times src1.z + src2.z 256848b8605Smrg 257848b8605Smrg dst.w = src0.w \times src1.w + src2.w 258848b8605Smrg 259848b8605Smrg 260848b8605Smrg.. opcode:: LRP - Linear Interpolate 261848b8605Smrg 262848b8605Smrg.. math:: 263848b8605Smrg 264848b8605Smrg dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x 265848b8605Smrg 266848b8605Smrg dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y 267848b8605Smrg 268848b8605Smrg dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z 269848b8605Smrg 270848b8605Smrg dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w 271848b8605Smrg 272848b8605Smrg 273b8e80941Smrg.. opcode:: FMA - Fused Multiply-Add 274848b8605Smrg 275b8e80941SmrgPerform a * b + c with no intermediate rounding step. 276848b8605Smrg 277848b8605Smrg.. math:: 278848b8605Smrg 279b8e80941Smrg dst.x = src0.x \times src1.x + src2.x 280848b8605Smrg 281b8e80941Smrg dst.y = src0.y \times src1.y + src2.y 282848b8605Smrg 283b8e80941Smrg dst.z = src0.z \times src1.z + src2.z 284848b8605Smrg 285b8e80941Smrg dst.w = src0.w \times src1.w + src2.w 286848b8605Smrg 287848b8605Smrg 288848b8605Smrg.. opcode:: FRC - Fraction 289848b8605Smrg 290848b8605Smrg.. math:: 291848b8605Smrg 292848b8605Smrg dst.x = src.x - \lfloor src.x\rfloor 293848b8605Smrg 294848b8605Smrg dst.y = src.y - \lfloor src.y\rfloor 295848b8605Smrg 296848b8605Smrg dst.z = src.z - \lfloor src.z\rfloor 297848b8605Smrg 298848b8605Smrg dst.w = src.w - \lfloor src.w\rfloor 299848b8605Smrg 300848b8605Smrg 301848b8605Smrg.. opcode:: FLR - Floor 302848b8605Smrg 303848b8605Smrg.. math:: 304848b8605Smrg 305848b8605Smrg dst.x = \lfloor src.x\rfloor 306848b8605Smrg 307848b8605Smrg dst.y = \lfloor src.y\rfloor 308848b8605Smrg 309848b8605Smrg dst.z = \lfloor src.z\rfloor 310848b8605Smrg 311848b8605Smrg dst.w = \lfloor src.w\rfloor 312848b8605Smrg 313848b8605Smrg 314848b8605Smrg.. opcode:: ROUND - Round 315848b8605Smrg 316848b8605Smrg.. math:: 317848b8605Smrg 318848b8605Smrg dst.x = round(src.x) 319848b8605Smrg 320848b8605Smrg dst.y = round(src.y) 321848b8605Smrg 322848b8605Smrg dst.z = round(src.z) 323848b8605Smrg 324848b8605Smrg dst.w = round(src.w) 325848b8605Smrg 326848b8605Smrg 327848b8605Smrg.. opcode:: EX2 - Exponential Base 2 328848b8605Smrg 329848b8605SmrgThis instruction replicates its result. 330848b8605Smrg 331848b8605Smrg.. math:: 332848b8605Smrg 333848b8605Smrg dst = 2^{src.x} 334848b8605Smrg 335848b8605Smrg 336848b8605Smrg.. opcode:: LG2 - Logarithm Base 2 337848b8605Smrg 338848b8605SmrgThis instruction replicates its result. 339848b8605Smrg 340848b8605Smrg.. math:: 341848b8605Smrg 342848b8605Smrg dst = \log_2{src.x} 343848b8605Smrg 344848b8605Smrg 345848b8605Smrg.. opcode:: POW - Power 346848b8605Smrg 347848b8605SmrgThis instruction replicates its result. 348848b8605Smrg 349848b8605Smrg.. math:: 350848b8605Smrg 351848b8605Smrg dst = src0.x^{src1.x} 352848b8605Smrg 353848b8605Smrg 354b8e80941Smrg.. opcode:: LDEXP - Multiply Number by Integral Power of 2 355848b8605Smrg 356b8e80941Smrgsrc1 is an integer. 357848b8605Smrg 358848b8605Smrg.. math:: 359848b8605Smrg 360b8e80941Smrg dst.x = src0.x * 2^{src1.x} 361b8e80941Smrg dst.y = src0.y * 2^{src1.y} 362b8e80941Smrg dst.z = src0.z * 2^{src1.z} 363b8e80941Smrg dst.w = src0.w * 2^{src1.w} 364848b8605Smrg 365848b8605Smrg 366848b8605Smrg.. opcode:: COS - Cosine 367848b8605Smrg 368848b8605SmrgThis instruction replicates its result. 369848b8605Smrg 370848b8605Smrg.. math:: 371848b8605Smrg 372848b8605Smrg dst = \cos{src.x} 373848b8605Smrg 374848b8605Smrg 375848b8605Smrg.. opcode:: DDX, DDX_FINE - Derivative Relative To X 376848b8605Smrg 377848b8605SmrgThe fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is 378848b8605Smrgadvertised. When it is, the fine version guarantees one derivative per row 379848b8605Smrgwhile DDX is allowed to be the same for the entire 2x2 quad. 380848b8605Smrg 381848b8605Smrg.. math:: 382848b8605Smrg 383848b8605Smrg dst.x = partialx(src.x) 384848b8605Smrg 385848b8605Smrg dst.y = partialx(src.y) 386848b8605Smrg 387848b8605Smrg dst.z = partialx(src.z) 388848b8605Smrg 389848b8605Smrg dst.w = partialx(src.w) 390848b8605Smrg 391848b8605Smrg 392848b8605Smrg.. opcode:: DDY, DDY_FINE - Derivative Relative To Y 393848b8605Smrg 394848b8605SmrgThe fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is 395848b8605Smrgadvertised. When it is, the fine version guarantees one derivative per column 396848b8605Smrgwhile DDY is allowed to be the same for the entire 2x2 quad. 397848b8605Smrg 398848b8605Smrg.. math:: 399848b8605Smrg 400848b8605Smrg dst.x = partialy(src.x) 401848b8605Smrg 402848b8605Smrg dst.y = partialy(src.y) 403848b8605Smrg 404848b8605Smrg dst.z = partialy(src.z) 405848b8605Smrg 406848b8605Smrg dst.w = partialy(src.w) 407848b8605Smrg 408848b8605Smrg 409848b8605Smrg.. opcode:: PK2H - Pack Two 16-bit Floats 410848b8605Smrg 411b8e80941SmrgThis instruction replicates its result. 412848b8605Smrg 413b8e80941Smrg.. math:: 414848b8605Smrg 415b8e80941Smrg dst = f32\_to\_f16(src.x) | f32\_to\_f16(src.y) << 16 416848b8605Smrg 417848b8605Smrg 418b8e80941Smrg.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars 419848b8605Smrg 420b8e80941SmrgThis instruction replicates its result. 421848b8605Smrg 422b8e80941Smrg.. math:: 423848b8605Smrg 424b8e80941Smrg dst = f32\_to\_unorm16(src.x) | f32\_to\_unorm16(src.y) << 16 425848b8605Smrg 426848b8605Smrg 427b8e80941Smrg.. opcode:: PK4B - Pack Four Signed 8-bit Scalars 428848b8605Smrg 429b8e80941SmrgThis instruction replicates its result. 430848b8605Smrg 431848b8605Smrg.. math:: 432848b8605Smrg 433b8e80941Smrg dst = f32\_to\_snorm8(src.x) | 434b8e80941Smrg (f32\_to\_snorm8(src.y) << 8) | 435b8e80941Smrg (f32\_to\_snorm8(src.z) << 16) | 436b8e80941Smrg (f32\_to\_snorm8(src.w) << 24) 437848b8605Smrg 438848b8605Smrg 439b8e80941Smrg.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars 440848b8605Smrg 441b8e80941SmrgThis instruction replicates its result. 442848b8605Smrg 443b8e80941Smrg.. math:: 444848b8605Smrg 445b8e80941Smrg dst = f32\_to\_unorm8(src.x) | 446b8e80941Smrg (f32\_to\_unorm8(src.y) << 8) | 447b8e80941Smrg (f32\_to\_unorm8(src.z) << 16) | 448b8e80941Smrg (f32\_to\_unorm8(src.w) << 24) 449848b8605Smrg 450848b8605Smrg 451848b8605Smrg.. opcode:: SEQ - Set On Equal 452848b8605Smrg 453848b8605Smrg.. math:: 454848b8605Smrg 455848b8605Smrg dst.x = (src0.x == src1.x) ? 1.0F : 0.0F 456848b8605Smrg 457848b8605Smrg dst.y = (src0.y == src1.y) ? 1.0F : 0.0F 458848b8605Smrg 459848b8605Smrg dst.z = (src0.z == src1.z) ? 1.0F : 0.0F 460848b8605Smrg 461848b8605Smrg dst.w = (src0.w == src1.w) ? 1.0F : 0.0F 462848b8605Smrg 463848b8605Smrg 464848b8605Smrg.. opcode:: SGT - Set On Greater Than 465848b8605Smrg 466848b8605Smrg.. math:: 467848b8605Smrg 468848b8605Smrg dst.x = (src0.x > src1.x) ? 1.0F : 0.0F 469848b8605Smrg 470848b8605Smrg dst.y = (src0.y > src1.y) ? 1.0F : 0.0F 471848b8605Smrg 472848b8605Smrg dst.z = (src0.z > src1.z) ? 1.0F : 0.0F 473848b8605Smrg 474848b8605Smrg dst.w = (src0.w > src1.w) ? 1.0F : 0.0F 475848b8605Smrg 476848b8605Smrg 477848b8605Smrg.. opcode:: SIN - Sine 478848b8605Smrg 479848b8605SmrgThis instruction replicates its result. 480848b8605Smrg 481848b8605Smrg.. math:: 482848b8605Smrg 483848b8605Smrg dst = \sin{src.x} 484848b8605Smrg 485848b8605Smrg 486848b8605Smrg.. opcode:: SLE - Set On Less Equal Than 487848b8605Smrg 488848b8605Smrg.. math:: 489848b8605Smrg 490848b8605Smrg dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F 491848b8605Smrg 492848b8605Smrg dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F 493848b8605Smrg 494848b8605Smrg dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F 495848b8605Smrg 496848b8605Smrg dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F 497848b8605Smrg 498848b8605Smrg 499848b8605Smrg.. opcode:: SNE - Set On Not Equal 500848b8605Smrg 501848b8605Smrg.. math:: 502848b8605Smrg 503848b8605Smrg dst.x = (src0.x != src1.x) ? 1.0F : 0.0F 504848b8605Smrg 505848b8605Smrg dst.y = (src0.y != src1.y) ? 1.0F : 0.0F 506848b8605Smrg 507848b8605Smrg dst.z = (src0.z != src1.z) ? 1.0F : 0.0F 508848b8605Smrg 509848b8605Smrg dst.w = (src0.w != src1.w) ? 1.0F : 0.0F 510848b8605Smrg 511848b8605Smrg 512848b8605Smrg.. opcode:: TEX - Texture Lookup 513848b8605Smrg 514848b8605Smrg for array textures src0.y contains the slice for 1D, 515848b8605Smrg and src0.z contain the slice for 2D. 516848b8605Smrg 517848b8605Smrg for shadow textures with no arrays (and not cube map), 518848b8605Smrg src0.z contains the reference value. 519848b8605Smrg 520848b8605Smrg for shadow textures with arrays, src0.z contains 521848b8605Smrg the reference value for 1D arrays, and src0.w contains 522848b8605Smrg the reference value for 2D arrays and cube maps. 523848b8605Smrg 524848b8605Smrg for cube map array shadow textures, the reference value 525848b8605Smrg cannot be passed in src0.w, and TEX2 must be used instead. 526848b8605Smrg 527848b8605Smrg.. math:: 528848b8605Smrg 529848b8605Smrg coord = src0 530848b8605Smrg 531848b8605Smrg shadow_ref = src0.z or src0.w (optional) 532848b8605Smrg 533848b8605Smrg unit = src1 534848b8605Smrg 535848b8605Smrg dst = texture\_sample(unit, coord, shadow_ref) 536848b8605Smrg 537848b8605Smrg 538848b8605Smrg.. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only) 539848b8605Smrg 540848b8605Smrg this is the same as TEX, but uses another reg to encode the 541848b8605Smrg reference value. 542848b8605Smrg 543848b8605Smrg.. math:: 544848b8605Smrg 545848b8605Smrg coord = src0 546848b8605Smrg 547848b8605Smrg shadow_ref = src1.x 548848b8605Smrg 549848b8605Smrg unit = src2 550848b8605Smrg 551848b8605Smrg dst = texture\_sample(unit, coord, shadow_ref) 552848b8605Smrg 553848b8605Smrg 554848b8605Smrg 555848b8605Smrg 556848b8605Smrg.. opcode:: TXD - Texture Lookup with Derivatives 557848b8605Smrg 558848b8605Smrg.. math:: 559848b8605Smrg 560848b8605Smrg coord = src0 561848b8605Smrg 562848b8605Smrg ddx = src1 563848b8605Smrg 564848b8605Smrg ddy = src2 565848b8605Smrg 566848b8605Smrg unit = src3 567848b8605Smrg 568848b8605Smrg dst = texture\_sample\_deriv(unit, coord, ddx, ddy) 569848b8605Smrg 570848b8605Smrg 571848b8605Smrg.. opcode:: TXP - Projective Texture Lookup 572848b8605Smrg 573848b8605Smrg.. math:: 574848b8605Smrg 575848b8605Smrg coord.x = src0.x / src0.w 576848b8605Smrg 577848b8605Smrg coord.y = src0.y / src0.w 578848b8605Smrg 579848b8605Smrg coord.z = src0.z / src0.w 580848b8605Smrg 581848b8605Smrg coord.w = src0.w 582848b8605Smrg 583848b8605Smrg unit = src1 584848b8605Smrg 585848b8605Smrg dst = texture\_sample(unit, coord) 586848b8605Smrg 587848b8605Smrg 588848b8605Smrg.. opcode:: UP2H - Unpack Two 16-Bit Floats 589848b8605Smrg 590b8e80941Smrg.. math:: 591848b8605Smrg 592b8e80941Smrg dst.x = f16\_to\_f32(src0.x \& 0xffff) 593848b8605Smrg 594b8e80941Smrg dst.y = f16\_to\_f32(src0.x >> 16) 595848b8605Smrg 596b8e80941Smrg dst.z = f16\_to\_f32(src0.x \& 0xffff) 597848b8605Smrg 598b8e80941Smrg dst.w = f16\_to\_f32(src0.x >> 16) 599848b8605Smrg 600848b8605Smrg.. note:: 601848b8605Smrg 602848b8605Smrg Considered for removal. 603848b8605Smrg 604b8e80941Smrg.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars 605848b8605Smrg 606848b8605Smrg TBD 607848b8605Smrg 608848b8605Smrg.. note:: 609848b8605Smrg 610848b8605Smrg Considered for removal. 611848b8605Smrg 612b8e80941Smrg.. opcode:: UP4B - Unpack Four Signed 8-Bit Values 613848b8605Smrg 614848b8605Smrg TBD 615848b8605Smrg 616848b8605Smrg.. note:: 617848b8605Smrg 618848b8605Smrg Considered for removal. 619848b8605Smrg 620b8e80941Smrg.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars 621848b8605Smrg 622848b8605Smrg TBD 623848b8605Smrg 624848b8605Smrg.. note:: 625848b8605Smrg 626848b8605Smrg Considered for removal. 627848b8605Smrg 628b8e80941Smrg 629848b8605Smrg.. opcode:: ARR - Address Register Load With Round 630848b8605Smrg 631848b8605Smrg.. math:: 632848b8605Smrg 633b8e80941Smrg dst.x = (int) round(src.x) 634848b8605Smrg 635b8e80941Smrg dst.y = (int) round(src.y) 636848b8605Smrg 637b8e80941Smrg dst.z = (int) round(src.z) 638848b8605Smrg 639b8e80941Smrg dst.w = (int) round(src.w) 640848b8605Smrg 641848b8605Smrg 642848b8605Smrg.. opcode:: SSG - Set Sign 643848b8605Smrg 644848b8605Smrg.. math:: 645848b8605Smrg 646848b8605Smrg dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0 647848b8605Smrg 648848b8605Smrg dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0 649848b8605Smrg 650848b8605Smrg dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0 651848b8605Smrg 652848b8605Smrg dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0 653848b8605Smrg 654848b8605Smrg 655848b8605Smrg.. opcode:: CMP - Compare 656848b8605Smrg 657848b8605Smrg.. math:: 658848b8605Smrg 659848b8605Smrg dst.x = (src0.x < 0) ? src1.x : src2.x 660848b8605Smrg 661848b8605Smrg dst.y = (src0.y < 0) ? src1.y : src2.y 662848b8605Smrg 663848b8605Smrg dst.z = (src0.z < 0) ? src1.z : src2.z 664848b8605Smrg 665848b8605Smrg dst.w = (src0.w < 0) ? src1.w : src2.w 666848b8605Smrg 667848b8605Smrg 668848b8605Smrg.. opcode:: KILL_IF - Conditional Discard 669848b8605Smrg 670848b8605Smrg Conditional discard. Allowed in fragment shaders only. 671848b8605Smrg 672848b8605Smrg.. math:: 673848b8605Smrg 674848b8605Smrg if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0) 675848b8605Smrg discard 676848b8605Smrg endif 677848b8605Smrg 678848b8605Smrg 679848b8605Smrg.. opcode:: KILL - Discard 680848b8605Smrg 681848b8605Smrg Unconditional discard. Allowed in fragment shaders only. 682848b8605Smrg 683848b8605Smrg 684848b8605Smrg.. opcode:: TXB - Texture Lookup With Bias 685848b8605Smrg 686848b8605Smrg for cube map array textures and shadow cube maps, the bias value 687848b8605Smrg cannot be passed in src0.w, and TXB2 must be used instead. 688848b8605Smrg 689848b8605Smrg if the target is a shadow texture, the reference value is always 690848b8605Smrg in src.z (this prevents shadow 3d and shadow 2d arrays from 691848b8605Smrg using this instruction, but this is not needed). 692848b8605Smrg 693848b8605Smrg.. math:: 694848b8605Smrg 695848b8605Smrg coord.x = src0.x 696848b8605Smrg 697848b8605Smrg coord.y = src0.y 698848b8605Smrg 699848b8605Smrg coord.z = src0.z 700848b8605Smrg 701848b8605Smrg coord.w = none 702848b8605Smrg 703848b8605Smrg bias = src0.w 704848b8605Smrg 705848b8605Smrg unit = src1 706848b8605Smrg 707848b8605Smrg dst = texture\_sample(unit, coord, bias) 708848b8605Smrg 709848b8605Smrg 710848b8605Smrg.. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only) 711848b8605Smrg 712848b8605Smrg this is the same as TXB, but uses another reg to encode the 713848b8605Smrg lod bias value for cube map arrays and shadow cube maps. 714848b8605Smrg Presumably shadow 2d arrays and shadow 3d targets could use 715848b8605Smrg this encoding too, but this is not legal. 716848b8605Smrg 717848b8605Smrg shadow cube map arrays are neither possible nor required. 718848b8605Smrg 719848b8605Smrg.. math:: 720848b8605Smrg 721848b8605Smrg coord = src0 722848b8605Smrg 723848b8605Smrg bias = src1.x 724848b8605Smrg 725848b8605Smrg unit = src2 726848b8605Smrg 727848b8605Smrg dst = texture\_sample(unit, coord, bias) 728848b8605Smrg 729848b8605Smrg 730848b8605Smrg.. opcode:: DIV - Divide 731848b8605Smrg 732848b8605Smrg.. math:: 733848b8605Smrg 734848b8605Smrg dst.x = \frac{src0.x}{src1.x} 735848b8605Smrg 736848b8605Smrg dst.y = \frac{src0.y}{src1.y} 737848b8605Smrg 738848b8605Smrg dst.z = \frac{src0.z}{src1.z} 739848b8605Smrg 740848b8605Smrg dst.w = \frac{src0.w}{src1.w} 741848b8605Smrg 742848b8605Smrg 743848b8605Smrg.. opcode:: DP2 - 2-component Dot Product 744848b8605Smrg 745848b8605SmrgThis instruction replicates its result. 746848b8605Smrg 747848b8605Smrg.. math:: 748848b8605Smrg 749848b8605Smrg dst = src0.x \times src1.x + src0.y \times src1.y 750848b8605Smrg 751848b8605Smrg 752b8e80941Smrg.. opcode:: TEX_LZ - Texture Lookup With LOD = 0 753b8e80941Smrg 754b8e80941Smrg This is the same as TXL with LOD = 0. Like every texture opcode, it obeys 755b8e80941Smrg pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod. 756b8e80941Smrg There is no way to override those two in shaders. 757b8e80941Smrg 758b8e80941Smrg.. math:: 759b8e80941Smrg 760b8e80941Smrg coord.x = src0.x 761b8e80941Smrg 762b8e80941Smrg coord.y = src0.y 763b8e80941Smrg 764b8e80941Smrg coord.z = src0.z 765b8e80941Smrg 766b8e80941Smrg coord.w = none 767b8e80941Smrg 768b8e80941Smrg lod = 0 769b8e80941Smrg 770b8e80941Smrg unit = src1 771b8e80941Smrg 772b8e80941Smrg dst = texture\_sample(unit, coord, lod) 773b8e80941Smrg 774b8e80941Smrg 775848b8605Smrg.. opcode:: TXL - Texture Lookup With explicit LOD 776848b8605Smrg 777848b8605Smrg for cube map array textures, the explicit lod value 778848b8605Smrg cannot be passed in src0.w, and TXL2 must be used instead. 779848b8605Smrg 780848b8605Smrg if the target is a shadow texture, the reference value is always 781848b8605Smrg in src.z (this prevents shadow 3d / 2d array / cube targets from 782848b8605Smrg using this instruction, but this is not needed). 783848b8605Smrg 784848b8605Smrg.. math:: 785848b8605Smrg 786848b8605Smrg coord.x = src0.x 787848b8605Smrg 788848b8605Smrg coord.y = src0.y 789848b8605Smrg 790848b8605Smrg coord.z = src0.z 791848b8605Smrg 792848b8605Smrg coord.w = none 793848b8605Smrg 794848b8605Smrg lod = src0.w 795848b8605Smrg 796848b8605Smrg unit = src1 797848b8605Smrg 798848b8605Smrg dst = texture\_sample(unit, coord, lod) 799848b8605Smrg 800848b8605Smrg 801848b8605Smrg.. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only) 802848b8605Smrg 803848b8605Smrg this is the same as TXL, but uses another reg to encode the 804848b8605Smrg explicit lod value. 805848b8605Smrg Presumably shadow 3d / 2d array / cube targets could use 806848b8605Smrg this encoding too, but this is not legal. 807848b8605Smrg 808848b8605Smrg shadow cube map arrays are neither possible nor required. 809848b8605Smrg 810848b8605Smrg.. math:: 811848b8605Smrg 812848b8605Smrg coord = src0 813848b8605Smrg 814848b8605Smrg lod = src1.x 815848b8605Smrg 816848b8605Smrg unit = src2 817848b8605Smrg 818848b8605Smrg dst = texture\_sample(unit, coord, lod) 819848b8605Smrg 820848b8605Smrg 821848b8605SmrgCompute ISA 822848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^ 823848b8605Smrg 824848b8605SmrgThese opcodes are primarily provided for special-use computational shaders. 825848b8605SmrgSupport for these opcodes indicated by a special pipe capability bit (TBD). 826848b8605Smrg 827848b8605SmrgXXX doesn't look like most of the opcodes really belong here. 828848b8605Smrg 829848b8605Smrg.. opcode:: CEIL - Ceiling 830848b8605Smrg 831848b8605Smrg.. math:: 832848b8605Smrg 833848b8605Smrg dst.x = \lceil src.x\rceil 834848b8605Smrg 835848b8605Smrg dst.y = \lceil src.y\rceil 836848b8605Smrg 837848b8605Smrg dst.z = \lceil src.z\rceil 838848b8605Smrg 839848b8605Smrg dst.w = \lceil src.w\rceil 840848b8605Smrg 841848b8605Smrg 842848b8605Smrg.. opcode:: TRUNC - Truncate 843848b8605Smrg 844848b8605Smrg.. math:: 845848b8605Smrg 846848b8605Smrg dst.x = trunc(src.x) 847848b8605Smrg 848848b8605Smrg dst.y = trunc(src.y) 849848b8605Smrg 850848b8605Smrg dst.z = trunc(src.z) 851848b8605Smrg 852848b8605Smrg dst.w = trunc(src.w) 853848b8605Smrg 854848b8605Smrg 855848b8605Smrg.. opcode:: MOD - Modulus 856848b8605Smrg 857848b8605Smrg.. math:: 858848b8605Smrg 859848b8605Smrg dst.x = src0.x \bmod src1.x 860848b8605Smrg 861848b8605Smrg dst.y = src0.y \bmod src1.y 862848b8605Smrg 863848b8605Smrg dst.z = src0.z \bmod src1.z 864848b8605Smrg 865848b8605Smrg dst.w = src0.w \bmod src1.w 866848b8605Smrg 867848b8605Smrg 868848b8605Smrg.. opcode:: UARL - Integer Address Register Load 869848b8605Smrg 870848b8605Smrg Moves the contents of the source register, assumed to be an integer, into the 871848b8605Smrg destination register, which is assumed to be an address (ADDR) register. 872848b8605Smrg 873848b8605Smrg 874848b8605Smrg.. opcode:: TXF - Texel Fetch 875848b8605Smrg 876848b8605Smrg As per NV_gpu_shader4, extract a single texel from a specified texture 877b8e80941Smrg image or PIPE_BUFFER resource. The source sampler may not be a CUBE or 878b8e80941Smrg SHADOW. src 0 is a 879848b8605Smrg four-component signed integer vector used to identify the single texel 880b8e80941Smrg accessed. 3 components + level. If the texture is multisampled, then 881b8e80941Smrg the fourth component indicates the sample, not the mipmap level. 882b8e80941Smrg Just like texture instructions, an optional 883848b8605Smrg offset vector is provided, which is subject to various driver restrictions 884b8e80941Smrg (regarding range, source of offsets). This instruction ignores the sampler 885b8e80941Smrg state. 886b8e80941Smrg 887848b8605Smrg TXF(uint_vec coord, int_vec offset). 888848b8605Smrg 889848b8605Smrg 890848b8605Smrg.. opcode:: TXQ - Texture Size Query 891848b8605Smrg 892848b8605Smrg As per NV_gpu_program4, retrieve the dimensions of the texture depending on 893848b8605Smrg the target. For 1D (width), 2D/RECT/CUBE (width, height), 3D (width, height, 894848b8605Smrg depth), 1D array (width, layers), 2D array (width, height, layers). 895848b8605Smrg Also return the number of accessible levels (last_level - first_level + 1) 896848b8605Smrg in W. 897848b8605Smrg 898848b8605Smrg For components which don't return a resource dimension, their value 899848b8605Smrg is undefined. 900848b8605Smrg 901848b8605Smrg.. math:: 902848b8605Smrg 903848b8605Smrg lod = src0.x 904848b8605Smrg 905848b8605Smrg dst.x = texture\_width(unit, lod) 906848b8605Smrg 907848b8605Smrg dst.y = texture\_height(unit, lod) 908848b8605Smrg 909848b8605Smrg dst.z = texture\_depth(unit, lod) 910848b8605Smrg 911848b8605Smrg dst.w = texture\_levels(unit) 912848b8605Smrg 913b8e80941Smrg 914b8e80941Smrg.. opcode:: TXQS - Texture Samples Query 915b8e80941Smrg 916b8e80941Smrg This retrieves the number of samples in the texture, and stores it 917b8e80941Smrg into the x component as an unsigned integer. The other components are 918b8e80941Smrg undefined. If the texture is not multisampled, this function returns 919b8e80941Smrg (1, undef, undef, undef). 920b8e80941Smrg 921b8e80941Smrg.. math:: 922b8e80941Smrg 923b8e80941Smrg dst.x = texture\_samples(unit) 924b8e80941Smrg 925b8e80941Smrg 926848b8605Smrg.. opcode:: TG4 - Texture Gather 927848b8605Smrg 928848b8605Smrg As per ARB_texture_gather, gathers the four texels to be used in a bi-linear 929848b8605Smrg filtering operation and packs them into a single register. Only works with 930848b8605Smrg 2D, 2D array, cubemaps, and cubemaps arrays. For 2D textures, only the 931848b8605Smrg addressing modes of the sampler and the top level of any mip pyramid are 932848b8605Smrg used. Set W to zero. It behaves like the TEX instruction, but a filtered 933848b8605Smrg sample is not generated. The four samples that contribute to filtering are 934848b8605Smrg placed into xyzw in clockwise order, starting with the (u,v) texture 935848b8605Smrg coordinate delta at the following locations (-, +), (+, +), (+, -), (-, -), 936848b8605Smrg where the magnitude of the deltas are half a texel. 937848b8605Smrg 938848b8605Smrg PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample 939848b8605Smrg depth compares, single component selection, and a non-constant offset. It 940848b8605Smrg doesn't allow support for the GL independent offset to get i0,j0. This would 941848b8605Smrg require another CAP is hw can do it natively. For now we lower that before 942848b8605Smrg TGSI. 943848b8605Smrg 944848b8605Smrg.. math:: 945848b8605Smrg 946848b8605Smrg coord = src0 947848b8605Smrg 948848b8605Smrg component = src1 949848b8605Smrg 950848b8605Smrg dst = texture\_gather4 (unit, coord, component) 951848b8605Smrg 952848b8605Smrg(with SM5 - cube array shadow) 953848b8605Smrg 954848b8605Smrg.. math:: 955848b8605Smrg 956848b8605Smrg coord = src0 957848b8605Smrg 958848b8605Smrg compare = src1 959848b8605Smrg 960848b8605Smrg dst = texture\_gather (uint, coord, compare) 961848b8605Smrg 962848b8605Smrg.. opcode:: LODQ - level of detail query 963848b8605Smrg 964848b8605Smrg Compute the LOD information that the texture pipe would use to access the 965848b8605Smrg texture. The Y component contains the computed LOD lambda_prime. The X 966848b8605Smrg component contains the LOD that will be accessed, based on min/max lod's 967848b8605Smrg and mipmap filters. 968848b8605Smrg 969848b8605Smrg.. math:: 970848b8605Smrg 971848b8605Smrg coord = src0 972848b8605Smrg 973848b8605Smrg dst.xy = lodq(uint, coord); 974848b8605Smrg 975b8e80941Smrg.. opcode:: CLOCK - retrieve the current shader time 976b8e80941Smrg 977b8e80941Smrg Invoking this instruction multiple times in the same shader should 978b8e80941Smrg cause monotonically increasing values to be returned. The values 979b8e80941Smrg are implicitly 64-bit, so if fewer than 64 bits of precision are 980b8e80941Smrg available, to provide expected wraparound semantics, the value 981b8e80941Smrg should be shifted up so that the most significant bit of the time 982b8e80941Smrg is the most significant bit of the 64-bit value. 983b8e80941Smrg 984b8e80941Smrg.. math:: 985b8e80941Smrg 986b8e80941Smrg dst.xy = clock() 987b8e80941Smrg 988b8e80941Smrg 989848b8605SmrgInteger ISA 990848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^ 991848b8605SmrgThese opcodes are used for integer operations. 992848b8605SmrgSupport for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?) 993848b8605Smrg 994848b8605Smrg 995848b8605Smrg.. opcode:: I2F - Signed Integer To Float 996848b8605Smrg 997848b8605Smrg Rounding is unspecified (round to nearest even suggested). 998848b8605Smrg 999848b8605Smrg.. math:: 1000848b8605Smrg 1001848b8605Smrg dst.x = (float) src.x 1002848b8605Smrg 1003848b8605Smrg dst.y = (float) src.y 1004848b8605Smrg 1005848b8605Smrg dst.z = (float) src.z 1006848b8605Smrg 1007848b8605Smrg dst.w = (float) src.w 1008848b8605Smrg 1009848b8605Smrg 1010848b8605Smrg.. opcode:: U2F - Unsigned Integer To Float 1011848b8605Smrg 1012848b8605Smrg Rounding is unspecified (round to nearest even suggested). 1013848b8605Smrg 1014848b8605Smrg.. math:: 1015848b8605Smrg 1016848b8605Smrg dst.x = (float) src.x 1017848b8605Smrg 1018848b8605Smrg dst.y = (float) src.y 1019848b8605Smrg 1020848b8605Smrg dst.z = (float) src.z 1021848b8605Smrg 1022848b8605Smrg dst.w = (float) src.w 1023848b8605Smrg 1024848b8605Smrg 1025848b8605Smrg.. opcode:: F2I - Float to Signed Integer 1026848b8605Smrg 1027848b8605Smrg Rounding is towards zero (truncate). 1028848b8605Smrg Values outside signed range (including NaNs) produce undefined results. 1029848b8605Smrg 1030848b8605Smrg.. math:: 1031848b8605Smrg 1032848b8605Smrg dst.x = (int) src.x 1033848b8605Smrg 1034848b8605Smrg dst.y = (int) src.y 1035848b8605Smrg 1036848b8605Smrg dst.z = (int) src.z 1037848b8605Smrg 1038848b8605Smrg dst.w = (int) src.w 1039848b8605Smrg 1040848b8605Smrg 1041848b8605Smrg.. opcode:: F2U - Float to Unsigned Integer 1042848b8605Smrg 1043848b8605Smrg Rounding is towards zero (truncate). 1044848b8605Smrg Values outside unsigned range (including NaNs) produce undefined results. 1045848b8605Smrg 1046848b8605Smrg.. math:: 1047848b8605Smrg 1048848b8605Smrg dst.x = (unsigned) src.x 1049848b8605Smrg 1050848b8605Smrg dst.y = (unsigned) src.y 1051848b8605Smrg 1052848b8605Smrg dst.z = (unsigned) src.z 1053848b8605Smrg 1054848b8605Smrg dst.w = (unsigned) src.w 1055848b8605Smrg 1056848b8605Smrg 1057848b8605Smrg.. opcode:: UADD - Integer Add 1058848b8605Smrg 1059848b8605Smrg This instruction works the same for signed and unsigned integers. 1060848b8605Smrg The low 32bit of the result is returned. 1061848b8605Smrg 1062848b8605Smrg.. math:: 1063848b8605Smrg 1064848b8605Smrg dst.x = src0.x + src1.x 1065848b8605Smrg 1066848b8605Smrg dst.y = src0.y + src1.y 1067848b8605Smrg 1068848b8605Smrg dst.z = src0.z + src1.z 1069848b8605Smrg 1070848b8605Smrg dst.w = src0.w + src1.w 1071848b8605Smrg 1072848b8605Smrg 1073848b8605Smrg.. opcode:: UMAD - Integer Multiply And Add 1074848b8605Smrg 1075848b8605Smrg This instruction works the same for signed and unsigned integers. 1076848b8605Smrg The multiplication returns the low 32bit (as does the result itself). 1077848b8605Smrg 1078848b8605Smrg.. math:: 1079848b8605Smrg 1080848b8605Smrg dst.x = src0.x \times src1.x + src2.x 1081848b8605Smrg 1082848b8605Smrg dst.y = src0.y \times src1.y + src2.y 1083848b8605Smrg 1084848b8605Smrg dst.z = src0.z \times src1.z + src2.z 1085848b8605Smrg 1086848b8605Smrg dst.w = src0.w \times src1.w + src2.w 1087848b8605Smrg 1088848b8605Smrg 1089848b8605Smrg.. opcode:: UMUL - Integer Multiply 1090848b8605Smrg 1091848b8605Smrg This instruction works the same for signed and unsigned integers. 1092848b8605Smrg The low 32bit of the result is returned. 1093848b8605Smrg 1094848b8605Smrg.. math:: 1095848b8605Smrg 1096848b8605Smrg dst.x = src0.x \times src1.x 1097848b8605Smrg 1098848b8605Smrg dst.y = src0.y \times src1.y 1099848b8605Smrg 1100848b8605Smrg dst.z = src0.z \times src1.z 1101848b8605Smrg 1102848b8605Smrg dst.w = src0.w \times src1.w 1103848b8605Smrg 1104848b8605Smrg 1105848b8605Smrg.. opcode:: IMUL_HI - Signed Integer Multiply High Bits 1106848b8605Smrg 1107848b8605Smrg The high 32bits of the multiplication of 2 signed integers are returned. 1108848b8605Smrg 1109848b8605Smrg.. math:: 1110848b8605Smrg 1111848b8605Smrg dst.x = (src0.x \times src1.x) >> 32 1112848b8605Smrg 1113848b8605Smrg dst.y = (src0.y \times src1.y) >> 32 1114848b8605Smrg 1115848b8605Smrg dst.z = (src0.z \times src1.z) >> 32 1116848b8605Smrg 1117848b8605Smrg dst.w = (src0.w \times src1.w) >> 32 1118848b8605Smrg 1119848b8605Smrg 1120848b8605Smrg.. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits 1121848b8605Smrg 1122848b8605Smrg The high 32bits of the multiplication of 2 unsigned integers are returned. 1123848b8605Smrg 1124848b8605Smrg.. math:: 1125848b8605Smrg 1126848b8605Smrg dst.x = (src0.x \times src1.x) >> 32 1127848b8605Smrg 1128848b8605Smrg dst.y = (src0.y \times src1.y) >> 32 1129848b8605Smrg 1130848b8605Smrg dst.z = (src0.z \times src1.z) >> 32 1131848b8605Smrg 1132848b8605Smrg dst.w = (src0.w \times src1.w) >> 32 1133848b8605Smrg 1134848b8605Smrg 1135848b8605Smrg.. opcode:: IDIV - Signed Integer Division 1136848b8605Smrg 1137848b8605Smrg TBD: behavior for division by zero. 1138848b8605Smrg 1139848b8605Smrg.. math:: 1140848b8605Smrg 1141b8e80941Smrg dst.x = \frac{src0.x}{src1.x} 1142848b8605Smrg 1143b8e80941Smrg dst.y = \frac{src0.y}{src1.y} 1144848b8605Smrg 1145b8e80941Smrg dst.z = \frac{src0.z}{src1.z} 1146848b8605Smrg 1147b8e80941Smrg dst.w = \frac{src0.w}{src1.w} 1148848b8605Smrg 1149848b8605Smrg 1150848b8605Smrg.. opcode:: UDIV - Unsigned Integer Division 1151848b8605Smrg 1152848b8605Smrg For division by zero, 0xffffffff is returned. 1153848b8605Smrg 1154848b8605Smrg.. math:: 1155848b8605Smrg 1156b8e80941Smrg dst.x = \frac{src0.x}{src1.x} 1157848b8605Smrg 1158b8e80941Smrg dst.y = \frac{src0.y}{src1.y} 1159848b8605Smrg 1160b8e80941Smrg dst.z = \frac{src0.z}{src1.z} 1161848b8605Smrg 1162b8e80941Smrg dst.w = \frac{src0.w}{src1.w} 1163848b8605Smrg 1164848b8605Smrg 1165848b8605Smrg.. opcode:: UMOD - Unsigned Integer Remainder 1166848b8605Smrg 1167848b8605Smrg If second arg is zero, 0xffffffff is returned. 1168848b8605Smrg 1169848b8605Smrg.. math:: 1170848b8605Smrg 1171b8e80941Smrg dst.x = src0.x \bmod src1.x 1172848b8605Smrg 1173b8e80941Smrg dst.y = src0.y \bmod src1.y 1174848b8605Smrg 1175b8e80941Smrg dst.z = src0.z \bmod src1.z 1176848b8605Smrg 1177b8e80941Smrg dst.w = src0.w \bmod src1.w 1178848b8605Smrg 1179848b8605Smrg 1180848b8605Smrg.. opcode:: NOT - Bitwise Not 1181848b8605Smrg 1182848b8605Smrg.. math:: 1183848b8605Smrg 1184848b8605Smrg dst.x = \sim src.x 1185848b8605Smrg 1186848b8605Smrg dst.y = \sim src.y 1187848b8605Smrg 1188848b8605Smrg dst.z = \sim src.z 1189848b8605Smrg 1190848b8605Smrg dst.w = \sim src.w 1191848b8605Smrg 1192848b8605Smrg 1193848b8605Smrg.. opcode:: AND - Bitwise And 1194848b8605Smrg 1195848b8605Smrg.. math:: 1196848b8605Smrg 1197848b8605Smrg dst.x = src0.x \& src1.x 1198848b8605Smrg 1199848b8605Smrg dst.y = src0.y \& src1.y 1200848b8605Smrg 1201848b8605Smrg dst.z = src0.z \& src1.z 1202848b8605Smrg 1203848b8605Smrg dst.w = src0.w \& src1.w 1204848b8605Smrg 1205848b8605Smrg 1206848b8605Smrg.. opcode:: OR - Bitwise Or 1207848b8605Smrg 1208848b8605Smrg.. math:: 1209848b8605Smrg 1210848b8605Smrg dst.x = src0.x | src1.x 1211848b8605Smrg 1212848b8605Smrg dst.y = src0.y | src1.y 1213848b8605Smrg 1214848b8605Smrg dst.z = src0.z | src1.z 1215848b8605Smrg 1216848b8605Smrg dst.w = src0.w | src1.w 1217848b8605Smrg 1218848b8605Smrg 1219848b8605Smrg.. opcode:: XOR - Bitwise Xor 1220848b8605Smrg 1221848b8605Smrg.. math:: 1222848b8605Smrg 1223848b8605Smrg dst.x = src0.x \oplus src1.x 1224848b8605Smrg 1225848b8605Smrg dst.y = src0.y \oplus src1.y 1226848b8605Smrg 1227848b8605Smrg dst.z = src0.z \oplus src1.z 1228848b8605Smrg 1229848b8605Smrg dst.w = src0.w \oplus src1.w 1230848b8605Smrg 1231848b8605Smrg 1232848b8605Smrg.. opcode:: IMAX - Maximum of Signed Integers 1233848b8605Smrg 1234848b8605Smrg.. math:: 1235848b8605Smrg 1236848b8605Smrg dst.x = max(src0.x, src1.x) 1237848b8605Smrg 1238848b8605Smrg dst.y = max(src0.y, src1.y) 1239848b8605Smrg 1240848b8605Smrg dst.z = max(src0.z, src1.z) 1241848b8605Smrg 1242848b8605Smrg dst.w = max(src0.w, src1.w) 1243848b8605Smrg 1244848b8605Smrg 1245848b8605Smrg.. opcode:: UMAX - Maximum of Unsigned Integers 1246848b8605Smrg 1247848b8605Smrg.. math:: 1248848b8605Smrg 1249848b8605Smrg dst.x = max(src0.x, src1.x) 1250848b8605Smrg 1251848b8605Smrg dst.y = max(src0.y, src1.y) 1252848b8605Smrg 1253848b8605Smrg dst.z = max(src0.z, src1.z) 1254848b8605Smrg 1255848b8605Smrg dst.w = max(src0.w, src1.w) 1256848b8605Smrg 1257848b8605Smrg 1258848b8605Smrg.. opcode:: IMIN - Minimum of Signed Integers 1259848b8605Smrg 1260848b8605Smrg.. math:: 1261848b8605Smrg 1262848b8605Smrg dst.x = min(src0.x, src1.x) 1263848b8605Smrg 1264848b8605Smrg dst.y = min(src0.y, src1.y) 1265848b8605Smrg 1266848b8605Smrg dst.z = min(src0.z, src1.z) 1267848b8605Smrg 1268848b8605Smrg dst.w = min(src0.w, src1.w) 1269848b8605Smrg 1270848b8605Smrg 1271848b8605Smrg.. opcode:: UMIN - Minimum of Unsigned Integers 1272848b8605Smrg 1273848b8605Smrg.. math:: 1274848b8605Smrg 1275848b8605Smrg dst.x = min(src0.x, src1.x) 1276848b8605Smrg 1277848b8605Smrg dst.y = min(src0.y, src1.y) 1278848b8605Smrg 1279848b8605Smrg dst.z = min(src0.z, src1.z) 1280848b8605Smrg 1281848b8605Smrg dst.w = min(src0.w, src1.w) 1282848b8605Smrg 1283848b8605Smrg 1284848b8605Smrg.. opcode:: SHL - Shift Left 1285848b8605Smrg 1286848b8605Smrg The shift count is masked with 0x1f before the shift is applied. 1287848b8605Smrg 1288848b8605Smrg.. math:: 1289848b8605Smrg 1290848b8605Smrg dst.x = src0.x << (0x1f \& src1.x) 1291848b8605Smrg 1292848b8605Smrg dst.y = src0.y << (0x1f \& src1.y) 1293848b8605Smrg 1294848b8605Smrg dst.z = src0.z << (0x1f \& src1.z) 1295848b8605Smrg 1296848b8605Smrg dst.w = src0.w << (0x1f \& src1.w) 1297848b8605Smrg 1298848b8605Smrg 1299848b8605Smrg.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer) 1300848b8605Smrg 1301848b8605Smrg The shift count is masked with 0x1f before the shift is applied. 1302848b8605Smrg 1303848b8605Smrg.. math:: 1304848b8605Smrg 1305848b8605Smrg dst.x = src0.x >> (0x1f \& src1.x) 1306848b8605Smrg 1307848b8605Smrg dst.y = src0.y >> (0x1f \& src1.y) 1308848b8605Smrg 1309848b8605Smrg dst.z = src0.z >> (0x1f \& src1.z) 1310848b8605Smrg 1311848b8605Smrg dst.w = src0.w >> (0x1f \& src1.w) 1312848b8605Smrg 1313848b8605Smrg 1314848b8605Smrg.. opcode:: USHR - Logical Shift Right 1315848b8605Smrg 1316848b8605Smrg The shift count is masked with 0x1f before the shift is applied. 1317848b8605Smrg 1318848b8605Smrg.. math:: 1319848b8605Smrg 1320848b8605Smrg dst.x = src0.x >> (unsigned) (0x1f \& src1.x) 1321848b8605Smrg 1322848b8605Smrg dst.y = src0.y >> (unsigned) (0x1f \& src1.y) 1323848b8605Smrg 1324848b8605Smrg dst.z = src0.z >> (unsigned) (0x1f \& src1.z) 1325848b8605Smrg 1326848b8605Smrg dst.w = src0.w >> (unsigned) (0x1f \& src1.w) 1327848b8605Smrg 1328848b8605Smrg 1329848b8605Smrg.. opcode:: UCMP - Integer Conditional Move 1330848b8605Smrg 1331848b8605Smrg.. math:: 1332848b8605Smrg 1333848b8605Smrg dst.x = src0.x ? src1.x : src2.x 1334848b8605Smrg 1335848b8605Smrg dst.y = src0.y ? src1.y : src2.y 1336848b8605Smrg 1337848b8605Smrg dst.z = src0.z ? src1.z : src2.z 1338848b8605Smrg 1339848b8605Smrg dst.w = src0.w ? src1.w : src2.w 1340848b8605Smrg 1341848b8605Smrg 1342848b8605Smrg 1343848b8605Smrg.. opcode:: ISSG - Integer Set Sign 1344848b8605Smrg 1345848b8605Smrg.. math:: 1346848b8605Smrg 1347848b8605Smrg dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0 1348848b8605Smrg 1349848b8605Smrg dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0 1350848b8605Smrg 1351848b8605Smrg dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0 1352848b8605Smrg 1353848b8605Smrg dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0 1354848b8605Smrg 1355848b8605Smrg 1356848b8605Smrg 1357848b8605Smrg.. opcode:: FSLT - Float Set On Less Than (ordered) 1358848b8605Smrg 1359848b8605Smrg Same comparison as SLT but returns integer instead of 1.0/0.0 float 1360848b8605Smrg 1361848b8605Smrg.. math:: 1362848b8605Smrg 1363848b8605Smrg dst.x = (src0.x < src1.x) ? \sim 0 : 0 1364848b8605Smrg 1365848b8605Smrg dst.y = (src0.y < src1.y) ? \sim 0 : 0 1366848b8605Smrg 1367848b8605Smrg dst.z = (src0.z < src1.z) ? \sim 0 : 0 1368848b8605Smrg 1369848b8605Smrg dst.w = (src0.w < src1.w) ? \sim 0 : 0 1370848b8605Smrg 1371848b8605Smrg 1372848b8605Smrg.. opcode:: ISLT - Signed Integer Set On Less Than 1373848b8605Smrg 1374848b8605Smrg.. math:: 1375848b8605Smrg 1376848b8605Smrg dst.x = (src0.x < src1.x) ? \sim 0 : 0 1377848b8605Smrg 1378848b8605Smrg dst.y = (src0.y < src1.y) ? \sim 0 : 0 1379848b8605Smrg 1380848b8605Smrg dst.z = (src0.z < src1.z) ? \sim 0 : 0 1381848b8605Smrg 1382848b8605Smrg dst.w = (src0.w < src1.w) ? \sim 0 : 0 1383848b8605Smrg 1384848b8605Smrg 1385848b8605Smrg.. opcode:: USLT - Unsigned Integer Set On Less Than 1386848b8605Smrg 1387848b8605Smrg.. math:: 1388848b8605Smrg 1389848b8605Smrg dst.x = (src0.x < src1.x) ? \sim 0 : 0 1390848b8605Smrg 1391848b8605Smrg dst.y = (src0.y < src1.y) ? \sim 0 : 0 1392848b8605Smrg 1393848b8605Smrg dst.z = (src0.z < src1.z) ? \sim 0 : 0 1394848b8605Smrg 1395848b8605Smrg dst.w = (src0.w < src1.w) ? \sim 0 : 0 1396848b8605Smrg 1397848b8605Smrg 1398848b8605Smrg.. opcode:: FSGE - Float Set On Greater Equal Than (ordered) 1399848b8605Smrg 1400848b8605Smrg Same comparison as SGE but returns integer instead of 1.0/0.0 float 1401848b8605Smrg 1402848b8605Smrg.. math:: 1403848b8605Smrg 1404848b8605Smrg dst.x = (src0.x >= src1.x) ? \sim 0 : 0 1405848b8605Smrg 1406848b8605Smrg dst.y = (src0.y >= src1.y) ? \sim 0 : 0 1407848b8605Smrg 1408848b8605Smrg dst.z = (src0.z >= src1.z) ? \sim 0 : 0 1409848b8605Smrg 1410848b8605Smrg dst.w = (src0.w >= src1.w) ? \sim 0 : 0 1411848b8605Smrg 1412848b8605Smrg 1413848b8605Smrg.. opcode:: ISGE - Signed Integer Set On Greater Equal Than 1414848b8605Smrg 1415848b8605Smrg.. math:: 1416848b8605Smrg 1417848b8605Smrg dst.x = (src0.x >= src1.x) ? \sim 0 : 0 1418848b8605Smrg 1419848b8605Smrg dst.y = (src0.y >= src1.y) ? \sim 0 : 0 1420848b8605Smrg 1421848b8605Smrg dst.z = (src0.z >= src1.z) ? \sim 0 : 0 1422848b8605Smrg 1423848b8605Smrg dst.w = (src0.w >= src1.w) ? \sim 0 : 0 1424848b8605Smrg 1425848b8605Smrg 1426848b8605Smrg.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than 1427848b8605Smrg 1428848b8605Smrg.. math:: 1429848b8605Smrg 1430848b8605Smrg dst.x = (src0.x >= src1.x) ? \sim 0 : 0 1431848b8605Smrg 1432848b8605Smrg dst.y = (src0.y >= src1.y) ? \sim 0 : 0 1433848b8605Smrg 1434848b8605Smrg dst.z = (src0.z >= src1.z) ? \sim 0 : 0 1435848b8605Smrg 1436848b8605Smrg dst.w = (src0.w >= src1.w) ? \sim 0 : 0 1437848b8605Smrg 1438848b8605Smrg 1439848b8605Smrg.. opcode:: FSEQ - Float Set On Equal (ordered) 1440848b8605Smrg 1441848b8605Smrg Same comparison as SEQ but returns integer instead of 1.0/0.0 float 1442848b8605Smrg 1443848b8605Smrg.. math:: 1444848b8605Smrg 1445848b8605Smrg dst.x = (src0.x == src1.x) ? \sim 0 : 0 1446848b8605Smrg 1447848b8605Smrg dst.y = (src0.y == src1.y) ? \sim 0 : 0 1448848b8605Smrg 1449848b8605Smrg dst.z = (src0.z == src1.z) ? \sim 0 : 0 1450848b8605Smrg 1451848b8605Smrg dst.w = (src0.w == src1.w) ? \sim 0 : 0 1452848b8605Smrg 1453848b8605Smrg 1454848b8605Smrg.. opcode:: USEQ - Integer Set On Equal 1455848b8605Smrg 1456848b8605Smrg.. math:: 1457848b8605Smrg 1458848b8605Smrg dst.x = (src0.x == src1.x) ? \sim 0 : 0 1459848b8605Smrg 1460848b8605Smrg dst.y = (src0.y == src1.y) ? \sim 0 : 0 1461848b8605Smrg 1462848b8605Smrg dst.z = (src0.z == src1.z) ? \sim 0 : 0 1463848b8605Smrg 1464848b8605Smrg dst.w = (src0.w == src1.w) ? \sim 0 : 0 1465848b8605Smrg 1466848b8605Smrg 1467848b8605Smrg.. opcode:: FSNE - Float Set On Not Equal (unordered) 1468848b8605Smrg 1469848b8605Smrg Same comparison as SNE but returns integer instead of 1.0/0.0 float 1470848b8605Smrg 1471848b8605Smrg.. math:: 1472848b8605Smrg 1473848b8605Smrg dst.x = (src0.x != src1.x) ? \sim 0 : 0 1474848b8605Smrg 1475848b8605Smrg dst.y = (src0.y != src1.y) ? \sim 0 : 0 1476848b8605Smrg 1477848b8605Smrg dst.z = (src0.z != src1.z) ? \sim 0 : 0 1478848b8605Smrg 1479848b8605Smrg dst.w = (src0.w != src1.w) ? \sim 0 : 0 1480848b8605Smrg 1481848b8605Smrg 1482848b8605Smrg.. opcode:: USNE - Integer Set On Not Equal 1483848b8605Smrg 1484848b8605Smrg.. math:: 1485848b8605Smrg 1486848b8605Smrg dst.x = (src0.x != src1.x) ? \sim 0 : 0 1487848b8605Smrg 1488848b8605Smrg dst.y = (src0.y != src1.y) ? \sim 0 : 0 1489848b8605Smrg 1490848b8605Smrg dst.z = (src0.z != src1.z) ? \sim 0 : 0 1491848b8605Smrg 1492848b8605Smrg dst.w = (src0.w != src1.w) ? \sim 0 : 0 1493848b8605Smrg 1494848b8605Smrg 1495848b8605Smrg.. opcode:: INEG - Integer Negate 1496848b8605Smrg 1497848b8605Smrg Two's complement. 1498848b8605Smrg 1499848b8605Smrg.. math:: 1500848b8605Smrg 1501848b8605Smrg dst.x = -src.x 1502848b8605Smrg 1503848b8605Smrg dst.y = -src.y 1504848b8605Smrg 1505848b8605Smrg dst.z = -src.z 1506848b8605Smrg 1507848b8605Smrg dst.w = -src.w 1508848b8605Smrg 1509848b8605Smrg 1510848b8605Smrg.. opcode:: IABS - Integer Absolute Value 1511848b8605Smrg 1512848b8605Smrg.. math:: 1513848b8605Smrg 1514848b8605Smrg dst.x = |src.x| 1515848b8605Smrg 1516848b8605Smrg dst.y = |src.y| 1517848b8605Smrg 1518848b8605Smrg dst.z = |src.z| 1519848b8605Smrg 1520848b8605Smrg dst.w = |src.w| 1521848b8605Smrg 1522848b8605SmrgBitwise ISA 1523848b8605Smrg^^^^^^^^^^^ 1524848b8605SmrgThese opcodes are used for bit-level manipulation of integers. 1525848b8605Smrg 1526848b8605Smrg.. opcode:: IBFE - Signed Bitfield Extract 1527848b8605Smrg 1528b8e80941Smrg Like GLSL bitfieldExtract. Extracts a set of bits from the input, and 1529b8e80941Smrg sign-extends them if the high bit of the extracted window is set. 1530848b8605Smrg 1531848b8605Smrg Pseudocode:: 1532848b8605Smrg 1533848b8605Smrg def ibfe(value, offset, bits): 1534b8e80941Smrg if offset < 0 or bits < 0 or offset + bits > 32: 1535b8e80941Smrg return undefined 1536848b8605Smrg if bits == 0: return 0 1537848b8605Smrg # Note: >> sign-extends 1538b8e80941Smrg return (value << (32 - offset - bits)) >> (32 - bits) 1539848b8605Smrg 1540848b8605Smrg.. opcode:: UBFE - Unsigned Bitfield Extract 1541848b8605Smrg 1542b8e80941Smrg Like GLSL bitfieldExtract. Extracts a set of bits from the input, without 1543b8e80941Smrg any sign-extension. 1544848b8605Smrg 1545848b8605Smrg Pseudocode:: 1546848b8605Smrg 1547848b8605Smrg def ubfe(value, offset, bits): 1548b8e80941Smrg if offset < 0 or bits < 0 or offset + bits > 32: 1549b8e80941Smrg return undefined 1550848b8605Smrg if bits == 0: return 0 1551848b8605Smrg # Note: >> does not sign-extend 1552b8e80941Smrg return (value << (32 - offset - bits)) >> (32 - bits) 1553848b8605Smrg 1554848b8605Smrg.. opcode:: BFI - Bitfield Insert 1555848b8605Smrg 1556b8e80941Smrg Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits 1557b8e80941Smrg of 'insert'. 1558848b8605Smrg 1559848b8605Smrg Pseudocode:: 1560848b8605Smrg 1561848b8605Smrg def bfi(base, insert, offset, bits): 1562b8e80941Smrg if offset < 0 or bits < 0 or offset + bits > 32: 1563b8e80941Smrg return undefined 1564b8e80941Smrg # << defined such that mask == ~0 when bits == 32, offset == 0 1565848b8605Smrg mask = ((1 << bits) - 1) << offset 1566848b8605Smrg return ((insert << offset) & mask) | (base & ~mask) 1567848b8605Smrg 1568848b8605Smrg.. opcode:: BREV - Bitfield Reverse 1569848b8605Smrg 1570848b8605Smrg See SM5 instruction BFREV. Reverses the bits of the argument. 1571848b8605Smrg 1572848b8605Smrg.. opcode:: POPC - Population Count 1573848b8605Smrg 1574848b8605Smrg See SM5 instruction COUNTBITS. Counts the number of set bits in the argument. 1575848b8605Smrg 1576848b8605Smrg.. opcode:: LSB - Index of lowest set bit 1577848b8605Smrg 1578848b8605Smrg See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set 1579848b8605Smrg bit of the argument. Returns -1 if none are set. 1580848b8605Smrg 1581848b8605Smrg.. opcode:: IMSB - Index of highest non-sign bit 1582848b8605Smrg 1583848b8605Smrg See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest 1584848b8605Smrg non-sign bit of the argument (i.e. highest 0 bit for negative numbers, 1585848b8605Smrg highest 1 bit for positive numbers). Returns -1 if all bits are the same 1586848b8605Smrg (i.e. for inputs 0 and -1). 1587848b8605Smrg 1588848b8605Smrg.. opcode:: UMSB - Index of highest set bit 1589848b8605Smrg 1590848b8605Smrg See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest 1591848b8605Smrg set bit of the argument. Returns -1 if none are set. 1592848b8605Smrg 1593848b8605SmrgGeometry ISA 1594848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1595848b8605Smrg 1596848b8605SmrgThese opcodes are only supported in geometry shaders; they have no meaning 1597848b8605Smrgin any other type of shader. 1598848b8605Smrg 1599848b8605Smrg.. opcode:: EMIT - Emit 1600848b8605Smrg 1601848b8605Smrg Generate a new vertex for the current primitive into the specified vertex 1602848b8605Smrg stream using the values in the output registers. 1603848b8605Smrg 1604848b8605Smrg 1605848b8605Smrg.. opcode:: ENDPRIM - End Primitive 1606848b8605Smrg 1607848b8605Smrg Complete the current primitive in the specified vertex stream (consisting of 1608848b8605Smrg the emitted vertices), and start a new one. 1609848b8605Smrg 1610848b8605Smrg 1611848b8605SmrgGLSL ISA 1612848b8605Smrg^^^^^^^^^^ 1613848b8605Smrg 1614848b8605SmrgThese opcodes are part of :term:`GLSL`'s opcode set. Support for these 1615848b8605Smrgopcodes is determined by a special capability bit, ``GLSL``. 1616b8e80941SmrgSome require glsl version 1.30 (UIF/SWITCH/CASE/DEFAULT/ENDSWITCH). 1617848b8605Smrg 1618848b8605Smrg.. opcode:: CAL - Subroutine Call 1619848b8605Smrg 1620848b8605Smrg push(pc) 1621848b8605Smrg pc = target 1622848b8605Smrg 1623848b8605Smrg 1624848b8605Smrg.. opcode:: RET - Subroutine Call Return 1625848b8605Smrg 1626848b8605Smrg pc = pop() 1627848b8605Smrg 1628848b8605Smrg 1629848b8605Smrg.. opcode:: CONT - Continue 1630848b8605Smrg 1631848b8605Smrg Unconditionally moves the point of execution to the instruction after the 1632848b8605Smrg last bgnloop. The instruction must appear within a bgnloop/endloop. 1633848b8605Smrg 1634848b8605Smrg.. note:: 1635848b8605Smrg 1636848b8605Smrg Support for CONT is determined by a special capability bit, 1637848b8605Smrg ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information. 1638848b8605Smrg 1639848b8605Smrg 1640848b8605Smrg.. opcode:: BGNLOOP - Begin a Loop 1641848b8605Smrg 1642848b8605Smrg Start a loop. Must have a matching endloop. 1643848b8605Smrg 1644848b8605Smrg 1645848b8605Smrg.. opcode:: BGNSUB - Begin Subroutine 1646848b8605Smrg 1647848b8605Smrg Starts definition of a subroutine. Must have a matching endsub. 1648848b8605Smrg 1649848b8605Smrg 1650848b8605Smrg.. opcode:: ENDLOOP - End a Loop 1651848b8605Smrg 1652848b8605Smrg End a loop started with bgnloop. 1653848b8605Smrg 1654848b8605Smrg 1655848b8605Smrg.. opcode:: ENDSUB - End Subroutine 1656848b8605Smrg 1657848b8605Smrg Ends definition of a subroutine. 1658848b8605Smrg 1659848b8605Smrg 1660848b8605Smrg.. opcode:: NOP - No Operation 1661848b8605Smrg 1662848b8605Smrg Do nothing. 1663848b8605Smrg 1664848b8605Smrg 1665848b8605Smrg.. opcode:: BRK - Break 1666848b8605Smrg 1667848b8605Smrg Unconditionally moves the point of execution to the instruction after the 1668848b8605Smrg next endloop or endswitch. The instruction must appear within a loop/endloop 1669848b8605Smrg or switch/endswitch. 1670848b8605Smrg 1671848b8605Smrg 1672848b8605Smrg.. opcode:: IF - Float If 1673848b8605Smrg 1674848b8605Smrg Start an IF ... ELSE .. ENDIF block. Condition evaluates to true if 1675848b8605Smrg 1676848b8605Smrg src0.x != 0.0 1677848b8605Smrg 1678848b8605Smrg where src0.x is interpreted as a floating point register. 1679848b8605Smrg 1680848b8605Smrg 1681848b8605Smrg.. opcode:: UIF - Bitwise If 1682848b8605Smrg 1683848b8605Smrg Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if 1684848b8605Smrg 1685848b8605Smrg src0.x != 0 1686848b8605Smrg 1687848b8605Smrg where src0.x is interpreted as an integer register. 1688848b8605Smrg 1689848b8605Smrg 1690848b8605Smrg.. opcode:: ELSE - Else 1691848b8605Smrg 1692848b8605Smrg Starts an else block, after an IF or UIF statement. 1693848b8605Smrg 1694848b8605Smrg 1695848b8605Smrg.. opcode:: ENDIF - End If 1696848b8605Smrg 1697848b8605Smrg Ends an IF or UIF block. 1698848b8605Smrg 1699848b8605Smrg 1700848b8605Smrg.. opcode:: SWITCH - Switch 1701848b8605Smrg 1702848b8605Smrg Starts a C-style switch expression. The switch consists of one or multiple 1703848b8605Smrg CASE statements, and at most one DEFAULT statement. Execution of a statement 1704848b8605Smrg ends when a BRK is hit, but just like in C falling through to other cases 1705848b8605Smrg without a break is allowed. Similarly, DEFAULT label is allowed anywhere not 1706848b8605Smrg just as last statement, and fallthrough is allowed into/from it. 1707848b8605Smrg CASE src arguments are evaluated at bit level against the SWITCH src argument. 1708848b8605Smrg 1709848b8605Smrg Example:: 1710848b8605Smrg 1711848b8605Smrg SWITCH src[0].x 1712848b8605Smrg CASE src[0].x 1713848b8605Smrg (some instructions here) 1714848b8605Smrg (optional BRK here) 1715848b8605Smrg DEFAULT 1716848b8605Smrg (some instructions here) 1717848b8605Smrg (optional BRK here) 1718848b8605Smrg CASE src[0].x 1719848b8605Smrg (some instructions here) 1720848b8605Smrg (optional BRK here) 1721848b8605Smrg ENDSWITCH 1722848b8605Smrg 1723848b8605Smrg 1724848b8605Smrg.. opcode:: CASE - Switch case 1725848b8605Smrg 1726848b8605Smrg This represents a switch case label. The src arg must be an integer immediate. 1727848b8605Smrg 1728848b8605Smrg 1729848b8605Smrg.. opcode:: DEFAULT - Switch default 1730848b8605Smrg 1731848b8605Smrg This represents the default case in the switch, which is taken if no other 1732848b8605Smrg case matches. 1733848b8605Smrg 1734848b8605Smrg 1735848b8605Smrg.. opcode:: ENDSWITCH - End of switch 1736848b8605Smrg 1737848b8605Smrg Ends a switch expression. 1738848b8605Smrg 1739848b8605Smrg 1740848b8605SmrgInterpolation ISA 1741848b8605Smrg^^^^^^^^^^^^^^^^^ 1742848b8605Smrg 1743848b8605SmrgThe interpolation instructions allow an input to be interpolated in a 1744848b8605Smrgdifferent way than its declaration. This corresponds to the GLSL 4.00 1745848b8605SmrginterpolateAt* functions. The first argument of each of these must come from 1746848b8605Smrg``TGSI_FILE_INPUT``. 1747848b8605Smrg 1748848b8605Smrg.. opcode:: INTERP_CENTROID - Interpolate at the centroid 1749848b8605Smrg 1750848b8605Smrg Interpolates the varying specified by src0 at the centroid 1751848b8605Smrg 1752848b8605Smrg.. opcode:: INTERP_SAMPLE - Interpolate at the specified sample 1753848b8605Smrg 1754848b8605Smrg Interpolates the varying specified by src0 at the sample id specified by 1755848b8605Smrg src1.x (interpreted as an integer) 1756848b8605Smrg 1757848b8605Smrg.. opcode:: INTERP_OFFSET - Interpolate at the specified offset 1758848b8605Smrg 1759848b8605Smrg Interpolates the varying specified by src0 at the offset src1.xy from the 1760848b8605Smrg pixel center (interpreted as floats) 1761848b8605Smrg 1762848b8605Smrg 1763848b8605Smrg.. _doubleopcodes: 1764848b8605Smrg 1765848b8605SmrgDouble ISA 1766848b8605Smrg^^^^^^^^^^^^^^^ 1767848b8605Smrg 1768848b8605SmrgThe double-precision opcodes reinterpret four-component vectors into 1769848b8605Smrgtwo-component vectors with doubled precision in each component. 1770848b8605Smrg 1771b8e80941Smrg.. opcode:: DABS - Absolute 1772b8e80941Smrg 1773b8e80941Smrg.. math:: 1774b8e80941Smrg 1775b8e80941Smrg dst.xy = |src0.xy| 1776b8e80941Smrg 1777b8e80941Smrg dst.zw = |src0.zw| 1778848b8605Smrg 1779848b8605Smrg.. opcode:: DADD - Add 1780848b8605Smrg 1781848b8605Smrg.. math:: 1782848b8605Smrg 1783848b8605Smrg dst.xy = src0.xy + src1.xy 1784848b8605Smrg 1785848b8605Smrg dst.zw = src0.zw + src1.zw 1786848b8605Smrg 1787b8e80941Smrg.. opcode:: DSEQ - Set on Equal 1788848b8605Smrg 1789848b8605Smrg.. math:: 1790848b8605Smrg 1791b8e80941Smrg dst.x = src0.xy == src1.xy ? \sim 0 : 0 1792848b8605Smrg 1793b8e80941Smrg dst.z = src0.zw == src1.zw ? \sim 0 : 0 1794848b8605Smrg 1795b8e80941Smrg.. opcode:: DSNE - Set on Not Equal 1796848b8605Smrg 1797848b8605Smrg.. math:: 1798848b8605Smrg 1799b8e80941Smrg dst.x = src0.xy != src1.xy ? \sim 0 : 0 1800848b8605Smrg 1801b8e80941Smrg dst.z = src0.zw != src1.zw ? \sim 0 : 0 1802848b8605Smrg 1803848b8605Smrg.. opcode:: DSLT - Set on Less than 1804848b8605Smrg 1805848b8605Smrg.. math:: 1806848b8605Smrg 1807b8e80941Smrg dst.x = src0.xy < src1.xy ? \sim 0 : 0 1808b8e80941Smrg 1809b8e80941Smrg dst.z = src0.zw < src1.zw ? \sim 0 : 0 1810b8e80941Smrg 1811b8e80941Smrg.. opcode:: DSGE - Set on Greater equal 1812b8e80941Smrg 1813b8e80941Smrg.. math:: 1814b8e80941Smrg 1815b8e80941Smrg dst.x = src0.xy >= src1.xy ? \sim 0 : 0 1816848b8605Smrg 1817b8e80941Smrg dst.z = src0.zw >= src1.zw ? \sim 0 : 0 1818848b8605Smrg 1819848b8605Smrg.. opcode:: DFRAC - Fraction 1820848b8605Smrg 1821848b8605Smrg.. math:: 1822848b8605Smrg 1823848b8605Smrg dst.xy = src.xy - \lfloor src.xy\rfloor 1824848b8605Smrg 1825848b8605Smrg dst.zw = src.zw - \lfloor src.zw\rfloor 1826848b8605Smrg 1827b8e80941Smrg.. opcode:: DTRUNC - Truncate 1828b8e80941Smrg 1829b8e80941Smrg.. math:: 1830b8e80941Smrg 1831b8e80941Smrg dst.xy = trunc(src.xy) 1832b8e80941Smrg 1833b8e80941Smrg dst.zw = trunc(src.zw) 1834b8e80941Smrg 1835b8e80941Smrg.. opcode:: DCEIL - Ceiling 1836b8e80941Smrg 1837b8e80941Smrg.. math:: 1838b8e80941Smrg 1839b8e80941Smrg dst.xy = \lceil src.xy\rceil 1840b8e80941Smrg 1841b8e80941Smrg dst.zw = \lceil src.zw\rceil 1842b8e80941Smrg 1843b8e80941Smrg.. opcode:: DFLR - Floor 1844b8e80941Smrg 1845b8e80941Smrg.. math:: 1846b8e80941Smrg 1847b8e80941Smrg dst.xy = \lfloor src.xy\rfloor 1848b8e80941Smrg 1849b8e80941Smrg dst.zw = \lfloor src.zw\rfloor 1850b8e80941Smrg 1851b8e80941Smrg.. opcode:: DROUND - Fraction 1852b8e80941Smrg 1853b8e80941Smrg.. math:: 1854b8e80941Smrg 1855b8e80941Smrg dst.xy = round(src.xy) 1856b8e80941Smrg 1857b8e80941Smrg dst.zw = round(src.zw) 1858b8e80941Smrg 1859b8e80941Smrg.. opcode:: DSSG - Set Sign 1860b8e80941Smrg 1861b8e80941Smrg.. math:: 1862b8e80941Smrg 1863b8e80941Smrg dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0 1864b8e80941Smrg 1865b8e80941Smrg dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0 1866848b8605Smrg 1867848b8605Smrg.. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components 1868848b8605Smrg 1869848b8605SmrgLike the ``frexp()`` routine in many math libraries, this opcode stores the 1870848b8605Smrgexponent of its source to ``dst0``, and the significand to ``dst1``, such that 1871b8e80941Smrg:math:`dst1 \times 2^{dst0} = src` . The results are replicated across 1872b8e80941Smrgchannels. 1873848b8605Smrg 1874848b8605Smrg.. math:: 1875848b8605Smrg 1876b8e80941Smrg dst0.xy = dst.zw = frac(src.xy) 1877848b8605Smrg 1878b8e80941Smrg dst1 = frac(src.xy) 1879848b8605Smrg 1880848b8605Smrg 1881848b8605Smrg.. opcode:: DLDEXP - Multiply Number by Integral Power of 2 1882848b8605Smrg 1883b8e80941SmrgThis opcode is the inverse of :opcode:`DFRACEXP`. The second 1884b8e80941Smrgsource is an integer. 1885848b8605Smrg 1886848b8605Smrg.. math:: 1887848b8605Smrg 1888b8e80941Smrg dst.xy = src0.xy \times 2^{src1.x} 1889848b8605Smrg 1890b8e80941Smrg dst.zw = src0.zw \times 2^{src1.z} 1891848b8605Smrg 1892848b8605Smrg.. opcode:: DMIN - Minimum 1893848b8605Smrg 1894848b8605Smrg.. math:: 1895848b8605Smrg 1896848b8605Smrg dst.xy = min(src0.xy, src1.xy) 1897848b8605Smrg 1898848b8605Smrg dst.zw = min(src0.zw, src1.zw) 1899848b8605Smrg 1900848b8605Smrg.. opcode:: DMAX - Maximum 1901848b8605Smrg 1902848b8605Smrg.. math:: 1903848b8605Smrg 1904848b8605Smrg dst.xy = max(src0.xy, src1.xy) 1905848b8605Smrg 1906848b8605Smrg dst.zw = max(src0.zw, src1.zw) 1907848b8605Smrg 1908848b8605Smrg.. opcode:: DMUL - Multiply 1909848b8605Smrg 1910848b8605Smrg.. math:: 1911848b8605Smrg 1912848b8605Smrg dst.xy = src0.xy \times src1.xy 1913848b8605Smrg 1914848b8605Smrg dst.zw = src0.zw \times src1.zw 1915848b8605Smrg 1916848b8605Smrg 1917848b8605Smrg.. opcode:: DMAD - Multiply And Add 1918848b8605Smrg 1919848b8605Smrg.. math:: 1920848b8605Smrg 1921848b8605Smrg dst.xy = src0.xy \times src1.xy + src2.xy 1922848b8605Smrg 1923848b8605Smrg dst.zw = src0.zw \times src1.zw + src2.zw 1924848b8605Smrg 1925848b8605Smrg 1926b8e80941Smrg.. opcode:: DFMA - Fused Multiply-Add 1927b8e80941Smrg 1928b8e80941SmrgPerform a * b + c with no intermediate rounding step. 1929b8e80941Smrg 1930b8e80941Smrg.. math:: 1931b8e80941Smrg 1932b8e80941Smrg dst.xy = src0.xy \times src1.xy + src2.xy 1933b8e80941Smrg 1934b8e80941Smrg dst.zw = src0.zw \times src1.zw + src2.zw 1935b8e80941Smrg 1936b8e80941Smrg 1937b8e80941Smrg.. opcode:: DDIV - Divide 1938b8e80941Smrg 1939b8e80941Smrg.. math:: 1940b8e80941Smrg 1941b8e80941Smrg dst.xy = \frac{src0.xy}{src1.xy} 1942b8e80941Smrg 1943b8e80941Smrg dst.zw = \frac{src0.zw}{src1.zw} 1944b8e80941Smrg 1945b8e80941Smrg 1946848b8605Smrg.. opcode:: DRCP - Reciprocal 1947848b8605Smrg 1948848b8605Smrg.. math:: 1949848b8605Smrg 1950848b8605Smrg dst.xy = \frac{1}{src.xy} 1951848b8605Smrg 1952848b8605Smrg dst.zw = \frac{1}{src.zw} 1953848b8605Smrg 1954848b8605Smrg.. opcode:: DSQRT - Square Root 1955848b8605Smrg 1956848b8605Smrg.. math:: 1957848b8605Smrg 1958848b8605Smrg dst.xy = \sqrt{src.xy} 1959848b8605Smrg 1960848b8605Smrg dst.zw = \sqrt{src.zw} 1961848b8605Smrg 1962b8e80941Smrg.. opcode:: DRSQ - Reciprocal Square Root 1963b8e80941Smrg 1964b8e80941Smrg.. math:: 1965b8e80941Smrg 1966b8e80941Smrg dst.xy = \frac{1}{\sqrt{src.xy}} 1967b8e80941Smrg 1968b8e80941Smrg dst.zw = \frac{1}{\sqrt{src.zw}} 1969b8e80941Smrg 1970b8e80941Smrg.. opcode:: F2D - Float to Double 1971b8e80941Smrg 1972b8e80941Smrg.. math:: 1973b8e80941Smrg 1974b8e80941Smrg dst.xy = double(src0.x) 1975b8e80941Smrg 1976b8e80941Smrg dst.zw = double(src0.y) 1977b8e80941Smrg 1978b8e80941Smrg.. opcode:: D2F - Double to Float 1979b8e80941Smrg 1980b8e80941Smrg.. math:: 1981b8e80941Smrg 1982b8e80941Smrg dst.x = float(src0.xy) 1983b8e80941Smrg 1984b8e80941Smrg dst.y = float(src0.zw) 1985b8e80941Smrg 1986b8e80941Smrg.. opcode:: I2D - Int to Double 1987b8e80941Smrg 1988b8e80941Smrg.. math:: 1989b8e80941Smrg 1990b8e80941Smrg dst.xy = double(src0.x) 1991b8e80941Smrg 1992b8e80941Smrg dst.zw = double(src0.y) 1993b8e80941Smrg 1994b8e80941Smrg.. opcode:: D2I - Double to Int 1995b8e80941Smrg 1996b8e80941Smrg.. math:: 1997b8e80941Smrg 1998b8e80941Smrg dst.x = int(src0.xy) 1999b8e80941Smrg 2000b8e80941Smrg dst.y = int(src0.zw) 2001b8e80941Smrg 2002b8e80941Smrg.. opcode:: U2D - Unsigned Int to Double 2003b8e80941Smrg 2004b8e80941Smrg.. math:: 2005b8e80941Smrg 2006b8e80941Smrg dst.xy = double(src0.x) 2007b8e80941Smrg 2008b8e80941Smrg dst.zw = double(src0.y) 2009b8e80941Smrg 2010b8e80941Smrg.. opcode:: D2U - Double to Unsigned Int 2011b8e80941Smrg 2012b8e80941Smrg.. math:: 2013b8e80941Smrg 2014b8e80941Smrg dst.x = unsigned(src0.xy) 2015b8e80941Smrg 2016b8e80941Smrg dst.y = unsigned(src0.zw) 2017b8e80941Smrg 2018b8e80941Smrg64-bit Integer ISA 2019b8e80941Smrg^^^^^^^^^^^^^^^^^^ 2020b8e80941Smrg 2021b8e80941SmrgThe 64-bit integer opcodes reinterpret four-component vectors into 2022b8e80941Smrgtwo-component vectors with 64-bits in each component. 2023b8e80941Smrg 2024b8e80941Smrg.. opcode:: I64ABS - 64-bit Integer Absolute Value 2025b8e80941Smrg 2026b8e80941Smrg.. math:: 2027b8e80941Smrg 2028b8e80941Smrg dst.xy = |src0.xy| 2029b8e80941Smrg 2030b8e80941Smrg dst.zw = |src0.zw| 2031b8e80941Smrg 2032b8e80941Smrg.. opcode:: I64NEG - 64-bit Integer Negate 2033b8e80941Smrg 2034b8e80941Smrg Two's complement. 2035b8e80941Smrg 2036b8e80941Smrg.. math:: 2037b8e80941Smrg 2038b8e80941Smrg dst.xy = -src.xy 2039b8e80941Smrg 2040b8e80941Smrg dst.zw = -src.zw 2041b8e80941Smrg 2042b8e80941Smrg.. opcode:: I64SSG - 64-bit Integer Set Sign 2043b8e80941Smrg 2044b8e80941Smrg.. math:: 2045b8e80941Smrg 2046b8e80941Smrg dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0 2047b8e80941Smrg 2048b8e80941Smrg dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0 2049b8e80941Smrg 2050b8e80941Smrg.. opcode:: U64ADD - 64-bit Integer Add 2051b8e80941Smrg 2052b8e80941Smrg.. math:: 2053b8e80941Smrg 2054b8e80941Smrg dst.xy = src0.xy + src1.xy 2055b8e80941Smrg 2056b8e80941Smrg dst.zw = src0.zw + src1.zw 2057b8e80941Smrg 2058b8e80941Smrg.. opcode:: U64MUL - 64-bit Integer Multiply 2059b8e80941Smrg 2060b8e80941Smrg.. math:: 2061b8e80941Smrg 2062b8e80941Smrg dst.xy = src0.xy * src1.xy 2063b8e80941Smrg 2064b8e80941Smrg dst.zw = src0.zw * src1.zw 2065b8e80941Smrg 2066b8e80941Smrg.. opcode:: U64SEQ - 64-bit Integer Set on Equal 2067b8e80941Smrg 2068b8e80941Smrg.. math:: 2069b8e80941Smrg 2070b8e80941Smrg dst.x = src0.xy == src1.xy ? \sim 0 : 0 2071b8e80941Smrg 2072b8e80941Smrg dst.z = src0.zw == src1.zw ? \sim 0 : 0 2073b8e80941Smrg 2074b8e80941Smrg.. opcode:: U64SNE - 64-bit Integer Set on Not Equal 2075b8e80941Smrg 2076b8e80941Smrg.. math:: 2077b8e80941Smrg 2078b8e80941Smrg dst.x = src0.xy != src1.xy ? \sim 0 : 0 2079b8e80941Smrg 2080b8e80941Smrg dst.z = src0.zw != src1.zw ? \sim 0 : 0 2081b8e80941Smrg 2082b8e80941Smrg.. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than 2083b8e80941Smrg 2084b8e80941Smrg.. math:: 2085b8e80941Smrg 2086b8e80941Smrg dst.x = src0.xy < src1.xy ? \sim 0 : 0 2087b8e80941Smrg 2088b8e80941Smrg dst.z = src0.zw < src1.zw ? \sim 0 : 0 2089b8e80941Smrg 2090b8e80941Smrg.. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal 2091b8e80941Smrg 2092b8e80941Smrg.. math:: 2093b8e80941Smrg 2094b8e80941Smrg dst.x = src0.xy >= src1.xy ? \sim 0 : 0 2095b8e80941Smrg 2096b8e80941Smrg dst.z = src0.zw >= src1.zw ? \sim 0 : 0 2097b8e80941Smrg 2098b8e80941Smrg.. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than 2099b8e80941Smrg 2100b8e80941Smrg.. math:: 2101b8e80941Smrg 2102b8e80941Smrg dst.x = src0.xy < src1.xy ? \sim 0 : 0 2103b8e80941Smrg 2104b8e80941Smrg dst.z = src0.zw < src1.zw ? \sim 0 : 0 2105b8e80941Smrg 2106b8e80941Smrg.. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal 2107b8e80941Smrg 2108b8e80941Smrg.. math:: 2109b8e80941Smrg 2110b8e80941Smrg dst.x = src0.xy >= src1.xy ? \sim 0 : 0 2111b8e80941Smrg 2112b8e80941Smrg dst.z = src0.zw >= src1.zw ? \sim 0 : 0 2113b8e80941Smrg 2114b8e80941Smrg.. opcode:: I64MIN - Minimum of 64-bit Signed Integers 2115b8e80941Smrg 2116b8e80941Smrg.. math:: 2117b8e80941Smrg 2118b8e80941Smrg dst.xy = min(src0.xy, src1.xy) 2119b8e80941Smrg 2120b8e80941Smrg dst.zw = min(src0.zw, src1.zw) 2121b8e80941Smrg 2122b8e80941Smrg.. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers 2123b8e80941Smrg 2124b8e80941Smrg.. math:: 2125b8e80941Smrg 2126b8e80941Smrg dst.xy = min(src0.xy, src1.xy) 2127b8e80941Smrg 2128b8e80941Smrg dst.zw = min(src0.zw, src1.zw) 2129b8e80941Smrg 2130b8e80941Smrg.. opcode:: I64MAX - Maximum of 64-bit Signed Integers 2131b8e80941Smrg 2132b8e80941Smrg.. math:: 2133b8e80941Smrg 2134b8e80941Smrg dst.xy = max(src0.xy, src1.xy) 2135b8e80941Smrg 2136b8e80941Smrg dst.zw = max(src0.zw, src1.zw) 2137b8e80941Smrg 2138b8e80941Smrg.. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers 2139b8e80941Smrg 2140b8e80941Smrg.. math:: 2141b8e80941Smrg 2142b8e80941Smrg dst.xy = max(src0.xy, src1.xy) 2143b8e80941Smrg 2144b8e80941Smrg dst.zw = max(src0.zw, src1.zw) 2145b8e80941Smrg 2146b8e80941Smrg.. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer 2147b8e80941Smrg 2148b8e80941Smrg The shift count is masked with 0x3f before the shift is applied. 2149b8e80941Smrg 2150b8e80941Smrg.. math:: 2151b8e80941Smrg 2152b8e80941Smrg dst.xy = src0.xy << (0x3f \& src1.x) 2153b8e80941Smrg 2154b8e80941Smrg dst.zw = src0.zw << (0x3f \& src1.y) 2155b8e80941Smrg 2156b8e80941Smrg.. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer) 2157b8e80941Smrg 2158b8e80941Smrg The shift count is masked with 0x3f before the shift is applied. 2159b8e80941Smrg 2160b8e80941Smrg.. math:: 2161b8e80941Smrg 2162b8e80941Smrg dst.xy = src0.xy >> (0x3f \& src1.x) 2163b8e80941Smrg 2164b8e80941Smrg dst.zw = src0.zw >> (0x3f \& src1.y) 2165b8e80941Smrg 2166b8e80941Smrg.. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer) 2167b8e80941Smrg 2168b8e80941Smrg The shift count is masked with 0x3f before the shift is applied. 2169b8e80941Smrg 2170b8e80941Smrg.. math:: 2171b8e80941Smrg 2172b8e80941Smrg dst.xy = src0.xy >> (unsigned) (0x3f \& src1.x) 2173b8e80941Smrg 2174b8e80941Smrg dst.zw = src0.zw >> (unsigned) (0x3f \& src1.y) 2175b8e80941Smrg 2176b8e80941Smrg.. opcode:: I64DIV - 64-bit Signed Integer Division 2177b8e80941Smrg 2178b8e80941Smrg.. math:: 2179b8e80941Smrg 2180b8e80941Smrg dst.xy = \frac{src0.xy}{src1.xy} 2181b8e80941Smrg 2182b8e80941Smrg dst.zw = \frac{src0.zw}{src1.zw} 2183b8e80941Smrg 2184b8e80941Smrg.. opcode:: U64DIV - 64-bit Unsigned Integer Division 2185b8e80941Smrg 2186b8e80941Smrg.. math:: 2187b8e80941Smrg 2188b8e80941Smrg dst.xy = \frac{src0.xy}{src1.xy} 2189b8e80941Smrg 2190b8e80941Smrg dst.zw = \frac{src0.zw}{src1.zw} 2191b8e80941Smrg 2192b8e80941Smrg.. opcode:: U64MOD - 64-bit Unsigned Integer Remainder 2193b8e80941Smrg 2194b8e80941Smrg.. math:: 2195b8e80941Smrg 2196b8e80941Smrg dst.xy = src0.xy \bmod src1.xy 2197b8e80941Smrg 2198b8e80941Smrg dst.zw = src0.zw \bmod src1.zw 2199b8e80941Smrg 2200b8e80941Smrg.. opcode:: I64MOD - 64-bit Signed Integer Remainder 2201b8e80941Smrg 2202b8e80941Smrg.. math:: 2203b8e80941Smrg 2204b8e80941Smrg dst.xy = src0.xy \bmod src1.xy 2205b8e80941Smrg 2206b8e80941Smrg dst.zw = src0.zw \bmod src1.zw 2207b8e80941Smrg 2208b8e80941Smrg.. opcode:: F2U64 - Float to 64-bit Unsigned Int 2209b8e80941Smrg 2210b8e80941Smrg.. math:: 2211b8e80941Smrg 2212b8e80941Smrg dst.xy = (uint64_t) src0.x 2213b8e80941Smrg 2214b8e80941Smrg dst.zw = (uint64_t) src0.y 2215b8e80941Smrg 2216b8e80941Smrg.. opcode:: F2I64 - Float to 64-bit Int 2217b8e80941Smrg 2218b8e80941Smrg.. math:: 2219b8e80941Smrg 2220b8e80941Smrg dst.xy = (int64_t) src0.x 2221b8e80941Smrg 2222b8e80941Smrg dst.zw = (int64_t) src0.y 2223b8e80941Smrg 2224b8e80941Smrg.. opcode:: U2I64 - Unsigned Integer to 64-bit Integer 2225b8e80941Smrg 2226b8e80941Smrg This is a zero extension. 2227b8e80941Smrg 2228b8e80941Smrg.. math:: 2229b8e80941Smrg 2230b8e80941Smrg dst.xy = (int64_t) src0.x 2231b8e80941Smrg 2232b8e80941Smrg dst.zw = (int64_t) src0.y 2233b8e80941Smrg 2234b8e80941Smrg.. opcode:: I2I64 - Signed Integer to 64-bit Integer 2235b8e80941Smrg 2236b8e80941Smrg This is a sign extension. 2237b8e80941Smrg 2238b8e80941Smrg.. math:: 2239b8e80941Smrg 2240b8e80941Smrg dst.xy = (int64_t) src0.x 2241b8e80941Smrg 2242b8e80941Smrg dst.zw = (int64_t) src0.y 2243b8e80941Smrg 2244b8e80941Smrg.. opcode:: D2U64 - Double to 64-bit Unsigned Int 2245b8e80941Smrg 2246b8e80941Smrg.. math:: 2247b8e80941Smrg 2248b8e80941Smrg dst.xy = (uint64_t) src0.xy 2249b8e80941Smrg 2250b8e80941Smrg dst.zw = (uint64_t) src0.zw 2251b8e80941Smrg 2252b8e80941Smrg.. opcode:: D2I64 - Double to 64-bit Int 2253b8e80941Smrg 2254b8e80941Smrg.. math:: 2255b8e80941Smrg 2256b8e80941Smrg dst.xy = (int64_t) src0.xy 2257b8e80941Smrg 2258b8e80941Smrg dst.zw = (int64_t) src0.zw 2259b8e80941Smrg 2260b8e80941Smrg.. opcode:: U642F - 64-bit unsigned integer to float 2261b8e80941Smrg 2262b8e80941Smrg.. math:: 2263b8e80941Smrg 2264b8e80941Smrg dst.x = (float) src0.xy 2265b8e80941Smrg 2266b8e80941Smrg dst.y = (float) src0.zw 2267b8e80941Smrg 2268b8e80941Smrg.. opcode:: I642F - 64-bit Int to Float 2269b8e80941Smrg 2270b8e80941Smrg.. math:: 2271b8e80941Smrg 2272b8e80941Smrg dst.x = (float) src0.xy 2273b8e80941Smrg 2274b8e80941Smrg dst.y = (float) src0.zw 2275b8e80941Smrg 2276b8e80941Smrg.. opcode:: U642D - 64-bit unsigned integer to double 2277b8e80941Smrg 2278b8e80941Smrg.. math:: 2279b8e80941Smrg 2280b8e80941Smrg dst.xy = (double) src0.xy 2281b8e80941Smrg 2282b8e80941Smrg dst.zw = (double) src0.zw 2283b8e80941Smrg 2284b8e80941Smrg.. opcode:: I642D - 64-bit Int to double 2285b8e80941Smrg 2286b8e80941Smrg.. math:: 2287b8e80941Smrg 2288b8e80941Smrg dst.xy = (double) src0.xy 2289b8e80941Smrg 2290b8e80941Smrg dst.zw = (double) src0.zw 2291848b8605Smrg 2292848b8605Smrg.. _samplingopcodes: 2293848b8605Smrg 2294848b8605SmrgResource Sampling Opcodes 2295848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^^ 2296848b8605Smrg 2297848b8605SmrgThose opcodes follow very closely semantics of the respective Direct3D 2298848b8605Smrginstructions. If in doubt double check Direct3D documentation. 2299848b8605SmrgNote that the swizzle on SVIEW (src1) determines texel swizzling 2300848b8605Smrgafter lookup. 2301848b8605Smrg 2302848b8605Smrg.. opcode:: SAMPLE 2303848b8605Smrg 2304848b8605Smrg Using provided address, sample data from the specified texture using the 2305b8e80941Smrg filtering mode identified by the given sampler. The source data may come from 2306848b8605Smrg any resource type other than buffers. 2307848b8605Smrg 2308848b8605Smrg Syntax: ``SAMPLE dst, address, sampler_view, sampler`` 2309848b8605Smrg 2310848b8605Smrg Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]`` 2311848b8605Smrg 2312848b8605Smrg.. opcode:: SAMPLE_I 2313848b8605Smrg 2314848b8605Smrg Simplified alternative to the SAMPLE instruction. Using the provided 2315848b8605Smrg integer address, SAMPLE_I fetches data from the specified sampler view 2316848b8605Smrg without any filtering. The source data may come from any resource type 2317848b8605Smrg other than CUBE. 2318848b8605Smrg 2319848b8605Smrg Syntax: ``SAMPLE_I dst, address, sampler_view`` 2320848b8605Smrg 2321848b8605Smrg Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]`` 2322848b8605Smrg 2323848b8605Smrg The 'address' is specified as unsigned integers. If the 'address' is out of 2324848b8605Smrg range [0...(# texels - 1)] the result of the fetch is always 0 in all 2325848b8605Smrg components. As such the instruction doesn't honor address wrap modes, in 2326848b8605Smrg cases where that behavior is desirable 'SAMPLE' instruction should be used. 2327848b8605Smrg address.w always provides an unsigned integer mipmap level. If the value is 2328848b8605Smrg out of the range then the instruction always returns 0 in all components. 2329848b8605Smrg address.yz are ignored for buffers and 1d textures. address.z is ignored 2330848b8605Smrg for 1d texture arrays and 2d textures. 2331848b8605Smrg 2332848b8605Smrg For 1D texture arrays address.y provides the array index (also as unsigned 2333848b8605Smrg integer). If the value is out of the range of available array indices 2334848b8605Smrg [0... (array size - 1)] then the opcode always returns 0 in all components. 2335848b8605Smrg For 2D texture arrays address.z provides the array index, otherwise it 2336848b8605Smrg exhibits the same behavior as in the case for 1D texture arrays. The exact 2337848b8605Smrg semantics of the source address are presented in the table below: 2338848b8605Smrg 2339848b8605Smrg +---------------------------+----+-----+-----+---------+ 2340848b8605Smrg | resource type | X | Y | Z | W | 2341848b8605Smrg +===========================+====+=====+=====+=========+ 2342848b8605Smrg | ``PIPE_BUFFER`` | x | | | ignored | 2343848b8605Smrg +---------------------------+----+-----+-----+---------+ 2344848b8605Smrg | ``PIPE_TEXTURE_1D`` | x | | | mpl | 2345848b8605Smrg +---------------------------+----+-----+-----+---------+ 2346848b8605Smrg | ``PIPE_TEXTURE_2D`` | x | y | | mpl | 2347848b8605Smrg +---------------------------+----+-----+-----+---------+ 2348848b8605Smrg | ``PIPE_TEXTURE_3D`` | x | y | z | mpl | 2349848b8605Smrg +---------------------------+----+-----+-----+---------+ 2350848b8605Smrg | ``PIPE_TEXTURE_RECT`` | x | y | | mpl | 2351848b8605Smrg +---------------------------+----+-----+-----+---------+ 2352848b8605Smrg | ``PIPE_TEXTURE_CUBE`` | not allowed as source | 2353848b8605Smrg +---------------------------+----+-----+-----+---------+ 2354848b8605Smrg | ``PIPE_TEXTURE_1D_ARRAY`` | x | idx | | mpl | 2355848b8605Smrg +---------------------------+----+-----+-----+---------+ 2356848b8605Smrg | ``PIPE_TEXTURE_2D_ARRAY`` | x | y | idx | mpl | 2357848b8605Smrg +---------------------------+----+-----+-----+---------+ 2358848b8605Smrg 2359848b8605Smrg Where 'mpl' is a mipmap level and 'idx' is the array index. 2360848b8605Smrg 2361848b8605Smrg.. opcode:: SAMPLE_I_MS 2362848b8605Smrg 2363848b8605Smrg Just like SAMPLE_I but allows fetch data from multi-sampled surfaces. 2364848b8605Smrg 2365848b8605Smrg Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample`` 2366848b8605Smrg 2367848b8605Smrg.. opcode:: SAMPLE_B 2368848b8605Smrg 2369848b8605Smrg Just like the SAMPLE instruction with the exception that an additional bias 2370848b8605Smrg is applied to the level of detail computed as part of the instruction 2371848b8605Smrg execution. 2372848b8605Smrg 2373848b8605Smrg Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias`` 2374848b8605Smrg 2375848b8605Smrg Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x`` 2376848b8605Smrg 2377848b8605Smrg.. opcode:: SAMPLE_C 2378848b8605Smrg 2379848b8605Smrg Similar to the SAMPLE instruction but it performs a comparison filter. The 2380848b8605Smrg operands to SAMPLE_C are identical to SAMPLE, except that there is an 2381848b8605Smrg additional float32 operand, reference value, which must be a register with 2382848b8605Smrg single-component, or a scalar literal. SAMPLE_C makes the hardware use the 2383848b8605Smrg current samplers compare_func (in pipe_sampler_state) to compare reference 2384848b8605Smrg value against the red component value for the surce resource at each texel 2385848b8605Smrg that the currently configured texture filter covers based on the provided 2386848b8605Smrg coordinates. 2387848b8605Smrg 2388848b8605Smrg Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value`` 2389848b8605Smrg 2390848b8605Smrg Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x`` 2391848b8605Smrg 2392848b8605Smrg.. opcode:: SAMPLE_C_LZ 2393848b8605Smrg 2394848b8605Smrg Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands 2395848b8605Smrg for level-zero. 2396848b8605Smrg 2397848b8605Smrg Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value`` 2398848b8605Smrg 2399848b8605Smrg Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x`` 2400848b8605Smrg 2401848b8605Smrg 2402848b8605Smrg.. opcode:: SAMPLE_D 2403848b8605Smrg 2404848b8605Smrg SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for 2405848b8605Smrg the source address in the x direction and the y direction are provided by 2406848b8605Smrg extra parameters. 2407848b8605Smrg 2408848b8605Smrg Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y`` 2409848b8605Smrg 2410848b8605Smrg Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]`` 2411848b8605Smrg 2412848b8605Smrg.. opcode:: SAMPLE_L 2413848b8605Smrg 2414848b8605Smrg SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided 2415848b8605Smrg directly as a scalar value, representing no anisotropy. 2416848b8605Smrg 2417848b8605Smrg Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod`` 2418848b8605Smrg 2419848b8605Smrg Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x`` 2420848b8605Smrg 2421848b8605Smrg.. opcode:: GATHER4 2422848b8605Smrg 2423848b8605Smrg Gathers the four texels to be used in a bi-linear filtering operation and 2424848b8605Smrg packs them into a single register. Only works with 2D, 2D array, cubemaps, 2425848b8605Smrg and cubemaps arrays. For 2D textures, only the addressing modes of the 2426848b8605Smrg sampler and the top level of any mip pyramid are used. Set W to zero. It 2427848b8605Smrg behaves like the SAMPLE instruction, but a filtered sample is not 2428848b8605Smrg generated. The four samples that contribute to filtering are placed into 2429848b8605Smrg xyzw in counter-clockwise order, starting with the (u,v) texture coordinate 2430848b8605Smrg delta at the following locations (-, +), (+, +), (+, -), (-, -), where the 2431848b8605Smrg magnitude of the deltas are half a texel. 2432848b8605Smrg 2433848b8605Smrg 2434848b8605Smrg.. opcode:: SVIEWINFO 2435848b8605Smrg 2436848b8605Smrg Query the dimensions of a given sampler view. dst receives width, height, 2437848b8605Smrg depth or array size and number of mipmap levels as int4. The dst can have a 2438848b8605Smrg writemask which will specify what info is the caller interested in. 2439848b8605Smrg 2440848b8605Smrg Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view`` 2441848b8605Smrg 2442848b8605Smrg Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]`` 2443848b8605Smrg 2444848b8605Smrg src_mip_level is an unsigned integer scalar. If it's out of range then 2445848b8605Smrg returns 0 for width, height and depth/array size but the total number of 2446848b8605Smrg mipmap is still returned correctly for the given sampler view. The returned 2447848b8605Smrg width, height and depth values are for the mipmap level selected by the 2448848b8605Smrg src_mip_level and are in the number of texels. For 1d texture array width 2449848b8605Smrg is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is 2450848b8605Smrg still in dst.w. In contrast to d3d10 resinfo, there's no way in the tgsi 2451848b8605Smrg instruction encoding to specify the return type (float/rcpfloat/uint), hence 2452848b8605Smrg always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1 2453848b8605Smrg resinfo allowing swizzling dst values is ignored (due to the interaction 2454848b8605Smrg with rcpfloat modifier which requires some swizzle handling in the state 2455848b8605Smrg tracker anyway). 2456848b8605Smrg 2457848b8605Smrg.. opcode:: SAMPLE_POS 2458848b8605Smrg 2459b8e80941Smrg Query the position of a sample in the given resource or render target 2460b8e80941Smrg when per-sample fragment shading is in effect. 2461b8e80941Smrg 2462b8e80941Smrg Syntax: ``SAMPLE_POS dst, source, sample_index`` 2463b8e80941Smrg 2464b8e80941Smrg dst receives float4 (x, y, undef, undef) indicated where the sample is 2465b8e80941Smrg located. Sample locations are in the range [0, 1] where 0.5 is the center 2466b8e80941Smrg of the fragment. 2467b8e80941Smrg 2468b8e80941Smrg source is either a sampler view (to indicate a shader resource) or temp 2469b8e80941Smrg register (to indicate the render target). The source register may have 2470b8e80941Smrg an optional swizzle to apply to the returned result 2471b8e80941Smrg 2472b8e80941Smrg sample_index is an integer scalar indicating which sample position is to 2473b8e80941Smrg be queried. 2474b8e80941Smrg 2475b8e80941Smrg If per-sample shading is not in effect or the source resource or render 2476b8e80941Smrg target is not multisampled, the result is (0.5, 0.5, undef, undef). 2477b8e80941Smrg 2478b8e80941Smrg NOTE: no driver has implemented this opcode yet (and no state tracker 2479b8e80941Smrg emits it). This information is subject to change. 2480848b8605Smrg 2481848b8605Smrg.. opcode:: SAMPLE_INFO 2482848b8605Smrg 2483b8e80941Smrg Query the number of samples in a multisampled resource or render target. 2484b8e80941Smrg 2485b8e80941Smrg Syntax: ``SAMPLE_INFO dst, source`` 2486b8e80941Smrg 2487b8e80941Smrg dst receives int4 (n, 0, 0, 0) where n is the number of samples in a 2488b8e80941Smrg resource or the render target. 2489b8e80941Smrg 2490b8e80941Smrg source is either a sampler view (to indicate a shader resource) or temp 2491b8e80941Smrg register (to indicate the render target). The source register may have 2492b8e80941Smrg an optional swizzle to apply to the returned result 2493b8e80941Smrg 2494b8e80941Smrg If per-sample shading is not in effect or the source resource or render 2495b8e80941Smrg target is not multisampled, the result is (1, 0, 0, 0). 2496b8e80941Smrg 2497b8e80941Smrg NOTE: no driver has implemented this opcode yet (and no state tracker 2498b8e80941Smrg emits it). This information is subject to change. 2499b8e80941Smrg 2500b8e80941Smrg.. opcode:: LOD - level of detail 2501b8e80941Smrg 2502b8e80941Smrg Same syntax as the SAMPLE opcode but instead of performing an actual 2503b8e80941Smrg texture lookup/filter, return the computed LOD information that the 2504b8e80941Smrg texture pipe would use to access the texture. The Y component contains 2505b8e80941Smrg the computed LOD lambda_prime. The X component contains the LOD that will 2506b8e80941Smrg be accessed, based on min/max lod's and mipmap filters. 2507b8e80941Smrg The Z and W components are set to 0. 2508b8e80941Smrg 2509b8e80941Smrg Syntax: ``LOD dst, address, sampler_view, sampler`` 2510848b8605Smrg 2511848b8605Smrg 2512848b8605Smrg.. _resourceopcodes: 2513848b8605Smrg 2514848b8605SmrgResource Access Opcodes 2515848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^ 2516848b8605Smrg 2517b8e80941SmrgFor these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY. 2518b8e80941Smrg 2519b8e80941Smrg.. opcode:: LOAD - Fetch data from a shader buffer or image 2520848b8605Smrg 2521848b8605Smrg Syntax: ``LOAD dst, resource, address`` 2522848b8605Smrg 2523b8e80941Smrg Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]`` 2524848b8605Smrg 2525848b8605Smrg Using the provided integer address, LOAD fetches data 2526848b8605Smrg from the specified buffer or texture without any 2527848b8605Smrg filtering. 2528848b8605Smrg 2529848b8605Smrg The 'address' is specified as a vector of unsigned 2530848b8605Smrg integers. If the 'address' is out of range the result 2531848b8605Smrg is unspecified. 2532848b8605Smrg 2533848b8605Smrg Only the first mipmap level of a resource can be read 2534848b8605Smrg from using this instruction. 2535848b8605Smrg 2536848b8605Smrg For 1D or 2D texture arrays, the array index is 2537848b8605Smrg provided as an unsigned integer in address.y or 2538848b8605Smrg address.z, respectively. address.yz are ignored for 2539848b8605Smrg buffers and 1D textures. address.z is ignored for 1D 2540848b8605Smrg texture arrays and 2D textures. address.w is always 2541848b8605Smrg ignored. 2542848b8605Smrg 2543b8e80941Smrg A swizzle suffix may be added to the resource argument 2544b8e80941Smrg this will cause the resource data to be swizzled accordingly. 2545b8e80941Smrg 2546848b8605Smrg.. opcode:: STORE - Write data to a shader resource 2547848b8605Smrg 2548848b8605Smrg Syntax: ``STORE resource, address, src`` 2549848b8605Smrg 2550b8e80941Smrg Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]`` 2551848b8605Smrg 2552848b8605Smrg Using the provided integer address, STORE writes data 2553848b8605Smrg to the specified buffer or texture. 2554848b8605Smrg 2555848b8605Smrg The 'address' is specified as a vector of unsigned 2556848b8605Smrg integers. If the 'address' is out of range the result 2557848b8605Smrg is unspecified. 2558848b8605Smrg 2559848b8605Smrg Only the first mipmap level of a resource can be 2560848b8605Smrg written to using this instruction. 2561848b8605Smrg 2562848b8605Smrg For 1D or 2D texture arrays, the array index is 2563848b8605Smrg provided as an unsigned integer in address.y or 2564848b8605Smrg address.z, respectively. address.yz are ignored for 2565848b8605Smrg buffers and 1D textures. address.z is ignored for 1D 2566848b8605Smrg texture arrays and 2D textures. address.w is always 2567848b8605Smrg ignored. 2568848b8605Smrg 2569b8e80941Smrg.. opcode:: RESQ - Query information about a resource 2570848b8605Smrg 2571b8e80941Smrg Syntax: ``RESQ dst, resource`` 2572848b8605Smrg 2573b8e80941Smrg Example: ``RESQ TEMP[0], BUFFER[0]`` 2574848b8605Smrg 2575b8e80941Smrg Returns information about the buffer or image resource. For buffer 2576b8e80941Smrg resources, the size (in bytes) is returned in the x component. For 2577b8e80941Smrg image resources, .xyz will contain the width/height/layers of the 2578b8e80941Smrg image, while .w will contain the number of samples for multi-sampled 2579b8e80941Smrg images. 2580b8e80941Smrg 2581b8e80941Smrg.. opcode:: FBFETCH - Load data from framebuffer 2582b8e80941Smrg 2583b8e80941Smrg Syntax: ``FBFETCH dst, output`` 2584b8e80941Smrg 2585b8e80941Smrg Example: ``FBFETCH TEMP[0], OUT[0]`` 2586848b8605Smrg 2587b8e80941Smrg This is only valid on ``COLOR`` semantic outputs. Returns the color 2588b8e80941Smrg of the current position in the framebuffer from before this fragment 2589b8e80941Smrg shader invocation. May return the same value from multiple calls for 2590b8e80941Smrg a particular output within a single invocation. Note that result may 2591b8e80941Smrg be undefined if a fragment is drawn multiple times without a blend 2592b8e80941Smrg barrier in between. 2593848b8605Smrg 2594848b8605Smrg 2595b8e80941Smrg.. _bindlessopcodes: 2596848b8605Smrg 2597b8e80941SmrgBindless Opcodes 2598b8e80941Smrg^^^^^^^^^^^^^^^^ 2599848b8605Smrg 2600b8e80941SmrgThese opcodes are for working with bindless sampler or image handles and 2601b8e80941Smrgrequire PIPE_CAP_BINDLESS_TEXTURE. 2602848b8605Smrg 2603b8e80941Smrg.. opcode:: IMG2HND - Get a bindless handle for a image 2604848b8605Smrg 2605b8e80941Smrg Syntax: ``IMG2HND dst, image`` 2606848b8605Smrg 2607b8e80941Smrg Example: ``IMG2HND TEMP[0], IMAGE[0]`` 2608848b8605Smrg 2609b8e80941Smrg Sets 'dst' to a bindless handle for 'image'. 2610848b8605Smrg 2611b8e80941Smrg.. opcode:: SAMP2HND - Get a bindless handle for a sampler 2612848b8605Smrg 2613b8e80941Smrg Syntax: ``SAMP2HND dst, sampler`` 2614848b8605Smrg 2615b8e80941Smrg Example: ``SAMP2HND TEMP[0], SAMP[0]`` 2616848b8605Smrg 2617b8e80941Smrg Sets 'dst' to a bindless handle for 'sampler'. 2618848b8605Smrg 2619848b8605Smrg 2620b8e80941Smrg.. _threadsyncopcodes: 2621b8e80941Smrg 2622b8e80941SmrgInter-thread synchronization opcodes 2623b8e80941Smrg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2624b8e80941Smrg 2625b8e80941SmrgThese opcodes are intended for communication between threads running 2626b8e80941Smrgwithin the same compute grid. For now they're only valid in compute 2627b8e80941Smrgprograms. 2628848b8605Smrg 2629848b8605Smrg.. opcode:: BARRIER - Thread group barrier 2630848b8605Smrg 2631848b8605Smrg ``BARRIER`` 2632848b8605Smrg 2633848b8605Smrg This opcode suspends the execution of the current thread until all 2634848b8605Smrg the remaining threads in the working group reach the same point of 2635848b8605Smrg the program. Results are unspecified if any of the remaining 2636848b8605Smrg threads terminates or never reaches an executed BARRIER instruction. 2637848b8605Smrg 2638b8e80941Smrg.. opcode:: MEMBAR - Memory barrier 2639b8e80941Smrg 2640b8e80941Smrg ``MEMBAR type`` 2641b8e80941Smrg 2642b8e80941Smrg This opcode waits for the completion of all memory accesses based on 2643b8e80941Smrg the type passed in. The type is an immediate bitfield with the following 2644b8e80941Smrg meaning: 2645b8e80941Smrg 2646b8e80941Smrg Bit 0: Shader storage buffers 2647b8e80941Smrg Bit 1: Atomic buffers 2648b8e80941Smrg Bit 2: Images 2649b8e80941Smrg Bit 3: Shared memory 2650b8e80941Smrg Bit 4: Thread group 2651b8e80941Smrg 2652b8e80941Smrg These may be passed in in any combination. An implementation is free to not 2653b8e80941Smrg distinguish between these as it sees fit. However these map to all the 2654b8e80941Smrg possibilities made available by GLSL. 2655848b8605Smrg 2656848b8605Smrg.. _atomopcodes: 2657848b8605Smrg 2658848b8605SmrgAtomic opcodes 2659848b8605Smrg^^^^^^^^^^^^^^ 2660848b8605Smrg 2661848b8605SmrgThese opcodes provide atomic variants of some common arithmetic and 2662848b8605Smrglogical operations. In this context atomicity means that another 2663848b8605Smrgconcurrent memory access operation that affects the same memory 2664848b8605Smrglocation is guaranteed to be performed strictly before or after the 2665b8e80941Smrgentire execution of the atomic operation. The resource may be a BUFFER, 2666b8e80941SmrgIMAGE, HWATOMIC, or MEMORY. In the case of an image, the offset works 2667b8e80941Smrgthe same as for ``LOAD`` and ``STORE``, specified above. For atomic 2668b8e80941Smrgcounters, the offset is an immediate index to the base hw atomic 2669b8e80941Smrgcounter for this operation. 2670b8e80941SmrgThese atomic operations may only be used with 32-bit integer image formats. 2671848b8605Smrg 2672848b8605Smrg.. opcode:: ATOMUADD - Atomic integer addition 2673848b8605Smrg 2674848b8605Smrg Syntax: ``ATOMUADD dst, resource, offset, src`` 2675848b8605Smrg 2676b8e80941Smrg Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2677b8e80941Smrg 2678b8e80941Smrg The following operation is performed atomically: 2679b8e80941Smrg 2680b8e80941Smrg.. math:: 2681b8e80941Smrg 2682b8e80941Smrg dst_x = resource[offset] 2683b8e80941Smrg 2684b8e80941Smrg resource[offset] = dst_x + src_x 2685b8e80941Smrg 2686848b8605Smrg 2687b8e80941Smrg.. opcode:: ATOMFADD - Atomic floating point addition 2688b8e80941Smrg 2689b8e80941Smrg Syntax: ``ATOMFADD dst, resource, offset, src`` 2690b8e80941Smrg 2691b8e80941Smrg Example: ``ATOMFADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2692b8e80941Smrg 2693b8e80941Smrg The following operation is performed atomically: 2694848b8605Smrg 2695848b8605Smrg.. math:: 2696848b8605Smrg 2697b8e80941Smrg dst_x = resource[offset] 2698848b8605Smrg 2699b8e80941Smrg resource[offset] = dst_x + src_x 2700848b8605Smrg 2701848b8605Smrg 2702848b8605Smrg.. opcode:: ATOMXCHG - Atomic exchange 2703848b8605Smrg 2704848b8605Smrg Syntax: ``ATOMXCHG dst, resource, offset, src`` 2705848b8605Smrg 2706b8e80941Smrg Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2707848b8605Smrg 2708b8e80941Smrg The following operation is performed atomically: 2709848b8605Smrg 2710848b8605Smrg.. math:: 2711848b8605Smrg 2712b8e80941Smrg dst_x = resource[offset] 2713848b8605Smrg 2714b8e80941Smrg resource[offset] = src_x 2715848b8605Smrg 2716848b8605Smrg 2717848b8605Smrg.. opcode:: ATOMCAS - Atomic compare-and-exchange 2718848b8605Smrg 2719848b8605Smrg Syntax: ``ATOMCAS dst, resource, offset, cmp, src`` 2720848b8605Smrg 2721b8e80941Smrg Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]`` 2722848b8605Smrg 2723b8e80941Smrg The following operation is performed atomically: 2724848b8605Smrg 2725848b8605Smrg.. math:: 2726848b8605Smrg 2727b8e80941Smrg dst_x = resource[offset] 2728848b8605Smrg 2729b8e80941Smrg resource[offset] = (dst_x == cmp_x ? src_x : dst_x) 2730848b8605Smrg 2731848b8605Smrg 2732848b8605Smrg.. opcode:: ATOMAND - Atomic bitwise And 2733848b8605Smrg 2734848b8605Smrg Syntax: ``ATOMAND dst, resource, offset, src`` 2735848b8605Smrg 2736b8e80941Smrg Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2737848b8605Smrg 2738b8e80941Smrg The following operation is performed atomically: 2739848b8605Smrg 2740848b8605Smrg.. math:: 2741848b8605Smrg 2742b8e80941Smrg dst_x = resource[offset] 2743848b8605Smrg 2744b8e80941Smrg resource[offset] = dst_x \& src_x 2745848b8605Smrg 2746848b8605Smrg 2747848b8605Smrg.. opcode:: ATOMOR - Atomic bitwise Or 2748848b8605Smrg 2749848b8605Smrg Syntax: ``ATOMOR dst, resource, offset, src`` 2750848b8605Smrg 2751b8e80941Smrg Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2752848b8605Smrg 2753b8e80941Smrg The following operation is performed atomically: 2754848b8605Smrg 2755848b8605Smrg.. math:: 2756848b8605Smrg 2757b8e80941Smrg dst_x = resource[offset] 2758848b8605Smrg 2759b8e80941Smrg resource[offset] = dst_x | src_x 2760848b8605Smrg 2761848b8605Smrg 2762848b8605Smrg.. opcode:: ATOMXOR - Atomic bitwise Xor 2763848b8605Smrg 2764848b8605Smrg Syntax: ``ATOMXOR dst, resource, offset, src`` 2765848b8605Smrg 2766b8e80941Smrg Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2767848b8605Smrg 2768b8e80941Smrg The following operation is performed atomically: 2769848b8605Smrg 2770848b8605Smrg.. math:: 2771848b8605Smrg 2772b8e80941Smrg dst_x = resource[offset] 2773848b8605Smrg 2774b8e80941Smrg resource[offset] = dst_x \oplus src_x 2775848b8605Smrg 2776848b8605Smrg 2777848b8605Smrg.. opcode:: ATOMUMIN - Atomic unsigned minimum 2778848b8605Smrg 2779848b8605Smrg Syntax: ``ATOMUMIN dst, resource, offset, src`` 2780848b8605Smrg 2781b8e80941Smrg Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2782848b8605Smrg 2783b8e80941Smrg The following operation is performed atomically: 2784848b8605Smrg 2785848b8605Smrg.. math:: 2786848b8605Smrg 2787b8e80941Smrg dst_x = resource[offset] 2788848b8605Smrg 2789b8e80941Smrg resource[offset] = (dst_x < src_x ? dst_x : src_x) 2790848b8605Smrg 2791848b8605Smrg 2792848b8605Smrg.. opcode:: ATOMUMAX - Atomic unsigned maximum 2793848b8605Smrg 2794848b8605Smrg Syntax: ``ATOMUMAX dst, resource, offset, src`` 2795848b8605Smrg 2796b8e80941Smrg Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2797848b8605Smrg 2798b8e80941Smrg The following operation is performed atomically: 2799848b8605Smrg 2800848b8605Smrg.. math:: 2801848b8605Smrg 2802b8e80941Smrg dst_x = resource[offset] 2803848b8605Smrg 2804b8e80941Smrg resource[offset] = (dst_x > src_x ? dst_x : src_x) 2805848b8605Smrg 2806848b8605Smrg 2807848b8605Smrg.. opcode:: ATOMIMIN - Atomic signed minimum 2808848b8605Smrg 2809848b8605Smrg Syntax: ``ATOMIMIN dst, resource, offset, src`` 2810848b8605Smrg 2811b8e80941Smrg Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2812848b8605Smrg 2813b8e80941Smrg The following operation is performed atomically: 2814848b8605Smrg 2815848b8605Smrg.. math:: 2816848b8605Smrg 2817b8e80941Smrg dst_x = resource[offset] 2818848b8605Smrg 2819b8e80941Smrg resource[offset] = (dst_x < src_x ? dst_x : src_x) 2820848b8605Smrg 2821848b8605Smrg 2822848b8605Smrg.. opcode:: ATOMIMAX - Atomic signed maximum 2823848b8605Smrg 2824848b8605Smrg Syntax: ``ATOMIMAX dst, resource, offset, src`` 2825848b8605Smrg 2826b8e80941Smrg Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]`` 2827848b8605Smrg 2828b8e80941Smrg The following operation is performed atomically: 2829848b8605Smrg 2830848b8605Smrg.. math:: 2831848b8605Smrg 2832b8e80941Smrg dst_x = resource[offset] 2833b8e80941Smrg 2834b8e80941Smrg resource[offset] = (dst_x > src_x ? dst_x : src_x) 2835b8e80941Smrg 2836b8e80941Smrg 2837b8e80941Smrg.. _interlaneopcodes: 2838b8e80941Smrg 2839b8e80941SmrgInter-lane opcodes 2840b8e80941Smrg^^^^^^^^^^^^^^^^^^ 2841b8e80941Smrg 2842b8e80941SmrgThese opcodes reduce the given value across the shader invocations 2843b8e80941Smrgrunning in the current SIMD group. Every thread in the subgroup will receive 2844b8e80941Smrgthe same result. The BALLOT operations accept a single-channel argument that 2845b8e80941Smrgis treated as a boolean and produce a 64-bit value. 2846b8e80941Smrg 2847b8e80941Smrg.. opcode:: VOTE_ANY - Value is set in any of the active invocations 2848b8e80941Smrg 2849b8e80941Smrg Syntax: ``VOTE_ANY dst, value`` 2850b8e80941Smrg 2851b8e80941Smrg Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x`` 2852b8e80941Smrg 2853b8e80941Smrg 2854b8e80941Smrg.. opcode:: VOTE_ALL - Value is set in all of the active invocations 2855b8e80941Smrg 2856b8e80941Smrg Syntax: ``VOTE_ALL dst, value`` 2857b8e80941Smrg 2858b8e80941Smrg Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x`` 2859b8e80941Smrg 2860b8e80941Smrg 2861b8e80941Smrg.. opcode:: VOTE_EQ - Value is the same in all of the active invocations 2862b8e80941Smrg 2863b8e80941Smrg Syntax: ``VOTE_EQ dst, value`` 2864b8e80941Smrg 2865b8e80941Smrg Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x`` 2866b8e80941Smrg 2867b8e80941Smrg 2868b8e80941Smrg.. opcode:: BALLOT - Lanemask of whether the value is set in each active 2869b8e80941Smrg invocation 2870b8e80941Smrg 2871b8e80941Smrg Syntax: ``BALLOT dst, value`` 2872b8e80941Smrg 2873b8e80941Smrg Example: ``BALLOT TEMP[0].xy, TEMP[1].x`` 2874b8e80941Smrg 2875b8e80941Smrg When the argument is a constant true, this produces a bitmask of active 2876b8e80941Smrg invocations. In fragment shaders, this can include helper invocations 2877b8e80941Smrg (invocations whose outputs and writes to memory are discarded, but which 2878b8e80941Smrg are used to compute derivatives). 2879b8e80941Smrg 2880b8e80941Smrg 2881b8e80941Smrg.. opcode:: READ_FIRST - Broadcast the value from the first active 2882b8e80941Smrg invocation to all active lanes 2883848b8605Smrg 2884b8e80941Smrg Syntax: ``READ_FIRST dst, value`` 2885848b8605Smrg 2886b8e80941Smrg Example: ``READ_FIRST TEMP[0], TEMP[1]`` 2887b8e80941Smrg 2888b8e80941Smrg 2889b8e80941Smrg.. opcode:: READ_INVOC - Retrieve the value from the given invocation 2890b8e80941Smrg (need not be uniform) 2891b8e80941Smrg 2892b8e80941Smrg Syntax: ``READ_INVOC dst, value, invocation`` 2893b8e80941Smrg 2894b8e80941Smrg Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x`` 2895b8e80941Smrg 2896b8e80941Smrg invocation.x controls the invocation number to read from for all channels. 2897b8e80941Smrg The invocation number must be the same across all active invocations in a 2898b8e80941Smrg sub-group; otherwise, the results are undefined. 2899848b8605Smrg 2900848b8605Smrg 2901848b8605SmrgExplanation of symbols used 2902848b8605Smrg------------------------------ 2903848b8605Smrg 2904848b8605Smrg 2905848b8605SmrgFunctions 2906848b8605Smrg^^^^^^^^^^^^^^ 2907848b8605Smrg 2908848b8605Smrg 2909848b8605Smrg :math:`|x|` Absolute value of `x`. 2910848b8605Smrg 2911848b8605Smrg :math:`\lceil x \rceil` Ceiling of `x`. 2912848b8605Smrg 2913848b8605Smrg clamp(x,y,z) Clamp x between y and z. 2914848b8605Smrg (x < y) ? y : (x > z) ? z : x 2915848b8605Smrg 2916848b8605Smrg :math:`\lfloor x\rfloor` Floor of `x`. 2917848b8605Smrg 2918848b8605Smrg :math:`\log_2{x}` Logarithm of `x`, base 2. 2919848b8605Smrg 2920848b8605Smrg max(x,y) Maximum of x and y. 2921848b8605Smrg (x > y) ? x : y 2922848b8605Smrg 2923848b8605Smrg min(x,y) Minimum of x and y. 2924848b8605Smrg (x < y) ? x : y 2925848b8605Smrg 2926848b8605Smrg partialx(x) Derivative of x relative to fragment's X. 2927848b8605Smrg 2928848b8605Smrg partialy(x) Derivative of x relative to fragment's Y. 2929848b8605Smrg 2930848b8605Smrg pop() Pop from stack. 2931848b8605Smrg 2932848b8605Smrg :math:`x^y` `x` to the power `y`. 2933848b8605Smrg 2934848b8605Smrg push(x) Push x on stack. 2935848b8605Smrg 2936848b8605Smrg round(x) Round x. 2937848b8605Smrg 2938848b8605Smrg trunc(x) Truncate x, i.e. drop the fraction bits. 2939848b8605Smrg 2940848b8605Smrg 2941848b8605SmrgKeywords 2942848b8605Smrg^^^^^^^^^^^^^ 2943848b8605Smrg 2944848b8605Smrg 2945848b8605Smrg discard Discard fragment. 2946848b8605Smrg 2947848b8605Smrg pc Program counter. 2948848b8605Smrg 2949848b8605Smrg target Label of target instruction. 2950848b8605Smrg 2951848b8605Smrg 2952848b8605SmrgOther tokens 2953848b8605Smrg--------------- 2954848b8605Smrg 2955848b8605Smrg 2956848b8605SmrgDeclaration 2957848b8605Smrg^^^^^^^^^^^ 2958848b8605Smrg 2959848b8605Smrg 2960848b8605SmrgDeclares a register that is will be referenced as an operand in Instruction 2961848b8605Smrgtokens. 2962848b8605Smrg 2963848b8605SmrgFile field contains register file that is being declared and is one 2964848b8605Smrgof TGSI_FILE. 2965848b8605Smrg 2966848b8605SmrgUsageMask field specifies which of the register components can be accessed 2967848b8605Smrgand is one of TGSI_WRITEMASK. 2968848b8605Smrg 2969848b8605SmrgThe Local flag specifies that a given value isn't intended for 2970848b8605Smrgsubroutine parameter passing and, as a result, the implementation 2971848b8605Smrgisn't required to give any guarantees of it being preserved across 2972848b8605Smrgsubroutine boundaries. As it's merely a compiler hint, the 2973848b8605Smrgimplementation is free to ignore it. 2974848b8605Smrg 2975848b8605SmrgIf Dimension flag is set to 1, a Declaration Dimension token follows. 2976848b8605Smrg 2977848b8605SmrgIf Semantic flag is set to 1, a Declaration Semantic token follows. 2978848b8605Smrg 2979848b8605SmrgIf Interpolate flag is set to 1, a Declaration Interpolate token follows. 2980848b8605Smrg 2981848b8605SmrgIf file is TGSI_FILE_RESOURCE, a Declaration Resource token follows. 2982848b8605Smrg 2983848b8605SmrgIf Array flag is set to 1, a Declaration Array token follows. 2984848b8605Smrg 2985848b8605SmrgArray Declaration 2986848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^ 2987848b8605Smrg 2988848b8605SmrgDeclarations can optional have an ArrayID attribute which can be referred by 2989b8e80941Smrgindirect addressing operands. An ArrayID of zero is reserved and treated as 2990848b8605Smrgif no ArrayID is specified. 2991848b8605Smrg 2992848b8605SmrgIf an indirect addressing operand refers to a specific declaration by using 2993848b8605Smrgan ArrayID only the registers in this declaration are guaranteed to be 2994848b8605Smrgaccessed, accessing any register outside this declaration results in undefined 2995848b8605Smrgbehavior. Note that for compatibility the effective index is zero-based and 2996848b8605Smrgnot relative to the specified declaration 2997848b8605Smrg 2998848b8605SmrgIf no ArrayID is specified with an indirect addressing operand the whole 2999848b8605Smrgregister file might be accessed by this operand. This is strongly discouraged 3000848b8605Smrgand will prevent packing of scalar/vec2 arrays and effective alias analysis. 3001b8e80941SmrgThis is only legal for TEMP and CONST register files. 3002848b8605Smrg 3003848b8605SmrgDeclaration Semantic 3004848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^ 3005848b8605Smrg 3006848b8605SmrgVertex and fragment shader input and output registers may be labeled 3007848b8605Smrgwith semantic information consisting of a name and index. 3008848b8605Smrg 3009848b8605SmrgFollows Declaration token if Semantic bit is set. 3010848b8605Smrg 3011848b8605SmrgSince its purpose is to link a shader with other stages of the pipeline, 3012848b8605Smrgit is valid to follow only those Declaration tokens that declare a register 3013848b8605Smrgeither in INPUT or OUTPUT file. 3014848b8605Smrg 3015848b8605SmrgSemanticName field contains the semantic name of the register being declared. 3016848b8605SmrgThere is no default value. 3017848b8605Smrg 3018848b8605SmrgSemanticIndex is an optional subscript that can be used to distinguish 3019848b8605Smrgdifferent register declarations with the same semantic name. The default value 3020848b8605Smrgis 0. 3021848b8605Smrg 3022848b8605SmrgThe meanings of the individual semantic names are explained in the following 3023848b8605Smrgsections. 3024848b8605Smrg 3025848b8605SmrgTGSI_SEMANTIC_POSITION 3026848b8605Smrg"""""""""""""""""""""" 3027848b8605Smrg 3028848b8605SmrgFor vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader 3029848b8605Smrgoutput register which contains the homogeneous vertex position in the clip 3030848b8605Smrgspace coordinate system. After clipping, the X, Y and Z components of the 3031848b8605Smrgvertex will be divided by the W value to get normalized device coordinates. 3032848b8605Smrg 3033848b8605SmrgFor fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that 3034b8e80941Smrgfragment shader input (or system value, depending on which one is 3035b8e80941Smrgsupported by the driver) contains the fragment's window position. The X 3036848b8605Smrgcomponent starts at zero and always increases from left to right. 3037848b8605SmrgThe Y component starts at zero and always increases but Y=0 may either 3038848b8605Smrgindicate the top of the window or the bottom depending on the fragment 3039848b8605Smrgcoordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN). 3040848b8605SmrgThe Z coordinate ranges from 0 to 1 to represent depth from the front 3041b8e80941Smrgto the back of the Z buffer. The W component contains the interpolated 3042b8e80941Smrgreciprocal of the vertex position W component (corresponding to gl_Fragcoord, 3043b8e80941Smrgbut unlike d3d10 which interpolates the same 1/w but then gives back 3044b8e80941Smrgthe reciprocal of the interpolated value). 3045848b8605Smrg 3046848b8605SmrgFragment shaders may also declare an output register with 3047848b8605SmrgTGSI_SEMANTIC_POSITION. Only the Z component is writable. This allows 3048848b8605Smrgthe fragment shader to change the fragment's Z position. 3049848b8605Smrg 3050848b8605Smrg 3051848b8605Smrg 3052848b8605SmrgTGSI_SEMANTIC_COLOR 3053848b8605Smrg""""""""""""""""""" 3054848b8605Smrg 3055848b8605SmrgFor vertex shader outputs or fragment shader inputs/outputs, this 3056b8e80941Smrglabel indicates that the register contains an R,G,B,A color. 3057848b8605Smrg 3058848b8605SmrgSeveral shader inputs/outputs may contain colors so the semantic index 3059848b8605Smrgis used to distinguish them. For example, color[0] may be the diffuse 3060848b8605Smrgcolor while color[1] may be the specular color. 3061848b8605Smrg 3062848b8605SmrgThis label is needed so that the flat/smooth shading can be applied 3063848b8605Smrgto the right interpolants during rasterization. 3064848b8605Smrg 3065848b8605Smrg 3066848b8605Smrg 3067848b8605SmrgTGSI_SEMANTIC_BCOLOR 3068848b8605Smrg"""""""""""""""""""" 3069848b8605Smrg 3070848b8605SmrgBack-facing colors are only used for back-facing polygons, and are only valid 3071848b8605Smrgin vertex shader outputs. After rasterization, all polygons are front-facing 3072848b8605Smrgand COLOR and BCOLOR end up occupying the same slots in the fragment shader, 3073848b8605Smrgso all BCOLORs effectively become regular COLORs in the fragment shader. 3074848b8605Smrg 3075848b8605Smrg 3076848b8605SmrgTGSI_SEMANTIC_FOG 3077848b8605Smrg""""""""""""""""" 3078848b8605Smrg 3079848b8605SmrgVertex shader inputs and outputs and fragment shader inputs may be 3080848b8605Smrglabeled with TGSI_SEMANTIC_FOG to indicate that the register contains 3081848b8605Smrga fog coordinate. Typically, the fragment shader will use the fog coordinate 3082848b8605Smrgto compute a fog blend factor which is used to blend the normal fragment color 3083848b8605Smrgwith a constant fog color. But fog coord really is just an ordinary vec4 3084848b8605Smrgregister like regular semantics. 3085848b8605Smrg 3086848b8605Smrg 3087848b8605SmrgTGSI_SEMANTIC_PSIZE 3088848b8605Smrg""""""""""""""""""" 3089848b8605Smrg 3090848b8605SmrgVertex shader input and output registers may be labeled with 3091848b8605SmrgTGIS_SEMANTIC_PSIZE to indicate that the register contains a point size 3092848b8605Smrgin the form (S, 0, 0, 1). The point size controls the width or diameter 3093848b8605Smrgof points for rasterization. This label cannot be used in fragment 3094848b8605Smrgshaders. 3095848b8605Smrg 3096848b8605SmrgWhen using this semantic, be sure to set the appropriate state in the 3097848b8605Smrg:ref:`rasterizer` first. 3098848b8605Smrg 3099848b8605Smrg 3100848b8605SmrgTGSI_SEMANTIC_TEXCOORD 3101848b8605Smrg"""""""""""""""""""""" 3102848b8605Smrg 3103848b8605SmrgOnly available if PIPE_CAP_TGSI_TEXCOORD is exposed ! 3104848b8605Smrg 3105848b8605SmrgVertex shader outputs and fragment shader inputs may be labeled with 3106848b8605Smrgthis semantic to make them replaceable by sprite coordinates via the 3107848b8605Smrgsprite_coord_enable state in the :ref:`rasterizer`. 3108848b8605SmrgThe semantic index permitted with this semantic is limited to <= 7. 3109848b8605Smrg 3110848b8605SmrgIf the driver does not support TEXCOORD, sprite coordinate replacement 3111848b8605Smrgapplies to inputs with the GENERIC semantic instead. 3112848b8605Smrg 3113848b8605SmrgThe intended use case for this semantic is gl_TexCoord. 3114848b8605Smrg 3115848b8605Smrg 3116848b8605SmrgTGSI_SEMANTIC_PCOORD 3117848b8605Smrg"""""""""""""""""""" 3118848b8605Smrg 3119848b8605SmrgOnly available if PIPE_CAP_TGSI_TEXCOORD is exposed ! 3120848b8605Smrg 3121848b8605SmrgFragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate 3122848b8605Smrgthat the register contains sprite coordinates in the form (x, y, 0, 1), if 3123848b8605Smrgthe current primitive is a point and point sprites are enabled. Otherwise, 3124848b8605Smrgthe contents of the register are undefined. 3125848b8605Smrg 3126848b8605SmrgThe intended use case for this semantic is gl_PointCoord. 3127848b8605Smrg 3128848b8605Smrg 3129848b8605SmrgTGSI_SEMANTIC_GENERIC 3130848b8605Smrg""""""""""""""""""""" 3131848b8605Smrg 3132848b8605SmrgAll vertex/fragment shader inputs/outputs not labeled with any other 3133848b8605Smrgsemantic label can be considered to be generic attributes. Typical 3134848b8605Smrguses of generic inputs/outputs are texcoords and user-defined values. 3135848b8605Smrg 3136848b8605Smrg 3137848b8605SmrgTGSI_SEMANTIC_NORMAL 3138848b8605Smrg"""""""""""""""""""" 3139848b8605Smrg 3140848b8605SmrgIndicates that a vertex shader input is a normal vector. This is 3141848b8605Smrgtypically only used for legacy graphics APIs. 3142848b8605Smrg 3143848b8605Smrg 3144848b8605SmrgTGSI_SEMANTIC_FACE 3145848b8605Smrg"""""""""""""""""" 3146848b8605Smrg 3147b8e80941SmrgThis label applies to fragment shader inputs (or system values, 3148b8e80941Smrgdepending on which one is supported by the driver) and indicates that 3149b8e80941Smrgthe register contains front/back-face information. 3150b8e80941Smrg 3151b8e80941SmrgIf it is an input, it will be a floating-point vector in the form (F, 0, 0, 1), 3152b8e80941Smrgwhere F will be positive when the fragment belongs to a front-facing polygon, 3153b8e80941Smrgand negative when the fragment belongs to a back-facing polygon. 3154b8e80941Smrg 3155b8e80941SmrgIf it is a system value, it will be an integer vector in the form (F, 0, 0, 1), 3156b8e80941Smrgwhere F is 0xffffffff when the fragment belongs to a front-facing polygon and 3157b8e80941Smrg0 when the fragment belongs to a back-facing polygon. 3158848b8605Smrg 3159848b8605Smrg 3160848b8605SmrgTGSI_SEMANTIC_EDGEFLAG 3161848b8605Smrg"""""""""""""""""""""" 3162848b8605Smrg 3163848b8605SmrgFor vertex shaders, this sematic label indicates that an input or 3164848b8605Smrgoutput is a boolean edge flag. The register layout is [F, x, x, x] 3165848b8605Smrgwhere F is 0.0 or 1.0 and x = don't care. Normally, the vertex shader 3166848b8605Smrgsimply copies the edge flag input to the edgeflag output. 3167848b8605Smrg 3168848b8605SmrgEdge flags are used to control which lines or points are actually 3169848b8605Smrgdrawn when the polygon mode converts triangles/quads/polygons into 3170848b8605Smrgpoints or lines. 3171848b8605Smrg 3172848b8605Smrg 3173848b8605SmrgTGSI_SEMANTIC_STENCIL 3174848b8605Smrg""""""""""""""""""""" 3175848b8605Smrg 3176848b8605SmrgFor fragment shaders, this semantic label indicates that an output 3177848b8605Smrgis a writable stencil reference value. Only the Y component is writable. 3178848b8605SmrgThis allows the fragment shader to change the fragments stencilref value. 3179848b8605Smrg 3180848b8605Smrg 3181848b8605SmrgTGSI_SEMANTIC_VIEWPORT_INDEX 3182848b8605Smrg"""""""""""""""""""""""""""" 3183848b8605Smrg 3184848b8605SmrgFor geometry shaders, this semantic label indicates that an output 3185848b8605Smrgcontains the index of the viewport (and scissor) to use. 3186b8e80941SmrgThis is an integer value, and only the X component is used. 3187b8e80941Smrg 3188b8e80941SmrgIf PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is 3189b8e80941Smrgsupported, then this semantic label can also be used in vertex or 3190b8e80941Smrgtessellation evaluation shaders, respectively. Only the value written in the 3191b8e80941Smrglast vertex processing stage is used. 3192848b8605Smrg 3193848b8605Smrg 3194848b8605SmrgTGSI_SEMANTIC_LAYER 3195848b8605Smrg""""""""""""""""""" 3196848b8605Smrg 3197848b8605SmrgFor geometry shaders, this semantic label indicates that an output 3198848b8605Smrgcontains the layer value to use for the color and depth/stencil surfaces. 3199b8e80941SmrgThis is an integer value, and only the X component is used. 3200b8e80941Smrg(Also known as rendertarget array index.) 3201848b8605Smrg 3202b8e80941SmrgIf PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is 3203b8e80941Smrgsupported, then this semantic label can also be used in vertex or 3204b8e80941Smrgtessellation evaluation shaders, respectively. Only the value written in the 3205b8e80941Smrglast vertex processing stage is used. 3206848b8605Smrg 3207848b8605Smrg 3208848b8605SmrgTGSI_SEMANTIC_CLIPDIST 3209848b8605Smrg"""""""""""""""""""""" 3210848b8605Smrg 3211b8e80941SmrgNote this covers clipping and culling distances. 3212b8e80941Smrg 3213848b8605SmrgWhen components of vertex elements are identified this way, these 3214848b8605Smrgvalues are each assumed to be a float32 signed distance to a plane. 3215b8e80941Smrg 3216b8e80941SmrgFor clip distances: 3217848b8605SmrgPrimitive setup only invokes rasterization on pixels for which 3218b8e80941Smrgthe interpolated plane distances are >= 0. 3219b8e80941Smrg 3220b8e80941SmrgFor cull distances: 3221b8e80941SmrgPrimitives will be completely discarded if the plane distance 3222b8e80941Smrgfor all of the vertices in the primitive are < 0. 3223b8e80941SmrgIf a vertex has a cull distance of NaN, that vertex counts as "out" 3224b8e80941Smrg(as if its < 0); 3225b8e80941Smrg 3226b8e80941SmrgMultiple clip/cull planes can be implemented simultaneously, by 3227b8e80941Smrgannotating multiple components of one or more vertex elements with 3228b8e80941Smrgthe above specified semantic. 3229b8e80941SmrgThe limits on both clip and cull distances are bound 3230848b8605Smrgby the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines 3231848b8605Smrgthe maximum number of components that can be used to hold the 3232848b8605Smrgdistances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT 3233848b8605Smrgwhich specifies the maximum number of registers which can be 3234848b8605Smrgannotated with those semantics. 3235b8e80941SmrgThe properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED 3236b8e80941Smrgare used to divide up the 2 x vec4 space between clipping and culling. 3237848b8605Smrg 3238848b8605SmrgTGSI_SEMANTIC_SAMPLEID 3239848b8605Smrg"""""""""""""""""""""" 3240848b8605Smrg 3241848b8605SmrgFor fragment shaders, this semantic label indicates that a system value 3242b8e80941Smrgcontains the current sample id (i.e. gl_SampleID) as an unsigned int. 3243b8e80941SmrgOnly the X component is used. If per-sample shading is not enabled, 3244b8e80941Smrgthe result is (0, undef, undef, undef). 3245b8e80941Smrg 3246b8e80941SmrgNote that if the fragment shader uses this system value, the fragment 3247b8e80941Smrgshader is automatically executed at per sample frequency. 3248848b8605Smrg 3249848b8605SmrgTGSI_SEMANTIC_SAMPLEPOS 3250848b8605Smrg""""""""""""""""""""""" 3251848b8605Smrg 3252b8e80941SmrgFor fragment shaders, this semantic label indicates that a system 3253b8e80941Smrgvalue contains the current sample's position as float4(x, y, undef, undef) 3254b8e80941Smrgin the render target (i.e. gl_SamplePosition) when per-fragment shading 3255b8e80941Smrgis in effect. Position values are in the range [0, 1] where 0.5 is 3256b8e80941Smrgthe center of the fragment. 3257b8e80941Smrg 3258b8e80941SmrgNote that if the fragment shader uses this system value, the fragment 3259b8e80941Smrgshader is automatically executed at per sample frequency. 3260848b8605Smrg 3261848b8605SmrgTGSI_SEMANTIC_SAMPLEMASK 3262848b8605Smrg"""""""""""""""""""""""" 3263848b8605Smrg 3264b8e80941SmrgFor fragment shaders, this semantic label can be applied to either a 3265b8e80941Smrgshader system value input or output. 3266b8e80941Smrg 3267b8e80941SmrgFor a system value, the sample mask indicates the set of samples covered by 3268b8e80941Smrgthe current primitive. If MSAA is not enabled, the value is (1, 0, 0, 0). 3269b8e80941Smrg 3270b8e80941SmrgFor an output, the sample mask is used to disable further sample processing. 3271b8e80941Smrg 3272b8e80941SmrgFor both, the register type is uint[4] but only the X component is used 3273b8e80941Smrg(i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up 3274b8e80941Smrgto 32x MSAA is supported). 3275848b8605Smrg 3276848b8605SmrgTGSI_SEMANTIC_INVOCATIONID 3277848b8605Smrg"""""""""""""""""""""""""" 3278848b8605Smrg 3279848b8605SmrgFor geometry shaders, this semantic label indicates that a system value 3280b8e80941Smrgcontains the current invocation id (i.e. gl_InvocationID). 3281b8e80941SmrgThis is an integer value, and only the X component is used. 3282b8e80941Smrg 3283b8e80941SmrgTGSI_SEMANTIC_INSTANCEID 3284b8e80941Smrg"""""""""""""""""""""""" 3285b8e80941Smrg 3286b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains 3287b8e80941Smrgthe current instance id (i.e. gl_InstanceID). It does not include the base 3288b8e80941Smrginstance. This is an integer value, and only the X component is used. 3289b8e80941Smrg 3290b8e80941SmrgTGSI_SEMANTIC_VERTEXID 3291b8e80941Smrg"""""""""""""""""""""" 3292b8e80941Smrg 3293b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains 3294b8e80941Smrgthe current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the 3295b8e80941Smrgbase vertex. This is an integer value, and only the X component is used. 3296b8e80941Smrg 3297b8e80941SmrgTGSI_SEMANTIC_VERTEXID_NOBASE 3298b8e80941Smrg""""""""""""""""""""""""""""""" 3299b8e80941Smrg 3300b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains 3301b8e80941Smrgthe current vertex id without including the base vertex (this corresponds to 3302b8e80941Smrgd3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX 3303b8e80941Smrg== TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component 3304b8e80941Smrgis used. 3305b8e80941Smrg 3306b8e80941SmrgTGSI_SEMANTIC_BASEVERTEX 3307b8e80941Smrg"""""""""""""""""""""""" 3308b8e80941Smrg 3309b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains 3310b8e80941Smrgthe base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls, 3311b8e80941Smrgthis contains the first (or start) value instead. 3312b8e80941SmrgThis is an integer value, and only the X component is used. 3313b8e80941Smrg 3314b8e80941SmrgTGSI_SEMANTIC_PRIMID 3315b8e80941Smrg"""""""""""""""""""" 3316b8e80941Smrg 3317b8e80941SmrgFor geometry and fragment shaders, this semantic label indicates the value 3318b8e80941Smrgcontains the primitive id (i.e. gl_PrimitiveID). This is an integer value, 3319b8e80941Smrgand only the X component is used. 3320b8e80941SmrgFIXME: This right now can be either a ordinary input or a system value... 3321b8e80941Smrg 3322b8e80941Smrg 3323b8e80941SmrgTGSI_SEMANTIC_PATCH 3324b8e80941Smrg""""""""""""""""""" 3325b8e80941Smrg 3326b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates a 3327b8e80941Smrggeneric per-patch attribute. Such semantics will not implicitly be per-vertex 3328b8e80941Smrgarrays. 3329b8e80941Smrg 3330b8e80941SmrgTGSI_SEMANTIC_TESSCOORD 3331b8e80941Smrg""""""""""""""""""""""" 3332b8e80941Smrg 3333b8e80941SmrgFor tessellation evaluation shaders, this semantic label indicates the 3334b8e80941Smrgcoordinates of the vertex being processed. This is available in XYZ; W is 3335b8e80941Smrgundefined. 3336b8e80941Smrg 3337b8e80941SmrgTGSI_SEMANTIC_TESSOUTER 3338b8e80941Smrg""""""""""""""""""""""" 3339b8e80941Smrg 3340b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates the 3341b8e80941Smrgouter tessellation levels of the patch. Isoline tessellation will only have XY 3342b8e80941Smrgdefined, triangle will have XYZ and quads will have XYZW defined. This 3343b8e80941Smrgcorresponds to gl_TessLevelOuter. 3344b8e80941Smrg 3345b8e80941SmrgTGSI_SEMANTIC_TESSINNER 3346b8e80941Smrg""""""""""""""""""""""" 3347b8e80941Smrg 3348b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates the 3349b8e80941Smrginner tessellation levels of the patch. The X value is only defined for 3350b8e80941Smrgtriangle tessellation, while quads will have XY defined. This is entirely 3351b8e80941Smrgundefined for isoline tessellation. 3352b8e80941Smrg 3353b8e80941SmrgTGSI_SEMANTIC_VERTICESIN 3354b8e80941Smrg"""""""""""""""""""""""" 3355b8e80941Smrg 3356b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates the 3357b8e80941Smrgnumber of vertices provided in the input patch. Only the X value is defined. 3358b8e80941Smrg 3359b8e80941SmrgTGSI_SEMANTIC_HELPER_INVOCATION 3360b8e80941Smrg""""""""""""""""""""""""""""""" 3361b8e80941Smrg 3362b8e80941SmrgFor fragment shaders, this semantic indicates whether the current 3363b8e80941Smrginvocation is covered or not. Helper invocations are created in order 3364b8e80941Smrgto properly compute derivatives, however it may be desirable to skip 3365b8e80941Smrgsome of the logic in those cases. See ``gl_HelperInvocation`` documentation. 3366b8e80941Smrg 3367b8e80941SmrgTGSI_SEMANTIC_BASEINSTANCE 3368b8e80941Smrg"""""""""""""""""""""""""" 3369b8e80941Smrg 3370b8e80941SmrgFor vertex shaders, the base instance argument supplied for this 3371b8e80941Smrgdraw. This is an integer value, and only the X component is used. 3372b8e80941Smrg 3373b8e80941SmrgTGSI_SEMANTIC_DRAWID 3374b8e80941Smrg"""""""""""""""""""" 3375b8e80941Smrg 3376b8e80941SmrgFor vertex shaders, the zero-based index of the current draw in a 3377b8e80941Smrg``glMultiDraw*`` invocation. This is an integer value, and only the X 3378b8e80941Smrgcomponent is used. 3379b8e80941Smrg 3380b8e80941Smrg 3381b8e80941SmrgTGSI_SEMANTIC_WORK_DIM 3382b8e80941Smrg"""""""""""""""""""""" 3383b8e80941Smrg 3384b8e80941SmrgFor compute shaders started via opencl this retrieves the work_dim 3385b8e80941Smrgparameter to the clEnqueueNDRangeKernel call with which the shader 3386b8e80941Smrgwas started. 3387b8e80941Smrg 3388b8e80941Smrg 3389b8e80941SmrgTGSI_SEMANTIC_GRID_SIZE 3390b8e80941Smrg""""""""""""""""""""""" 3391b8e80941Smrg 3392b8e80941SmrgFor compute shaders, this semantic indicates the maximum (x, y, z) dimensions 3393b8e80941Smrgof a grid of thread blocks. 3394b8e80941Smrg 3395b8e80941Smrg 3396b8e80941SmrgTGSI_SEMANTIC_BLOCK_ID 3397b8e80941Smrg"""""""""""""""""""""" 3398b8e80941Smrg 3399b8e80941SmrgFor compute shaders, this semantic indicates the (x, y, z) coordinates of the 3400b8e80941Smrgcurrent block inside of the grid. 3401b8e80941Smrg 3402b8e80941Smrg 3403b8e80941SmrgTGSI_SEMANTIC_BLOCK_SIZE 3404b8e80941Smrg"""""""""""""""""""""""" 3405b8e80941Smrg 3406b8e80941SmrgFor compute shaders, this semantic indicates the maximum (x, y, z) dimensions 3407b8e80941Smrgof a block in threads. 3408b8e80941Smrg 3409b8e80941Smrg 3410b8e80941SmrgTGSI_SEMANTIC_THREAD_ID 3411b8e80941Smrg""""""""""""""""""""""" 3412b8e80941Smrg 3413b8e80941SmrgFor compute shaders, this semantic indicates the (x, y, z) coordinates of the 3414b8e80941Smrgcurrent thread inside of the block. 3415b8e80941Smrg 3416b8e80941Smrg 3417b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_SIZE 3418b8e80941Smrg""""""""""""""""""""""""""" 3419b8e80941Smrg 3420b8e80941SmrgThis semantic indicates the subgroup size for the current invocation. This is 3421b8e80941Smrgan integer of at most 64, as it indicates the width of lanemasks. It does not 3422b8e80941Smrgdepend on the number of invocations that are active. 3423b8e80941Smrg 3424b8e80941Smrg 3425b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_INVOCATION 3426b8e80941Smrg""""""""""""""""""""""""""""""""" 3427b8e80941Smrg 3428b8e80941SmrgThe index of the current invocation within its subgroup. 3429b8e80941Smrg 3430b8e80941Smrg 3431b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_EQ_MASK 3432b8e80941Smrg"""""""""""""""""""""""""""""" 3433b8e80941Smrg 3434b8e80941SmrgA bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3435b8e80941Smrg``1 << subgroup_invocation`` in arbitrary precision arithmetic. 3436b8e80941Smrg 3437b8e80941Smrg 3438b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_GE_MASK 3439b8e80941Smrg"""""""""""""""""""""""""""""" 3440b8e80941Smrg 3441b8e80941SmrgA bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3442b8e80941Smrg``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation`` 3443b8e80941Smrgin arbitrary precision arithmetic. 3444b8e80941Smrg 3445b8e80941Smrg 3446b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_GT_MASK 3447b8e80941Smrg"""""""""""""""""""""""""""""" 3448b8e80941Smrg 3449b8e80941SmrgA bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3450b8e80941Smrg``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)`` 3451b8e80941Smrgin arbitrary precision arithmetic. 3452b8e80941Smrg 3453b8e80941Smrg 3454b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_LE_MASK 3455b8e80941Smrg"""""""""""""""""""""""""""""" 3456b8e80941Smrg 3457b8e80941SmrgA bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3458b8e80941Smrg``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic. 3459b8e80941Smrg 3460b8e80941Smrg 3461b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_LT_MASK 3462b8e80941Smrg"""""""""""""""""""""""""""""" 3463b8e80941Smrg 3464b8e80941SmrgA bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e. 3465b8e80941Smrg``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic. 3466b8e80941Smrg 3467848b8605Smrg 3468848b8605SmrgDeclaration Interpolate 3469848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^ 3470848b8605Smrg 3471848b8605SmrgThis token is only valid for fragment shader INPUT declarations. 3472848b8605Smrg 3473848b8605SmrgThe Interpolate field specifes the way input is being interpolated by 3474848b8605Smrgthe rasteriser and is one of TGSI_INTERPOLATE_*. 3475848b8605Smrg 3476848b8605SmrgThe Location field specifies the location inside the pixel that the 3477848b8605Smrginterpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that 3478848b8605Smrgwhen per-sample shading is enabled, the implementation may choose to 3479848b8605Smrginterpolate at the sample irrespective of the Location field. 3480848b8605Smrg 3481848b8605SmrgThe CylindricalWrap bitfield specifies which register components 3482848b8605Smrgshould be subject to cylindrical wrapping when interpolating by the 3483848b8605Smrgrasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component 3484848b8605Smrgshould be interpolated according to cylindrical wrapping rules. 3485848b8605Smrg 3486848b8605Smrg 3487848b8605SmrgDeclaration Sampler View 3488848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^ 3489848b8605Smrg 3490848b8605SmrgFollows Declaration token if file is TGSI_FILE_SAMPLER_VIEW. 3491848b8605Smrg 3492848b8605SmrgDCL SVIEW[#], resource, type(s) 3493848b8605Smrg 3494848b8605SmrgDeclares a shader input sampler view and assigns it to a SVIEW[#] 3495848b8605Smrgregister. 3496848b8605Smrg 3497848b8605Smrgresource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray. 3498848b8605Smrg 3499848b8605Smrgtype must be 1 or 4 entries (if specifying on a per-component 3500848b8605Smrglevel) out of UNORM, SNORM, SINT, UINT and FLOAT. 3501848b8605Smrg 3502b8e80941SmrgFor TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes 3503b8e80941Smrgwhich take an explicit SVIEW[#] source register), there may be optionally 3504b8e80941SmrgSVIEW[#] declarations. In this case, the SVIEW index is implied by the 3505b8e80941SmrgSAMP index, and there must be a corresponding SVIEW[#] declaration for 3506b8e80941Smrgeach SAMP[#] declaration. Drivers are free to ignore this if they wish. 3507b8e80941SmrgBut note in particular that some drivers need to know the sampler type 3508b8e80941Smrg(float/int/unsigned) in order to generate the correct code, so cases 3509b8e80941Smrgwhere integer textures are sampled, SVIEW[#] declarations should be 3510b8e80941Smrgused. 3511b8e80941Smrg 3512b8e80941SmrgNOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes 3513b8e80941Smrgin the same shader. 3514848b8605Smrg 3515848b8605SmrgDeclaration Resource 3516848b8605Smrg^^^^^^^^^^^^^^^^^^^^ 3517848b8605Smrg 3518848b8605SmrgFollows Declaration token if file is TGSI_FILE_RESOURCE. 3519848b8605Smrg 3520848b8605SmrgDCL RES[#], resource [, WR] [, RAW] 3521848b8605Smrg 3522848b8605SmrgDeclares a shader input resource and assigns it to a RES[#] 3523848b8605Smrgregister. 3524848b8605Smrg 3525848b8605Smrgresource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and 3526848b8605Smrg2DArray. 3527848b8605Smrg 3528848b8605SmrgIf the RAW keyword is not specified, the texture data will be 3529848b8605Smrgsubject to conversion, swizzling and scaling as required to yield 3530848b8605Smrgthe specified data type from the physical data format of the bound 3531848b8605Smrgresource. 3532848b8605Smrg 3533848b8605SmrgIf the RAW keyword is specified, no channel conversion will be 3534848b8605Smrgperformed: the values read for each of the channels (X,Y,Z,W) will 3535848b8605Smrgcorrespond to consecutive words in the same order and format 3536848b8605Smrgthey're found in memory. No element-to-address conversion will be 3537848b8605Smrgperformed either: the value of the provided X coordinate will be 3538848b8605Smrginterpreted in byte units instead of texel units. The result of 3539848b8605Smrgaccessing a misaligned address is undefined. 3540848b8605Smrg 3541848b8605SmrgUsage of the STORE opcode is only allowed if the WR (writable) flag 3542848b8605Smrgis set. 3543848b8605Smrg 3544b8e80941SmrgHardware Atomic Register File 3545b8e80941Smrg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3546b8e80941Smrg 3547b8e80941SmrgHardware atomics are declared as a 2D array with an optional array id. 3548b8e80941Smrg 3549b8e80941SmrgThe first member of the dimension is the buffer resource the atomic 3550b8e80941Smrgis located in. 3551b8e80941SmrgThe second member is a range into the buffer resource, either for 3552b8e80941Smrgone or multiple counters. If this is an array, the declaration will have 3553b8e80941Smrgan unique array id. 3554b8e80941Smrg 3555b8e80941SmrgEach counter is 4 bytes in size, and index and ranges are in counters not bytes. 3556b8e80941SmrgDCL HWATOMIC[0][0] 3557b8e80941SmrgDCL HWATOMIC[0][1] 3558b8e80941Smrg 3559b8e80941SmrgThis declares two atomics, one at the start of the buffer and one in the 3560b8e80941Smrgsecond 4 bytes. 3561b8e80941Smrg 3562b8e80941SmrgDCL HWATOMIC[0][0] 3563b8e80941SmrgDCL HWATOMIC[1][0] 3564b8e80941SmrgDCL HWATOMIC[1][1..3], ARRAY(1) 3565b8e80941Smrg 3566b8e80941SmrgThis declares 5 atomics, one in buffer 0 at 0, 3567b8e80941Smrgone in buffer 1 at 0, and an array of 3 atomics in 3568b8e80941Smrgthe buffer 1, starting at 1. 3569848b8605Smrg 3570848b8605SmrgProperties 3571848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^ 3572848b8605Smrg 3573848b8605SmrgProperties are general directives that apply to the whole TGSI program. 3574848b8605Smrg 3575848b8605SmrgFS_COORD_ORIGIN 3576848b8605Smrg""""""""""""""" 3577848b8605Smrg 3578848b8605SmrgSpecifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin. 3579848b8605SmrgThe default value is UPPER_LEFT. 3580848b8605Smrg 3581848b8605SmrgIf UPPER_LEFT, the position will be (0,0) at the upper left corner and 3582848b8605Smrgincrease downward and rightward. 3583848b8605SmrgIf LOWER_LEFT, the position will be (0,0) at the lower left corner and 3584848b8605Smrgincrease upward and rightward. 3585848b8605Smrg 3586848b8605SmrgOpenGL defaults to LOWER_LEFT, and is configurable with the 3587848b8605SmrgGL_ARB_fragment_coord_conventions extension. 3588848b8605Smrg 3589848b8605SmrgDirectX 9/10 use UPPER_LEFT. 3590848b8605Smrg 3591848b8605SmrgFS_COORD_PIXEL_CENTER 3592848b8605Smrg""""""""""""""""""""" 3593848b8605Smrg 3594848b8605SmrgSpecifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention. 3595848b8605SmrgThe default value is HALF_INTEGER. 3596848b8605Smrg 3597848b8605SmrgIf HALF_INTEGER, the fractionary part of the position will be 0.5 3598848b8605SmrgIf INTEGER, the fractionary part of the position will be 0.0 3599848b8605Smrg 3600848b8605SmrgNote that this does not affect the set of fragments generated by 3601848b8605Smrgrasterization, which is instead controlled by half_pixel_center in the 3602848b8605Smrgrasterizer. 3603848b8605Smrg 3604848b8605SmrgOpenGL defaults to HALF_INTEGER, and is configurable with the 3605848b8605SmrgGL_ARB_fragment_coord_conventions extension. 3606848b8605Smrg 3607848b8605SmrgDirectX 9 uses INTEGER. 3608848b8605SmrgDirectX 10 uses HALF_INTEGER. 3609848b8605Smrg 3610848b8605SmrgFS_COLOR0_WRITES_ALL_CBUFS 3611848b8605Smrg"""""""""""""""""""""""""" 3612848b8605SmrgSpecifies that writes to the fragment shader color 0 are replicated to all 3613848b8605Smrgbound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where 3614848b8605SmrgfragData is directed to a single color buffer, but fragColor is broadcast. 3615848b8605Smrg 3616848b8605SmrgVS_PROHIBIT_UCPS 3617848b8605Smrg"""""""""""""""""""""""""" 3618848b8605SmrgIf this property is set on the program bound to the shader stage before the 3619848b8605Smrgfragment shader, user clip planes should have no effect (be disabled) even if 3620848b8605Smrgthat shader does not write to any clip distance outputs and the rasterizer's 3621848b8605Smrgclip_plane_enable is non-zero. 3622848b8605SmrgThis property is only supported by drivers that also support shader clip 3623848b8605Smrgdistance outputs. 3624848b8605SmrgThis is useful for APIs that don't have UCPs and where clip distances written 3625848b8605Smrgby a shader cannot be disabled. 3626848b8605Smrg 3627848b8605SmrgGS_INVOCATIONS 3628848b8605Smrg"""""""""""""" 3629848b8605Smrg 3630848b8605SmrgSpecifies the number of times a geometry shader should be executed for each 3631848b8605Smrginput primitive. Each invocation will have a different 3632848b8605SmrgTGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to 3633848b8605Smrgbe 1. 3634848b8605Smrg 3635848b8605SmrgVS_WINDOW_SPACE_POSITION 3636848b8605Smrg"""""""""""""""""""""""""" 3637848b8605SmrgIf this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output 3638848b8605Smrgis assumed to contain window space coordinates. 3639848b8605SmrgDivision of X,Y,Z by W and the viewport transformation are disabled, and 1/W is 3640848b8605Smrgdirectly taken from the 4-th component of the shader output. 3641848b8605SmrgNaturally, clipping is not performed on window coordinates either. 3642848b8605SmrgThe effect of this property is undefined if a geometry or tessellation shader 3643848b8605Smrgare in use. 3644848b8605Smrg 3645b8e80941SmrgTCS_VERTICES_OUT 3646b8e80941Smrg"""""""""""""""" 3647b8e80941Smrg 3648b8e80941SmrgThe number of vertices written by the tessellation control shader. This 3649b8e80941Smrgeffectively defines the patch input size of the tessellation evaluation shader 3650b8e80941Smrgas well. 3651b8e80941Smrg 3652b8e80941SmrgTES_PRIM_MODE 3653b8e80941Smrg""""""""""""" 3654b8e80941Smrg 3655b8e80941SmrgThis sets the tessellation primitive mode, one of ``PIPE_PRIM_TRIANGLES``, 3656b8e80941Smrg``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no 3657b8e80941Smrgseparate isolines settings, the regular lines is assumed to mean isolines.) 3658b8e80941Smrg 3659b8e80941SmrgTES_SPACING 3660b8e80941Smrg""""""""""" 3661b8e80941Smrg 3662b8e80941SmrgThis sets the spacing mode of the tessellation generator, one of 3663b8e80941Smrg``PIPE_TESS_SPACING_*``. 3664b8e80941Smrg 3665b8e80941SmrgTES_VERTEX_ORDER_CW 3666b8e80941Smrg""""""""""""""""""" 3667b8e80941Smrg 3668b8e80941SmrgThis sets the vertex order to be clockwise if the value is 1, or 3669b8e80941Smrgcounter-clockwise if set to 0. 3670b8e80941Smrg 3671b8e80941SmrgTES_POINT_MODE 3672b8e80941Smrg"""""""""""""" 3673b8e80941Smrg 3674b8e80941SmrgIf set to a non-zero value, this turns on point mode for the tessellator, 3675b8e80941Smrgwhich means that points will be generated instead of primitives. 3676b8e80941Smrg 3677b8e80941SmrgNUM_CLIPDIST_ENABLED 3678b8e80941Smrg"""""""""""""""""""" 3679b8e80941Smrg 3680b8e80941SmrgHow many clip distance scalar outputs are enabled. 3681b8e80941Smrg 3682b8e80941SmrgNUM_CULLDIST_ENABLED 3683b8e80941Smrg"""""""""""""""""""" 3684b8e80941Smrg 3685b8e80941SmrgHow many cull distance scalar outputs are enabled. 3686b8e80941Smrg 3687b8e80941SmrgFS_EARLY_DEPTH_STENCIL 3688b8e80941Smrg"""""""""""""""""""""" 3689b8e80941Smrg 3690b8e80941SmrgWhether depth test, stencil test, and occlusion query should run before 3691b8e80941Smrgthe fragment shader (regardless of fragment shader side effects). Corresponds 3692b8e80941Smrgto GLSL early_fragment_tests. 3693b8e80941Smrg 3694b8e80941SmrgNEXT_SHADER 3695b8e80941Smrg""""""""""" 3696b8e80941Smrg 3697b8e80941SmrgWhich shader stage will MOST LIKELY follow after this shader when the shader 3698b8e80941Smrgis bound. This is only a hint to the driver and doesn't have to be precise. 3699b8e80941SmrgOnly set for VS and TES. 3700b8e80941Smrg 3701b8e80941SmrgCS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH 3702b8e80941Smrg""""""""""""""""""""""""""""""""""""" 3703b8e80941Smrg 3704b8e80941SmrgThreads per block in each dimension, if known at compile time. If the block size 3705b8e80941Smrgis known all three should be at least 1. If it is unknown they should all be set 3706b8e80941Smrgto 0 or not set. 3707b8e80941Smrg 3708b8e80941SmrgMUL_ZERO_WINS 3709b8e80941Smrg""""""""""""" 3710b8e80941Smrg 3711b8e80941SmrgThe MUL TGSI operation (FP32 multiplication) will return 0 if either 3712b8e80941Smrgof the operands are equal to 0. That means that 0 * Inf = 0. This 3713b8e80941Smrgshould be set the same way for an entire pipeline. Note that this 3714b8e80941Smrgapplies not only to the literal MUL TGSI opcode, but all FP32 3715b8e80941Smrgmultiplications implied by other operations, such as MAD, FMA, DP2, 3716b8e80941SmrgDP3, DP4, DST, LOG, LRP, and possibly others. If there is a 3717b8e80941Smrgmismatch between shaders, then it is unspecified whether this behavior 3718b8e80941Smrgwill be enabled. 3719b8e80941Smrg 3720b8e80941SmrgFS_POST_DEPTH_COVERAGE 3721b8e80941Smrg"""""""""""""""""""""" 3722b8e80941Smrg 3723b8e80941SmrgWhen enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples 3724b8e80941Smrgthat have failed the depth/stencil tests. This is only valid when 3725b8e80941SmrgFS_EARLY_DEPTH_STENCIL is also specified. 3726b8e80941Smrg 3727b8e80941Smrg 3728848b8605SmrgTexture Sampling and Texture Formats 3729848b8605Smrg------------------------------------ 3730848b8605Smrg 3731848b8605SmrgThis table shows how texture image components are returned as (x,y,z,w) tuples 3732848b8605Smrgby TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and 3733848b8605Smrg:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as 3734848b8605Smrgwell. 3735848b8605Smrg 3736848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3737848b8605Smrg| Texture Components | Gallium | OpenGL | Direct3D 9 | 3738848b8605Smrg+====================+==============+====================+==============+ 3739848b8605Smrg| R | (r, 0, 0, 1) | (r, 0, 0, 1) | (r, 1, 1, 1) | 3740848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3741848b8605Smrg| RG | (r, g, 0, 1) | (r, g, 0, 1) | (r, g, 1, 1) | 3742848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3743848b8605Smrg| RGB | (r, g, b, 1) | (r, g, b, 1) | (r, g, b, 1) | 3744848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3745848b8605Smrg| RGBA | (r, g, b, a) | (r, g, b, a) | (r, g, b, a) | 3746848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3747848b8605Smrg| A | (0, 0, 0, a) | (0, 0, 0, a) | (0, 0, 0, a) | 3748848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3749848b8605Smrg| L | (l, l, l, 1) | (l, l, l, 1) | (l, l, l, 1) | 3750848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3751848b8605Smrg| LA | (l, l, l, a) | (l, l, l, a) | (l, l, l, a) | 3752848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3753848b8605Smrg| I | (i, i, i, i) | (i, i, i, i) | N/A | 3754848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3755848b8605Smrg| UV | XXX TBD | (0, 0, 0, 1) | (u, v, 1, 1) | 3756848b8605Smrg| | | [#envmap-bumpmap]_ | | 3757848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3758848b8605Smrg| Z | XXX TBD | (z, z, z, 1) | (0, z, 0, 1) | 3759848b8605Smrg| | | [#depth-tex-mode]_ | | 3760848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3761848b8605Smrg| S | (s, s, s, s) | unknown | unknown | 3762848b8605Smrg+--------------------+--------------+--------------------+--------------+ 3763848b8605Smrg 3764848b8605Smrg.. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt 3765848b8605Smrg.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z) 3766848b8605Smrg or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE. 3767