1848b8605SmrgTGSI
2848b8605Smrg====
3848b8605Smrg
4848b8605SmrgTGSI, Tungsten Graphics Shader Infrastructure, is an intermediate language
5848b8605Smrgfor describing shaders. Since Gallium is inherently shaderful, shaders are
6848b8605Smrgan important part of the API. TGSI is the only intermediate representation
7848b8605Smrgused by all drivers.
8848b8605Smrg
9848b8605SmrgBasics
10848b8605Smrg------
11848b8605Smrg
12848b8605SmrgAll TGSI instructions, known as *opcodes*, operate on arbitrary-precision
13848b8605Smrgfloating-point four-component vectors. An opcode may have up to one
14848b8605Smrgdestination register, known as *dst*, and between zero and three source
15848b8605Smrgregisters, called *src0* through *src2*, or simply *src* if there is only
16848b8605Smrgone.
17848b8605Smrg
18848b8605SmrgSome instructions, like :opcode:`I2F`, permit re-interpretation of vector
19848b8605Smrgcomponents as integers. Other instructions permit using registers as
20848b8605Smrgtwo-component vectors with double precision; see :ref:`doubleopcodes`.
21848b8605Smrg
22848b8605SmrgWhen an instruction has a scalar result, the result is usually copied into
23848b8605Smrgeach of the components of *dst*. When this happens, the result is said to be
24848b8605Smrg*replicated* to *dst*. :opcode:`RCP` is one such instruction.
25848b8605Smrg
26848b8605SmrgModifiers
27848b8605Smrg^^^^^^^^^^^^^^^
28848b8605Smrg
29b8e80941SmrgTGSI supports modifiers on inputs (as well as saturate and precise modifier
30b8e80941Smrgon instructions).
31848b8605Smrg
32b8e80941SmrgFor arithmetic instruction having a precise modifier certain optimizations
33b8e80941Smrgwhich may alter the result are disallowed. Example: *add(mul(a,b),c)* can't be
34b8e80941Smrgoptimized to TGSI_OPCODE_MAD, because some hardware only supports the fused
35b8e80941SmrgMAD instruction.
36b8e80941Smrg
37b8e80941SmrgFor inputs which have a floating point type, both absolute value and
38b8e80941Smrgnegation modifiers are supported (with absolute value being applied
39b8e80941Smrgfirst).  The only source of TGSI_OPCODE_MOV and the second and third
40b8e80941Smrgsources of TGSI_OPCODE_UCMP are considered to have float type for
41b8e80941Smrgapplying modifiers.
42848b8605Smrg
43848b8605SmrgFor inputs which have signed or unsigned type only the negate modifier is
44848b8605Smrgsupported.
45848b8605Smrg
46848b8605SmrgInstruction Set
47848b8605Smrg---------------
48848b8605Smrg
49848b8605SmrgCore ISA
50848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^^
51848b8605Smrg
52848b8605SmrgThese opcodes are guaranteed to be available regardless of the driver being
53848b8605Smrgused.
54848b8605Smrg
55848b8605Smrg.. opcode:: ARL - Address Register Load
56848b8605Smrg
57848b8605Smrg.. math::
58848b8605Smrg
59b8e80941Smrg  dst.x = (int) \lfloor src.x\rfloor
60848b8605Smrg
61b8e80941Smrg  dst.y = (int) \lfloor src.y\rfloor
62848b8605Smrg
63b8e80941Smrg  dst.z = (int) \lfloor src.z\rfloor
64848b8605Smrg
65b8e80941Smrg  dst.w = (int) \lfloor src.w\rfloor
66848b8605Smrg
67848b8605Smrg
68848b8605Smrg.. opcode:: MOV - Move
69848b8605Smrg
70848b8605Smrg.. math::
71848b8605Smrg
72848b8605Smrg  dst.x = src.x
73848b8605Smrg
74848b8605Smrg  dst.y = src.y
75848b8605Smrg
76848b8605Smrg  dst.z = src.z
77848b8605Smrg
78848b8605Smrg  dst.w = src.w
79848b8605Smrg
80848b8605Smrg
81848b8605Smrg.. opcode:: LIT - Light Coefficients
82848b8605Smrg
83848b8605Smrg.. math::
84848b8605Smrg
85848b8605Smrg  dst.x &= 1 \\
86848b8605Smrg  dst.y &= max(src.x, 0) \\
87848b8605Smrg  dst.z &= (src.x > 0) ? max(src.y, 0)^{clamp(src.w, -128, 128))} : 0 \\
88848b8605Smrg  dst.w &= 1
89848b8605Smrg
90848b8605Smrg
91848b8605Smrg.. opcode:: RCP - Reciprocal
92848b8605Smrg
93848b8605SmrgThis instruction replicates its result.
94848b8605Smrg
95848b8605Smrg.. math::
96848b8605Smrg
97848b8605Smrg  dst = \frac{1}{src.x}
98848b8605Smrg
99848b8605Smrg
100848b8605Smrg.. opcode:: RSQ - Reciprocal Square Root
101848b8605Smrg
102848b8605SmrgThis instruction replicates its result. The results are undefined for src <= 0.
103848b8605Smrg
104848b8605Smrg.. math::
105848b8605Smrg
106848b8605Smrg  dst = \frac{1}{\sqrt{src.x}}
107848b8605Smrg
108848b8605Smrg
109848b8605Smrg.. opcode:: SQRT - Square Root
110848b8605Smrg
111848b8605SmrgThis instruction replicates its result. The results are undefined for src < 0.
112848b8605Smrg
113848b8605Smrg.. math::
114848b8605Smrg
115848b8605Smrg  dst = {\sqrt{src.x}}
116848b8605Smrg
117848b8605Smrg
118848b8605Smrg.. opcode:: EXP - Approximate Exponential Base 2
119848b8605Smrg
120848b8605Smrg.. math::
121848b8605Smrg
122848b8605Smrg  dst.x &= 2^{\lfloor src.x\rfloor} \\
123848b8605Smrg  dst.y &= src.x - \lfloor src.x\rfloor \\
124848b8605Smrg  dst.z &= 2^{src.x} \\
125848b8605Smrg  dst.w &= 1
126848b8605Smrg
127848b8605Smrg
128848b8605Smrg.. opcode:: LOG - Approximate Logarithm Base 2
129848b8605Smrg
130848b8605Smrg.. math::
131848b8605Smrg
132848b8605Smrg  dst.x &= \lfloor\log_2{|src.x|}\rfloor \\
133848b8605Smrg  dst.y &= \frac{|src.x|}{2^{\lfloor\log_2{|src.x|}\rfloor}} \\
134848b8605Smrg  dst.z &= \log_2{|src.x|} \\
135848b8605Smrg  dst.w &= 1
136848b8605Smrg
137848b8605Smrg
138848b8605Smrg.. opcode:: MUL - Multiply
139848b8605Smrg
140848b8605Smrg.. math::
141848b8605Smrg
142848b8605Smrg  dst.x = src0.x \times src1.x
143848b8605Smrg
144848b8605Smrg  dst.y = src0.y \times src1.y
145848b8605Smrg
146848b8605Smrg  dst.z = src0.z \times src1.z
147848b8605Smrg
148848b8605Smrg  dst.w = src0.w \times src1.w
149848b8605Smrg
150848b8605Smrg
151848b8605Smrg.. opcode:: ADD - Add
152848b8605Smrg
153848b8605Smrg.. math::
154848b8605Smrg
155848b8605Smrg  dst.x = src0.x + src1.x
156848b8605Smrg
157848b8605Smrg  dst.y = src0.y + src1.y
158848b8605Smrg
159848b8605Smrg  dst.z = src0.z + src1.z
160848b8605Smrg
161848b8605Smrg  dst.w = src0.w + src1.w
162848b8605Smrg
163848b8605Smrg
164848b8605Smrg.. opcode:: DP3 - 3-component Dot Product
165848b8605Smrg
166848b8605SmrgThis instruction replicates its result.
167848b8605Smrg
168848b8605Smrg.. math::
169848b8605Smrg
170848b8605Smrg  dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z
171848b8605Smrg
172848b8605Smrg
173848b8605Smrg.. opcode:: DP4 - 4-component Dot Product
174848b8605Smrg
175848b8605SmrgThis instruction replicates its result.
176848b8605Smrg
177848b8605Smrg.. math::
178848b8605Smrg
179848b8605Smrg  dst = src0.x \times src1.x + src0.y \times src1.y + src0.z \times src1.z + src0.w \times src1.w
180848b8605Smrg
181848b8605Smrg
182848b8605Smrg.. opcode:: DST - Distance Vector
183848b8605Smrg
184848b8605Smrg.. math::
185848b8605Smrg
186848b8605Smrg  dst.x &= 1\\
187848b8605Smrg  dst.y &= src0.y \times src1.y\\
188848b8605Smrg  dst.z &= src0.z\\
189848b8605Smrg  dst.w &= src1.w
190848b8605Smrg
191848b8605Smrg
192848b8605Smrg.. opcode:: MIN - Minimum
193848b8605Smrg
194848b8605Smrg.. math::
195848b8605Smrg
196848b8605Smrg  dst.x = min(src0.x, src1.x)
197848b8605Smrg
198848b8605Smrg  dst.y = min(src0.y, src1.y)
199848b8605Smrg
200848b8605Smrg  dst.z = min(src0.z, src1.z)
201848b8605Smrg
202848b8605Smrg  dst.w = min(src0.w, src1.w)
203848b8605Smrg
204848b8605Smrg
205848b8605Smrg.. opcode:: MAX - Maximum
206848b8605Smrg
207848b8605Smrg.. math::
208848b8605Smrg
209848b8605Smrg  dst.x = max(src0.x, src1.x)
210848b8605Smrg
211848b8605Smrg  dst.y = max(src0.y, src1.y)
212848b8605Smrg
213848b8605Smrg  dst.z = max(src0.z, src1.z)
214848b8605Smrg
215848b8605Smrg  dst.w = max(src0.w, src1.w)
216848b8605Smrg
217848b8605Smrg
218848b8605Smrg.. opcode:: SLT - Set On Less Than
219848b8605Smrg
220848b8605Smrg.. math::
221848b8605Smrg
222848b8605Smrg  dst.x = (src0.x < src1.x) ? 1.0F : 0.0F
223848b8605Smrg
224848b8605Smrg  dst.y = (src0.y < src1.y) ? 1.0F : 0.0F
225848b8605Smrg
226848b8605Smrg  dst.z = (src0.z < src1.z) ? 1.0F : 0.0F
227848b8605Smrg
228848b8605Smrg  dst.w = (src0.w < src1.w) ? 1.0F : 0.0F
229848b8605Smrg
230848b8605Smrg
231848b8605Smrg.. opcode:: SGE - Set On Greater Equal Than
232848b8605Smrg
233848b8605Smrg.. math::
234848b8605Smrg
235848b8605Smrg  dst.x = (src0.x >= src1.x) ? 1.0F : 0.0F
236848b8605Smrg
237848b8605Smrg  dst.y = (src0.y >= src1.y) ? 1.0F : 0.0F
238848b8605Smrg
239848b8605Smrg  dst.z = (src0.z >= src1.z) ? 1.0F : 0.0F
240848b8605Smrg
241848b8605Smrg  dst.w = (src0.w >= src1.w) ? 1.0F : 0.0F
242848b8605Smrg
243848b8605Smrg
244848b8605Smrg.. opcode:: MAD - Multiply And Add
245848b8605Smrg
246b8e80941SmrgPerform a * b + c. The implementation is free to decide whether there is an
247b8e80941Smrgintermediate rounding step or not.
248b8e80941Smrg
249848b8605Smrg.. math::
250848b8605Smrg
251848b8605Smrg  dst.x = src0.x \times src1.x + src2.x
252848b8605Smrg
253848b8605Smrg  dst.y = src0.y \times src1.y + src2.y
254848b8605Smrg
255848b8605Smrg  dst.z = src0.z \times src1.z + src2.z
256848b8605Smrg
257848b8605Smrg  dst.w = src0.w \times src1.w + src2.w
258848b8605Smrg
259848b8605Smrg
260848b8605Smrg.. opcode:: LRP - Linear Interpolate
261848b8605Smrg
262848b8605Smrg.. math::
263848b8605Smrg
264848b8605Smrg  dst.x = src0.x \times src1.x + (1 - src0.x) \times src2.x
265848b8605Smrg
266848b8605Smrg  dst.y = src0.y \times src1.y + (1 - src0.y) \times src2.y
267848b8605Smrg
268848b8605Smrg  dst.z = src0.z \times src1.z + (1 - src0.z) \times src2.z
269848b8605Smrg
270848b8605Smrg  dst.w = src0.w \times src1.w + (1 - src0.w) \times src2.w
271848b8605Smrg
272848b8605Smrg
273b8e80941Smrg.. opcode:: FMA - Fused Multiply-Add
274848b8605Smrg
275b8e80941SmrgPerform a * b + c with no intermediate rounding step.
276848b8605Smrg
277848b8605Smrg.. math::
278848b8605Smrg
279b8e80941Smrg  dst.x = src0.x \times src1.x + src2.x
280848b8605Smrg
281b8e80941Smrg  dst.y = src0.y \times src1.y + src2.y
282848b8605Smrg
283b8e80941Smrg  dst.z = src0.z \times src1.z + src2.z
284848b8605Smrg
285b8e80941Smrg  dst.w = src0.w \times src1.w + src2.w
286848b8605Smrg
287848b8605Smrg
288848b8605Smrg.. opcode:: FRC - Fraction
289848b8605Smrg
290848b8605Smrg.. math::
291848b8605Smrg
292848b8605Smrg  dst.x = src.x - \lfloor src.x\rfloor
293848b8605Smrg
294848b8605Smrg  dst.y = src.y - \lfloor src.y\rfloor
295848b8605Smrg
296848b8605Smrg  dst.z = src.z - \lfloor src.z\rfloor
297848b8605Smrg
298848b8605Smrg  dst.w = src.w - \lfloor src.w\rfloor
299848b8605Smrg
300848b8605Smrg
301848b8605Smrg.. opcode:: FLR - Floor
302848b8605Smrg
303848b8605Smrg.. math::
304848b8605Smrg
305848b8605Smrg  dst.x = \lfloor src.x\rfloor
306848b8605Smrg
307848b8605Smrg  dst.y = \lfloor src.y\rfloor
308848b8605Smrg
309848b8605Smrg  dst.z = \lfloor src.z\rfloor
310848b8605Smrg
311848b8605Smrg  dst.w = \lfloor src.w\rfloor
312848b8605Smrg
313848b8605Smrg
314848b8605Smrg.. opcode:: ROUND - Round
315848b8605Smrg
316848b8605Smrg.. math::
317848b8605Smrg
318848b8605Smrg  dst.x = round(src.x)
319848b8605Smrg
320848b8605Smrg  dst.y = round(src.y)
321848b8605Smrg
322848b8605Smrg  dst.z = round(src.z)
323848b8605Smrg
324848b8605Smrg  dst.w = round(src.w)
325848b8605Smrg
326848b8605Smrg
327848b8605Smrg.. opcode:: EX2 - Exponential Base 2
328848b8605Smrg
329848b8605SmrgThis instruction replicates its result.
330848b8605Smrg
331848b8605Smrg.. math::
332848b8605Smrg
333848b8605Smrg  dst = 2^{src.x}
334848b8605Smrg
335848b8605Smrg
336848b8605Smrg.. opcode:: LG2 - Logarithm Base 2
337848b8605Smrg
338848b8605SmrgThis instruction replicates its result.
339848b8605Smrg
340848b8605Smrg.. math::
341848b8605Smrg
342848b8605Smrg  dst = \log_2{src.x}
343848b8605Smrg
344848b8605Smrg
345848b8605Smrg.. opcode:: POW - Power
346848b8605Smrg
347848b8605SmrgThis instruction replicates its result.
348848b8605Smrg
349848b8605Smrg.. math::
350848b8605Smrg
351848b8605Smrg  dst = src0.x^{src1.x}
352848b8605Smrg
353848b8605Smrg
354b8e80941Smrg.. opcode:: LDEXP - Multiply Number by Integral Power of 2
355848b8605Smrg
356b8e80941Smrgsrc1 is an integer.
357848b8605Smrg
358848b8605Smrg.. math::
359848b8605Smrg
360b8e80941Smrg  dst.x = src0.x * 2^{src1.x}
361b8e80941Smrg  dst.y = src0.y * 2^{src1.y}
362b8e80941Smrg  dst.z = src0.z * 2^{src1.z}
363b8e80941Smrg  dst.w = src0.w * 2^{src1.w}
364848b8605Smrg
365848b8605Smrg
366848b8605Smrg.. opcode:: COS - Cosine
367848b8605Smrg
368848b8605SmrgThis instruction replicates its result.
369848b8605Smrg
370848b8605Smrg.. math::
371848b8605Smrg
372848b8605Smrg  dst = \cos{src.x}
373848b8605Smrg
374848b8605Smrg
375848b8605Smrg.. opcode:: DDX, DDX_FINE - Derivative Relative To X
376848b8605Smrg
377848b8605SmrgThe fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
378848b8605Smrgadvertised. When it is, the fine version guarantees one derivative per row
379848b8605Smrgwhile DDX is allowed to be the same for the entire 2x2 quad.
380848b8605Smrg
381848b8605Smrg.. math::
382848b8605Smrg
383848b8605Smrg  dst.x = partialx(src.x)
384848b8605Smrg
385848b8605Smrg  dst.y = partialx(src.y)
386848b8605Smrg
387848b8605Smrg  dst.z = partialx(src.z)
388848b8605Smrg
389848b8605Smrg  dst.w = partialx(src.w)
390848b8605Smrg
391848b8605Smrg
392848b8605Smrg.. opcode:: DDY, DDY_FINE - Derivative Relative To Y
393848b8605Smrg
394848b8605SmrgThe fine variant is only used when ``PIPE_CAP_TGSI_FS_FINE_DERIVATIVE`` is
395848b8605Smrgadvertised. When it is, the fine version guarantees one derivative per column
396848b8605Smrgwhile DDY is allowed to be the same for the entire 2x2 quad.
397848b8605Smrg
398848b8605Smrg.. math::
399848b8605Smrg
400848b8605Smrg  dst.x = partialy(src.x)
401848b8605Smrg
402848b8605Smrg  dst.y = partialy(src.y)
403848b8605Smrg
404848b8605Smrg  dst.z = partialy(src.z)
405848b8605Smrg
406848b8605Smrg  dst.w = partialy(src.w)
407848b8605Smrg
408848b8605Smrg
409848b8605Smrg.. opcode:: PK2H - Pack Two 16-bit Floats
410848b8605Smrg
411b8e80941SmrgThis instruction replicates its result.
412848b8605Smrg
413b8e80941Smrg.. math::
414848b8605Smrg
415b8e80941Smrg  dst = f32\_to\_f16(src.x) | f32\_to\_f16(src.y) << 16
416848b8605Smrg
417848b8605Smrg
418b8e80941Smrg.. opcode:: PK2US - Pack Two Unsigned 16-bit Scalars
419848b8605Smrg
420b8e80941SmrgThis instruction replicates its result.
421848b8605Smrg
422b8e80941Smrg.. math::
423848b8605Smrg
424b8e80941Smrg  dst = f32\_to\_unorm16(src.x) | f32\_to\_unorm16(src.y) << 16
425848b8605Smrg
426848b8605Smrg
427b8e80941Smrg.. opcode:: PK4B - Pack Four Signed 8-bit Scalars
428848b8605Smrg
429b8e80941SmrgThis instruction replicates its result.
430848b8605Smrg
431848b8605Smrg.. math::
432848b8605Smrg
433b8e80941Smrg  dst = f32\_to\_snorm8(src.x) |
434b8e80941Smrg        (f32\_to\_snorm8(src.y) << 8) |
435b8e80941Smrg        (f32\_to\_snorm8(src.z) << 16) |
436b8e80941Smrg        (f32\_to\_snorm8(src.w) << 24)
437848b8605Smrg
438848b8605Smrg
439b8e80941Smrg.. opcode:: PK4UB - Pack Four Unsigned 8-bit Scalars
440848b8605Smrg
441b8e80941SmrgThis instruction replicates its result.
442848b8605Smrg
443b8e80941Smrg.. math::
444848b8605Smrg
445b8e80941Smrg  dst = f32\_to\_unorm8(src.x) |
446b8e80941Smrg        (f32\_to\_unorm8(src.y) << 8) |
447b8e80941Smrg        (f32\_to\_unorm8(src.z) << 16) |
448b8e80941Smrg        (f32\_to\_unorm8(src.w) << 24)
449848b8605Smrg
450848b8605Smrg
451848b8605Smrg.. opcode:: SEQ - Set On Equal
452848b8605Smrg
453848b8605Smrg.. math::
454848b8605Smrg
455848b8605Smrg  dst.x = (src0.x == src1.x) ? 1.0F : 0.0F
456848b8605Smrg
457848b8605Smrg  dst.y = (src0.y == src1.y) ? 1.0F : 0.0F
458848b8605Smrg
459848b8605Smrg  dst.z = (src0.z == src1.z) ? 1.0F : 0.0F
460848b8605Smrg
461848b8605Smrg  dst.w = (src0.w == src1.w) ? 1.0F : 0.0F
462848b8605Smrg
463848b8605Smrg
464848b8605Smrg.. opcode:: SGT - Set On Greater Than
465848b8605Smrg
466848b8605Smrg.. math::
467848b8605Smrg
468848b8605Smrg  dst.x = (src0.x > src1.x) ? 1.0F : 0.0F
469848b8605Smrg
470848b8605Smrg  dst.y = (src0.y > src1.y) ? 1.0F : 0.0F
471848b8605Smrg
472848b8605Smrg  dst.z = (src0.z > src1.z) ? 1.0F : 0.0F
473848b8605Smrg
474848b8605Smrg  dst.w = (src0.w > src1.w) ? 1.0F : 0.0F
475848b8605Smrg
476848b8605Smrg
477848b8605Smrg.. opcode:: SIN - Sine
478848b8605Smrg
479848b8605SmrgThis instruction replicates its result.
480848b8605Smrg
481848b8605Smrg.. math::
482848b8605Smrg
483848b8605Smrg  dst = \sin{src.x}
484848b8605Smrg
485848b8605Smrg
486848b8605Smrg.. opcode:: SLE - Set On Less Equal Than
487848b8605Smrg
488848b8605Smrg.. math::
489848b8605Smrg
490848b8605Smrg  dst.x = (src0.x <= src1.x) ? 1.0F : 0.0F
491848b8605Smrg
492848b8605Smrg  dst.y = (src0.y <= src1.y) ? 1.0F : 0.0F
493848b8605Smrg
494848b8605Smrg  dst.z = (src0.z <= src1.z) ? 1.0F : 0.0F
495848b8605Smrg
496848b8605Smrg  dst.w = (src0.w <= src1.w) ? 1.0F : 0.0F
497848b8605Smrg
498848b8605Smrg
499848b8605Smrg.. opcode:: SNE - Set On Not Equal
500848b8605Smrg
501848b8605Smrg.. math::
502848b8605Smrg
503848b8605Smrg  dst.x = (src0.x != src1.x) ? 1.0F : 0.0F
504848b8605Smrg
505848b8605Smrg  dst.y = (src0.y != src1.y) ? 1.0F : 0.0F
506848b8605Smrg
507848b8605Smrg  dst.z = (src0.z != src1.z) ? 1.0F : 0.0F
508848b8605Smrg
509848b8605Smrg  dst.w = (src0.w != src1.w) ? 1.0F : 0.0F
510848b8605Smrg
511848b8605Smrg
512848b8605Smrg.. opcode:: TEX - Texture Lookup
513848b8605Smrg
514848b8605Smrg  for array textures src0.y contains the slice for 1D,
515848b8605Smrg  and src0.z contain the slice for 2D.
516848b8605Smrg
517848b8605Smrg  for shadow textures with no arrays (and not cube map),
518848b8605Smrg  src0.z contains the reference value.
519848b8605Smrg
520848b8605Smrg  for shadow textures with arrays, src0.z contains
521848b8605Smrg  the reference value for 1D arrays, and src0.w contains
522848b8605Smrg  the reference value for 2D arrays and cube maps.
523848b8605Smrg
524848b8605Smrg  for cube map array shadow textures, the reference value
525848b8605Smrg  cannot be passed in src0.w, and TEX2 must be used instead.
526848b8605Smrg
527848b8605Smrg.. math::
528848b8605Smrg
529848b8605Smrg  coord = src0
530848b8605Smrg
531848b8605Smrg  shadow_ref = src0.z or src0.w (optional)
532848b8605Smrg
533848b8605Smrg  unit = src1
534848b8605Smrg
535848b8605Smrg  dst = texture\_sample(unit, coord, shadow_ref)
536848b8605Smrg
537848b8605Smrg
538848b8605Smrg.. opcode:: TEX2 - Texture Lookup (for shadow cube map arrays only)
539848b8605Smrg
540848b8605Smrg  this is the same as TEX, but uses another reg to encode the
541848b8605Smrg  reference value.
542848b8605Smrg
543848b8605Smrg.. math::
544848b8605Smrg
545848b8605Smrg  coord = src0
546848b8605Smrg
547848b8605Smrg  shadow_ref = src1.x
548848b8605Smrg
549848b8605Smrg  unit = src2
550848b8605Smrg
551848b8605Smrg  dst = texture\_sample(unit, coord, shadow_ref)
552848b8605Smrg
553848b8605Smrg
554848b8605Smrg
555848b8605Smrg
556848b8605Smrg.. opcode:: TXD - Texture Lookup with Derivatives
557848b8605Smrg
558848b8605Smrg.. math::
559848b8605Smrg
560848b8605Smrg  coord = src0
561848b8605Smrg
562848b8605Smrg  ddx = src1
563848b8605Smrg
564848b8605Smrg  ddy = src2
565848b8605Smrg
566848b8605Smrg  unit = src3
567848b8605Smrg
568848b8605Smrg  dst = texture\_sample\_deriv(unit, coord, ddx, ddy)
569848b8605Smrg
570848b8605Smrg
571848b8605Smrg.. opcode:: TXP - Projective Texture Lookup
572848b8605Smrg
573848b8605Smrg.. math::
574848b8605Smrg
575848b8605Smrg  coord.x = src0.x / src0.w
576848b8605Smrg
577848b8605Smrg  coord.y = src0.y / src0.w
578848b8605Smrg
579848b8605Smrg  coord.z = src0.z / src0.w
580848b8605Smrg
581848b8605Smrg  coord.w = src0.w
582848b8605Smrg
583848b8605Smrg  unit = src1
584848b8605Smrg
585848b8605Smrg  dst = texture\_sample(unit, coord)
586848b8605Smrg
587848b8605Smrg
588848b8605Smrg.. opcode:: UP2H - Unpack Two 16-Bit Floats
589848b8605Smrg
590b8e80941Smrg.. math::
591848b8605Smrg
592b8e80941Smrg  dst.x = f16\_to\_f32(src0.x \& 0xffff)
593848b8605Smrg
594b8e80941Smrg  dst.y = f16\_to\_f32(src0.x >> 16)
595848b8605Smrg
596b8e80941Smrg  dst.z = f16\_to\_f32(src0.x \& 0xffff)
597848b8605Smrg
598b8e80941Smrg  dst.w = f16\_to\_f32(src0.x >> 16)
599848b8605Smrg
600848b8605Smrg.. note::
601848b8605Smrg
602848b8605Smrg   Considered for removal.
603848b8605Smrg
604b8e80941Smrg.. opcode:: UP2US - Unpack Two Unsigned 16-Bit Scalars
605848b8605Smrg
606848b8605Smrg  TBD
607848b8605Smrg
608848b8605Smrg.. note::
609848b8605Smrg
610848b8605Smrg   Considered for removal.
611848b8605Smrg
612b8e80941Smrg.. opcode:: UP4B - Unpack Four Signed 8-Bit Values
613848b8605Smrg
614848b8605Smrg  TBD
615848b8605Smrg
616848b8605Smrg.. note::
617848b8605Smrg
618848b8605Smrg   Considered for removal.
619848b8605Smrg
620b8e80941Smrg.. opcode:: UP4UB - Unpack Four Unsigned 8-Bit Scalars
621848b8605Smrg
622848b8605Smrg  TBD
623848b8605Smrg
624848b8605Smrg.. note::
625848b8605Smrg
626848b8605Smrg   Considered for removal.
627848b8605Smrg
628b8e80941Smrg
629848b8605Smrg.. opcode:: ARR - Address Register Load With Round
630848b8605Smrg
631848b8605Smrg.. math::
632848b8605Smrg
633b8e80941Smrg  dst.x = (int) round(src.x)
634848b8605Smrg
635b8e80941Smrg  dst.y = (int) round(src.y)
636848b8605Smrg
637b8e80941Smrg  dst.z = (int) round(src.z)
638848b8605Smrg
639b8e80941Smrg  dst.w = (int) round(src.w)
640848b8605Smrg
641848b8605Smrg
642848b8605Smrg.. opcode:: SSG - Set Sign
643848b8605Smrg
644848b8605Smrg.. math::
645848b8605Smrg
646848b8605Smrg  dst.x = (src.x > 0) ? 1 : (src.x < 0) ? -1 : 0
647848b8605Smrg
648848b8605Smrg  dst.y = (src.y > 0) ? 1 : (src.y < 0) ? -1 : 0
649848b8605Smrg
650848b8605Smrg  dst.z = (src.z > 0) ? 1 : (src.z < 0) ? -1 : 0
651848b8605Smrg
652848b8605Smrg  dst.w = (src.w > 0) ? 1 : (src.w < 0) ? -1 : 0
653848b8605Smrg
654848b8605Smrg
655848b8605Smrg.. opcode:: CMP - Compare
656848b8605Smrg
657848b8605Smrg.. math::
658848b8605Smrg
659848b8605Smrg  dst.x = (src0.x < 0) ? src1.x : src2.x
660848b8605Smrg
661848b8605Smrg  dst.y = (src0.y < 0) ? src1.y : src2.y
662848b8605Smrg
663848b8605Smrg  dst.z = (src0.z < 0) ? src1.z : src2.z
664848b8605Smrg
665848b8605Smrg  dst.w = (src0.w < 0) ? src1.w : src2.w
666848b8605Smrg
667848b8605Smrg
668848b8605Smrg.. opcode:: KILL_IF - Conditional Discard
669848b8605Smrg
670848b8605Smrg  Conditional discard.  Allowed in fragment shaders only.
671848b8605Smrg
672848b8605Smrg.. math::
673848b8605Smrg
674848b8605Smrg  if (src.x < 0 || src.y < 0 || src.z < 0 || src.w < 0)
675848b8605Smrg    discard
676848b8605Smrg  endif
677848b8605Smrg
678848b8605Smrg
679848b8605Smrg.. opcode:: KILL - Discard
680848b8605Smrg
681848b8605Smrg  Unconditional discard.  Allowed in fragment shaders only.
682848b8605Smrg
683848b8605Smrg
684848b8605Smrg.. opcode:: TXB - Texture Lookup With Bias
685848b8605Smrg
686848b8605Smrg  for cube map array textures and shadow cube maps, the bias value
687848b8605Smrg  cannot be passed in src0.w, and TXB2 must be used instead.
688848b8605Smrg
689848b8605Smrg  if the target is a shadow texture, the reference value is always
690848b8605Smrg  in src.z (this prevents shadow 3d and shadow 2d arrays from
691848b8605Smrg  using this instruction, but this is not needed).
692848b8605Smrg
693848b8605Smrg.. math::
694848b8605Smrg
695848b8605Smrg  coord.x = src0.x
696848b8605Smrg
697848b8605Smrg  coord.y = src0.y
698848b8605Smrg
699848b8605Smrg  coord.z = src0.z
700848b8605Smrg
701848b8605Smrg  coord.w = none
702848b8605Smrg
703848b8605Smrg  bias = src0.w
704848b8605Smrg
705848b8605Smrg  unit = src1
706848b8605Smrg
707848b8605Smrg  dst = texture\_sample(unit, coord, bias)
708848b8605Smrg
709848b8605Smrg
710848b8605Smrg.. opcode:: TXB2 - Texture Lookup With Bias (some cube maps only)
711848b8605Smrg
712848b8605Smrg  this is the same as TXB, but uses another reg to encode the
713848b8605Smrg  lod bias value for cube map arrays and shadow cube maps.
714848b8605Smrg  Presumably shadow 2d arrays and shadow 3d targets could use
715848b8605Smrg  this encoding too, but this is not legal.
716848b8605Smrg
717848b8605Smrg  shadow cube map arrays are neither possible nor required.
718848b8605Smrg
719848b8605Smrg.. math::
720848b8605Smrg
721848b8605Smrg  coord = src0
722848b8605Smrg
723848b8605Smrg  bias = src1.x
724848b8605Smrg
725848b8605Smrg  unit = src2
726848b8605Smrg
727848b8605Smrg  dst = texture\_sample(unit, coord, bias)
728848b8605Smrg
729848b8605Smrg
730848b8605Smrg.. opcode:: DIV - Divide
731848b8605Smrg
732848b8605Smrg.. math::
733848b8605Smrg
734848b8605Smrg  dst.x = \frac{src0.x}{src1.x}
735848b8605Smrg
736848b8605Smrg  dst.y = \frac{src0.y}{src1.y}
737848b8605Smrg
738848b8605Smrg  dst.z = \frac{src0.z}{src1.z}
739848b8605Smrg
740848b8605Smrg  dst.w = \frac{src0.w}{src1.w}
741848b8605Smrg
742848b8605Smrg
743848b8605Smrg.. opcode:: DP2 - 2-component Dot Product
744848b8605Smrg
745848b8605SmrgThis instruction replicates its result.
746848b8605Smrg
747848b8605Smrg.. math::
748848b8605Smrg
749848b8605Smrg  dst = src0.x \times src1.x + src0.y \times src1.y
750848b8605Smrg
751848b8605Smrg
752b8e80941Smrg.. opcode:: TEX_LZ - Texture Lookup With LOD = 0
753b8e80941Smrg
754b8e80941Smrg  This is the same as TXL with LOD = 0. Like every texture opcode, it obeys
755b8e80941Smrg  pipe_sampler_view::u.tex.first_level and pipe_sampler_state::min_lod.
756b8e80941Smrg  There is no way to override those two in shaders.
757b8e80941Smrg
758b8e80941Smrg.. math::
759b8e80941Smrg
760b8e80941Smrg  coord.x = src0.x
761b8e80941Smrg
762b8e80941Smrg  coord.y = src0.y
763b8e80941Smrg
764b8e80941Smrg  coord.z = src0.z
765b8e80941Smrg
766b8e80941Smrg  coord.w = none
767b8e80941Smrg
768b8e80941Smrg  lod = 0
769b8e80941Smrg
770b8e80941Smrg  unit = src1
771b8e80941Smrg
772b8e80941Smrg  dst = texture\_sample(unit, coord, lod)
773b8e80941Smrg
774b8e80941Smrg
775848b8605Smrg.. opcode:: TXL - Texture Lookup With explicit LOD
776848b8605Smrg
777848b8605Smrg  for cube map array textures, the explicit lod value
778848b8605Smrg  cannot be passed in src0.w, and TXL2 must be used instead.
779848b8605Smrg
780848b8605Smrg  if the target is a shadow texture, the reference value is always
781848b8605Smrg  in src.z (this prevents shadow 3d / 2d array / cube targets from
782848b8605Smrg  using this instruction, but this is not needed).
783848b8605Smrg
784848b8605Smrg.. math::
785848b8605Smrg
786848b8605Smrg  coord.x = src0.x
787848b8605Smrg
788848b8605Smrg  coord.y = src0.y
789848b8605Smrg
790848b8605Smrg  coord.z = src0.z
791848b8605Smrg
792848b8605Smrg  coord.w = none
793848b8605Smrg
794848b8605Smrg  lod = src0.w
795848b8605Smrg
796848b8605Smrg  unit = src1
797848b8605Smrg
798848b8605Smrg  dst = texture\_sample(unit, coord, lod)
799848b8605Smrg
800848b8605Smrg
801848b8605Smrg.. opcode:: TXL2 - Texture Lookup With explicit LOD (for cube map arrays only)
802848b8605Smrg
803848b8605Smrg  this is the same as TXL, but uses another reg to encode the
804848b8605Smrg  explicit lod value.
805848b8605Smrg  Presumably shadow 3d / 2d array / cube targets could use
806848b8605Smrg  this encoding too, but this is not legal.
807848b8605Smrg
808848b8605Smrg  shadow cube map arrays are neither possible nor required.
809848b8605Smrg
810848b8605Smrg.. math::
811848b8605Smrg
812848b8605Smrg  coord = src0
813848b8605Smrg
814848b8605Smrg  lod = src1.x
815848b8605Smrg
816848b8605Smrg  unit = src2
817848b8605Smrg
818848b8605Smrg  dst = texture\_sample(unit, coord, lod)
819848b8605Smrg
820848b8605Smrg
821848b8605SmrgCompute ISA
822848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
823848b8605Smrg
824848b8605SmrgThese opcodes are primarily provided for special-use computational shaders.
825848b8605SmrgSupport for these opcodes indicated by a special pipe capability bit (TBD).
826848b8605Smrg
827848b8605SmrgXXX doesn't look like most of the opcodes really belong here.
828848b8605Smrg
829848b8605Smrg.. opcode:: CEIL - Ceiling
830848b8605Smrg
831848b8605Smrg.. math::
832848b8605Smrg
833848b8605Smrg  dst.x = \lceil src.x\rceil
834848b8605Smrg
835848b8605Smrg  dst.y = \lceil src.y\rceil
836848b8605Smrg
837848b8605Smrg  dst.z = \lceil src.z\rceil
838848b8605Smrg
839848b8605Smrg  dst.w = \lceil src.w\rceil
840848b8605Smrg
841848b8605Smrg
842848b8605Smrg.. opcode:: TRUNC - Truncate
843848b8605Smrg
844848b8605Smrg.. math::
845848b8605Smrg
846848b8605Smrg  dst.x = trunc(src.x)
847848b8605Smrg
848848b8605Smrg  dst.y = trunc(src.y)
849848b8605Smrg
850848b8605Smrg  dst.z = trunc(src.z)
851848b8605Smrg
852848b8605Smrg  dst.w = trunc(src.w)
853848b8605Smrg
854848b8605Smrg
855848b8605Smrg.. opcode:: MOD - Modulus
856848b8605Smrg
857848b8605Smrg.. math::
858848b8605Smrg
859848b8605Smrg  dst.x = src0.x \bmod src1.x
860848b8605Smrg
861848b8605Smrg  dst.y = src0.y \bmod src1.y
862848b8605Smrg
863848b8605Smrg  dst.z = src0.z \bmod src1.z
864848b8605Smrg
865848b8605Smrg  dst.w = src0.w \bmod src1.w
866848b8605Smrg
867848b8605Smrg
868848b8605Smrg.. opcode:: UARL - Integer Address Register Load
869848b8605Smrg
870848b8605Smrg  Moves the contents of the source register, assumed to be an integer, into the
871848b8605Smrg  destination register, which is assumed to be an address (ADDR) register.
872848b8605Smrg
873848b8605Smrg
874848b8605Smrg.. opcode:: TXF - Texel Fetch
875848b8605Smrg
876848b8605Smrg  As per NV_gpu_shader4, extract a single texel from a specified texture
877b8e80941Smrg  image or PIPE_BUFFER resource. The source sampler may not be a CUBE or
878b8e80941Smrg  SHADOW.  src 0 is a
879848b8605Smrg  four-component signed integer vector used to identify the single texel
880b8e80941Smrg  accessed. 3 components + level.  If the texture is multisampled, then
881b8e80941Smrg  the fourth component indicates the sample, not the mipmap level.
882b8e80941Smrg  Just like texture instructions, an optional
883848b8605Smrg  offset vector is provided, which is subject to various driver restrictions
884b8e80941Smrg  (regarding range, source of offsets). This instruction ignores the sampler
885b8e80941Smrg  state.
886b8e80941Smrg
887848b8605Smrg  TXF(uint_vec coord, int_vec offset).
888848b8605Smrg
889848b8605Smrg
890848b8605Smrg.. opcode:: TXQ - Texture Size Query
891848b8605Smrg
892848b8605Smrg  As per NV_gpu_program4, retrieve the dimensions of the texture depending on
893848b8605Smrg  the target. For 1D (width), 2D/RECT/CUBE (width, height), 3D (width, height,
894848b8605Smrg  depth), 1D array (width, layers), 2D array (width, height, layers).
895848b8605Smrg  Also return the number of accessible levels (last_level - first_level + 1)
896848b8605Smrg  in W.
897848b8605Smrg
898848b8605Smrg  For components which don't return a resource dimension, their value
899848b8605Smrg  is undefined.
900848b8605Smrg
901848b8605Smrg.. math::
902848b8605Smrg
903848b8605Smrg  lod = src0.x
904848b8605Smrg
905848b8605Smrg  dst.x = texture\_width(unit, lod)
906848b8605Smrg
907848b8605Smrg  dst.y = texture\_height(unit, lod)
908848b8605Smrg
909848b8605Smrg  dst.z = texture\_depth(unit, lod)
910848b8605Smrg
911848b8605Smrg  dst.w = texture\_levels(unit)
912848b8605Smrg
913b8e80941Smrg
914b8e80941Smrg.. opcode:: TXQS - Texture Samples Query
915b8e80941Smrg
916b8e80941Smrg  This retrieves the number of samples in the texture, and stores it
917b8e80941Smrg  into the x component as an unsigned integer. The other components are
918b8e80941Smrg  undefined.  If the texture is not multisampled, this function returns
919b8e80941Smrg  (1, undef, undef, undef).
920b8e80941Smrg
921b8e80941Smrg.. math::
922b8e80941Smrg
923b8e80941Smrg  dst.x = texture\_samples(unit)
924b8e80941Smrg
925b8e80941Smrg
926848b8605Smrg.. opcode:: TG4 - Texture Gather
927848b8605Smrg
928848b8605Smrg  As per ARB_texture_gather, gathers the four texels to be used in a bi-linear
929848b8605Smrg  filtering operation and packs them into a single register.  Only works with
930848b8605Smrg  2D, 2D array, cubemaps, and cubemaps arrays.  For 2D textures, only the
931848b8605Smrg  addressing modes of the sampler and the top level of any mip pyramid are
932848b8605Smrg  used. Set W to zero.  It behaves like the TEX instruction, but a filtered
933848b8605Smrg  sample is not generated. The four samples that contribute to filtering are
934848b8605Smrg  placed into xyzw in clockwise order, starting with the (u,v) texture
935848b8605Smrg  coordinate delta at the following locations (-, +), (+, +), (+, -), (-, -),
936848b8605Smrg  where the magnitude of the deltas are half a texel.
937848b8605Smrg
938848b8605Smrg  PIPE_CAP_TEXTURE_SM5 enhances this instruction to support shadow per-sample
939848b8605Smrg  depth compares, single component selection, and a non-constant offset. It
940848b8605Smrg  doesn't allow support for the GL independent offset to get i0,j0. This would
941848b8605Smrg  require another CAP is hw can do it natively. For now we lower that before
942848b8605Smrg  TGSI.
943848b8605Smrg
944848b8605Smrg.. math::
945848b8605Smrg
946848b8605Smrg   coord = src0
947848b8605Smrg
948848b8605Smrg   component = src1
949848b8605Smrg
950848b8605Smrg   dst = texture\_gather4 (unit, coord, component)
951848b8605Smrg
952848b8605Smrg(with SM5 - cube array shadow)
953848b8605Smrg
954848b8605Smrg.. math::
955848b8605Smrg
956848b8605Smrg   coord = src0
957848b8605Smrg
958848b8605Smrg   compare = src1
959848b8605Smrg
960848b8605Smrg   dst = texture\_gather (uint, coord, compare)
961848b8605Smrg
962848b8605Smrg.. opcode:: LODQ - level of detail query
963848b8605Smrg
964848b8605Smrg   Compute the LOD information that the texture pipe would use to access the
965848b8605Smrg   texture. The Y component contains the computed LOD lambda_prime. The X
966848b8605Smrg   component contains the LOD that will be accessed, based on min/max lod's
967848b8605Smrg   and mipmap filters.
968848b8605Smrg
969848b8605Smrg.. math::
970848b8605Smrg
971848b8605Smrg   coord = src0
972848b8605Smrg
973848b8605Smrg   dst.xy = lodq(uint, coord);
974848b8605Smrg
975b8e80941Smrg.. opcode:: CLOCK - retrieve the current shader time
976b8e80941Smrg
977b8e80941Smrg   Invoking this instruction multiple times in the same shader should
978b8e80941Smrg   cause monotonically increasing values to be returned. The values
979b8e80941Smrg   are implicitly 64-bit, so if fewer than 64 bits of precision are
980b8e80941Smrg   available, to provide expected wraparound semantics, the value
981b8e80941Smrg   should be shifted up so that the most significant bit of the time
982b8e80941Smrg   is the most significant bit of the 64-bit value.
983b8e80941Smrg
984b8e80941Smrg.. math::
985b8e80941Smrg
986b8e80941Smrg   dst.xy = clock()
987b8e80941Smrg
988b8e80941Smrg
989848b8605SmrgInteger ISA
990848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
991848b8605SmrgThese opcodes are used for integer operations.
992848b8605SmrgSupport for these opcodes indicated by PIPE_SHADER_CAP_INTEGERS (all of them?)
993848b8605Smrg
994848b8605Smrg
995848b8605Smrg.. opcode:: I2F - Signed Integer To Float
996848b8605Smrg
997848b8605Smrg   Rounding is unspecified (round to nearest even suggested).
998848b8605Smrg
999848b8605Smrg.. math::
1000848b8605Smrg
1001848b8605Smrg  dst.x = (float) src.x
1002848b8605Smrg
1003848b8605Smrg  dst.y = (float) src.y
1004848b8605Smrg
1005848b8605Smrg  dst.z = (float) src.z
1006848b8605Smrg
1007848b8605Smrg  dst.w = (float) src.w
1008848b8605Smrg
1009848b8605Smrg
1010848b8605Smrg.. opcode:: U2F - Unsigned Integer To Float
1011848b8605Smrg
1012848b8605Smrg   Rounding is unspecified (round to nearest even suggested).
1013848b8605Smrg
1014848b8605Smrg.. math::
1015848b8605Smrg
1016848b8605Smrg  dst.x = (float) src.x
1017848b8605Smrg
1018848b8605Smrg  dst.y = (float) src.y
1019848b8605Smrg
1020848b8605Smrg  dst.z = (float) src.z
1021848b8605Smrg
1022848b8605Smrg  dst.w = (float) src.w
1023848b8605Smrg
1024848b8605Smrg
1025848b8605Smrg.. opcode:: F2I - Float to Signed Integer
1026848b8605Smrg
1027848b8605Smrg   Rounding is towards zero (truncate).
1028848b8605Smrg   Values outside signed range (including NaNs) produce undefined results.
1029848b8605Smrg
1030848b8605Smrg.. math::
1031848b8605Smrg
1032848b8605Smrg  dst.x = (int) src.x
1033848b8605Smrg
1034848b8605Smrg  dst.y = (int) src.y
1035848b8605Smrg
1036848b8605Smrg  dst.z = (int) src.z
1037848b8605Smrg
1038848b8605Smrg  dst.w = (int) src.w
1039848b8605Smrg
1040848b8605Smrg
1041848b8605Smrg.. opcode:: F2U - Float to Unsigned Integer
1042848b8605Smrg
1043848b8605Smrg   Rounding is towards zero (truncate).
1044848b8605Smrg   Values outside unsigned range (including NaNs) produce undefined results.
1045848b8605Smrg
1046848b8605Smrg.. math::
1047848b8605Smrg
1048848b8605Smrg  dst.x = (unsigned) src.x
1049848b8605Smrg
1050848b8605Smrg  dst.y = (unsigned) src.y
1051848b8605Smrg
1052848b8605Smrg  dst.z = (unsigned) src.z
1053848b8605Smrg
1054848b8605Smrg  dst.w = (unsigned) src.w
1055848b8605Smrg
1056848b8605Smrg
1057848b8605Smrg.. opcode:: UADD - Integer Add
1058848b8605Smrg
1059848b8605Smrg   This instruction works the same for signed and unsigned integers.
1060848b8605Smrg   The low 32bit of the result is returned.
1061848b8605Smrg
1062848b8605Smrg.. math::
1063848b8605Smrg
1064848b8605Smrg  dst.x = src0.x + src1.x
1065848b8605Smrg
1066848b8605Smrg  dst.y = src0.y + src1.y
1067848b8605Smrg
1068848b8605Smrg  dst.z = src0.z + src1.z
1069848b8605Smrg
1070848b8605Smrg  dst.w = src0.w + src1.w
1071848b8605Smrg
1072848b8605Smrg
1073848b8605Smrg.. opcode:: UMAD - Integer Multiply And Add
1074848b8605Smrg
1075848b8605Smrg   This instruction works the same for signed and unsigned integers.
1076848b8605Smrg   The multiplication returns the low 32bit (as does the result itself).
1077848b8605Smrg
1078848b8605Smrg.. math::
1079848b8605Smrg
1080848b8605Smrg  dst.x = src0.x \times src1.x + src2.x
1081848b8605Smrg
1082848b8605Smrg  dst.y = src0.y \times src1.y + src2.y
1083848b8605Smrg
1084848b8605Smrg  dst.z = src0.z \times src1.z + src2.z
1085848b8605Smrg
1086848b8605Smrg  dst.w = src0.w \times src1.w + src2.w
1087848b8605Smrg
1088848b8605Smrg
1089848b8605Smrg.. opcode:: UMUL - Integer Multiply
1090848b8605Smrg
1091848b8605Smrg   This instruction works the same for signed and unsigned integers.
1092848b8605Smrg   The low 32bit of the result is returned.
1093848b8605Smrg
1094848b8605Smrg.. math::
1095848b8605Smrg
1096848b8605Smrg  dst.x = src0.x \times src1.x
1097848b8605Smrg
1098848b8605Smrg  dst.y = src0.y \times src1.y
1099848b8605Smrg
1100848b8605Smrg  dst.z = src0.z \times src1.z
1101848b8605Smrg
1102848b8605Smrg  dst.w = src0.w \times src1.w
1103848b8605Smrg
1104848b8605Smrg
1105848b8605Smrg.. opcode:: IMUL_HI - Signed Integer Multiply High Bits
1106848b8605Smrg
1107848b8605Smrg   The high 32bits of the multiplication of 2 signed integers are returned.
1108848b8605Smrg
1109848b8605Smrg.. math::
1110848b8605Smrg
1111848b8605Smrg  dst.x = (src0.x \times src1.x) >> 32
1112848b8605Smrg
1113848b8605Smrg  dst.y = (src0.y \times src1.y) >> 32
1114848b8605Smrg
1115848b8605Smrg  dst.z = (src0.z \times src1.z) >> 32
1116848b8605Smrg
1117848b8605Smrg  dst.w = (src0.w \times src1.w) >> 32
1118848b8605Smrg
1119848b8605Smrg
1120848b8605Smrg.. opcode:: UMUL_HI - Unsigned Integer Multiply High Bits
1121848b8605Smrg
1122848b8605Smrg   The high 32bits of the multiplication of 2 unsigned integers are returned.
1123848b8605Smrg
1124848b8605Smrg.. math::
1125848b8605Smrg
1126848b8605Smrg  dst.x = (src0.x \times src1.x) >> 32
1127848b8605Smrg
1128848b8605Smrg  dst.y = (src0.y \times src1.y) >> 32
1129848b8605Smrg
1130848b8605Smrg  dst.z = (src0.z \times src1.z) >> 32
1131848b8605Smrg
1132848b8605Smrg  dst.w = (src0.w \times src1.w) >> 32
1133848b8605Smrg
1134848b8605Smrg
1135848b8605Smrg.. opcode:: IDIV - Signed Integer Division
1136848b8605Smrg
1137848b8605Smrg   TBD: behavior for division by zero.
1138848b8605Smrg
1139848b8605Smrg.. math::
1140848b8605Smrg
1141b8e80941Smrg  dst.x = \frac{src0.x}{src1.x}
1142848b8605Smrg
1143b8e80941Smrg  dst.y = \frac{src0.y}{src1.y}
1144848b8605Smrg
1145b8e80941Smrg  dst.z = \frac{src0.z}{src1.z}
1146848b8605Smrg
1147b8e80941Smrg  dst.w = \frac{src0.w}{src1.w}
1148848b8605Smrg
1149848b8605Smrg
1150848b8605Smrg.. opcode:: UDIV - Unsigned Integer Division
1151848b8605Smrg
1152848b8605Smrg   For division by zero, 0xffffffff is returned.
1153848b8605Smrg
1154848b8605Smrg.. math::
1155848b8605Smrg
1156b8e80941Smrg  dst.x = \frac{src0.x}{src1.x}
1157848b8605Smrg
1158b8e80941Smrg  dst.y = \frac{src0.y}{src1.y}
1159848b8605Smrg
1160b8e80941Smrg  dst.z = \frac{src0.z}{src1.z}
1161848b8605Smrg
1162b8e80941Smrg  dst.w = \frac{src0.w}{src1.w}
1163848b8605Smrg
1164848b8605Smrg
1165848b8605Smrg.. opcode:: UMOD - Unsigned Integer Remainder
1166848b8605Smrg
1167848b8605Smrg   If second arg is zero, 0xffffffff is returned.
1168848b8605Smrg
1169848b8605Smrg.. math::
1170848b8605Smrg
1171b8e80941Smrg  dst.x = src0.x \bmod src1.x
1172848b8605Smrg
1173b8e80941Smrg  dst.y = src0.y \bmod src1.y
1174848b8605Smrg
1175b8e80941Smrg  dst.z = src0.z \bmod src1.z
1176848b8605Smrg
1177b8e80941Smrg  dst.w = src0.w \bmod src1.w
1178848b8605Smrg
1179848b8605Smrg
1180848b8605Smrg.. opcode:: NOT - Bitwise Not
1181848b8605Smrg
1182848b8605Smrg.. math::
1183848b8605Smrg
1184848b8605Smrg  dst.x = \sim src.x
1185848b8605Smrg
1186848b8605Smrg  dst.y = \sim src.y
1187848b8605Smrg
1188848b8605Smrg  dst.z = \sim src.z
1189848b8605Smrg
1190848b8605Smrg  dst.w = \sim src.w
1191848b8605Smrg
1192848b8605Smrg
1193848b8605Smrg.. opcode:: AND - Bitwise And
1194848b8605Smrg
1195848b8605Smrg.. math::
1196848b8605Smrg
1197848b8605Smrg  dst.x = src0.x \& src1.x
1198848b8605Smrg
1199848b8605Smrg  dst.y = src0.y \& src1.y
1200848b8605Smrg
1201848b8605Smrg  dst.z = src0.z \& src1.z
1202848b8605Smrg
1203848b8605Smrg  dst.w = src0.w \& src1.w
1204848b8605Smrg
1205848b8605Smrg
1206848b8605Smrg.. opcode:: OR - Bitwise Or
1207848b8605Smrg
1208848b8605Smrg.. math::
1209848b8605Smrg
1210848b8605Smrg  dst.x = src0.x | src1.x
1211848b8605Smrg
1212848b8605Smrg  dst.y = src0.y | src1.y
1213848b8605Smrg
1214848b8605Smrg  dst.z = src0.z | src1.z
1215848b8605Smrg
1216848b8605Smrg  dst.w = src0.w | src1.w
1217848b8605Smrg
1218848b8605Smrg
1219848b8605Smrg.. opcode:: XOR - Bitwise Xor
1220848b8605Smrg
1221848b8605Smrg.. math::
1222848b8605Smrg
1223848b8605Smrg  dst.x = src0.x \oplus src1.x
1224848b8605Smrg
1225848b8605Smrg  dst.y = src0.y \oplus src1.y
1226848b8605Smrg
1227848b8605Smrg  dst.z = src0.z \oplus src1.z
1228848b8605Smrg
1229848b8605Smrg  dst.w = src0.w \oplus src1.w
1230848b8605Smrg
1231848b8605Smrg
1232848b8605Smrg.. opcode:: IMAX - Maximum of Signed Integers
1233848b8605Smrg
1234848b8605Smrg.. math::
1235848b8605Smrg
1236848b8605Smrg  dst.x = max(src0.x, src1.x)
1237848b8605Smrg
1238848b8605Smrg  dst.y = max(src0.y, src1.y)
1239848b8605Smrg
1240848b8605Smrg  dst.z = max(src0.z, src1.z)
1241848b8605Smrg
1242848b8605Smrg  dst.w = max(src0.w, src1.w)
1243848b8605Smrg
1244848b8605Smrg
1245848b8605Smrg.. opcode:: UMAX - Maximum of Unsigned Integers
1246848b8605Smrg
1247848b8605Smrg.. math::
1248848b8605Smrg
1249848b8605Smrg  dst.x = max(src0.x, src1.x)
1250848b8605Smrg
1251848b8605Smrg  dst.y = max(src0.y, src1.y)
1252848b8605Smrg
1253848b8605Smrg  dst.z = max(src0.z, src1.z)
1254848b8605Smrg
1255848b8605Smrg  dst.w = max(src0.w, src1.w)
1256848b8605Smrg
1257848b8605Smrg
1258848b8605Smrg.. opcode:: IMIN - Minimum of Signed Integers
1259848b8605Smrg
1260848b8605Smrg.. math::
1261848b8605Smrg
1262848b8605Smrg  dst.x = min(src0.x, src1.x)
1263848b8605Smrg
1264848b8605Smrg  dst.y = min(src0.y, src1.y)
1265848b8605Smrg
1266848b8605Smrg  dst.z = min(src0.z, src1.z)
1267848b8605Smrg
1268848b8605Smrg  dst.w = min(src0.w, src1.w)
1269848b8605Smrg
1270848b8605Smrg
1271848b8605Smrg.. opcode:: UMIN - Minimum of Unsigned Integers
1272848b8605Smrg
1273848b8605Smrg.. math::
1274848b8605Smrg
1275848b8605Smrg  dst.x = min(src0.x, src1.x)
1276848b8605Smrg
1277848b8605Smrg  dst.y = min(src0.y, src1.y)
1278848b8605Smrg
1279848b8605Smrg  dst.z = min(src0.z, src1.z)
1280848b8605Smrg
1281848b8605Smrg  dst.w = min(src0.w, src1.w)
1282848b8605Smrg
1283848b8605Smrg
1284848b8605Smrg.. opcode:: SHL - Shift Left
1285848b8605Smrg
1286848b8605Smrg   The shift count is masked with 0x1f before the shift is applied.
1287848b8605Smrg
1288848b8605Smrg.. math::
1289848b8605Smrg
1290848b8605Smrg  dst.x = src0.x << (0x1f \& src1.x)
1291848b8605Smrg
1292848b8605Smrg  dst.y = src0.y << (0x1f \& src1.y)
1293848b8605Smrg
1294848b8605Smrg  dst.z = src0.z << (0x1f \& src1.z)
1295848b8605Smrg
1296848b8605Smrg  dst.w = src0.w << (0x1f \& src1.w)
1297848b8605Smrg
1298848b8605Smrg
1299848b8605Smrg.. opcode:: ISHR - Arithmetic Shift Right (of Signed Integer)
1300848b8605Smrg
1301848b8605Smrg   The shift count is masked with 0x1f before the shift is applied.
1302848b8605Smrg
1303848b8605Smrg.. math::
1304848b8605Smrg
1305848b8605Smrg  dst.x = src0.x >> (0x1f \& src1.x)
1306848b8605Smrg
1307848b8605Smrg  dst.y = src0.y >> (0x1f \& src1.y)
1308848b8605Smrg
1309848b8605Smrg  dst.z = src0.z >> (0x1f \& src1.z)
1310848b8605Smrg
1311848b8605Smrg  dst.w = src0.w >> (0x1f \& src1.w)
1312848b8605Smrg
1313848b8605Smrg
1314848b8605Smrg.. opcode:: USHR - Logical Shift Right
1315848b8605Smrg
1316848b8605Smrg   The shift count is masked with 0x1f before the shift is applied.
1317848b8605Smrg
1318848b8605Smrg.. math::
1319848b8605Smrg
1320848b8605Smrg  dst.x = src0.x >> (unsigned) (0x1f \& src1.x)
1321848b8605Smrg
1322848b8605Smrg  dst.y = src0.y >> (unsigned) (0x1f \& src1.y)
1323848b8605Smrg
1324848b8605Smrg  dst.z = src0.z >> (unsigned) (0x1f \& src1.z)
1325848b8605Smrg
1326848b8605Smrg  dst.w = src0.w >> (unsigned) (0x1f \& src1.w)
1327848b8605Smrg
1328848b8605Smrg
1329848b8605Smrg.. opcode:: UCMP - Integer Conditional Move
1330848b8605Smrg
1331848b8605Smrg.. math::
1332848b8605Smrg
1333848b8605Smrg  dst.x = src0.x ? src1.x : src2.x
1334848b8605Smrg
1335848b8605Smrg  dst.y = src0.y ? src1.y : src2.y
1336848b8605Smrg
1337848b8605Smrg  dst.z = src0.z ? src1.z : src2.z
1338848b8605Smrg
1339848b8605Smrg  dst.w = src0.w ? src1.w : src2.w
1340848b8605Smrg
1341848b8605Smrg
1342848b8605Smrg
1343848b8605Smrg.. opcode:: ISSG - Integer Set Sign
1344848b8605Smrg
1345848b8605Smrg.. math::
1346848b8605Smrg
1347848b8605Smrg  dst.x = (src0.x < 0) ? -1 : (src0.x > 0) ? 1 : 0
1348848b8605Smrg
1349848b8605Smrg  dst.y = (src0.y < 0) ? -1 : (src0.y > 0) ? 1 : 0
1350848b8605Smrg
1351848b8605Smrg  dst.z = (src0.z < 0) ? -1 : (src0.z > 0) ? 1 : 0
1352848b8605Smrg
1353848b8605Smrg  dst.w = (src0.w < 0) ? -1 : (src0.w > 0) ? 1 : 0
1354848b8605Smrg
1355848b8605Smrg
1356848b8605Smrg
1357848b8605Smrg.. opcode:: FSLT - Float Set On Less Than (ordered)
1358848b8605Smrg
1359848b8605Smrg   Same comparison as SLT but returns integer instead of 1.0/0.0 float
1360848b8605Smrg
1361848b8605Smrg.. math::
1362848b8605Smrg
1363848b8605Smrg  dst.x = (src0.x < src1.x) ? \sim 0 : 0
1364848b8605Smrg
1365848b8605Smrg  dst.y = (src0.y < src1.y) ? \sim 0 : 0
1366848b8605Smrg
1367848b8605Smrg  dst.z = (src0.z < src1.z) ? \sim 0 : 0
1368848b8605Smrg
1369848b8605Smrg  dst.w = (src0.w < src1.w) ? \sim 0 : 0
1370848b8605Smrg
1371848b8605Smrg
1372848b8605Smrg.. opcode:: ISLT - Signed Integer Set On Less Than
1373848b8605Smrg
1374848b8605Smrg.. math::
1375848b8605Smrg
1376848b8605Smrg  dst.x = (src0.x < src1.x) ? \sim 0 : 0
1377848b8605Smrg
1378848b8605Smrg  dst.y = (src0.y < src1.y) ? \sim 0 : 0
1379848b8605Smrg
1380848b8605Smrg  dst.z = (src0.z < src1.z) ? \sim 0 : 0
1381848b8605Smrg
1382848b8605Smrg  dst.w = (src0.w < src1.w) ? \sim 0 : 0
1383848b8605Smrg
1384848b8605Smrg
1385848b8605Smrg.. opcode:: USLT - Unsigned Integer Set On Less Than
1386848b8605Smrg
1387848b8605Smrg.. math::
1388848b8605Smrg
1389848b8605Smrg  dst.x = (src0.x < src1.x) ? \sim 0 : 0
1390848b8605Smrg
1391848b8605Smrg  dst.y = (src0.y < src1.y) ? \sim 0 : 0
1392848b8605Smrg
1393848b8605Smrg  dst.z = (src0.z < src1.z) ? \sim 0 : 0
1394848b8605Smrg
1395848b8605Smrg  dst.w = (src0.w < src1.w) ? \sim 0 : 0
1396848b8605Smrg
1397848b8605Smrg
1398848b8605Smrg.. opcode:: FSGE - Float Set On Greater Equal Than (ordered)
1399848b8605Smrg
1400848b8605Smrg   Same comparison as SGE but returns integer instead of 1.0/0.0 float
1401848b8605Smrg
1402848b8605Smrg.. math::
1403848b8605Smrg
1404848b8605Smrg  dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1405848b8605Smrg
1406848b8605Smrg  dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1407848b8605Smrg
1408848b8605Smrg  dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1409848b8605Smrg
1410848b8605Smrg  dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1411848b8605Smrg
1412848b8605Smrg
1413848b8605Smrg.. opcode:: ISGE - Signed Integer Set On Greater Equal Than
1414848b8605Smrg
1415848b8605Smrg.. math::
1416848b8605Smrg
1417848b8605Smrg  dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1418848b8605Smrg
1419848b8605Smrg  dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1420848b8605Smrg
1421848b8605Smrg  dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1422848b8605Smrg
1423848b8605Smrg  dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1424848b8605Smrg
1425848b8605Smrg
1426848b8605Smrg.. opcode:: USGE - Unsigned Integer Set On Greater Equal Than
1427848b8605Smrg
1428848b8605Smrg.. math::
1429848b8605Smrg
1430848b8605Smrg  dst.x = (src0.x >= src1.x) ? \sim 0 : 0
1431848b8605Smrg
1432848b8605Smrg  dst.y = (src0.y >= src1.y) ? \sim 0 : 0
1433848b8605Smrg
1434848b8605Smrg  dst.z = (src0.z >= src1.z) ? \sim 0 : 0
1435848b8605Smrg
1436848b8605Smrg  dst.w = (src0.w >= src1.w) ? \sim 0 : 0
1437848b8605Smrg
1438848b8605Smrg
1439848b8605Smrg.. opcode:: FSEQ - Float Set On Equal (ordered)
1440848b8605Smrg
1441848b8605Smrg   Same comparison as SEQ but returns integer instead of 1.0/0.0 float
1442848b8605Smrg
1443848b8605Smrg.. math::
1444848b8605Smrg
1445848b8605Smrg  dst.x = (src0.x == src1.x) ? \sim 0 : 0
1446848b8605Smrg
1447848b8605Smrg  dst.y = (src0.y == src1.y) ? \sim 0 : 0
1448848b8605Smrg
1449848b8605Smrg  dst.z = (src0.z == src1.z) ? \sim 0 : 0
1450848b8605Smrg
1451848b8605Smrg  dst.w = (src0.w == src1.w) ? \sim 0 : 0
1452848b8605Smrg
1453848b8605Smrg
1454848b8605Smrg.. opcode:: USEQ - Integer Set On Equal
1455848b8605Smrg
1456848b8605Smrg.. math::
1457848b8605Smrg
1458848b8605Smrg  dst.x = (src0.x == src1.x) ? \sim 0 : 0
1459848b8605Smrg
1460848b8605Smrg  dst.y = (src0.y == src1.y) ? \sim 0 : 0
1461848b8605Smrg
1462848b8605Smrg  dst.z = (src0.z == src1.z) ? \sim 0 : 0
1463848b8605Smrg
1464848b8605Smrg  dst.w = (src0.w == src1.w) ? \sim 0 : 0
1465848b8605Smrg
1466848b8605Smrg
1467848b8605Smrg.. opcode:: FSNE - Float Set On Not Equal (unordered)
1468848b8605Smrg
1469848b8605Smrg   Same comparison as SNE but returns integer instead of 1.0/0.0 float
1470848b8605Smrg
1471848b8605Smrg.. math::
1472848b8605Smrg
1473848b8605Smrg  dst.x = (src0.x != src1.x) ? \sim 0 : 0
1474848b8605Smrg
1475848b8605Smrg  dst.y = (src0.y != src1.y) ? \sim 0 : 0
1476848b8605Smrg
1477848b8605Smrg  dst.z = (src0.z != src1.z) ? \sim 0 : 0
1478848b8605Smrg
1479848b8605Smrg  dst.w = (src0.w != src1.w) ? \sim 0 : 0
1480848b8605Smrg
1481848b8605Smrg
1482848b8605Smrg.. opcode:: USNE - Integer Set On Not Equal
1483848b8605Smrg
1484848b8605Smrg.. math::
1485848b8605Smrg
1486848b8605Smrg  dst.x = (src0.x != src1.x) ? \sim 0 : 0
1487848b8605Smrg
1488848b8605Smrg  dst.y = (src0.y != src1.y) ? \sim 0 : 0
1489848b8605Smrg
1490848b8605Smrg  dst.z = (src0.z != src1.z) ? \sim 0 : 0
1491848b8605Smrg
1492848b8605Smrg  dst.w = (src0.w != src1.w) ? \sim 0 : 0
1493848b8605Smrg
1494848b8605Smrg
1495848b8605Smrg.. opcode:: INEG - Integer Negate
1496848b8605Smrg
1497848b8605Smrg  Two's complement.
1498848b8605Smrg
1499848b8605Smrg.. math::
1500848b8605Smrg
1501848b8605Smrg  dst.x = -src.x
1502848b8605Smrg
1503848b8605Smrg  dst.y = -src.y
1504848b8605Smrg
1505848b8605Smrg  dst.z = -src.z
1506848b8605Smrg
1507848b8605Smrg  dst.w = -src.w
1508848b8605Smrg
1509848b8605Smrg
1510848b8605Smrg.. opcode:: IABS - Integer Absolute Value
1511848b8605Smrg
1512848b8605Smrg.. math::
1513848b8605Smrg
1514848b8605Smrg  dst.x = |src.x|
1515848b8605Smrg
1516848b8605Smrg  dst.y = |src.y|
1517848b8605Smrg
1518848b8605Smrg  dst.z = |src.z|
1519848b8605Smrg
1520848b8605Smrg  dst.w = |src.w|
1521848b8605Smrg
1522848b8605SmrgBitwise ISA
1523848b8605Smrg^^^^^^^^^^^
1524848b8605SmrgThese opcodes are used for bit-level manipulation of integers.
1525848b8605Smrg
1526848b8605Smrg.. opcode:: IBFE - Signed Bitfield Extract
1527848b8605Smrg
1528b8e80941Smrg  Like GLSL bitfieldExtract. Extracts a set of bits from the input, and
1529b8e80941Smrg  sign-extends them if the high bit of the extracted window is set.
1530848b8605Smrg
1531848b8605Smrg  Pseudocode::
1532848b8605Smrg
1533848b8605Smrg    def ibfe(value, offset, bits):
1534b8e80941Smrg      if offset < 0 or bits < 0 or offset + bits > 32:
1535b8e80941Smrg        return undefined
1536848b8605Smrg      if bits == 0: return 0
1537848b8605Smrg      # Note: >> sign-extends
1538b8e80941Smrg      return (value << (32 - offset - bits)) >> (32 - bits)
1539848b8605Smrg
1540848b8605Smrg.. opcode:: UBFE - Unsigned Bitfield Extract
1541848b8605Smrg
1542b8e80941Smrg  Like GLSL bitfieldExtract. Extracts a set of bits from the input, without
1543b8e80941Smrg  any sign-extension.
1544848b8605Smrg
1545848b8605Smrg  Pseudocode::
1546848b8605Smrg
1547848b8605Smrg    def ubfe(value, offset, bits):
1548b8e80941Smrg      if offset < 0 or bits < 0 or offset + bits > 32:
1549b8e80941Smrg        return undefined
1550848b8605Smrg      if bits == 0: return 0
1551848b8605Smrg      # Note: >> does not sign-extend
1552b8e80941Smrg      return (value << (32 - offset - bits)) >> (32 - bits)
1553848b8605Smrg
1554848b8605Smrg.. opcode:: BFI - Bitfield Insert
1555848b8605Smrg
1556b8e80941Smrg  Like GLSL bitfieldInsert. Replaces a bit region of 'base' with the low bits
1557b8e80941Smrg  of 'insert'.
1558848b8605Smrg
1559848b8605Smrg  Pseudocode::
1560848b8605Smrg
1561848b8605Smrg    def bfi(base, insert, offset, bits):
1562b8e80941Smrg      if offset < 0 or bits < 0 or offset + bits > 32:
1563b8e80941Smrg        return undefined
1564b8e80941Smrg      # << defined such that mask == ~0 when bits == 32, offset == 0
1565848b8605Smrg      mask = ((1 << bits) - 1) << offset
1566848b8605Smrg      return ((insert << offset) & mask) | (base & ~mask)
1567848b8605Smrg
1568848b8605Smrg.. opcode:: BREV - Bitfield Reverse
1569848b8605Smrg
1570848b8605Smrg  See SM5 instruction BFREV. Reverses the bits of the argument.
1571848b8605Smrg
1572848b8605Smrg.. opcode:: POPC - Population Count
1573848b8605Smrg
1574848b8605Smrg  See SM5 instruction COUNTBITS. Counts the number of set bits in the argument.
1575848b8605Smrg
1576848b8605Smrg.. opcode:: LSB - Index of lowest set bit
1577848b8605Smrg
1578848b8605Smrg  See SM5 instruction FIRSTBIT_LO. Computes the 0-based index of the first set
1579848b8605Smrg  bit of the argument. Returns -1 if none are set.
1580848b8605Smrg
1581848b8605Smrg.. opcode:: IMSB - Index of highest non-sign bit
1582848b8605Smrg
1583848b8605Smrg  See SM5 instruction FIRSTBIT_SHI. Computes the 0-based index of the highest
1584848b8605Smrg  non-sign bit of the argument (i.e. highest 0 bit for negative numbers,
1585848b8605Smrg  highest 1 bit for positive numbers). Returns -1 if all bits are the same
1586848b8605Smrg  (i.e. for inputs 0 and -1).
1587848b8605Smrg
1588848b8605Smrg.. opcode:: UMSB - Index of highest set bit
1589848b8605Smrg
1590848b8605Smrg  See SM5 instruction FIRSTBIT_HI. Computes the 0-based index of the highest
1591848b8605Smrg  set bit of the argument. Returns -1 if none are set.
1592848b8605Smrg
1593848b8605SmrgGeometry ISA
1594848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1595848b8605Smrg
1596848b8605SmrgThese opcodes are only supported in geometry shaders; they have no meaning
1597848b8605Smrgin any other type of shader.
1598848b8605Smrg
1599848b8605Smrg.. opcode:: EMIT - Emit
1600848b8605Smrg
1601848b8605Smrg  Generate a new vertex for the current primitive into the specified vertex
1602848b8605Smrg  stream using the values in the output registers.
1603848b8605Smrg
1604848b8605Smrg
1605848b8605Smrg.. opcode:: ENDPRIM - End Primitive
1606848b8605Smrg
1607848b8605Smrg  Complete the current primitive in the specified vertex stream (consisting of
1608848b8605Smrg  the emitted vertices), and start a new one.
1609848b8605Smrg
1610848b8605Smrg
1611848b8605SmrgGLSL ISA
1612848b8605Smrg^^^^^^^^^^
1613848b8605Smrg
1614848b8605SmrgThese opcodes are part of :term:`GLSL`'s opcode set. Support for these
1615848b8605Smrgopcodes is determined by a special capability bit, ``GLSL``.
1616b8e80941SmrgSome require glsl version 1.30 (UIF/SWITCH/CASE/DEFAULT/ENDSWITCH).
1617848b8605Smrg
1618848b8605Smrg.. opcode:: CAL - Subroutine Call
1619848b8605Smrg
1620848b8605Smrg  push(pc)
1621848b8605Smrg  pc = target
1622848b8605Smrg
1623848b8605Smrg
1624848b8605Smrg.. opcode:: RET - Subroutine Call Return
1625848b8605Smrg
1626848b8605Smrg  pc = pop()
1627848b8605Smrg
1628848b8605Smrg
1629848b8605Smrg.. opcode:: CONT - Continue
1630848b8605Smrg
1631848b8605Smrg  Unconditionally moves the point of execution to the instruction after the
1632848b8605Smrg  last bgnloop. The instruction must appear within a bgnloop/endloop.
1633848b8605Smrg
1634848b8605Smrg.. note::
1635848b8605Smrg
1636848b8605Smrg   Support for CONT is determined by a special capability bit,
1637848b8605Smrg   ``TGSI_CONT_SUPPORTED``. See :ref:`Screen` for more information.
1638848b8605Smrg
1639848b8605Smrg
1640848b8605Smrg.. opcode:: BGNLOOP - Begin a Loop
1641848b8605Smrg
1642848b8605Smrg  Start a loop. Must have a matching endloop.
1643848b8605Smrg
1644848b8605Smrg
1645848b8605Smrg.. opcode:: BGNSUB - Begin Subroutine
1646848b8605Smrg
1647848b8605Smrg  Starts definition of a subroutine. Must have a matching endsub.
1648848b8605Smrg
1649848b8605Smrg
1650848b8605Smrg.. opcode:: ENDLOOP - End a Loop
1651848b8605Smrg
1652848b8605Smrg  End a loop started with bgnloop.
1653848b8605Smrg
1654848b8605Smrg
1655848b8605Smrg.. opcode:: ENDSUB - End Subroutine
1656848b8605Smrg
1657848b8605Smrg  Ends definition of a subroutine.
1658848b8605Smrg
1659848b8605Smrg
1660848b8605Smrg.. opcode:: NOP - No Operation
1661848b8605Smrg
1662848b8605Smrg  Do nothing.
1663848b8605Smrg
1664848b8605Smrg
1665848b8605Smrg.. opcode:: BRK - Break
1666848b8605Smrg
1667848b8605Smrg  Unconditionally moves the point of execution to the instruction after the
1668848b8605Smrg  next endloop or endswitch. The instruction must appear within a loop/endloop
1669848b8605Smrg  or switch/endswitch.
1670848b8605Smrg
1671848b8605Smrg
1672848b8605Smrg.. opcode:: IF - Float If
1673848b8605Smrg
1674848b8605Smrg  Start an IF ... ELSE .. ENDIF block.  Condition evaluates to true if
1675848b8605Smrg
1676848b8605Smrg    src0.x != 0.0
1677848b8605Smrg
1678848b8605Smrg  where src0.x is interpreted as a floating point register.
1679848b8605Smrg
1680848b8605Smrg
1681848b8605Smrg.. opcode:: UIF - Bitwise If
1682848b8605Smrg
1683848b8605Smrg  Start an UIF ... ELSE .. ENDIF block. Condition evaluates to true if
1684848b8605Smrg
1685848b8605Smrg    src0.x != 0
1686848b8605Smrg
1687848b8605Smrg  where src0.x is interpreted as an integer register.
1688848b8605Smrg
1689848b8605Smrg
1690848b8605Smrg.. opcode:: ELSE - Else
1691848b8605Smrg
1692848b8605Smrg  Starts an else block, after an IF or UIF statement.
1693848b8605Smrg
1694848b8605Smrg
1695848b8605Smrg.. opcode:: ENDIF - End If
1696848b8605Smrg
1697848b8605Smrg  Ends an IF or UIF block.
1698848b8605Smrg
1699848b8605Smrg
1700848b8605Smrg.. opcode:: SWITCH - Switch
1701848b8605Smrg
1702848b8605Smrg   Starts a C-style switch expression. The switch consists of one or multiple
1703848b8605Smrg   CASE statements, and at most one DEFAULT statement. Execution of a statement
1704848b8605Smrg   ends when a BRK is hit, but just like in C falling through to other cases
1705848b8605Smrg   without a break is allowed. Similarly, DEFAULT label is allowed anywhere not
1706848b8605Smrg   just as last statement, and fallthrough is allowed into/from it.
1707848b8605Smrg   CASE src arguments are evaluated at bit level against the SWITCH src argument.
1708848b8605Smrg
1709848b8605Smrg   Example::
1710848b8605Smrg
1711848b8605Smrg     SWITCH src[0].x
1712848b8605Smrg     CASE src[0].x
1713848b8605Smrg     (some instructions here)
1714848b8605Smrg     (optional BRK here)
1715848b8605Smrg     DEFAULT
1716848b8605Smrg     (some instructions here)
1717848b8605Smrg     (optional BRK here)
1718848b8605Smrg     CASE src[0].x
1719848b8605Smrg     (some instructions here)
1720848b8605Smrg     (optional BRK here)
1721848b8605Smrg     ENDSWITCH
1722848b8605Smrg
1723848b8605Smrg
1724848b8605Smrg.. opcode:: CASE - Switch case
1725848b8605Smrg
1726848b8605Smrg   This represents a switch case label. The src arg must be an integer immediate.
1727848b8605Smrg
1728848b8605Smrg
1729848b8605Smrg.. opcode:: DEFAULT - Switch default
1730848b8605Smrg
1731848b8605Smrg   This represents the default case in the switch, which is taken if no other
1732848b8605Smrg   case matches.
1733848b8605Smrg
1734848b8605Smrg
1735848b8605Smrg.. opcode:: ENDSWITCH - End of switch
1736848b8605Smrg
1737848b8605Smrg   Ends a switch expression.
1738848b8605Smrg
1739848b8605Smrg
1740848b8605SmrgInterpolation ISA
1741848b8605Smrg^^^^^^^^^^^^^^^^^
1742848b8605Smrg
1743848b8605SmrgThe interpolation instructions allow an input to be interpolated in a
1744848b8605Smrgdifferent way than its declaration. This corresponds to the GLSL 4.00
1745848b8605SmrginterpolateAt* functions. The first argument of each of these must come from
1746848b8605Smrg``TGSI_FILE_INPUT``.
1747848b8605Smrg
1748848b8605Smrg.. opcode:: INTERP_CENTROID - Interpolate at the centroid
1749848b8605Smrg
1750848b8605Smrg   Interpolates the varying specified by src0 at the centroid
1751848b8605Smrg
1752848b8605Smrg.. opcode:: INTERP_SAMPLE - Interpolate at the specified sample
1753848b8605Smrg
1754848b8605Smrg   Interpolates the varying specified by src0 at the sample id specified by
1755848b8605Smrg   src1.x (interpreted as an integer)
1756848b8605Smrg
1757848b8605Smrg.. opcode:: INTERP_OFFSET - Interpolate at the specified offset
1758848b8605Smrg
1759848b8605Smrg   Interpolates the varying specified by src0 at the offset src1.xy from the
1760848b8605Smrg   pixel center (interpreted as floats)
1761848b8605Smrg
1762848b8605Smrg
1763848b8605Smrg.. _doubleopcodes:
1764848b8605Smrg
1765848b8605SmrgDouble ISA
1766848b8605Smrg^^^^^^^^^^^^^^^
1767848b8605Smrg
1768848b8605SmrgThe double-precision opcodes reinterpret four-component vectors into
1769848b8605Smrgtwo-component vectors with doubled precision in each component.
1770848b8605Smrg
1771b8e80941Smrg.. opcode:: DABS - Absolute
1772b8e80941Smrg
1773b8e80941Smrg.. math::
1774b8e80941Smrg
1775b8e80941Smrg  dst.xy = |src0.xy|
1776b8e80941Smrg
1777b8e80941Smrg  dst.zw = |src0.zw|
1778848b8605Smrg
1779848b8605Smrg.. opcode:: DADD - Add
1780848b8605Smrg
1781848b8605Smrg.. math::
1782848b8605Smrg
1783848b8605Smrg  dst.xy = src0.xy + src1.xy
1784848b8605Smrg
1785848b8605Smrg  dst.zw = src0.zw + src1.zw
1786848b8605Smrg
1787b8e80941Smrg.. opcode:: DSEQ - Set on Equal
1788848b8605Smrg
1789848b8605Smrg.. math::
1790848b8605Smrg
1791b8e80941Smrg  dst.x = src0.xy == src1.xy ? \sim 0 : 0
1792848b8605Smrg
1793b8e80941Smrg  dst.z = src0.zw == src1.zw ? \sim 0 : 0
1794848b8605Smrg
1795b8e80941Smrg.. opcode:: DSNE - Set on Not Equal
1796848b8605Smrg
1797848b8605Smrg.. math::
1798848b8605Smrg
1799b8e80941Smrg  dst.x = src0.xy != src1.xy ? \sim 0 : 0
1800848b8605Smrg
1801b8e80941Smrg  dst.z = src0.zw != src1.zw ? \sim 0 : 0
1802848b8605Smrg
1803848b8605Smrg.. opcode:: DSLT - Set on Less than
1804848b8605Smrg
1805848b8605Smrg.. math::
1806848b8605Smrg
1807b8e80941Smrg  dst.x = src0.xy < src1.xy ? \sim 0 : 0
1808b8e80941Smrg
1809b8e80941Smrg  dst.z = src0.zw < src1.zw ? \sim 0 : 0
1810b8e80941Smrg
1811b8e80941Smrg.. opcode:: DSGE - Set on Greater equal
1812b8e80941Smrg
1813b8e80941Smrg.. math::
1814b8e80941Smrg
1815b8e80941Smrg  dst.x = src0.xy >= src1.xy ? \sim 0 : 0
1816848b8605Smrg
1817b8e80941Smrg  dst.z = src0.zw >= src1.zw ? \sim 0 : 0
1818848b8605Smrg
1819848b8605Smrg.. opcode:: DFRAC - Fraction
1820848b8605Smrg
1821848b8605Smrg.. math::
1822848b8605Smrg
1823848b8605Smrg  dst.xy = src.xy - \lfloor src.xy\rfloor
1824848b8605Smrg
1825848b8605Smrg  dst.zw = src.zw - \lfloor src.zw\rfloor
1826848b8605Smrg
1827b8e80941Smrg.. opcode:: DTRUNC - Truncate
1828b8e80941Smrg
1829b8e80941Smrg.. math::
1830b8e80941Smrg
1831b8e80941Smrg  dst.xy = trunc(src.xy)
1832b8e80941Smrg
1833b8e80941Smrg  dst.zw = trunc(src.zw)
1834b8e80941Smrg
1835b8e80941Smrg.. opcode:: DCEIL - Ceiling
1836b8e80941Smrg
1837b8e80941Smrg.. math::
1838b8e80941Smrg
1839b8e80941Smrg  dst.xy = \lceil src.xy\rceil
1840b8e80941Smrg
1841b8e80941Smrg  dst.zw = \lceil src.zw\rceil
1842b8e80941Smrg
1843b8e80941Smrg.. opcode:: DFLR - Floor
1844b8e80941Smrg
1845b8e80941Smrg.. math::
1846b8e80941Smrg
1847b8e80941Smrg  dst.xy = \lfloor src.xy\rfloor
1848b8e80941Smrg
1849b8e80941Smrg  dst.zw = \lfloor src.zw\rfloor
1850b8e80941Smrg
1851b8e80941Smrg.. opcode:: DROUND - Fraction
1852b8e80941Smrg
1853b8e80941Smrg.. math::
1854b8e80941Smrg
1855b8e80941Smrg  dst.xy = round(src.xy)
1856b8e80941Smrg
1857b8e80941Smrg  dst.zw = round(src.zw)
1858b8e80941Smrg
1859b8e80941Smrg.. opcode:: DSSG - Set Sign
1860b8e80941Smrg
1861b8e80941Smrg.. math::
1862b8e80941Smrg
1863b8e80941Smrg  dst.xy = (src.xy > 0) ? 1.0 : (src.xy < 0) ? -1.0 : 0.0
1864b8e80941Smrg
1865b8e80941Smrg  dst.zw = (src.zw > 0) ? 1.0 : (src.zw < 0) ? -1.0 : 0.0
1866848b8605Smrg
1867848b8605Smrg.. opcode:: DFRACEXP - Convert Number to Fractional and Integral Components
1868848b8605Smrg
1869848b8605SmrgLike the ``frexp()`` routine in many math libraries, this opcode stores the
1870848b8605Smrgexponent of its source to ``dst0``, and the significand to ``dst1``, such that
1871b8e80941Smrg:math:`dst1 \times 2^{dst0} = src` . The results are replicated across
1872b8e80941Smrgchannels.
1873848b8605Smrg
1874848b8605Smrg.. math::
1875848b8605Smrg
1876b8e80941Smrg  dst0.xy = dst.zw = frac(src.xy)
1877848b8605Smrg
1878b8e80941Smrg  dst1 = frac(src.xy)
1879848b8605Smrg
1880848b8605Smrg
1881848b8605Smrg.. opcode:: DLDEXP - Multiply Number by Integral Power of 2
1882848b8605Smrg
1883b8e80941SmrgThis opcode is the inverse of :opcode:`DFRACEXP`. The second
1884b8e80941Smrgsource is an integer.
1885848b8605Smrg
1886848b8605Smrg.. math::
1887848b8605Smrg
1888b8e80941Smrg  dst.xy = src0.xy \times 2^{src1.x}
1889848b8605Smrg
1890b8e80941Smrg  dst.zw = src0.zw \times 2^{src1.z}
1891848b8605Smrg
1892848b8605Smrg.. opcode:: DMIN - Minimum
1893848b8605Smrg
1894848b8605Smrg.. math::
1895848b8605Smrg
1896848b8605Smrg  dst.xy = min(src0.xy, src1.xy)
1897848b8605Smrg
1898848b8605Smrg  dst.zw = min(src0.zw, src1.zw)
1899848b8605Smrg
1900848b8605Smrg.. opcode:: DMAX - Maximum
1901848b8605Smrg
1902848b8605Smrg.. math::
1903848b8605Smrg
1904848b8605Smrg  dst.xy = max(src0.xy, src1.xy)
1905848b8605Smrg
1906848b8605Smrg  dst.zw = max(src0.zw, src1.zw)
1907848b8605Smrg
1908848b8605Smrg.. opcode:: DMUL - Multiply
1909848b8605Smrg
1910848b8605Smrg.. math::
1911848b8605Smrg
1912848b8605Smrg  dst.xy = src0.xy \times src1.xy
1913848b8605Smrg
1914848b8605Smrg  dst.zw = src0.zw \times src1.zw
1915848b8605Smrg
1916848b8605Smrg
1917848b8605Smrg.. opcode:: DMAD - Multiply And Add
1918848b8605Smrg
1919848b8605Smrg.. math::
1920848b8605Smrg
1921848b8605Smrg  dst.xy = src0.xy \times src1.xy + src2.xy
1922848b8605Smrg
1923848b8605Smrg  dst.zw = src0.zw \times src1.zw + src2.zw
1924848b8605Smrg
1925848b8605Smrg
1926b8e80941Smrg.. opcode:: DFMA - Fused Multiply-Add
1927b8e80941Smrg
1928b8e80941SmrgPerform a * b + c with no intermediate rounding step.
1929b8e80941Smrg
1930b8e80941Smrg.. math::
1931b8e80941Smrg
1932b8e80941Smrg  dst.xy = src0.xy \times src1.xy + src2.xy
1933b8e80941Smrg
1934b8e80941Smrg  dst.zw = src0.zw \times src1.zw + src2.zw
1935b8e80941Smrg
1936b8e80941Smrg
1937b8e80941Smrg.. opcode:: DDIV - Divide
1938b8e80941Smrg
1939b8e80941Smrg.. math::
1940b8e80941Smrg
1941b8e80941Smrg  dst.xy = \frac{src0.xy}{src1.xy}
1942b8e80941Smrg
1943b8e80941Smrg  dst.zw = \frac{src0.zw}{src1.zw}
1944b8e80941Smrg
1945b8e80941Smrg
1946848b8605Smrg.. opcode:: DRCP - Reciprocal
1947848b8605Smrg
1948848b8605Smrg.. math::
1949848b8605Smrg
1950848b8605Smrg   dst.xy = \frac{1}{src.xy}
1951848b8605Smrg
1952848b8605Smrg   dst.zw = \frac{1}{src.zw}
1953848b8605Smrg
1954848b8605Smrg.. opcode:: DSQRT - Square Root
1955848b8605Smrg
1956848b8605Smrg.. math::
1957848b8605Smrg
1958848b8605Smrg   dst.xy = \sqrt{src.xy}
1959848b8605Smrg
1960848b8605Smrg   dst.zw = \sqrt{src.zw}
1961848b8605Smrg
1962b8e80941Smrg.. opcode:: DRSQ - Reciprocal Square Root
1963b8e80941Smrg
1964b8e80941Smrg.. math::
1965b8e80941Smrg
1966b8e80941Smrg   dst.xy = \frac{1}{\sqrt{src.xy}}
1967b8e80941Smrg
1968b8e80941Smrg   dst.zw = \frac{1}{\sqrt{src.zw}}
1969b8e80941Smrg
1970b8e80941Smrg.. opcode:: F2D - Float to Double
1971b8e80941Smrg
1972b8e80941Smrg.. math::
1973b8e80941Smrg
1974b8e80941Smrg   dst.xy = double(src0.x)
1975b8e80941Smrg
1976b8e80941Smrg   dst.zw = double(src0.y)
1977b8e80941Smrg
1978b8e80941Smrg.. opcode:: D2F - Double to Float
1979b8e80941Smrg
1980b8e80941Smrg.. math::
1981b8e80941Smrg
1982b8e80941Smrg   dst.x = float(src0.xy)
1983b8e80941Smrg
1984b8e80941Smrg   dst.y = float(src0.zw)
1985b8e80941Smrg
1986b8e80941Smrg.. opcode:: I2D - Int to Double
1987b8e80941Smrg
1988b8e80941Smrg.. math::
1989b8e80941Smrg
1990b8e80941Smrg   dst.xy = double(src0.x)
1991b8e80941Smrg
1992b8e80941Smrg   dst.zw = double(src0.y)
1993b8e80941Smrg
1994b8e80941Smrg.. opcode:: D2I - Double to Int
1995b8e80941Smrg
1996b8e80941Smrg.. math::
1997b8e80941Smrg
1998b8e80941Smrg   dst.x = int(src0.xy)
1999b8e80941Smrg
2000b8e80941Smrg   dst.y = int(src0.zw)
2001b8e80941Smrg
2002b8e80941Smrg.. opcode:: U2D - Unsigned Int to Double
2003b8e80941Smrg
2004b8e80941Smrg.. math::
2005b8e80941Smrg
2006b8e80941Smrg   dst.xy = double(src0.x)
2007b8e80941Smrg
2008b8e80941Smrg   dst.zw = double(src0.y)
2009b8e80941Smrg
2010b8e80941Smrg.. opcode:: D2U - Double to Unsigned Int
2011b8e80941Smrg
2012b8e80941Smrg.. math::
2013b8e80941Smrg
2014b8e80941Smrg   dst.x = unsigned(src0.xy)
2015b8e80941Smrg
2016b8e80941Smrg   dst.y = unsigned(src0.zw)
2017b8e80941Smrg
2018b8e80941Smrg64-bit Integer ISA
2019b8e80941Smrg^^^^^^^^^^^^^^^^^^
2020b8e80941Smrg
2021b8e80941SmrgThe 64-bit integer opcodes reinterpret four-component vectors into
2022b8e80941Smrgtwo-component vectors with 64-bits in each component.
2023b8e80941Smrg
2024b8e80941Smrg.. opcode:: I64ABS - 64-bit Integer Absolute Value
2025b8e80941Smrg
2026b8e80941Smrg.. math::
2027b8e80941Smrg
2028b8e80941Smrg  dst.xy = |src0.xy|
2029b8e80941Smrg
2030b8e80941Smrg  dst.zw = |src0.zw|
2031b8e80941Smrg
2032b8e80941Smrg.. opcode:: I64NEG - 64-bit Integer Negate
2033b8e80941Smrg
2034b8e80941Smrg  Two's complement.
2035b8e80941Smrg
2036b8e80941Smrg.. math::
2037b8e80941Smrg
2038b8e80941Smrg  dst.xy = -src.xy
2039b8e80941Smrg
2040b8e80941Smrg  dst.zw = -src.zw
2041b8e80941Smrg
2042b8e80941Smrg.. opcode:: I64SSG - 64-bit Integer Set Sign
2043b8e80941Smrg
2044b8e80941Smrg.. math::
2045b8e80941Smrg
2046b8e80941Smrg  dst.xy = (src0.xy < 0) ? -1 : (src0.xy > 0) ? 1 : 0
2047b8e80941Smrg
2048b8e80941Smrg  dst.zw = (src0.zw < 0) ? -1 : (src0.zw > 0) ? 1 : 0
2049b8e80941Smrg
2050b8e80941Smrg.. opcode:: U64ADD - 64-bit Integer Add
2051b8e80941Smrg
2052b8e80941Smrg.. math::
2053b8e80941Smrg
2054b8e80941Smrg  dst.xy = src0.xy + src1.xy
2055b8e80941Smrg
2056b8e80941Smrg  dst.zw = src0.zw + src1.zw
2057b8e80941Smrg
2058b8e80941Smrg.. opcode:: U64MUL - 64-bit Integer Multiply
2059b8e80941Smrg
2060b8e80941Smrg.. math::
2061b8e80941Smrg
2062b8e80941Smrg  dst.xy = src0.xy * src1.xy
2063b8e80941Smrg
2064b8e80941Smrg  dst.zw = src0.zw * src1.zw
2065b8e80941Smrg
2066b8e80941Smrg.. opcode:: U64SEQ - 64-bit Integer Set on Equal
2067b8e80941Smrg
2068b8e80941Smrg.. math::
2069b8e80941Smrg
2070b8e80941Smrg  dst.x = src0.xy == src1.xy ? \sim 0 : 0
2071b8e80941Smrg
2072b8e80941Smrg  dst.z = src0.zw == src1.zw ? \sim 0 : 0
2073b8e80941Smrg
2074b8e80941Smrg.. opcode:: U64SNE - 64-bit Integer Set on Not Equal
2075b8e80941Smrg
2076b8e80941Smrg.. math::
2077b8e80941Smrg
2078b8e80941Smrg  dst.x = src0.xy != src1.xy ? \sim 0 : 0
2079b8e80941Smrg
2080b8e80941Smrg  dst.z = src0.zw != src1.zw ? \sim 0 : 0
2081b8e80941Smrg
2082b8e80941Smrg.. opcode:: U64SLT - 64-bit Unsigned Integer Set on Less Than
2083b8e80941Smrg
2084b8e80941Smrg.. math::
2085b8e80941Smrg
2086b8e80941Smrg  dst.x = src0.xy < src1.xy ? \sim 0 : 0
2087b8e80941Smrg
2088b8e80941Smrg  dst.z = src0.zw < src1.zw ? \sim 0 : 0
2089b8e80941Smrg
2090b8e80941Smrg.. opcode:: U64SGE - 64-bit Unsigned Integer Set on Greater Equal
2091b8e80941Smrg
2092b8e80941Smrg.. math::
2093b8e80941Smrg
2094b8e80941Smrg  dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2095b8e80941Smrg
2096b8e80941Smrg  dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2097b8e80941Smrg
2098b8e80941Smrg.. opcode:: I64SLT - 64-bit Signed Integer Set on Less Than
2099b8e80941Smrg
2100b8e80941Smrg.. math::
2101b8e80941Smrg
2102b8e80941Smrg  dst.x = src0.xy < src1.xy ? \sim 0 : 0
2103b8e80941Smrg
2104b8e80941Smrg  dst.z = src0.zw < src1.zw ? \sim 0 : 0
2105b8e80941Smrg
2106b8e80941Smrg.. opcode:: I64SGE - 64-bit Signed Integer Set on Greater Equal
2107b8e80941Smrg
2108b8e80941Smrg.. math::
2109b8e80941Smrg
2110b8e80941Smrg  dst.x = src0.xy >= src1.xy ? \sim 0 : 0
2111b8e80941Smrg
2112b8e80941Smrg  dst.z = src0.zw >= src1.zw ? \sim 0 : 0
2113b8e80941Smrg
2114b8e80941Smrg.. opcode:: I64MIN - Minimum of 64-bit Signed Integers
2115b8e80941Smrg
2116b8e80941Smrg.. math::
2117b8e80941Smrg
2118b8e80941Smrg  dst.xy = min(src0.xy, src1.xy)
2119b8e80941Smrg
2120b8e80941Smrg  dst.zw = min(src0.zw, src1.zw)
2121b8e80941Smrg
2122b8e80941Smrg.. opcode:: U64MIN - Minimum of 64-bit Unsigned Integers
2123b8e80941Smrg
2124b8e80941Smrg.. math::
2125b8e80941Smrg
2126b8e80941Smrg  dst.xy = min(src0.xy, src1.xy)
2127b8e80941Smrg
2128b8e80941Smrg  dst.zw = min(src0.zw, src1.zw)
2129b8e80941Smrg
2130b8e80941Smrg.. opcode:: I64MAX - Maximum of 64-bit Signed Integers
2131b8e80941Smrg
2132b8e80941Smrg.. math::
2133b8e80941Smrg
2134b8e80941Smrg  dst.xy = max(src0.xy, src1.xy)
2135b8e80941Smrg
2136b8e80941Smrg  dst.zw = max(src0.zw, src1.zw)
2137b8e80941Smrg
2138b8e80941Smrg.. opcode:: U64MAX - Maximum of 64-bit Unsigned Integers
2139b8e80941Smrg
2140b8e80941Smrg.. math::
2141b8e80941Smrg
2142b8e80941Smrg  dst.xy = max(src0.xy, src1.xy)
2143b8e80941Smrg
2144b8e80941Smrg  dst.zw = max(src0.zw, src1.zw)
2145b8e80941Smrg
2146b8e80941Smrg.. opcode:: U64SHL - Shift Left 64-bit Unsigned Integer
2147b8e80941Smrg
2148b8e80941Smrg   The shift count is masked with 0x3f before the shift is applied.
2149b8e80941Smrg
2150b8e80941Smrg.. math::
2151b8e80941Smrg
2152b8e80941Smrg  dst.xy = src0.xy << (0x3f \& src1.x)
2153b8e80941Smrg
2154b8e80941Smrg  dst.zw = src0.zw << (0x3f \& src1.y)
2155b8e80941Smrg
2156b8e80941Smrg.. opcode:: I64SHR - Arithmetic Shift Right (of 64-bit Signed Integer)
2157b8e80941Smrg
2158b8e80941Smrg   The shift count is masked with 0x3f before the shift is applied.
2159b8e80941Smrg
2160b8e80941Smrg.. math::
2161b8e80941Smrg
2162b8e80941Smrg  dst.xy = src0.xy >> (0x3f \& src1.x)
2163b8e80941Smrg
2164b8e80941Smrg  dst.zw = src0.zw >> (0x3f \& src1.y)
2165b8e80941Smrg
2166b8e80941Smrg.. opcode:: U64SHR - Logical Shift Right (of 64-bit Unsigned Integer)
2167b8e80941Smrg
2168b8e80941Smrg   The shift count is masked with 0x3f before the shift is applied.
2169b8e80941Smrg
2170b8e80941Smrg.. math::
2171b8e80941Smrg
2172b8e80941Smrg  dst.xy = src0.xy >> (unsigned) (0x3f \& src1.x)
2173b8e80941Smrg
2174b8e80941Smrg  dst.zw = src0.zw >> (unsigned) (0x3f \& src1.y)
2175b8e80941Smrg
2176b8e80941Smrg.. opcode:: I64DIV - 64-bit Signed Integer Division
2177b8e80941Smrg
2178b8e80941Smrg.. math::
2179b8e80941Smrg
2180b8e80941Smrg  dst.xy = \frac{src0.xy}{src1.xy}
2181b8e80941Smrg
2182b8e80941Smrg  dst.zw = \frac{src0.zw}{src1.zw}
2183b8e80941Smrg
2184b8e80941Smrg.. opcode:: U64DIV - 64-bit Unsigned Integer Division
2185b8e80941Smrg
2186b8e80941Smrg.. math::
2187b8e80941Smrg
2188b8e80941Smrg  dst.xy = \frac{src0.xy}{src1.xy}
2189b8e80941Smrg
2190b8e80941Smrg  dst.zw = \frac{src0.zw}{src1.zw}
2191b8e80941Smrg
2192b8e80941Smrg.. opcode:: U64MOD - 64-bit Unsigned Integer Remainder
2193b8e80941Smrg
2194b8e80941Smrg.. math::
2195b8e80941Smrg
2196b8e80941Smrg  dst.xy = src0.xy \bmod src1.xy
2197b8e80941Smrg
2198b8e80941Smrg  dst.zw = src0.zw \bmod src1.zw
2199b8e80941Smrg
2200b8e80941Smrg.. opcode:: I64MOD - 64-bit Signed Integer Remainder
2201b8e80941Smrg
2202b8e80941Smrg.. math::
2203b8e80941Smrg
2204b8e80941Smrg  dst.xy = src0.xy \bmod src1.xy
2205b8e80941Smrg
2206b8e80941Smrg  dst.zw = src0.zw \bmod src1.zw
2207b8e80941Smrg
2208b8e80941Smrg.. opcode:: F2U64 - Float to 64-bit Unsigned Int
2209b8e80941Smrg
2210b8e80941Smrg.. math::
2211b8e80941Smrg
2212b8e80941Smrg   dst.xy = (uint64_t) src0.x
2213b8e80941Smrg
2214b8e80941Smrg   dst.zw = (uint64_t) src0.y
2215b8e80941Smrg
2216b8e80941Smrg.. opcode:: F2I64 - Float to 64-bit Int
2217b8e80941Smrg
2218b8e80941Smrg.. math::
2219b8e80941Smrg
2220b8e80941Smrg   dst.xy = (int64_t) src0.x
2221b8e80941Smrg
2222b8e80941Smrg   dst.zw = (int64_t) src0.y
2223b8e80941Smrg
2224b8e80941Smrg.. opcode:: U2I64 - Unsigned Integer to 64-bit Integer
2225b8e80941Smrg
2226b8e80941Smrg   This is a zero extension.
2227b8e80941Smrg
2228b8e80941Smrg.. math::
2229b8e80941Smrg
2230b8e80941Smrg   dst.xy = (int64_t) src0.x
2231b8e80941Smrg
2232b8e80941Smrg   dst.zw = (int64_t) src0.y
2233b8e80941Smrg
2234b8e80941Smrg.. opcode:: I2I64 - Signed Integer to 64-bit Integer
2235b8e80941Smrg
2236b8e80941Smrg   This is a sign extension.
2237b8e80941Smrg
2238b8e80941Smrg.. math::
2239b8e80941Smrg
2240b8e80941Smrg   dst.xy = (int64_t) src0.x
2241b8e80941Smrg
2242b8e80941Smrg   dst.zw = (int64_t) src0.y
2243b8e80941Smrg
2244b8e80941Smrg.. opcode:: D2U64 - Double to 64-bit Unsigned Int
2245b8e80941Smrg
2246b8e80941Smrg.. math::
2247b8e80941Smrg
2248b8e80941Smrg   dst.xy = (uint64_t) src0.xy
2249b8e80941Smrg
2250b8e80941Smrg   dst.zw = (uint64_t) src0.zw
2251b8e80941Smrg
2252b8e80941Smrg.. opcode:: D2I64 - Double to 64-bit Int
2253b8e80941Smrg
2254b8e80941Smrg.. math::
2255b8e80941Smrg
2256b8e80941Smrg   dst.xy = (int64_t) src0.xy
2257b8e80941Smrg
2258b8e80941Smrg   dst.zw = (int64_t) src0.zw
2259b8e80941Smrg
2260b8e80941Smrg.. opcode:: U642F - 64-bit unsigned integer to float
2261b8e80941Smrg
2262b8e80941Smrg.. math::
2263b8e80941Smrg
2264b8e80941Smrg   dst.x = (float) src0.xy
2265b8e80941Smrg
2266b8e80941Smrg   dst.y = (float) src0.zw
2267b8e80941Smrg
2268b8e80941Smrg.. opcode:: I642F - 64-bit Int to Float
2269b8e80941Smrg
2270b8e80941Smrg.. math::
2271b8e80941Smrg
2272b8e80941Smrg   dst.x = (float) src0.xy
2273b8e80941Smrg
2274b8e80941Smrg   dst.y = (float) src0.zw
2275b8e80941Smrg
2276b8e80941Smrg.. opcode:: U642D - 64-bit unsigned integer to double
2277b8e80941Smrg
2278b8e80941Smrg.. math::
2279b8e80941Smrg
2280b8e80941Smrg   dst.xy = (double) src0.xy
2281b8e80941Smrg
2282b8e80941Smrg   dst.zw = (double) src0.zw
2283b8e80941Smrg
2284b8e80941Smrg.. opcode:: I642D - 64-bit Int to double
2285b8e80941Smrg
2286b8e80941Smrg.. math::
2287b8e80941Smrg
2288b8e80941Smrg   dst.xy = (double) src0.xy
2289b8e80941Smrg
2290b8e80941Smrg   dst.zw = (double) src0.zw
2291848b8605Smrg
2292848b8605Smrg.. _samplingopcodes:
2293848b8605Smrg
2294848b8605SmrgResource Sampling Opcodes
2295848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^^
2296848b8605Smrg
2297848b8605SmrgThose opcodes follow very closely semantics of the respective Direct3D
2298848b8605Smrginstructions. If in doubt double check Direct3D documentation.
2299848b8605SmrgNote that the swizzle on SVIEW (src1) determines texel swizzling
2300848b8605Smrgafter lookup.
2301848b8605Smrg
2302848b8605Smrg.. opcode:: SAMPLE
2303848b8605Smrg
2304848b8605Smrg  Using provided address, sample data from the specified texture using the
2305b8e80941Smrg  filtering mode identified by the given sampler. The source data may come from
2306848b8605Smrg  any resource type other than buffers.
2307848b8605Smrg
2308848b8605Smrg  Syntax: ``SAMPLE dst, address, sampler_view, sampler``
2309848b8605Smrg
2310848b8605Smrg  Example: ``SAMPLE TEMP[0], TEMP[1], SVIEW[0], SAMP[0]``
2311848b8605Smrg
2312848b8605Smrg.. opcode:: SAMPLE_I
2313848b8605Smrg
2314848b8605Smrg  Simplified alternative to the SAMPLE instruction.  Using the provided
2315848b8605Smrg  integer address, SAMPLE_I fetches data from the specified sampler view
2316848b8605Smrg  without any filtering.  The source data may come from any resource type
2317848b8605Smrg  other than CUBE.
2318848b8605Smrg
2319848b8605Smrg  Syntax: ``SAMPLE_I dst, address, sampler_view``
2320848b8605Smrg
2321848b8605Smrg  Example: ``SAMPLE_I TEMP[0], TEMP[1], SVIEW[0]``
2322848b8605Smrg
2323848b8605Smrg  The 'address' is specified as unsigned integers. If the 'address' is out of
2324848b8605Smrg  range [0...(# texels - 1)] the result of the fetch is always 0 in all
2325848b8605Smrg  components.  As such the instruction doesn't honor address wrap modes, in
2326848b8605Smrg  cases where that behavior is desirable 'SAMPLE' instruction should be used.
2327848b8605Smrg  address.w always provides an unsigned integer mipmap level. If the value is
2328848b8605Smrg  out of the range then the instruction always returns 0 in all components.
2329848b8605Smrg  address.yz are ignored for buffers and 1d textures.  address.z is ignored
2330848b8605Smrg  for 1d texture arrays and 2d textures.
2331848b8605Smrg
2332848b8605Smrg  For 1D texture arrays address.y provides the array index (also as unsigned
2333848b8605Smrg  integer). If the value is out of the range of available array indices
2334848b8605Smrg  [0... (array size - 1)] then the opcode always returns 0 in all components.
2335848b8605Smrg  For 2D texture arrays address.z provides the array index, otherwise it
2336848b8605Smrg  exhibits the same behavior as in the case for 1D texture arrays.  The exact
2337848b8605Smrg  semantics of the source address are presented in the table below:
2338848b8605Smrg
2339848b8605Smrg  +---------------------------+----+-----+-----+---------+
2340848b8605Smrg  | resource type             | X  |  Y  |  Z  |    W    |
2341848b8605Smrg  +===========================+====+=====+=====+=========+
2342848b8605Smrg  | ``PIPE_BUFFER``           | x  |     |     | ignored |
2343848b8605Smrg  +---------------------------+----+-----+-----+---------+
2344848b8605Smrg  | ``PIPE_TEXTURE_1D``       | x  |     |     |   mpl   |
2345848b8605Smrg  +---------------------------+----+-----+-----+---------+
2346848b8605Smrg  | ``PIPE_TEXTURE_2D``       | x  |  y  |     |   mpl   |
2347848b8605Smrg  +---------------------------+----+-----+-----+---------+
2348848b8605Smrg  | ``PIPE_TEXTURE_3D``       | x  |  y  |  z  |   mpl   |
2349848b8605Smrg  +---------------------------+----+-----+-----+---------+
2350848b8605Smrg  | ``PIPE_TEXTURE_RECT``     | x  |  y  |     |   mpl   |
2351848b8605Smrg  +---------------------------+----+-----+-----+---------+
2352848b8605Smrg  | ``PIPE_TEXTURE_CUBE``     | not allowed as source    |
2353848b8605Smrg  +---------------------------+----+-----+-----+---------+
2354848b8605Smrg  | ``PIPE_TEXTURE_1D_ARRAY`` | x  | idx |     |   mpl   |
2355848b8605Smrg  +---------------------------+----+-----+-----+---------+
2356848b8605Smrg  | ``PIPE_TEXTURE_2D_ARRAY`` | x  |  y  | idx |   mpl   |
2357848b8605Smrg  +---------------------------+----+-----+-----+---------+
2358848b8605Smrg
2359848b8605Smrg  Where 'mpl' is a mipmap level and 'idx' is the array index.
2360848b8605Smrg
2361848b8605Smrg.. opcode:: SAMPLE_I_MS
2362848b8605Smrg
2363848b8605Smrg  Just like SAMPLE_I but allows fetch data from multi-sampled surfaces.
2364848b8605Smrg
2365848b8605Smrg  Syntax: ``SAMPLE_I_MS dst, address, sampler_view, sample``
2366848b8605Smrg
2367848b8605Smrg.. opcode:: SAMPLE_B
2368848b8605Smrg
2369848b8605Smrg  Just like the SAMPLE instruction with the exception that an additional bias
2370848b8605Smrg  is applied to the level of detail computed as part of the instruction
2371848b8605Smrg  execution.
2372848b8605Smrg
2373848b8605Smrg  Syntax: ``SAMPLE_B dst, address, sampler_view, sampler, lod_bias``
2374848b8605Smrg
2375848b8605Smrg  Example: ``SAMPLE_B TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2376848b8605Smrg
2377848b8605Smrg.. opcode:: SAMPLE_C
2378848b8605Smrg
2379848b8605Smrg  Similar to the SAMPLE instruction but it performs a comparison filter. The
2380848b8605Smrg  operands to SAMPLE_C are identical to SAMPLE, except that there is an
2381848b8605Smrg  additional float32 operand, reference value, which must be a register with
2382848b8605Smrg  single-component, or a scalar literal.  SAMPLE_C makes the hardware use the
2383848b8605Smrg  current samplers compare_func (in pipe_sampler_state) to compare reference
2384848b8605Smrg  value against the red component value for the surce resource at each texel
2385848b8605Smrg  that the currently configured texture filter covers based on the provided
2386848b8605Smrg  coordinates.
2387848b8605Smrg
2388848b8605Smrg  Syntax: ``SAMPLE_C dst, address, sampler_view.r, sampler, ref_value``
2389848b8605Smrg
2390848b8605Smrg  Example: ``SAMPLE_C TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2391848b8605Smrg
2392848b8605Smrg.. opcode:: SAMPLE_C_LZ
2393848b8605Smrg
2394848b8605Smrg  Same as SAMPLE_C, but LOD is 0 and derivatives are ignored. The LZ stands
2395848b8605Smrg  for level-zero.
2396848b8605Smrg
2397848b8605Smrg  Syntax: ``SAMPLE_C_LZ dst, address, sampler_view.r, sampler, ref_value``
2398848b8605Smrg
2399848b8605Smrg  Example: ``SAMPLE_C_LZ TEMP[0], TEMP[1], SVIEW[0].r, SAMP[0], TEMP[2].x``
2400848b8605Smrg
2401848b8605Smrg
2402848b8605Smrg.. opcode:: SAMPLE_D
2403848b8605Smrg
2404848b8605Smrg  SAMPLE_D is identical to the SAMPLE opcode except that the derivatives for
2405848b8605Smrg  the source address in the x direction and the y direction are provided by
2406848b8605Smrg  extra parameters.
2407848b8605Smrg
2408848b8605Smrg  Syntax: ``SAMPLE_D dst, address, sampler_view, sampler, der_x, der_y``
2409848b8605Smrg
2410848b8605Smrg  Example: ``SAMPLE_D TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2], TEMP[3]``
2411848b8605Smrg
2412848b8605Smrg.. opcode:: SAMPLE_L
2413848b8605Smrg
2414848b8605Smrg  SAMPLE_L is identical to the SAMPLE opcode except that the LOD is provided
2415848b8605Smrg  directly as a scalar value, representing no anisotropy.
2416848b8605Smrg
2417848b8605Smrg  Syntax: ``SAMPLE_L dst, address, sampler_view, sampler, explicit_lod``
2418848b8605Smrg
2419848b8605Smrg  Example: ``SAMPLE_L TEMP[0], TEMP[1], SVIEW[0], SAMP[0], TEMP[2].x``
2420848b8605Smrg
2421848b8605Smrg.. opcode:: GATHER4
2422848b8605Smrg
2423848b8605Smrg  Gathers the four texels to be used in a bi-linear filtering operation and
2424848b8605Smrg  packs them into a single register.  Only works with 2D, 2D array, cubemaps,
2425848b8605Smrg  and cubemaps arrays.  For 2D textures, only the addressing modes of the
2426848b8605Smrg  sampler and the top level of any mip pyramid are used. Set W to zero.  It
2427848b8605Smrg  behaves like the SAMPLE instruction, but a filtered sample is not
2428848b8605Smrg  generated. The four samples that contribute to filtering are placed into
2429848b8605Smrg  xyzw in counter-clockwise order, starting with the (u,v) texture coordinate
2430848b8605Smrg  delta at the following locations (-, +), (+, +), (+, -), (-, -), where the
2431848b8605Smrg  magnitude of the deltas are half a texel.
2432848b8605Smrg
2433848b8605Smrg
2434848b8605Smrg.. opcode:: SVIEWINFO
2435848b8605Smrg
2436848b8605Smrg  Query the dimensions of a given sampler view.  dst receives width, height,
2437848b8605Smrg  depth or array size and number of mipmap levels as int4. The dst can have a
2438848b8605Smrg  writemask which will specify what info is the caller interested in.
2439848b8605Smrg
2440848b8605Smrg  Syntax: ``SVIEWINFO dst, src_mip_level, sampler_view``
2441848b8605Smrg
2442848b8605Smrg  Example: ``SVIEWINFO TEMP[0], TEMP[1].x, SVIEW[0]``
2443848b8605Smrg
2444848b8605Smrg  src_mip_level is an unsigned integer scalar. If it's out of range then
2445848b8605Smrg  returns 0 for width, height and depth/array size but the total number of
2446848b8605Smrg  mipmap is still returned correctly for the given sampler view.  The returned
2447848b8605Smrg  width, height and depth values are for the mipmap level selected by the
2448848b8605Smrg  src_mip_level and are in the number of texels.  For 1d texture array width
2449848b8605Smrg  is in dst.x, array size is in dst.y and dst.z is 0. The number of mipmaps is
2450848b8605Smrg  still in dst.w.  In contrast to d3d10 resinfo, there's no way in the tgsi
2451848b8605Smrg  instruction encoding to specify the return type (float/rcpfloat/uint), hence
2452848b8605Smrg  always using uint. Also, unlike the SAMPLE instructions, the swizzle on src1
2453848b8605Smrg  resinfo allowing swizzling dst values is ignored (due to the interaction
2454848b8605Smrg  with rcpfloat modifier which requires some swizzle handling in the state
2455848b8605Smrg  tracker anyway).
2456848b8605Smrg
2457848b8605Smrg.. opcode:: SAMPLE_POS
2458848b8605Smrg
2459b8e80941Smrg  Query the position of a sample in the given resource or render target
2460b8e80941Smrg  when per-sample fragment shading is in effect.
2461b8e80941Smrg
2462b8e80941Smrg  Syntax: ``SAMPLE_POS dst, source, sample_index``
2463b8e80941Smrg
2464b8e80941Smrg  dst receives float4 (x, y, undef, undef) indicated where the sample is
2465b8e80941Smrg  located. Sample locations are in the range [0, 1] where 0.5 is the center
2466b8e80941Smrg  of the fragment.
2467b8e80941Smrg
2468b8e80941Smrg  source is either a sampler view (to indicate a shader resource) or temp
2469b8e80941Smrg  register (to indicate the render target).  The source register may have
2470b8e80941Smrg  an optional swizzle to apply to the returned result
2471b8e80941Smrg
2472b8e80941Smrg  sample_index is an integer scalar indicating which sample position is to
2473b8e80941Smrg  be queried.
2474b8e80941Smrg
2475b8e80941Smrg  If per-sample shading is not in effect or the source resource or render
2476b8e80941Smrg  target is not multisampled, the result is (0.5, 0.5, undef, undef).
2477b8e80941Smrg
2478b8e80941Smrg  NOTE: no driver has implemented this opcode yet (and no state tracker
2479b8e80941Smrg  emits it).  This information is subject to change.
2480848b8605Smrg
2481848b8605Smrg.. opcode:: SAMPLE_INFO
2482848b8605Smrg
2483b8e80941Smrg  Query the number of samples in a multisampled resource or render target.
2484b8e80941Smrg
2485b8e80941Smrg  Syntax: ``SAMPLE_INFO dst, source``
2486b8e80941Smrg
2487b8e80941Smrg  dst receives int4 (n, 0, 0, 0) where n is the number of samples in a
2488b8e80941Smrg  resource or the render target.
2489b8e80941Smrg
2490b8e80941Smrg  source is either a sampler view (to indicate a shader resource) or temp
2491b8e80941Smrg  register (to indicate the render target).  The source register may have
2492b8e80941Smrg  an optional swizzle to apply to the returned result
2493b8e80941Smrg
2494b8e80941Smrg  If per-sample shading is not in effect or the source resource or render
2495b8e80941Smrg  target is not multisampled, the result is (1, 0, 0, 0).
2496b8e80941Smrg
2497b8e80941Smrg  NOTE: no driver has implemented this opcode yet (and no state tracker
2498b8e80941Smrg  emits it).  This information is subject to change.
2499b8e80941Smrg
2500b8e80941Smrg.. opcode:: LOD - level of detail
2501b8e80941Smrg
2502b8e80941Smrg   Same syntax as the SAMPLE opcode but instead of performing an actual
2503b8e80941Smrg   texture lookup/filter, return the computed LOD information that the
2504b8e80941Smrg   texture pipe would use to access the texture. The Y component contains
2505b8e80941Smrg   the computed LOD lambda_prime. The X component contains the LOD that will
2506b8e80941Smrg   be accessed, based on min/max lod's and mipmap filters.
2507b8e80941Smrg   The Z and W components are set to 0.
2508b8e80941Smrg
2509b8e80941Smrg   Syntax: ``LOD dst, address, sampler_view, sampler``
2510848b8605Smrg
2511848b8605Smrg
2512848b8605Smrg.. _resourceopcodes:
2513848b8605Smrg
2514848b8605SmrgResource Access Opcodes
2515848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^
2516848b8605Smrg
2517b8e80941SmrgFor these opcodes, the resource can be a BUFFER, IMAGE, or MEMORY.
2518b8e80941Smrg
2519b8e80941Smrg.. opcode:: LOAD - Fetch data from a shader buffer or image
2520848b8605Smrg
2521848b8605Smrg               Syntax: ``LOAD dst, resource, address``
2522848b8605Smrg
2523b8e80941Smrg               Example: ``LOAD TEMP[0], BUFFER[0], TEMP[1]``
2524848b8605Smrg
2525848b8605Smrg               Using the provided integer address, LOAD fetches data
2526848b8605Smrg               from the specified buffer or texture without any
2527848b8605Smrg               filtering.
2528848b8605Smrg
2529848b8605Smrg               The 'address' is specified as a vector of unsigned
2530848b8605Smrg               integers.  If the 'address' is out of range the result
2531848b8605Smrg               is unspecified.
2532848b8605Smrg
2533848b8605Smrg               Only the first mipmap level of a resource can be read
2534848b8605Smrg               from using this instruction.
2535848b8605Smrg
2536848b8605Smrg               For 1D or 2D texture arrays, the array index is
2537848b8605Smrg               provided as an unsigned integer in address.y or
2538848b8605Smrg               address.z, respectively.  address.yz are ignored for
2539848b8605Smrg               buffers and 1D textures.  address.z is ignored for 1D
2540848b8605Smrg               texture arrays and 2D textures.  address.w is always
2541848b8605Smrg               ignored.
2542848b8605Smrg
2543b8e80941Smrg               A swizzle suffix may be added to the resource argument
2544b8e80941Smrg               this will cause the resource data to be swizzled accordingly.
2545b8e80941Smrg
2546848b8605Smrg.. opcode:: STORE - Write data to a shader resource
2547848b8605Smrg
2548848b8605Smrg               Syntax: ``STORE resource, address, src``
2549848b8605Smrg
2550b8e80941Smrg               Example: ``STORE BUFFER[0], TEMP[0], TEMP[1]``
2551848b8605Smrg
2552848b8605Smrg               Using the provided integer address, STORE writes data
2553848b8605Smrg               to the specified buffer or texture.
2554848b8605Smrg
2555848b8605Smrg               The 'address' is specified as a vector of unsigned
2556848b8605Smrg               integers.  If the 'address' is out of range the result
2557848b8605Smrg               is unspecified.
2558848b8605Smrg
2559848b8605Smrg               Only the first mipmap level of a resource can be
2560848b8605Smrg               written to using this instruction.
2561848b8605Smrg
2562848b8605Smrg               For 1D or 2D texture arrays, the array index is
2563848b8605Smrg               provided as an unsigned integer in address.y or
2564848b8605Smrg               address.z, respectively.  address.yz are ignored for
2565848b8605Smrg               buffers and 1D textures.  address.z is ignored for 1D
2566848b8605Smrg               texture arrays and 2D textures.  address.w is always
2567848b8605Smrg               ignored.
2568848b8605Smrg
2569b8e80941Smrg.. opcode:: RESQ - Query information about a resource
2570848b8605Smrg
2571b8e80941Smrg  Syntax: ``RESQ dst, resource``
2572848b8605Smrg
2573b8e80941Smrg  Example: ``RESQ TEMP[0], BUFFER[0]``
2574848b8605Smrg
2575b8e80941Smrg  Returns information about the buffer or image resource. For buffer
2576b8e80941Smrg  resources, the size (in bytes) is returned in the x component. For
2577b8e80941Smrg  image resources, .xyz will contain the width/height/layers of the
2578b8e80941Smrg  image, while .w will contain the number of samples for multi-sampled
2579b8e80941Smrg  images.
2580b8e80941Smrg
2581b8e80941Smrg.. opcode:: FBFETCH - Load data from framebuffer
2582b8e80941Smrg
2583b8e80941Smrg  Syntax: ``FBFETCH dst, output``
2584b8e80941Smrg
2585b8e80941Smrg  Example: ``FBFETCH TEMP[0], OUT[0]``
2586848b8605Smrg
2587b8e80941Smrg  This is only valid on ``COLOR`` semantic outputs. Returns the color
2588b8e80941Smrg  of the current position in the framebuffer from before this fragment
2589b8e80941Smrg  shader invocation. May return the same value from multiple calls for
2590b8e80941Smrg  a particular output within a single invocation. Note that result may
2591b8e80941Smrg  be undefined if a fragment is drawn multiple times without a blend
2592b8e80941Smrg  barrier in between.
2593848b8605Smrg
2594848b8605Smrg
2595b8e80941Smrg.. _bindlessopcodes:
2596848b8605Smrg
2597b8e80941SmrgBindless Opcodes
2598b8e80941Smrg^^^^^^^^^^^^^^^^
2599848b8605Smrg
2600b8e80941SmrgThese opcodes are for working with bindless sampler or image handles and
2601b8e80941Smrgrequire PIPE_CAP_BINDLESS_TEXTURE.
2602848b8605Smrg
2603b8e80941Smrg.. opcode:: IMG2HND - Get a bindless handle for a image
2604848b8605Smrg
2605b8e80941Smrg  Syntax: ``IMG2HND dst, image``
2606848b8605Smrg
2607b8e80941Smrg  Example: ``IMG2HND TEMP[0], IMAGE[0]``
2608848b8605Smrg
2609b8e80941Smrg  Sets 'dst' to a bindless handle for 'image'.
2610848b8605Smrg
2611b8e80941Smrg.. opcode:: SAMP2HND - Get a bindless handle for a sampler
2612848b8605Smrg
2613b8e80941Smrg  Syntax: ``SAMP2HND dst, sampler``
2614848b8605Smrg
2615b8e80941Smrg  Example: ``SAMP2HND TEMP[0], SAMP[0]``
2616848b8605Smrg
2617b8e80941Smrg  Sets 'dst' to a bindless handle for 'sampler'.
2618848b8605Smrg
2619848b8605Smrg
2620b8e80941Smrg.. _threadsyncopcodes:
2621b8e80941Smrg
2622b8e80941SmrgInter-thread synchronization opcodes
2623b8e80941Smrg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2624b8e80941Smrg
2625b8e80941SmrgThese opcodes are intended for communication between threads running
2626b8e80941Smrgwithin the same compute grid.  For now they're only valid in compute
2627b8e80941Smrgprograms.
2628848b8605Smrg
2629848b8605Smrg.. opcode:: BARRIER - Thread group barrier
2630848b8605Smrg
2631848b8605Smrg  ``BARRIER``
2632848b8605Smrg
2633848b8605Smrg  This opcode suspends the execution of the current thread until all
2634848b8605Smrg  the remaining threads in the working group reach the same point of
2635848b8605Smrg  the program.  Results are unspecified if any of the remaining
2636848b8605Smrg  threads terminates or never reaches an executed BARRIER instruction.
2637848b8605Smrg
2638b8e80941Smrg.. opcode:: MEMBAR - Memory barrier
2639b8e80941Smrg
2640b8e80941Smrg  ``MEMBAR type``
2641b8e80941Smrg
2642b8e80941Smrg  This opcode waits for the completion of all memory accesses based on
2643b8e80941Smrg  the type passed in. The type is an immediate bitfield with the following
2644b8e80941Smrg  meaning:
2645b8e80941Smrg
2646b8e80941Smrg  Bit 0: Shader storage buffers
2647b8e80941Smrg  Bit 1: Atomic buffers
2648b8e80941Smrg  Bit 2: Images
2649b8e80941Smrg  Bit 3: Shared memory
2650b8e80941Smrg  Bit 4: Thread group
2651b8e80941Smrg
2652b8e80941Smrg  These may be passed in in any combination. An implementation is free to not
2653b8e80941Smrg  distinguish between these as it sees fit. However these map to all the
2654b8e80941Smrg  possibilities made available by GLSL.
2655848b8605Smrg
2656848b8605Smrg.. _atomopcodes:
2657848b8605Smrg
2658848b8605SmrgAtomic opcodes
2659848b8605Smrg^^^^^^^^^^^^^^
2660848b8605Smrg
2661848b8605SmrgThese opcodes provide atomic variants of some common arithmetic and
2662848b8605Smrglogical operations.  In this context atomicity means that another
2663848b8605Smrgconcurrent memory access operation that affects the same memory
2664848b8605Smrglocation is guaranteed to be performed strictly before or after the
2665b8e80941Smrgentire execution of the atomic operation. The resource may be a BUFFER,
2666b8e80941SmrgIMAGE, HWATOMIC, or MEMORY.  In the case of an image, the offset works
2667b8e80941Smrgthe same as for ``LOAD`` and ``STORE``, specified above. For atomic
2668b8e80941Smrgcounters, the offset is an immediate index to the base hw atomic
2669b8e80941Smrgcounter for this operation.
2670b8e80941SmrgThese atomic operations may only be used with 32-bit integer image formats.
2671848b8605Smrg
2672848b8605Smrg.. opcode:: ATOMUADD - Atomic integer addition
2673848b8605Smrg
2674848b8605Smrg  Syntax: ``ATOMUADD dst, resource, offset, src``
2675848b8605Smrg
2676b8e80941Smrg  Example: ``ATOMUADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2677b8e80941Smrg
2678b8e80941Smrg  The following operation is performed atomically:
2679b8e80941Smrg
2680b8e80941Smrg.. math::
2681b8e80941Smrg
2682b8e80941Smrg  dst_x = resource[offset]
2683b8e80941Smrg
2684b8e80941Smrg  resource[offset] = dst_x + src_x
2685b8e80941Smrg
2686848b8605Smrg
2687b8e80941Smrg.. opcode:: ATOMFADD - Atomic floating point addition
2688b8e80941Smrg
2689b8e80941Smrg  Syntax: ``ATOMFADD dst, resource, offset, src``
2690b8e80941Smrg
2691b8e80941Smrg  Example: ``ATOMFADD TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2692b8e80941Smrg
2693b8e80941Smrg  The following operation is performed atomically:
2694848b8605Smrg
2695848b8605Smrg.. math::
2696848b8605Smrg
2697b8e80941Smrg  dst_x = resource[offset]
2698848b8605Smrg
2699b8e80941Smrg  resource[offset] = dst_x + src_x
2700848b8605Smrg
2701848b8605Smrg
2702848b8605Smrg.. opcode:: ATOMXCHG - Atomic exchange
2703848b8605Smrg
2704848b8605Smrg  Syntax: ``ATOMXCHG dst, resource, offset, src``
2705848b8605Smrg
2706b8e80941Smrg  Example: ``ATOMXCHG TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2707848b8605Smrg
2708b8e80941Smrg  The following operation is performed atomically:
2709848b8605Smrg
2710848b8605Smrg.. math::
2711848b8605Smrg
2712b8e80941Smrg  dst_x = resource[offset]
2713848b8605Smrg
2714b8e80941Smrg  resource[offset] = src_x
2715848b8605Smrg
2716848b8605Smrg
2717848b8605Smrg.. opcode:: ATOMCAS - Atomic compare-and-exchange
2718848b8605Smrg
2719848b8605Smrg  Syntax: ``ATOMCAS dst, resource, offset, cmp, src``
2720848b8605Smrg
2721b8e80941Smrg  Example: ``ATOMCAS TEMP[0], BUFFER[0], TEMP[1], TEMP[2], TEMP[3]``
2722848b8605Smrg
2723b8e80941Smrg  The following operation is performed atomically:
2724848b8605Smrg
2725848b8605Smrg.. math::
2726848b8605Smrg
2727b8e80941Smrg  dst_x = resource[offset]
2728848b8605Smrg
2729b8e80941Smrg  resource[offset] = (dst_x == cmp_x ? src_x : dst_x)
2730848b8605Smrg
2731848b8605Smrg
2732848b8605Smrg.. opcode:: ATOMAND - Atomic bitwise And
2733848b8605Smrg
2734848b8605Smrg  Syntax: ``ATOMAND dst, resource, offset, src``
2735848b8605Smrg
2736b8e80941Smrg  Example: ``ATOMAND TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2737848b8605Smrg
2738b8e80941Smrg  The following operation is performed atomically:
2739848b8605Smrg
2740848b8605Smrg.. math::
2741848b8605Smrg
2742b8e80941Smrg  dst_x = resource[offset]
2743848b8605Smrg
2744b8e80941Smrg  resource[offset] = dst_x \& src_x
2745848b8605Smrg
2746848b8605Smrg
2747848b8605Smrg.. opcode:: ATOMOR - Atomic bitwise Or
2748848b8605Smrg
2749848b8605Smrg  Syntax: ``ATOMOR dst, resource, offset, src``
2750848b8605Smrg
2751b8e80941Smrg  Example: ``ATOMOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2752848b8605Smrg
2753b8e80941Smrg  The following operation is performed atomically:
2754848b8605Smrg
2755848b8605Smrg.. math::
2756848b8605Smrg
2757b8e80941Smrg  dst_x = resource[offset]
2758848b8605Smrg
2759b8e80941Smrg  resource[offset] = dst_x | src_x
2760848b8605Smrg
2761848b8605Smrg
2762848b8605Smrg.. opcode:: ATOMXOR - Atomic bitwise Xor
2763848b8605Smrg
2764848b8605Smrg  Syntax: ``ATOMXOR dst, resource, offset, src``
2765848b8605Smrg
2766b8e80941Smrg  Example: ``ATOMXOR TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2767848b8605Smrg
2768b8e80941Smrg  The following operation is performed atomically:
2769848b8605Smrg
2770848b8605Smrg.. math::
2771848b8605Smrg
2772b8e80941Smrg  dst_x = resource[offset]
2773848b8605Smrg
2774b8e80941Smrg  resource[offset] = dst_x \oplus src_x
2775848b8605Smrg
2776848b8605Smrg
2777848b8605Smrg.. opcode:: ATOMUMIN - Atomic unsigned minimum
2778848b8605Smrg
2779848b8605Smrg  Syntax: ``ATOMUMIN dst, resource, offset, src``
2780848b8605Smrg
2781b8e80941Smrg  Example: ``ATOMUMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2782848b8605Smrg
2783b8e80941Smrg  The following operation is performed atomically:
2784848b8605Smrg
2785848b8605Smrg.. math::
2786848b8605Smrg
2787b8e80941Smrg  dst_x = resource[offset]
2788848b8605Smrg
2789b8e80941Smrg  resource[offset] = (dst_x < src_x ? dst_x : src_x)
2790848b8605Smrg
2791848b8605Smrg
2792848b8605Smrg.. opcode:: ATOMUMAX - Atomic unsigned maximum
2793848b8605Smrg
2794848b8605Smrg  Syntax: ``ATOMUMAX dst, resource, offset, src``
2795848b8605Smrg
2796b8e80941Smrg  Example: ``ATOMUMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2797848b8605Smrg
2798b8e80941Smrg  The following operation is performed atomically:
2799848b8605Smrg
2800848b8605Smrg.. math::
2801848b8605Smrg
2802b8e80941Smrg  dst_x = resource[offset]
2803848b8605Smrg
2804b8e80941Smrg  resource[offset] = (dst_x > src_x ? dst_x : src_x)
2805848b8605Smrg
2806848b8605Smrg
2807848b8605Smrg.. opcode:: ATOMIMIN - Atomic signed minimum
2808848b8605Smrg
2809848b8605Smrg  Syntax: ``ATOMIMIN dst, resource, offset, src``
2810848b8605Smrg
2811b8e80941Smrg  Example: ``ATOMIMIN TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2812848b8605Smrg
2813b8e80941Smrg  The following operation is performed atomically:
2814848b8605Smrg
2815848b8605Smrg.. math::
2816848b8605Smrg
2817b8e80941Smrg  dst_x = resource[offset]
2818848b8605Smrg
2819b8e80941Smrg  resource[offset] = (dst_x < src_x ? dst_x : src_x)
2820848b8605Smrg
2821848b8605Smrg
2822848b8605Smrg.. opcode:: ATOMIMAX - Atomic signed maximum
2823848b8605Smrg
2824848b8605Smrg  Syntax: ``ATOMIMAX dst, resource, offset, src``
2825848b8605Smrg
2826b8e80941Smrg  Example: ``ATOMIMAX TEMP[0], BUFFER[0], TEMP[1], TEMP[2]``
2827848b8605Smrg
2828b8e80941Smrg  The following operation is performed atomically:
2829848b8605Smrg
2830848b8605Smrg.. math::
2831848b8605Smrg
2832b8e80941Smrg  dst_x = resource[offset]
2833b8e80941Smrg
2834b8e80941Smrg  resource[offset] = (dst_x > src_x ? dst_x : src_x)
2835b8e80941Smrg
2836b8e80941Smrg
2837b8e80941Smrg.. _interlaneopcodes:
2838b8e80941Smrg
2839b8e80941SmrgInter-lane opcodes
2840b8e80941Smrg^^^^^^^^^^^^^^^^^^
2841b8e80941Smrg
2842b8e80941SmrgThese opcodes reduce the given value across the shader invocations
2843b8e80941Smrgrunning in the current SIMD group. Every thread in the subgroup will receive
2844b8e80941Smrgthe same result. The BALLOT operations accept a single-channel argument that
2845b8e80941Smrgis treated as a boolean and produce a 64-bit value.
2846b8e80941Smrg
2847b8e80941Smrg.. opcode:: VOTE_ANY - Value is set in any of the active invocations
2848b8e80941Smrg
2849b8e80941Smrg  Syntax: ``VOTE_ANY dst, value``
2850b8e80941Smrg
2851b8e80941Smrg  Example: ``VOTE_ANY TEMP[0].x, TEMP[1].x``
2852b8e80941Smrg
2853b8e80941Smrg
2854b8e80941Smrg.. opcode:: VOTE_ALL - Value is set in all of the active invocations
2855b8e80941Smrg
2856b8e80941Smrg  Syntax: ``VOTE_ALL dst, value``
2857b8e80941Smrg
2858b8e80941Smrg  Example: ``VOTE_ALL TEMP[0].x, TEMP[1].x``
2859b8e80941Smrg
2860b8e80941Smrg
2861b8e80941Smrg.. opcode:: VOTE_EQ - Value is the same in all of the active invocations
2862b8e80941Smrg
2863b8e80941Smrg  Syntax: ``VOTE_EQ dst, value``
2864b8e80941Smrg
2865b8e80941Smrg  Example: ``VOTE_EQ TEMP[0].x, TEMP[1].x``
2866b8e80941Smrg
2867b8e80941Smrg
2868b8e80941Smrg.. opcode:: BALLOT - Lanemask of whether the value is set in each active
2869b8e80941Smrg            invocation
2870b8e80941Smrg
2871b8e80941Smrg  Syntax: ``BALLOT dst, value``
2872b8e80941Smrg
2873b8e80941Smrg  Example: ``BALLOT TEMP[0].xy, TEMP[1].x``
2874b8e80941Smrg
2875b8e80941Smrg  When the argument is a constant true, this produces a bitmask of active
2876b8e80941Smrg  invocations. In fragment shaders, this can include helper invocations
2877b8e80941Smrg  (invocations whose outputs and writes to memory are discarded, but which
2878b8e80941Smrg  are used to compute derivatives).
2879b8e80941Smrg
2880b8e80941Smrg
2881b8e80941Smrg.. opcode:: READ_FIRST - Broadcast the value from the first active
2882b8e80941Smrg            invocation to all active lanes
2883848b8605Smrg
2884b8e80941Smrg  Syntax: ``READ_FIRST dst, value``
2885848b8605Smrg
2886b8e80941Smrg  Example: ``READ_FIRST TEMP[0], TEMP[1]``
2887b8e80941Smrg
2888b8e80941Smrg
2889b8e80941Smrg.. opcode:: READ_INVOC - Retrieve the value from the given invocation
2890b8e80941Smrg            (need not be uniform)
2891b8e80941Smrg
2892b8e80941Smrg  Syntax: ``READ_INVOC dst, value, invocation``
2893b8e80941Smrg
2894b8e80941Smrg  Example: ``READ_INVOC TEMP[0].xy, TEMP[1].xy, TEMP[2].x``
2895b8e80941Smrg
2896b8e80941Smrg  invocation.x controls the invocation number to read from for all channels.
2897b8e80941Smrg  The invocation number must be the same across all active invocations in a
2898b8e80941Smrg  sub-group; otherwise, the results are undefined.
2899848b8605Smrg
2900848b8605Smrg
2901848b8605SmrgExplanation of symbols used
2902848b8605Smrg------------------------------
2903848b8605Smrg
2904848b8605Smrg
2905848b8605SmrgFunctions
2906848b8605Smrg^^^^^^^^^^^^^^
2907848b8605Smrg
2908848b8605Smrg
2909848b8605Smrg  :math:`|x|`       Absolute value of `x`.
2910848b8605Smrg
2911848b8605Smrg  :math:`\lceil x \rceil` Ceiling of `x`.
2912848b8605Smrg
2913848b8605Smrg  clamp(x,y,z)      Clamp x between y and z.
2914848b8605Smrg                    (x < y) ? y : (x > z) ? z : x
2915848b8605Smrg
2916848b8605Smrg  :math:`\lfloor x\rfloor` Floor of `x`.
2917848b8605Smrg
2918848b8605Smrg  :math:`\log_2{x}` Logarithm of `x`, base 2.
2919848b8605Smrg
2920848b8605Smrg  max(x,y)          Maximum of x and y.
2921848b8605Smrg                    (x > y) ? x : y
2922848b8605Smrg
2923848b8605Smrg  min(x,y)          Minimum of x and y.
2924848b8605Smrg                    (x < y) ? x : y
2925848b8605Smrg
2926848b8605Smrg  partialx(x)       Derivative of x relative to fragment's X.
2927848b8605Smrg
2928848b8605Smrg  partialy(x)       Derivative of x relative to fragment's Y.
2929848b8605Smrg
2930848b8605Smrg  pop()             Pop from stack.
2931848b8605Smrg
2932848b8605Smrg  :math:`x^y`       `x` to the power `y`.
2933848b8605Smrg
2934848b8605Smrg  push(x)           Push x on stack.
2935848b8605Smrg
2936848b8605Smrg  round(x)          Round x.
2937848b8605Smrg
2938848b8605Smrg  trunc(x)          Truncate x, i.e. drop the fraction bits.
2939848b8605Smrg
2940848b8605Smrg
2941848b8605SmrgKeywords
2942848b8605Smrg^^^^^^^^^^^^^
2943848b8605Smrg
2944848b8605Smrg
2945848b8605Smrg  discard           Discard fragment.
2946848b8605Smrg
2947848b8605Smrg  pc                Program counter.
2948848b8605Smrg
2949848b8605Smrg  target            Label of target instruction.
2950848b8605Smrg
2951848b8605Smrg
2952848b8605SmrgOther tokens
2953848b8605Smrg---------------
2954848b8605Smrg
2955848b8605Smrg
2956848b8605SmrgDeclaration
2957848b8605Smrg^^^^^^^^^^^
2958848b8605Smrg
2959848b8605Smrg
2960848b8605SmrgDeclares a register that is will be referenced as an operand in Instruction
2961848b8605Smrgtokens.
2962848b8605Smrg
2963848b8605SmrgFile field contains register file that is being declared and is one
2964848b8605Smrgof TGSI_FILE.
2965848b8605Smrg
2966848b8605SmrgUsageMask field specifies which of the register components can be accessed
2967848b8605Smrgand is one of TGSI_WRITEMASK.
2968848b8605Smrg
2969848b8605SmrgThe Local flag specifies that a given value isn't intended for
2970848b8605Smrgsubroutine parameter passing and, as a result, the implementation
2971848b8605Smrgisn't required to give any guarantees of it being preserved across
2972848b8605Smrgsubroutine boundaries.  As it's merely a compiler hint, the
2973848b8605Smrgimplementation is free to ignore it.
2974848b8605Smrg
2975848b8605SmrgIf Dimension flag is set to 1, a Declaration Dimension token follows.
2976848b8605Smrg
2977848b8605SmrgIf Semantic flag is set to 1, a Declaration Semantic token follows.
2978848b8605Smrg
2979848b8605SmrgIf Interpolate flag is set to 1, a Declaration Interpolate token follows.
2980848b8605Smrg
2981848b8605SmrgIf file is TGSI_FILE_RESOURCE, a Declaration Resource token follows.
2982848b8605Smrg
2983848b8605SmrgIf Array flag is set to 1, a Declaration Array token follows.
2984848b8605Smrg
2985848b8605SmrgArray Declaration
2986848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
2987848b8605Smrg
2988848b8605SmrgDeclarations can optional have an ArrayID attribute which can be referred by
2989b8e80941Smrgindirect addressing operands. An ArrayID of zero is reserved and treated as
2990848b8605Smrgif no ArrayID is specified.
2991848b8605Smrg
2992848b8605SmrgIf an indirect addressing operand refers to a specific declaration by using
2993848b8605Smrgan ArrayID only the registers in this declaration are guaranteed to be
2994848b8605Smrgaccessed, accessing any register outside this declaration results in undefined
2995848b8605Smrgbehavior. Note that for compatibility the effective index is zero-based and
2996848b8605Smrgnot relative to the specified declaration
2997848b8605Smrg
2998848b8605SmrgIf no ArrayID is specified with an indirect addressing operand the whole
2999848b8605Smrgregister file might be accessed by this operand. This is strongly discouraged
3000848b8605Smrgand will prevent packing of scalar/vec2 arrays and effective alias analysis.
3001b8e80941SmrgThis is only legal for TEMP and CONST register files.
3002848b8605Smrg
3003848b8605SmrgDeclaration Semantic
3004848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
3005848b8605Smrg
3006848b8605SmrgVertex and fragment shader input and output registers may be labeled
3007848b8605Smrgwith semantic information consisting of a name and index.
3008848b8605Smrg
3009848b8605SmrgFollows Declaration token if Semantic bit is set.
3010848b8605Smrg
3011848b8605SmrgSince its purpose is to link a shader with other stages of the pipeline,
3012848b8605Smrgit is valid to follow only those Declaration tokens that declare a register
3013848b8605Smrgeither in INPUT or OUTPUT file.
3014848b8605Smrg
3015848b8605SmrgSemanticName field contains the semantic name of the register being declared.
3016848b8605SmrgThere is no default value.
3017848b8605Smrg
3018848b8605SmrgSemanticIndex is an optional subscript that can be used to distinguish
3019848b8605Smrgdifferent register declarations with the same semantic name. The default value
3020848b8605Smrgis 0.
3021848b8605Smrg
3022848b8605SmrgThe meanings of the individual semantic names are explained in the following
3023848b8605Smrgsections.
3024848b8605Smrg
3025848b8605SmrgTGSI_SEMANTIC_POSITION
3026848b8605Smrg""""""""""""""""""""""
3027848b8605Smrg
3028848b8605SmrgFor vertex shaders, TGSI_SEMANTIC_POSITION indicates the vertex shader
3029848b8605Smrgoutput register which contains the homogeneous vertex position in the clip
3030848b8605Smrgspace coordinate system.  After clipping, the X, Y and Z components of the
3031848b8605Smrgvertex will be divided by the W value to get normalized device coordinates.
3032848b8605Smrg
3033848b8605SmrgFor fragment shaders, TGSI_SEMANTIC_POSITION is used to indicate that
3034b8e80941Smrgfragment shader input (or system value, depending on which one is
3035b8e80941Smrgsupported by the driver) contains the fragment's window position.  The X
3036848b8605Smrgcomponent starts at zero and always increases from left to right.
3037848b8605SmrgThe Y component starts at zero and always increases but Y=0 may either
3038848b8605Smrgindicate the top of the window or the bottom depending on the fragment
3039848b8605Smrgcoordinate origin convention (see TGSI_PROPERTY_FS_COORD_ORIGIN).
3040848b8605SmrgThe Z coordinate ranges from 0 to 1 to represent depth from the front
3041b8e80941Smrgto the back of the Z buffer.  The W component contains the interpolated
3042b8e80941Smrgreciprocal of the vertex position W component (corresponding to gl_Fragcoord,
3043b8e80941Smrgbut unlike d3d10 which interpolates the same 1/w but then gives back
3044b8e80941Smrgthe reciprocal of the interpolated value).
3045848b8605Smrg
3046848b8605SmrgFragment shaders may also declare an output register with
3047848b8605SmrgTGSI_SEMANTIC_POSITION.  Only the Z component is writable.  This allows
3048848b8605Smrgthe fragment shader to change the fragment's Z position.
3049848b8605Smrg
3050848b8605Smrg
3051848b8605Smrg
3052848b8605SmrgTGSI_SEMANTIC_COLOR
3053848b8605Smrg"""""""""""""""""""
3054848b8605Smrg
3055848b8605SmrgFor vertex shader outputs or fragment shader inputs/outputs, this
3056b8e80941Smrglabel indicates that the register contains an R,G,B,A color.
3057848b8605Smrg
3058848b8605SmrgSeveral shader inputs/outputs may contain colors so the semantic index
3059848b8605Smrgis used to distinguish them.  For example, color[0] may be the diffuse
3060848b8605Smrgcolor while color[1] may be the specular color.
3061848b8605Smrg
3062848b8605SmrgThis label is needed so that the flat/smooth shading can be applied
3063848b8605Smrgto the right interpolants during rasterization.
3064848b8605Smrg
3065848b8605Smrg
3066848b8605Smrg
3067848b8605SmrgTGSI_SEMANTIC_BCOLOR
3068848b8605Smrg""""""""""""""""""""
3069848b8605Smrg
3070848b8605SmrgBack-facing colors are only used for back-facing polygons, and are only valid
3071848b8605Smrgin vertex shader outputs. After rasterization, all polygons are front-facing
3072848b8605Smrgand COLOR and BCOLOR end up occupying the same slots in the fragment shader,
3073848b8605Smrgso all BCOLORs effectively become regular COLORs in the fragment shader.
3074848b8605Smrg
3075848b8605Smrg
3076848b8605SmrgTGSI_SEMANTIC_FOG
3077848b8605Smrg"""""""""""""""""
3078848b8605Smrg
3079848b8605SmrgVertex shader inputs and outputs and fragment shader inputs may be
3080848b8605Smrglabeled with TGSI_SEMANTIC_FOG to indicate that the register contains
3081848b8605Smrga fog coordinate.  Typically, the fragment shader will use the fog coordinate
3082848b8605Smrgto compute a fog blend factor which is used to blend the normal fragment color
3083848b8605Smrgwith a constant fog color.  But fog coord really is just an ordinary vec4
3084848b8605Smrgregister like regular semantics.
3085848b8605Smrg
3086848b8605Smrg
3087848b8605SmrgTGSI_SEMANTIC_PSIZE
3088848b8605Smrg"""""""""""""""""""
3089848b8605Smrg
3090848b8605SmrgVertex shader input and output registers may be labeled with
3091848b8605SmrgTGIS_SEMANTIC_PSIZE to indicate that the register contains a point size
3092848b8605Smrgin the form (S, 0, 0, 1).  The point size controls the width or diameter
3093848b8605Smrgof points for rasterization.  This label cannot be used in fragment
3094848b8605Smrgshaders.
3095848b8605Smrg
3096848b8605SmrgWhen using this semantic, be sure to set the appropriate state in the
3097848b8605Smrg:ref:`rasterizer` first.
3098848b8605Smrg
3099848b8605Smrg
3100848b8605SmrgTGSI_SEMANTIC_TEXCOORD
3101848b8605Smrg""""""""""""""""""""""
3102848b8605Smrg
3103848b8605SmrgOnly available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3104848b8605Smrg
3105848b8605SmrgVertex shader outputs and fragment shader inputs may be labeled with
3106848b8605Smrgthis semantic to make them replaceable by sprite coordinates via the
3107848b8605Smrgsprite_coord_enable state in the :ref:`rasterizer`.
3108848b8605SmrgThe semantic index permitted with this semantic is limited to <= 7.
3109848b8605Smrg
3110848b8605SmrgIf the driver does not support TEXCOORD, sprite coordinate replacement
3111848b8605Smrgapplies to inputs with the GENERIC semantic instead.
3112848b8605Smrg
3113848b8605SmrgThe intended use case for this semantic is gl_TexCoord.
3114848b8605Smrg
3115848b8605Smrg
3116848b8605SmrgTGSI_SEMANTIC_PCOORD
3117848b8605Smrg""""""""""""""""""""
3118848b8605Smrg
3119848b8605SmrgOnly available if PIPE_CAP_TGSI_TEXCOORD is exposed !
3120848b8605Smrg
3121848b8605SmrgFragment shader inputs may be labeled with TGSI_SEMANTIC_PCOORD to indicate
3122848b8605Smrgthat the register contains sprite coordinates in the form (x, y, 0, 1), if
3123848b8605Smrgthe current primitive is a point and point sprites are enabled. Otherwise,
3124848b8605Smrgthe contents of the register are undefined.
3125848b8605Smrg
3126848b8605SmrgThe intended use case for this semantic is gl_PointCoord.
3127848b8605Smrg
3128848b8605Smrg
3129848b8605SmrgTGSI_SEMANTIC_GENERIC
3130848b8605Smrg"""""""""""""""""""""
3131848b8605Smrg
3132848b8605SmrgAll vertex/fragment shader inputs/outputs not labeled with any other
3133848b8605Smrgsemantic label can be considered to be generic attributes.  Typical
3134848b8605Smrguses of generic inputs/outputs are texcoords and user-defined values.
3135848b8605Smrg
3136848b8605Smrg
3137848b8605SmrgTGSI_SEMANTIC_NORMAL
3138848b8605Smrg""""""""""""""""""""
3139848b8605Smrg
3140848b8605SmrgIndicates that a vertex shader input is a normal vector.  This is
3141848b8605Smrgtypically only used for legacy graphics APIs.
3142848b8605Smrg
3143848b8605Smrg
3144848b8605SmrgTGSI_SEMANTIC_FACE
3145848b8605Smrg""""""""""""""""""
3146848b8605Smrg
3147b8e80941SmrgThis label applies to fragment shader inputs (or system values,
3148b8e80941Smrgdepending on which one is supported by the driver) and indicates that
3149b8e80941Smrgthe register contains front/back-face information.
3150b8e80941Smrg
3151b8e80941SmrgIf it is an input, it will be a floating-point vector in the form (F, 0, 0, 1),
3152b8e80941Smrgwhere F will be positive when the fragment belongs to a front-facing polygon,
3153b8e80941Smrgand negative when the fragment belongs to a back-facing polygon.
3154b8e80941Smrg
3155b8e80941SmrgIf it is a system value, it will be an integer vector in the form (F, 0, 0, 1),
3156b8e80941Smrgwhere F is 0xffffffff when the fragment belongs to a front-facing polygon and
3157b8e80941Smrg0 when the fragment belongs to a back-facing polygon.
3158848b8605Smrg
3159848b8605Smrg
3160848b8605SmrgTGSI_SEMANTIC_EDGEFLAG
3161848b8605Smrg""""""""""""""""""""""
3162848b8605Smrg
3163848b8605SmrgFor vertex shaders, this sematic label indicates that an input or
3164848b8605Smrgoutput is a boolean edge flag.  The register layout is [F, x, x, x]
3165848b8605Smrgwhere F is 0.0 or 1.0 and x = don't care.  Normally, the vertex shader
3166848b8605Smrgsimply copies the edge flag input to the edgeflag output.
3167848b8605Smrg
3168848b8605SmrgEdge flags are used to control which lines or points are actually
3169848b8605Smrgdrawn when the polygon mode converts triangles/quads/polygons into
3170848b8605Smrgpoints or lines.
3171848b8605Smrg
3172848b8605Smrg
3173848b8605SmrgTGSI_SEMANTIC_STENCIL
3174848b8605Smrg"""""""""""""""""""""
3175848b8605Smrg
3176848b8605SmrgFor fragment shaders, this semantic label indicates that an output
3177848b8605Smrgis a writable stencil reference value. Only the Y component is writable.
3178848b8605SmrgThis allows the fragment shader to change the fragments stencilref value.
3179848b8605Smrg
3180848b8605Smrg
3181848b8605SmrgTGSI_SEMANTIC_VIEWPORT_INDEX
3182848b8605Smrg""""""""""""""""""""""""""""
3183848b8605Smrg
3184848b8605SmrgFor geometry shaders, this semantic label indicates that an output
3185848b8605Smrgcontains the index of the viewport (and scissor) to use.
3186b8e80941SmrgThis is an integer value, and only the X component is used.
3187b8e80941Smrg
3188b8e80941SmrgIf PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is
3189b8e80941Smrgsupported, then this semantic label can also be used in vertex or
3190b8e80941Smrgtessellation evaluation shaders, respectively. Only the value written in the
3191b8e80941Smrglast vertex processing stage is used.
3192848b8605Smrg
3193848b8605Smrg
3194848b8605SmrgTGSI_SEMANTIC_LAYER
3195848b8605Smrg"""""""""""""""""""
3196848b8605Smrg
3197848b8605SmrgFor geometry shaders, this semantic label indicates that an output
3198848b8605Smrgcontains the layer value to use for the color and depth/stencil surfaces.
3199b8e80941SmrgThis is an integer value, and only the X component is used.
3200b8e80941Smrg(Also known as rendertarget array index.)
3201848b8605Smrg
3202b8e80941SmrgIf PIPE_CAP_TGSI_VS_LAYER_VIEWPORT or PIPE_CAP_TGSI_TES_LAYER_VIEWPORT is
3203b8e80941Smrgsupported, then this semantic label can also be used in vertex or
3204b8e80941Smrgtessellation evaluation shaders, respectively. Only the value written in the
3205b8e80941Smrglast vertex processing stage is used.
3206848b8605Smrg
3207848b8605Smrg
3208848b8605SmrgTGSI_SEMANTIC_CLIPDIST
3209848b8605Smrg""""""""""""""""""""""
3210848b8605Smrg
3211b8e80941SmrgNote this covers clipping and culling distances.
3212b8e80941Smrg
3213848b8605SmrgWhen components of vertex elements are identified this way, these
3214848b8605Smrgvalues are each assumed to be a float32 signed distance to a plane.
3215b8e80941Smrg
3216b8e80941SmrgFor clip distances:
3217848b8605SmrgPrimitive setup only invokes rasterization on pixels for which
3218b8e80941Smrgthe interpolated plane distances are >= 0.
3219b8e80941Smrg
3220b8e80941SmrgFor cull distances:
3221b8e80941SmrgPrimitives will be completely discarded if the plane distance
3222b8e80941Smrgfor all of the vertices in the primitive are < 0.
3223b8e80941SmrgIf a vertex has a cull distance of NaN, that vertex counts as "out"
3224b8e80941Smrg(as if its < 0);
3225b8e80941Smrg
3226b8e80941SmrgMultiple clip/cull planes can be implemented simultaneously, by
3227b8e80941Smrgannotating multiple components of one or more vertex elements with
3228b8e80941Smrgthe above specified semantic.
3229b8e80941SmrgThe limits on both clip and cull distances are bound
3230848b8605Smrgby the PIPE_MAX_CLIP_OR_CULL_DISTANCE_COUNT define which defines
3231848b8605Smrgthe maximum number of components that can be used to hold the
3232848b8605Smrgdistances and by the PIPE_MAX_CLIP_OR_CULL_DISTANCE_ELEMENT_COUNT
3233848b8605Smrgwhich specifies the maximum number of registers which can be
3234848b8605Smrgannotated with those semantics.
3235b8e80941SmrgThe properties NUM_CLIPDIST_ENABLED and NUM_CULLDIST_ENABLED
3236b8e80941Smrgare used to divide up the 2 x vec4 space between clipping and culling.
3237848b8605Smrg
3238848b8605SmrgTGSI_SEMANTIC_SAMPLEID
3239848b8605Smrg""""""""""""""""""""""
3240848b8605Smrg
3241848b8605SmrgFor fragment shaders, this semantic label indicates that a system value
3242b8e80941Smrgcontains the current sample id (i.e. gl_SampleID) as an unsigned int.
3243b8e80941SmrgOnly the X component is used.  If per-sample shading is not enabled,
3244b8e80941Smrgthe result is (0, undef, undef, undef).
3245b8e80941Smrg
3246b8e80941SmrgNote that if the fragment shader uses this system value, the fragment
3247b8e80941Smrgshader is automatically executed at per sample frequency.
3248848b8605Smrg
3249848b8605SmrgTGSI_SEMANTIC_SAMPLEPOS
3250848b8605Smrg"""""""""""""""""""""""
3251848b8605Smrg
3252b8e80941SmrgFor fragment shaders, this semantic label indicates that a system
3253b8e80941Smrgvalue contains the current sample's position as float4(x, y, undef, undef)
3254b8e80941Smrgin the render target (i.e.  gl_SamplePosition) when per-fragment shading
3255b8e80941Smrgis in effect.  Position values are in the range [0, 1] where 0.5 is
3256b8e80941Smrgthe center of the fragment.
3257b8e80941Smrg
3258b8e80941SmrgNote that if the fragment shader uses this system value, the fragment
3259b8e80941Smrgshader is automatically executed at per sample frequency.
3260848b8605Smrg
3261848b8605SmrgTGSI_SEMANTIC_SAMPLEMASK
3262848b8605Smrg""""""""""""""""""""""""
3263848b8605Smrg
3264b8e80941SmrgFor fragment shaders, this semantic label can be applied to either a
3265b8e80941Smrgshader system value input or output.
3266b8e80941Smrg
3267b8e80941SmrgFor a system value, the sample mask indicates the set of samples covered by
3268b8e80941Smrgthe current primitive.  If MSAA is not enabled, the value is (1, 0, 0, 0).
3269b8e80941Smrg
3270b8e80941SmrgFor an output, the sample mask is used to disable further sample processing.
3271b8e80941Smrg
3272b8e80941SmrgFor both, the register type is uint[4] but only the X component is used
3273b8e80941Smrg(i.e. gl_SampleMask[0]). Each bit corresponds to one sample position (up
3274b8e80941Smrgto 32x MSAA is supported).
3275848b8605Smrg
3276848b8605SmrgTGSI_SEMANTIC_INVOCATIONID
3277848b8605Smrg""""""""""""""""""""""""""
3278848b8605Smrg
3279848b8605SmrgFor geometry shaders, this semantic label indicates that a system value
3280b8e80941Smrgcontains the current invocation id (i.e. gl_InvocationID).
3281b8e80941SmrgThis is an integer value, and only the X component is used.
3282b8e80941Smrg
3283b8e80941SmrgTGSI_SEMANTIC_INSTANCEID
3284b8e80941Smrg""""""""""""""""""""""""
3285b8e80941Smrg
3286b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains
3287b8e80941Smrgthe current instance id (i.e. gl_InstanceID). It does not include the base
3288b8e80941Smrginstance. This is an integer value, and only the X component is used.
3289b8e80941Smrg
3290b8e80941SmrgTGSI_SEMANTIC_VERTEXID
3291b8e80941Smrg""""""""""""""""""""""
3292b8e80941Smrg
3293b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains
3294b8e80941Smrgthe current vertex id (i.e. gl_VertexID). It does (unlike in d3d10) include the
3295b8e80941Smrgbase vertex. This is an integer value, and only the X component is used.
3296b8e80941Smrg
3297b8e80941SmrgTGSI_SEMANTIC_VERTEXID_NOBASE
3298b8e80941Smrg"""""""""""""""""""""""""""""""
3299b8e80941Smrg
3300b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains
3301b8e80941Smrgthe current vertex id without including the base vertex (this corresponds to
3302b8e80941Smrgd3d10 vertex id, so TGSI_SEMANTIC_VERTEXID_NOBASE + TGSI_SEMANTIC_BASEVERTEX
3303b8e80941Smrg== TGSI_SEMANTIC_VERTEXID). This is an integer value, and only the X component
3304b8e80941Smrgis used.
3305b8e80941Smrg
3306b8e80941SmrgTGSI_SEMANTIC_BASEVERTEX
3307b8e80941Smrg""""""""""""""""""""""""
3308b8e80941Smrg
3309b8e80941SmrgFor vertex shaders, this semantic label indicates that a system value contains
3310b8e80941Smrgthe base vertex (i.e. gl_BaseVertex). Note that for non-indexed draw calls,
3311b8e80941Smrgthis contains the first (or start) value instead.
3312b8e80941SmrgThis is an integer value, and only the X component is used.
3313b8e80941Smrg
3314b8e80941SmrgTGSI_SEMANTIC_PRIMID
3315b8e80941Smrg""""""""""""""""""""
3316b8e80941Smrg
3317b8e80941SmrgFor geometry and fragment shaders, this semantic label indicates the value
3318b8e80941Smrgcontains the primitive id (i.e. gl_PrimitiveID). This is an integer value,
3319b8e80941Smrgand only the X component is used.
3320b8e80941SmrgFIXME: This right now can be either a ordinary input or a system value...
3321b8e80941Smrg
3322b8e80941Smrg
3323b8e80941SmrgTGSI_SEMANTIC_PATCH
3324b8e80941Smrg"""""""""""""""""""
3325b8e80941Smrg
3326b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates a
3327b8e80941Smrggeneric per-patch attribute. Such semantics will not implicitly be per-vertex
3328b8e80941Smrgarrays.
3329b8e80941Smrg
3330b8e80941SmrgTGSI_SEMANTIC_TESSCOORD
3331b8e80941Smrg"""""""""""""""""""""""
3332b8e80941Smrg
3333b8e80941SmrgFor tessellation evaluation shaders, this semantic label indicates the
3334b8e80941Smrgcoordinates of the vertex being processed. This is available in XYZ; W is
3335b8e80941Smrgundefined.
3336b8e80941Smrg
3337b8e80941SmrgTGSI_SEMANTIC_TESSOUTER
3338b8e80941Smrg"""""""""""""""""""""""
3339b8e80941Smrg
3340b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates the
3341b8e80941Smrgouter tessellation levels of the patch. Isoline tessellation will only have XY
3342b8e80941Smrgdefined, triangle will have XYZ and quads will have XYZW defined. This
3343b8e80941Smrgcorresponds to gl_TessLevelOuter.
3344b8e80941Smrg
3345b8e80941SmrgTGSI_SEMANTIC_TESSINNER
3346b8e80941Smrg"""""""""""""""""""""""
3347b8e80941Smrg
3348b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates the
3349b8e80941Smrginner tessellation levels of the patch. The X value is only defined for
3350b8e80941Smrgtriangle tessellation, while quads will have XY defined. This is entirely
3351b8e80941Smrgundefined for isoline tessellation.
3352b8e80941Smrg
3353b8e80941SmrgTGSI_SEMANTIC_VERTICESIN
3354b8e80941Smrg""""""""""""""""""""""""
3355b8e80941Smrg
3356b8e80941SmrgFor tessellation evaluation/control shaders, this semantic label indicates the
3357b8e80941Smrgnumber of vertices provided in the input patch. Only the X value is defined.
3358b8e80941Smrg
3359b8e80941SmrgTGSI_SEMANTIC_HELPER_INVOCATION
3360b8e80941Smrg"""""""""""""""""""""""""""""""
3361b8e80941Smrg
3362b8e80941SmrgFor fragment shaders, this semantic indicates whether the current
3363b8e80941Smrginvocation is covered or not. Helper invocations are created in order
3364b8e80941Smrgto properly compute derivatives, however it may be desirable to skip
3365b8e80941Smrgsome of the logic in those cases. See ``gl_HelperInvocation`` documentation.
3366b8e80941Smrg
3367b8e80941SmrgTGSI_SEMANTIC_BASEINSTANCE
3368b8e80941Smrg""""""""""""""""""""""""""
3369b8e80941Smrg
3370b8e80941SmrgFor vertex shaders, the base instance argument supplied for this
3371b8e80941Smrgdraw. This is an integer value, and only the X component is used.
3372b8e80941Smrg
3373b8e80941SmrgTGSI_SEMANTIC_DRAWID
3374b8e80941Smrg""""""""""""""""""""
3375b8e80941Smrg
3376b8e80941SmrgFor vertex shaders, the zero-based index of the current draw in a
3377b8e80941Smrg``glMultiDraw*`` invocation. This is an integer value, and only the X
3378b8e80941Smrgcomponent is used.
3379b8e80941Smrg
3380b8e80941Smrg
3381b8e80941SmrgTGSI_SEMANTIC_WORK_DIM
3382b8e80941Smrg""""""""""""""""""""""
3383b8e80941Smrg
3384b8e80941SmrgFor compute shaders started via opencl this retrieves the work_dim
3385b8e80941Smrgparameter to the clEnqueueNDRangeKernel call with which the shader
3386b8e80941Smrgwas started.
3387b8e80941Smrg
3388b8e80941Smrg
3389b8e80941SmrgTGSI_SEMANTIC_GRID_SIZE
3390b8e80941Smrg"""""""""""""""""""""""
3391b8e80941Smrg
3392b8e80941SmrgFor compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3393b8e80941Smrgof a grid of thread blocks.
3394b8e80941Smrg
3395b8e80941Smrg
3396b8e80941SmrgTGSI_SEMANTIC_BLOCK_ID
3397b8e80941Smrg""""""""""""""""""""""
3398b8e80941Smrg
3399b8e80941SmrgFor compute shaders, this semantic indicates the (x, y, z) coordinates of the
3400b8e80941Smrgcurrent block inside of the grid.
3401b8e80941Smrg
3402b8e80941Smrg
3403b8e80941SmrgTGSI_SEMANTIC_BLOCK_SIZE
3404b8e80941Smrg""""""""""""""""""""""""
3405b8e80941Smrg
3406b8e80941SmrgFor compute shaders, this semantic indicates the maximum (x, y, z) dimensions
3407b8e80941Smrgof a block in threads.
3408b8e80941Smrg
3409b8e80941Smrg
3410b8e80941SmrgTGSI_SEMANTIC_THREAD_ID
3411b8e80941Smrg"""""""""""""""""""""""
3412b8e80941Smrg
3413b8e80941SmrgFor compute shaders, this semantic indicates the (x, y, z) coordinates of the
3414b8e80941Smrgcurrent thread inside of the block.
3415b8e80941Smrg
3416b8e80941Smrg
3417b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_SIZE
3418b8e80941Smrg"""""""""""""""""""""""""""
3419b8e80941Smrg
3420b8e80941SmrgThis semantic indicates the subgroup size for the current invocation. This is
3421b8e80941Smrgan integer of at most 64, as it indicates the width of lanemasks. It does not
3422b8e80941Smrgdepend on the number of invocations that are active.
3423b8e80941Smrg
3424b8e80941Smrg
3425b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_INVOCATION
3426b8e80941Smrg"""""""""""""""""""""""""""""""""
3427b8e80941Smrg
3428b8e80941SmrgThe index of the current invocation within its subgroup.
3429b8e80941Smrg
3430b8e80941Smrg
3431b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_EQ_MASK
3432b8e80941Smrg""""""""""""""""""""""""""""""
3433b8e80941Smrg
3434b8e80941SmrgA bit mask of ``bit index == TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3435b8e80941Smrg``1 << subgroup_invocation`` in arbitrary precision arithmetic.
3436b8e80941Smrg
3437b8e80941Smrg
3438b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_GE_MASK
3439b8e80941Smrg""""""""""""""""""""""""""""""
3440b8e80941Smrg
3441b8e80941SmrgA bit mask of ``bit index >= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3442b8e80941Smrg``((1 << (subgroup_size - subgroup_invocation)) - 1) << subgroup_invocation``
3443b8e80941Smrgin arbitrary precision arithmetic.
3444b8e80941Smrg
3445b8e80941Smrg
3446b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_GT_MASK
3447b8e80941Smrg""""""""""""""""""""""""""""""
3448b8e80941Smrg
3449b8e80941SmrgA bit mask of ``bit index > TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3450b8e80941Smrg``((1 << (subgroup_size - subgroup_invocation - 1)) - 1) << (subgroup_invocation + 1)``
3451b8e80941Smrgin arbitrary precision arithmetic.
3452b8e80941Smrg
3453b8e80941Smrg
3454b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_LE_MASK
3455b8e80941Smrg""""""""""""""""""""""""""""""
3456b8e80941Smrg
3457b8e80941SmrgA bit mask of ``bit index <= TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3458b8e80941Smrg``(1 << (subgroup_invocation + 1)) - 1`` in arbitrary precision arithmetic.
3459b8e80941Smrg
3460b8e80941Smrg
3461b8e80941SmrgTGSI_SEMANTIC_SUBGROUP_LT_MASK
3462b8e80941Smrg""""""""""""""""""""""""""""""
3463b8e80941Smrg
3464b8e80941SmrgA bit mask of ``bit index < TGSI_SEMANTIC_SUBGROUP_INVOCATION``, i.e.
3465b8e80941Smrg``(1 << subgroup_invocation) - 1`` in arbitrary precision arithmetic.
3466b8e80941Smrg
3467848b8605Smrg
3468848b8605SmrgDeclaration Interpolate
3469848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^
3470848b8605Smrg
3471848b8605SmrgThis token is only valid for fragment shader INPUT declarations.
3472848b8605Smrg
3473848b8605SmrgThe Interpolate field specifes the way input is being interpolated by
3474848b8605Smrgthe rasteriser and is one of TGSI_INTERPOLATE_*.
3475848b8605Smrg
3476848b8605SmrgThe Location field specifies the location inside the pixel that the
3477848b8605Smrginterpolation should be done at, one of ``TGSI_INTERPOLATE_LOC_*``. Note that
3478848b8605Smrgwhen per-sample shading is enabled, the implementation may choose to
3479848b8605Smrginterpolate at the sample irrespective of the Location field.
3480848b8605Smrg
3481848b8605SmrgThe CylindricalWrap bitfield specifies which register components
3482848b8605Smrgshould be subject to cylindrical wrapping when interpolating by the
3483848b8605Smrgrasteriser. If TGSI_CYLINDRICAL_WRAP_X is set to 1, the X component
3484848b8605Smrgshould be interpolated according to cylindrical wrapping rules.
3485848b8605Smrg
3486848b8605Smrg
3487848b8605SmrgDeclaration Sampler View
3488848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
3489848b8605Smrg
3490848b8605SmrgFollows Declaration token if file is TGSI_FILE_SAMPLER_VIEW.
3491848b8605Smrg
3492848b8605SmrgDCL SVIEW[#], resource, type(s)
3493848b8605Smrg
3494848b8605SmrgDeclares a shader input sampler view and assigns it to a SVIEW[#]
3495848b8605Smrgregister.
3496848b8605Smrg
3497848b8605Smrgresource can be one of BUFFER, 1D, 2D, 3D, 1DArray and 2DArray.
3498848b8605Smrg
3499848b8605Smrgtype must be 1 or 4 entries (if specifying on a per-component
3500848b8605Smrglevel) out of UNORM, SNORM, SINT, UINT and FLOAT.
3501848b8605Smrg
3502b8e80941SmrgFor TEX\* style texture sample opcodes (as opposed to SAMPLE\* opcodes
3503b8e80941Smrgwhich take an explicit SVIEW[#] source register), there may be optionally
3504b8e80941SmrgSVIEW[#] declarations.  In this case, the SVIEW index is implied by the
3505b8e80941SmrgSAMP index, and there must be a corresponding SVIEW[#] declaration for
3506b8e80941Smrgeach SAMP[#] declaration.  Drivers are free to ignore this if they wish.
3507b8e80941SmrgBut note in particular that some drivers need to know the sampler type
3508b8e80941Smrg(float/int/unsigned) in order to generate the correct code, so cases
3509b8e80941Smrgwhere integer textures are sampled, SVIEW[#] declarations should be
3510b8e80941Smrgused.
3511b8e80941Smrg
3512b8e80941SmrgNOTE: It is NOT legal to mix SAMPLE\* style opcodes and TEX\* opcodes
3513b8e80941Smrgin the same shader.
3514848b8605Smrg
3515848b8605SmrgDeclaration Resource
3516848b8605Smrg^^^^^^^^^^^^^^^^^^^^
3517848b8605Smrg
3518848b8605SmrgFollows Declaration token if file is TGSI_FILE_RESOURCE.
3519848b8605Smrg
3520848b8605SmrgDCL RES[#], resource [, WR] [, RAW]
3521848b8605Smrg
3522848b8605SmrgDeclares a shader input resource and assigns it to a RES[#]
3523848b8605Smrgregister.
3524848b8605Smrg
3525848b8605Smrgresource can be one of BUFFER, 1D, 2D, 3D, CUBE, 1DArray and
3526848b8605Smrg2DArray.
3527848b8605Smrg
3528848b8605SmrgIf the RAW keyword is not specified, the texture data will be
3529848b8605Smrgsubject to conversion, swizzling and scaling as required to yield
3530848b8605Smrgthe specified data type from the physical data format of the bound
3531848b8605Smrgresource.
3532848b8605Smrg
3533848b8605SmrgIf the RAW keyword is specified, no channel conversion will be
3534848b8605Smrgperformed: the values read for each of the channels (X,Y,Z,W) will
3535848b8605Smrgcorrespond to consecutive words in the same order and format
3536848b8605Smrgthey're found in memory.  No element-to-address conversion will be
3537848b8605Smrgperformed either: the value of the provided X coordinate will be
3538848b8605Smrginterpreted in byte units instead of texel units.  The result of
3539848b8605Smrgaccessing a misaligned address is undefined.
3540848b8605Smrg
3541848b8605SmrgUsage of the STORE opcode is only allowed if the WR (writable) flag
3542848b8605Smrgis set.
3543848b8605Smrg
3544b8e80941SmrgHardware Atomic Register File
3545b8e80941Smrg^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3546b8e80941Smrg
3547b8e80941SmrgHardware atomics are declared as a 2D array with an optional array id.
3548b8e80941Smrg
3549b8e80941SmrgThe first member of the dimension is the buffer resource the atomic
3550b8e80941Smrgis located in.
3551b8e80941SmrgThe second member is a range into the buffer resource, either for
3552b8e80941Smrgone or multiple counters. If this is an array, the declaration will have
3553b8e80941Smrgan unique array id.
3554b8e80941Smrg
3555b8e80941SmrgEach counter is 4 bytes in size, and index and ranges are in counters not bytes.
3556b8e80941SmrgDCL HWATOMIC[0][0]
3557b8e80941SmrgDCL HWATOMIC[0][1]
3558b8e80941Smrg
3559b8e80941SmrgThis declares two atomics, one at the start of the buffer and one in the
3560b8e80941Smrgsecond 4 bytes.
3561b8e80941Smrg
3562b8e80941SmrgDCL HWATOMIC[0][0]
3563b8e80941SmrgDCL HWATOMIC[1][0]
3564b8e80941SmrgDCL HWATOMIC[1][1..3], ARRAY(1)
3565b8e80941Smrg
3566b8e80941SmrgThis declares 5 atomics, one in buffer 0 at 0,
3567b8e80941Smrgone in buffer 1 at 0, and an array of 3 atomics in
3568b8e80941Smrgthe buffer 1, starting at 1.
3569848b8605Smrg
3570848b8605SmrgProperties
3571848b8605Smrg^^^^^^^^^^^^^^^^^^^^^^^^
3572848b8605Smrg
3573848b8605SmrgProperties are general directives that apply to the whole TGSI program.
3574848b8605Smrg
3575848b8605SmrgFS_COORD_ORIGIN
3576848b8605Smrg"""""""""""""""
3577848b8605Smrg
3578848b8605SmrgSpecifies the fragment shader TGSI_SEMANTIC_POSITION coordinate origin.
3579848b8605SmrgThe default value is UPPER_LEFT.
3580848b8605Smrg
3581848b8605SmrgIf UPPER_LEFT, the position will be (0,0) at the upper left corner and
3582848b8605Smrgincrease downward and rightward.
3583848b8605SmrgIf LOWER_LEFT, the position will be (0,0) at the lower left corner and
3584848b8605Smrgincrease upward and rightward.
3585848b8605Smrg
3586848b8605SmrgOpenGL defaults to LOWER_LEFT, and is configurable with the
3587848b8605SmrgGL_ARB_fragment_coord_conventions extension.
3588848b8605Smrg
3589848b8605SmrgDirectX 9/10 use UPPER_LEFT.
3590848b8605Smrg
3591848b8605SmrgFS_COORD_PIXEL_CENTER
3592848b8605Smrg"""""""""""""""""""""
3593848b8605Smrg
3594848b8605SmrgSpecifies the fragment shader TGSI_SEMANTIC_POSITION pixel center convention.
3595848b8605SmrgThe default value is HALF_INTEGER.
3596848b8605Smrg
3597848b8605SmrgIf HALF_INTEGER, the fractionary part of the position will be 0.5
3598848b8605SmrgIf INTEGER, the fractionary part of the position will be 0.0
3599848b8605Smrg
3600848b8605SmrgNote that this does not affect the set of fragments generated by
3601848b8605Smrgrasterization, which is instead controlled by half_pixel_center in the
3602848b8605Smrgrasterizer.
3603848b8605Smrg
3604848b8605SmrgOpenGL defaults to HALF_INTEGER, and is configurable with the
3605848b8605SmrgGL_ARB_fragment_coord_conventions extension.
3606848b8605Smrg
3607848b8605SmrgDirectX 9 uses INTEGER.
3608848b8605SmrgDirectX 10 uses HALF_INTEGER.
3609848b8605Smrg
3610848b8605SmrgFS_COLOR0_WRITES_ALL_CBUFS
3611848b8605Smrg""""""""""""""""""""""""""
3612848b8605SmrgSpecifies that writes to the fragment shader color 0 are replicated to all
3613848b8605Smrgbound cbufs. This facilitates OpenGL's fragColor output vs fragData[0] where
3614848b8605SmrgfragData is directed to a single color buffer, but fragColor is broadcast.
3615848b8605Smrg
3616848b8605SmrgVS_PROHIBIT_UCPS
3617848b8605Smrg""""""""""""""""""""""""""
3618848b8605SmrgIf this property is set on the program bound to the shader stage before the
3619848b8605Smrgfragment shader, user clip planes should have no effect (be disabled) even if
3620848b8605Smrgthat shader does not write to any clip distance outputs and the rasterizer's
3621848b8605Smrgclip_plane_enable is non-zero.
3622848b8605SmrgThis property is only supported by drivers that also support shader clip
3623848b8605Smrgdistance outputs.
3624848b8605SmrgThis is useful for APIs that don't have UCPs and where clip distances written
3625848b8605Smrgby a shader cannot be disabled.
3626848b8605Smrg
3627848b8605SmrgGS_INVOCATIONS
3628848b8605Smrg""""""""""""""
3629848b8605Smrg
3630848b8605SmrgSpecifies the number of times a geometry shader should be executed for each
3631848b8605Smrginput primitive. Each invocation will have a different
3632848b8605SmrgTGSI_SEMANTIC_INVOCATIONID system value set. If not specified, assumed to
3633848b8605Smrgbe 1.
3634848b8605Smrg
3635848b8605SmrgVS_WINDOW_SPACE_POSITION
3636848b8605Smrg""""""""""""""""""""""""""
3637848b8605SmrgIf this property is set on the vertex shader, the TGSI_SEMANTIC_POSITION output
3638848b8605Smrgis assumed to contain window space coordinates.
3639848b8605SmrgDivision of X,Y,Z by W and the viewport transformation are disabled, and 1/W is
3640848b8605Smrgdirectly taken from the 4-th component of the shader output.
3641848b8605SmrgNaturally, clipping is not performed on window coordinates either.
3642848b8605SmrgThe effect of this property is undefined if a geometry or tessellation shader
3643848b8605Smrgare in use.
3644848b8605Smrg
3645b8e80941SmrgTCS_VERTICES_OUT
3646b8e80941Smrg""""""""""""""""
3647b8e80941Smrg
3648b8e80941SmrgThe number of vertices written by the tessellation control shader. This
3649b8e80941Smrgeffectively defines the patch input size of the tessellation evaluation shader
3650b8e80941Smrgas well.
3651b8e80941Smrg
3652b8e80941SmrgTES_PRIM_MODE
3653b8e80941Smrg"""""""""""""
3654b8e80941Smrg
3655b8e80941SmrgThis sets the tessellation primitive mode, one of ``PIPE_PRIM_TRIANGLES``,
3656b8e80941Smrg``PIPE_PRIM_QUADS``, or ``PIPE_PRIM_LINES``. (Unlike in GL, there is no
3657b8e80941Smrgseparate isolines settings, the regular lines is assumed to mean isolines.)
3658b8e80941Smrg
3659b8e80941SmrgTES_SPACING
3660b8e80941Smrg"""""""""""
3661b8e80941Smrg
3662b8e80941SmrgThis sets the spacing mode of the tessellation generator, one of
3663b8e80941Smrg``PIPE_TESS_SPACING_*``.
3664b8e80941Smrg
3665b8e80941SmrgTES_VERTEX_ORDER_CW
3666b8e80941Smrg"""""""""""""""""""
3667b8e80941Smrg
3668b8e80941SmrgThis sets the vertex order to be clockwise if the value is 1, or
3669b8e80941Smrgcounter-clockwise if set to 0.
3670b8e80941Smrg
3671b8e80941SmrgTES_POINT_MODE
3672b8e80941Smrg""""""""""""""
3673b8e80941Smrg
3674b8e80941SmrgIf set to a non-zero value, this turns on point mode for the tessellator,
3675b8e80941Smrgwhich means that points will be generated instead of primitives.
3676b8e80941Smrg
3677b8e80941SmrgNUM_CLIPDIST_ENABLED
3678b8e80941Smrg""""""""""""""""""""
3679b8e80941Smrg
3680b8e80941SmrgHow many clip distance scalar outputs are enabled.
3681b8e80941Smrg
3682b8e80941SmrgNUM_CULLDIST_ENABLED
3683b8e80941Smrg""""""""""""""""""""
3684b8e80941Smrg
3685b8e80941SmrgHow many cull distance scalar outputs are enabled.
3686b8e80941Smrg
3687b8e80941SmrgFS_EARLY_DEPTH_STENCIL
3688b8e80941Smrg""""""""""""""""""""""
3689b8e80941Smrg
3690b8e80941SmrgWhether depth test, stencil test, and occlusion query should run before
3691b8e80941Smrgthe fragment shader (regardless of fragment shader side effects). Corresponds
3692b8e80941Smrgto GLSL early_fragment_tests.
3693b8e80941Smrg
3694b8e80941SmrgNEXT_SHADER
3695b8e80941Smrg"""""""""""
3696b8e80941Smrg
3697b8e80941SmrgWhich shader stage will MOST LIKELY follow after this shader when the shader
3698b8e80941Smrgis bound. This is only a hint to the driver and doesn't have to be precise.
3699b8e80941SmrgOnly set for VS and TES.
3700b8e80941Smrg
3701b8e80941SmrgCS_FIXED_BLOCK_WIDTH / HEIGHT / DEPTH
3702b8e80941Smrg"""""""""""""""""""""""""""""""""""""
3703b8e80941Smrg
3704b8e80941SmrgThreads per block in each dimension, if known at compile time. If the block size
3705b8e80941Smrgis known all three should be at least 1. If it is unknown they should all be set
3706b8e80941Smrgto 0 or not set.
3707b8e80941Smrg
3708b8e80941SmrgMUL_ZERO_WINS
3709b8e80941Smrg"""""""""""""
3710b8e80941Smrg
3711b8e80941SmrgThe MUL TGSI operation (FP32 multiplication) will return 0 if either
3712b8e80941Smrgof the operands are equal to 0. That means that 0 * Inf = 0. This
3713b8e80941Smrgshould be set the same way for an entire pipeline. Note that this
3714b8e80941Smrgapplies not only to the literal MUL TGSI opcode, but all FP32
3715b8e80941Smrgmultiplications implied by other operations, such as MAD, FMA, DP2,
3716b8e80941SmrgDP3, DP4, DST, LOG, LRP, and possibly others. If there is a
3717b8e80941Smrgmismatch between shaders, then it is unspecified whether this behavior
3718b8e80941Smrgwill be enabled.
3719b8e80941Smrg
3720b8e80941SmrgFS_POST_DEPTH_COVERAGE
3721b8e80941Smrg""""""""""""""""""""""
3722b8e80941Smrg
3723b8e80941SmrgWhen enabled, the input for TGSI_SEMANTIC_SAMPLEMASK will exclude samples
3724b8e80941Smrgthat have failed the depth/stencil tests. This is only valid when
3725b8e80941SmrgFS_EARLY_DEPTH_STENCIL is also specified.
3726b8e80941Smrg
3727b8e80941Smrg
3728848b8605SmrgTexture Sampling and Texture Formats
3729848b8605Smrg------------------------------------
3730848b8605Smrg
3731848b8605SmrgThis table shows how texture image components are returned as (x,y,z,w) tuples
3732848b8605Smrgby TGSI texture instructions, such as :opcode:`TEX`, :opcode:`TXD`, and
3733848b8605Smrg:opcode:`TXP`. For reference, OpenGL and Direct3D conventions are shown as
3734848b8605Smrgwell.
3735848b8605Smrg
3736848b8605Smrg+--------------------+--------------+--------------------+--------------+
3737848b8605Smrg| Texture Components | Gallium      | OpenGL             | Direct3D 9   |
3738848b8605Smrg+====================+==============+====================+==============+
3739848b8605Smrg| R                  | (r, 0, 0, 1) | (r, 0, 0, 1)       | (r, 1, 1, 1) |
3740848b8605Smrg+--------------------+--------------+--------------------+--------------+
3741848b8605Smrg| RG                 | (r, g, 0, 1) | (r, g, 0, 1)       | (r, g, 1, 1) |
3742848b8605Smrg+--------------------+--------------+--------------------+--------------+
3743848b8605Smrg| RGB                | (r, g, b, 1) | (r, g, b, 1)       | (r, g, b, 1) |
3744848b8605Smrg+--------------------+--------------+--------------------+--------------+
3745848b8605Smrg| RGBA               | (r, g, b, a) | (r, g, b, a)       | (r, g, b, a) |
3746848b8605Smrg+--------------------+--------------+--------------------+--------------+
3747848b8605Smrg| A                  | (0, 0, 0, a) | (0, 0, 0, a)       | (0, 0, 0, a) |
3748848b8605Smrg+--------------------+--------------+--------------------+--------------+
3749848b8605Smrg| L                  | (l, l, l, 1) | (l, l, l, 1)       | (l, l, l, 1) |
3750848b8605Smrg+--------------------+--------------+--------------------+--------------+
3751848b8605Smrg| LA                 | (l, l, l, a) | (l, l, l, a)       | (l, l, l, a) |
3752848b8605Smrg+--------------------+--------------+--------------------+--------------+
3753848b8605Smrg| I                  | (i, i, i, i) | (i, i, i, i)       | N/A          |
3754848b8605Smrg+--------------------+--------------+--------------------+--------------+
3755848b8605Smrg| UV                 | XXX TBD      | (0, 0, 0, 1)       | (u, v, 1, 1) |
3756848b8605Smrg|                    |              | [#envmap-bumpmap]_ |              |
3757848b8605Smrg+--------------------+--------------+--------------------+--------------+
3758848b8605Smrg| Z                  | XXX TBD      | (z, z, z, 1)       | (0, z, 0, 1) |
3759848b8605Smrg|                    |              | [#depth-tex-mode]_ |              |
3760848b8605Smrg+--------------------+--------------+--------------------+--------------+
3761848b8605Smrg| S                  | (s, s, s, s) | unknown            | unknown      |
3762848b8605Smrg+--------------------+--------------+--------------------+--------------+
3763848b8605Smrg
3764848b8605Smrg.. [#envmap-bumpmap] http://www.opengl.org/registry/specs/ATI/envmap_bumpmap.txt
3765848b8605Smrg.. [#depth-tex-mode] the default is (z, z, z, 1) but may also be (0, 0, 0, z)
3766848b8605Smrg   or (z, z, z, z) depending on the value of GL_DEPTH_TEXTURE_MODE.
3767