17ec681f3SmrgTiling
27ec681f3Smrg======
37ec681f3Smrg
47ec681f3SmrgThe naive view of an image in memory is that the pixels are stored one after
57ec681f3Smrganother in memory usually in an X-major order.  An image that is arranged in
67ec681f3Smrgthis way is called "linear".  Linear images, while easy to reason about, can
77ec681f3Smrghave very bad cache locality.  Graphics operations tend to act on pixels that
87ec681f3Smrgare close together in 2-D euclidean space.  If you move one pixel to the right
97ec681f3Smrgor left in a linear image, you only move a few bytes to one side or the other
107ec681f3Smrgin memory.  However, if you move one pixel up or down you can end up kilobytes
117ec681f3Smrgor even megabytes away.
127ec681f3Smrg
137ec681f3SmrgTiling (sometimes referred to as swizzling) is a method of re-arranging the
147ec681f3Smrgpixels of a surface so that pixels which are close in 2-D euclidean space are
157ec681f3Smrglikely to be close in memory.
167ec681f3Smrg
177ec681f3SmrgBasics
187ec681f3Smrg------
197ec681f3Smrg
207ec681f3SmrgThe basic idea of a tiled image is that the image is first divided into
217ec681f3Smrgtwo-dimensional blocks or tiles.  Each tile takes up a chunk of contiguous
227ec681f3Smrgmemory and the tiles are arranged like pixels in linear surface.  This is best
237ec681f3Smrgdemonstrated with a specific example. Suppose we have a RGBA8888 X-tiled
247ec681f3Smrgsurface on Intel graphics.  Then the surface is divided into 128x8 pixel tiles
257ec681f3Smrgeach of which is 4KB of memory.  Within each tile, the pixels are laid out like
267ec681f3Smrga 128x8 linear image.  The tiles themselves are laid out row-major in memory
277ec681f3Smrglike giant pixels.  This means that, as long as you don't leave your 128x8
287ec681f3Smrgtile, you can move in both dimensions without leaving the same 4K page in
297ec681f3Smrgmemory.
307ec681f3Smrg
317ec681f3Smrg.. image:: tiling-basic.svg
327ec681f3Smrg   :alt: Example of an X-tiled image
337ec681f3Smrg
347ec681f3SmrgYou can, however do even better than this.  Suppose that same image is,
357ec681f3Smrginstead, Y-tiled.  Then the surface is divided into 32x32 pixel tiles each of
367ec681f3Smrgwhich is 4KB of memory.  Within a tile, each 64B cache line corresponds to 4x4
377ec681f3Smrgpixel region of the image (you can think of it as a tile within a tile).  This
387ec681f3Smrgmeans that very small deviations don't even leave the cache line.  This added
397ec681f3Smrgbit of pixel shuffling is known to have a substantial performance impact in
407ec681f3Smrgmost real-world applications.
417ec681f3Smrg
427ec681f3SmrgIntel graphics has several different tiling formats that we'll discuss in
437ec681f3Smrgdetail in later sections.  The most commonly used as of the writing of this
447ec681f3Smrgchapter is Y-tiling.  In all tiling formats the basic principal is the same:
457ec681f3SmrgThe image is divided into tiles of a particular size and, within those tiles,
467ec681f3Smrgthe data is re-arranged (or swizzled) based on a particular pattern.  A tile
477ec681f3Smrgsize will always be specified in bytes by rows and the actual X-dimension of
487ec681f3Smrgthe tile in elements depends on the size of the element in bytes.
497ec681f3Smrg
507ec681f3SmrgBit-6 Swizzling
517ec681f3Smrg^^^^^^^^^^^^^^^
527ec681f3Smrg
537ec681f3SmrgOn some older hardware, there is an additional address swizzle that is applied
547ec681f3Smrgon top of the tiling format.  This has been removed starting with Broadwell
557ec681f3Smrgbecause, as it says in the Broadwell PRM Vol 5 "Tiling Algorithm" (p. 17):
567ec681f3Smrg
577ec681f3Smrg   Address Swizzling for Tiled-Surfaces is no longer used because the main
587ec681f3Smrg   memory controller has a more effective address swizzling algorithm.
597ec681f3Smrg
607ec681f3SmrgWhether or not swizzling is enabled depends on the memory configuration of the
617ec681f3Smrgsystem.  Generally, systems with dual-channel RAM have swizzling enabled and
627ec681f3Smrgsingle-channel do not.  Supposedly, this swizzling allows for better balancing
637ec681f3Smrgbetween the two memory channels and increases performance. Because it depends
647ec681f3Smrgon the memory configuration which may change from one boot to the next, it
657ec681f3Smrgrequires a run-time check.
667ec681f3Smrg
677ec681f3SmrgThe best documentation for bit-6 swizzling can be found in the Haswell PRM Vol.
687ec681f3Smrg5 "Memory Views" in the section entitled "Address Swizzling for Tiled-Y
697ec681f3SmrgSurfaces".  It exists on older platforms but the docs get progressively worse
707ec681f3Smrgthe further you go back.
717ec681f3Smrg
727ec681f3SmrgISL Representation
737ec681f3Smrg------------------
747ec681f3Smrg
757ec681f3SmrgThe structure of any given tiling format is represented by ISL using the
767ec681f3Smrg:cpp:enum:`isl_tiling` enum and the :cpp:struct:`isl_tile_info` structure:
777ec681f3Smrg
787ec681f3Smrg.. doxygenenum:: isl_tiling
797ec681f3Smrg
807ec681f3Smrg.. doxygenfunction:: isl_tiling_get_info
817ec681f3Smrg
827ec681f3Smrg.. doxygenstruct:: isl_tile_info
837ec681f3Smrg   :members:
847ec681f3Smrg
857ec681f3SmrgThe `isl_tile_info` structure has two different sizes for a tile: a logical
867ec681f3Smrgsize in surface elements and a physical size in bytes.  In order to determine
877ec681f3Smrgthe proper logical size, the bits-per-block of the underlying format has to be
887ec681f3Smrgpassed into `isl_tiling_get_info`. The proper way to compute the size of an
897ec681f3Smrgimage in bytes given a width and height in elements is as follows:
907ec681f3Smrg
917ec681f3Smrg.. code-block:: c
927ec681f3Smrg
937ec681f3Smrg   uint32_t width_tl = DIV_ROUND_UP(width_el * (format_bpb / tile_info.format_bpb),
947ec681f3Smrg                                    tile_info.logical_extent_el.w);
957ec681f3Smrg   uint32_t height_tl = DIV_ROUND_UP(height_el, tile_info.logical_extent_el.h);
967ec681f3Smrg   uint32_t row_pitch = width_tl * tile_info.phys_extent_el.w;
977ec681f3Smrg   uint32_t size = height_tl * tile_info.phys_extent_el.h * row_pitch;
987ec681f3Smrg
997ec681f3SmrgIt is very important to note that there is no direct conversion between
1007ec681f3Smrg:cpp:member:`isl_tile_info::logical_extent_el` and
1017ec681f3Smrg:cpp:member:`isl_tile_info::phys_extent_B`.  It is tempting to assume that the
1027ec681f3Smrglogical and physical heights are the same and simply divide the width of
1037ec681f3Smrg:cpp:member:`isl_tile_info::phys_extent_B` by the size of the format (which is
1047ec681f3Smrgwhat the PRM does) to get :cpp:member:`isl_tile_info::logical_extent_el` but
1057ec681f3Smrgthis is not at all correct. Some tiling formats have logical and physical
1067ec681f3Smrgheights that differ and so no such calculation will work in general.  The
1077ec681f3Smrgeasiest case study for this is W-tiling. From the Sky Lake PRM Vol. 2d,
1087ec681f3Smrg"RENDER_SURFACE_STATE" (p. 427):
1097ec681f3Smrg
1107ec681f3Smrg   If the surface is a stencil buffer (and thus has Tile Mode set to
1117ec681f3Smrg   TILEMODE_WMAJOR), the pitch must be set to 2x the value computed based on
1127ec681f3Smrg   width, as the stencil buffer is stored with two rows interleaved.
1137ec681f3Smrg
1147ec681f3SmrgWhat does this mean?  Why are we multiplying the pitch by two?  What does it
1157ec681f3Smrgmean that "the stencil buffer is stored with two rows interleaved"?  The
1167ec681f3Smrgexplanation for all these questions is that a W-tile (which is only used for
1177ec681f3Smrgstencil) has a logical size of 64el x 64el but a physical size of 128B
1187ec681f3Smrgx 32rows.  In memory, a W-tile has the same footprint as a Y-tile (128B
1197ec681f3Smrgx 32rows) but every pair of rows in the stencil buffer is interleaved into
1207ec681f3Smrga single row of bytes yielding a two-dimensional area of 64el x 64el.  You can
1217ec681f3Smrgconsider this as its own tiling format or as a modification of Y-tiling.  The
1227ec681f3Smrginterpretation in the PRMs vary by hardware generation; on Sandy Bridge they
1237ec681f3Smrgsimply said it was Y-tiled but by Sky Lake there is almost no mention of
1247ec681f3SmrgY-tiling in connection with stencil buffers and they are always W-tiled. This
1257ec681f3Smrgmismatch between logical and physical tile sizes are also relevant for
1267ec681f3Smrghierarchical depth buffers as well as single-channel MCS and CCS buffers.
1277ec681f3Smrg
1287ec681f3SmrgX-tiling
1297ec681f3Smrg--------
1307ec681f3Smrg
1317ec681f3SmrgThe simplest tiling format available on Intel graphics (which has been
1327ec681f3Smrgavailable since gen4) is X-tiling.  An X-tile is 512B x 8rows and, within the
1337ec681f3Smrgtile, the data is arranged in an X-major linear fashion.  You can also look at
1347ec681f3SmrgX-tiling as being an 8x8 cache line grid where the cache lines are arranged
1357ec681f3SmrgX-major as follows:
1367ec681f3Smrg
1377ec681f3Smrg===== ===== ===== ===== ===== ===== ===== =====
1387ec681f3Smrg===== ===== ===== ===== ===== ===== ===== =====
1397ec681f3Smrg0x000 0x040 0x080 0x0c0 0x100 0x140 0x180 0x1c0
1407ec681f3Smrg0x200 0x240 0x280 0x2c0 0x300 0x340 0x380 0x3c0
1417ec681f3Smrg0x400 0x440 0x480 0x4c0 0x500 0x540 0x580 0x5c0
1427ec681f3Smrg0x600 0x640 0x680 0x6c0 0x700 0x740 0x780 0x7c0
1437ec681f3Smrg0x800 0x840 0x880 0x8c0 0x900 0x940 0x980 0x9c0
1447ec681f3Smrg0xa00 0xa40 0xa80 0xac0 0xb00 0xb40 0xb80 0xbc0
1457ec681f3Smrg0xc00 0xc40 0xc80 0xcc0 0xd00 0xd40 0xd80 0xdc0
1467ec681f3Smrg0xe00 0xe40 0xe80 0xec0 0xf00 0xf40 0xf80 0xfc0
1477ec681f3Smrg===== ===== ===== ===== ===== ===== ===== =====
1487ec681f3Smrg
1497ec681f3SmrgEach cache line represents a piece of a single row of pixels within the image.
1507ec681f3SmrgThe memory locations of two vertically adjacent pixels within the same X-tile
1517ec681f3Smrgalways differs by 512B or 8 cache lines.
1527ec681f3Smrg
1537ec681f3SmrgAs mentioned above, X-tiling is slower than Y-tiling (though still faster than
1547ec681f3Smrglinear).  However, until Sky Lake, the display scan-out hardware could only do
1557ec681f3SmrgX-tiling so we have historically used X-tiling for all window-system buffers
1567ec681f3Smrg(because X or a Wayland compositor may want to put it in a plane).
1577ec681f3Smrg
1587ec681f3SmrgBit-6 Swizzling
1597ec681f3Smrg^^^^^^^^^^^^^^^
1607ec681f3Smrg
1617ec681f3SmrgWhen bit-6 swizzling is enabled, bits 9 and 10 are XOR'd in with bit 6 of the
1627ec681f3Smrgtiled address:
1637ec681f3Smrg
1647ec681f3Smrg.. code-block:: c
1657ec681f3Smrg
1667ec681f3Smrg   addr[6] ^= addr[9] ^ addr[10];
1677ec681f3Smrg
1687ec681f3SmrgY-tiling
1697ec681f3Smrg--------
1707ec681f3Smrg
1717ec681f3SmrgThe Y-tiling format, also available since gen4, is substantially different from
1727ec681f3SmrgX-tiling and performs much better in practice.  Each Y-tile is an 8x8 grid of cache lines arranged Y-major as follows:
1737ec681f3Smrg
1747ec681f3Smrg===== ===== ===== ===== ===== ===== ===== =====
1757ec681f3Smrg===== ===== ===== ===== ===== ===== ===== =====
1767ec681f3Smrg0x000 0x200 0x400 0x600 0x800 0xa00 0xc00 0xe00
1777ec681f3Smrg0x040 0x240 0x440 0x640 0x840 0xa40 0xc40 0xe40
1787ec681f3Smrg0x080 0x280 0x480 0x680 0x880 0xa80 0xc80 0xe80
1797ec681f3Smrg0x0c0 0x2c0 0x4c0 0x6c0 0x8c0 0xac0 0xcc0 0xec0
1807ec681f3Smrg0x100 0x300 0x500 0x700 0x900 0xb00 0xd00 0xf00
1817ec681f3Smrg0x140 0x340 0x540 0x740 0x940 0xb40 0xd40 0xf40
1827ec681f3Smrg0x180 0x380 0x580 0x780 0x980 0xb80 0xd80 0xf80
1837ec681f3Smrg0x1c0 0x3c0 0x5c0 0x7c0 0x9c0 0xbc0 0xdc0 0xfc0
1847ec681f3Smrg===== ===== ===== ===== ===== ===== ===== =====
1857ec681f3Smrg
1867ec681f3SmrgEach 64B cache line within the tile is laid out as 4 rows of 16B each:
1877ec681f3Smrg
1887ec681f3Smrg==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ====
1897ec681f3Smrg==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ====
1907ec681f3Smrg0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
1917ec681f3Smrg0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
1927ec681f3Smrg0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27 0x28 0x29 0x2a 0x2b 0x2c 0x2d 0x2e 0x2f
1937ec681f3Smrg0x30 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x3a 0x3b 0x3c 0x3d 0x3e 0x3f
1947ec681f3Smrg==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ====
1957ec681f3Smrg
1967ec681f3SmrgY-tiling is widely regarded as being substantially faster than X-tiling so it
1977ec681f3Smrgis generally preferred.  However, prior to Sky Lake, Y-tiling was not available
1987ec681f3Smrgfor scanout so X tiling was used for any sort of window-system buffers.
1997ec681f3SmrgStarting with Sky Lake, we can scan out from Y-tiled buffers.
2007ec681f3Smrg
2017ec681f3SmrgBit-6 Swizzling
2027ec681f3Smrg^^^^^^^^^^^^^^^
2037ec681f3Smrg
2047ec681f3SmrgWhen bit-6 swizzling is enabled, bit 9 is XOR'd in with bit 6 of the tiled
2057ec681f3Smrgaddress:
2067ec681f3Smrg
2077ec681f3Smrg.. code-block:: c
2087ec681f3Smrg
2097ec681f3Smrg   addr[6] ^= addr[9];
2107ec681f3Smrg
2117ec681f3SmrgW-tiling
2127ec681f3Smrg--------
2137ec681f3Smrg
2147ec681f3SmrgW-tiling is a new tiling format added on Sandy Bridge for use in stencil
2157ec681f3Smrgbuffers.  W-tiling is similar to Y-tiling in that it's arranged as an 8x8
2167ec681f3SmrgY-major grid of cache lines.  The bytes within each cache line are arranged as
2177ec681f3Smrgfollows:
2187ec681f3Smrg
2197ec681f3Smrg==== ==== ==== ==== ==== ==== ==== ====
2207ec681f3Smrg==== ==== ==== ==== ==== ==== ==== ====
2217ec681f3Smrg0x00 0x01 0x04 0x05 0x10 0x11 0x14 0x15
2227ec681f3Smrg0x02 0x03 0x06 0x07 0x12 0x13 0x16 0x17
2237ec681f3Smrg0x08 0x09 0x0c 0x0d 0x18 0x19 0x1c 0x1d
2247ec681f3Smrg0x0a 0x0b 0x0e 0x0f 0x1a 0x1b 0x1e 0x1f
2257ec681f3Smrg0x20 0x21 0x24 0x25 0x30 0x31 0x34 0x35
2267ec681f3Smrg0x22 0x23 0x26 0x27 0x32 0x33 0x36 0x37
2277ec681f3Smrg0x28 0x29 0x2c 0x2d 0x38 0x39 0x3c 0x3d
2287ec681f3Smrg0x2a 0x2b 0x2e 0x2f 0x3a 0x3b 0x3e 0x3f
2297ec681f3Smrg==== ==== ==== ==== ==== ==== ==== ====
2307ec681f3Smrg
2317ec681f3SmrgWhile W-tiling has been required for stencil all the way back to Sandy Bridge,
2327ec681f3Smrgthe docs are somewhat confused as to whether stencil buffers are W or Y-tiled.
2337ec681f3SmrgThis seems to stem from the fact that the hardware seems to implement W-tiling
2347ec681f3Smrgas a sort of modified Y-tiling.  One example of this is the somewhat odd
2357ec681f3Smrgrequirement that W-tiled buffers have their pitch multiplied by 2.  From the
2367ec681f3SmrgSky Lake PRM Vol. 2d, "RENDER_SURFACE_STATE" (p. 427):
2377ec681f3Smrg
2387ec681f3Smrg   If the surface is a stencil buffer (and thus has Tile Mode set to
2397ec681f3Smrg   TILEMODE_WMAJOR), the pitch must be set to 2x the value computed based on
2407ec681f3Smrg   width, as the stencil buffer is stored with two rows interleaved.
2417ec681f3Smrg
2427ec681f3SmrgThe last phrase holds the key here: "the stencil buffer is stored with two rows
2437ec681f3Smrginterleaved".  More accurately, a W-tiled buffer can be viewed as a Y-tiled
2447ec681f3Smrgbuffer with each set of 4 W-tiled lines interleaved to form 2 Y-tiled lines. In
2457ec681f3SmrgISL, we represent a W-tile as a tiling with a logical dimension of 64el x 64el
2467ec681f3Smrgbut a physical size of 128B x 32rows.  This cleanly takes care of the pitch
2477ec681f3Smrgissue above and seems to nicely model the hardware.
2487ec681f3Smrg
2497ec681f3SmrgTile4
2507ec681f3Smrg-----
2517ec681f3Smrg
2527ec681f3SmrgThe tile4 format, introduced on Xe-HP, is somewhat similar to Y but with more
2537ec681f3Smrginternal shuffling.  Each tile4 tile is an 8x8 grid of cache lines arranged
2547ec681f3Smrgas follows:
2557ec681f3Smrg
2567ec681f3Smrg===== ===== ===== ===== ===== ===== ===== =====
2577ec681f3Smrg===== ===== ===== ===== ===== ===== ===== =====
2587ec681f3Smrg0x000 0x040 0x080 0x0a0 0x200 0x240 0x280 0x2a0
2597ec681f3Smrg0x100 0x140 0x180 0x1a0 0x300 0x340 0x380 0x3a0
2607ec681f3Smrg0x400 0x440 0x480 0x4a0 0x600 0x640 0x680 0x6a0
2617ec681f3Smrg0x500 0x540 0x580 0x5a0 0x700 0x740 0x780 0x7a0
2627ec681f3Smrg0x800 0x840 0x880 0x8a0 0xa00 0xa40 0xa80 0xaa0
2637ec681f3Smrg0x900 0x940 0x980 0x9a0 0xb00 0xb40 0xb80 0xba0
2647ec681f3Smrg0xc00 0xc40 0xc80 0xca0 0xe00 0xe40 0xe80 0xea0
2657ec681f3Smrg0xd00 0xd40 0xd80 0xda0 0xf00 0xf40 0xf80 0xfa0
2667ec681f3Smrg===== ===== ===== ===== ===== ===== ===== =====
2677ec681f3Smrg
2687ec681f3SmrgEach 64B cache line within the tile is laid out the same way as for a Y-tile,
2697ec681f3Smrgas 4 rows of 16B each:
2707ec681f3Smrg
2717ec681f3Smrg==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ====
2727ec681f3Smrg==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ====
2737ec681f3Smrg0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f
2747ec681f3Smrg0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f
2757ec681f3Smrg0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27 0x28 0x29 0x2a 0x2b 0x2c 0x2d 0x2e 0x2f
2767ec681f3Smrg0x30 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x3a 0x3b 0x3c 0x3d 0x3e 0x3f
2777ec681f3Smrg==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== ====
2787ec681f3Smrg
2797ec681f3SmrgTiling as a bit pattern
2807ec681f3Smrg-----------------------
2817ec681f3Smrg
2827ec681f3SmrgThere is one more important angle on tiling that should be discussed before we
2837ec681f3Smrgfinish.  Every tiling can be described by three things:
2847ec681f3Smrg
2857ec681f3Smrg 1. A logical width and height in elements
2867ec681f3Smrg 2. A physical width in bytes and height in rows
2877ec681f3Smrg 3. A mapping from logical elements to physical bytes within the tile
2887ec681f3Smrg
2897ec681f3SmrgWe have spent a good deal of time on the first two because this is what you
2907ec681f3Smrgreally need for doing surface layout calculations.  However, there are cases in
2917ec681f3Smrgwhich the map from logical to physical elements is critical.  One example is
2927ec681f3SmrgW-tiling where we have code to do W-tiled encoding and decoding in the shader
2937ec681f3Smrgfor doing stencil blits because the hardware does not allow us to render to
2947ec681f3SmrgW-tiled surfaces.
2957ec681f3Smrg
2967ec681f3SmrgThere are many ways to mathematically describe the mapping from logical
2977ec681f3Smrgelements to physical bytes.  In the PRMs they give a very complicated set of
2987ec681f3Smrgformulas involving lots of multiplication, modulus, and sums that show you how
2997ec681f3Smrgto compute the mapping.  With a little creativity, you can easily reduce those
3007ec681f3Smrgto a set of bit shifts and ORs.  By far the simplest formulation, however, is
3017ec681f3Smrgas a mapping from the bits of the texture coordinates to bits in the address.
3027ec681f3SmrgSuppose that :math:`(u, v)` is location of a 1-byte element within a tile.  If
3037ec681f3Smrgyou represent :math:`u` as :math:`u_n u_{n-1} \cdots u_2 u_1 u_0` where
3047ec681f3Smrg:math:`u_0` is the LSB and :math:`u_n` is the MSB of :math:`u` and similarly
3057ec681f3Smrg:math:`v = v_m v_{m-1} \cdots v_2 v_1 v_0`, then the bits of the address within
3067ec681f3Smrgthe tile are given by the table below:
3077ec681f3Smrg
3087ec681f3Smrg=========================================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== ===========
3097ec681f3Smrg Tiling                                          11          10          9           8           7           6           5           4           3           2           1           0
3107ec681f3Smrg=========================================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== ===========
3117ec681f3Smrg:cpp:enumerator:`isl_tiling::ISL_TILING_X`  :math:`v_2` :math:`v_1` :math:`v_0` :math:`u_8` :math:`u_7` :math:`u_6` :math:`u_5` :math:`u_4` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0`
3127ec681f3Smrg:cpp:enumerator:`isl_tiling::ISL_TILING_Y0` :math:`u_6` :math:`u_5` :math:`u_4` :math:`v_4` :math:`v_3` :math:`v_2` :math:`v_1` :math:`v_0` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0`
3137ec681f3Smrg:cpp:enumerator:`isl_tiling::ISL_TILING_W`  :math:`u_5` :math:`u_4` :math:`u_3` :math:`v_5` :math:`v_4` :math:`v_3` :math:`v_2` :math:`u_2` :math:`v_1` :math:`u_1` :math:`v_0` :math:`u_0`
3147ec681f3Smrg:cpp:enumerator:`isl_tiling::ISL_TILING_4`  :math:`v_4` :math:`v_3` :math:`u_6` :math:`v_2` :math:`u_5` :math:`u_4` :math:`v_1` :math:`v_0` :math:`u_3` :math:`u_2` :math:`u_1` :math:`u_0`
3157ec681f3Smrg=========================================== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== =========== ===========
3167ec681f3Smrg
3177ec681f3SmrgConstructing the mapping this way makes a lot of sense when you think about
3187ec681f3Smrghardware.  It may seem complex on paper but "simple" things such as addition
3197ec681f3Smrgare relatively expensive in hardware while interleaving bits in a well-defined
3207ec681f3Smrgpattern is practically free. For a format that has more than one byte per
3217ec681f3Smrgelement, you simply chop bits off the bottom of the pattern, hard-code them to
3227ec681f3Smrg0, and adjust bit indices as needed.  For a 128-bit format, for instance, the
3237ec681f3SmrgY-tiled pattern becomes u_2 u_1 u_0 v_4 v_3 v_2 v_1 v_0.  The Sky Lake PRM
3247ec681f3SmrgVol. 5 in the section "2D Surfaces" contains an expanded version of the above
3257ec681f3Smrgtable (which we will not repeat here) that also includes the bit patterns for
3267ec681f3Smrgthe Ys and Yf tiling formats.
327