17ec681f3SmrgAuxiliary surface compression
27ec681f3Smrg=============================
37ec681f3Smrg
47ec681f3SmrgMost lossless image compression on Intel hardware, be that CCS, MCS, or HiZ,
57ec681f3Smrgworks by way of some chunk of auxiliary data (often a surface) which is used
67ec681f3Smrgtogether with the main surface to provide compression.  Even though this means
77ec681f3Smrgmore memory is allocated, the scheme allows us to reduce our over-all memory
87ec681f3Smrgbandwidth since the auxiliary data is much smaller than the main surface.
97ec681f3Smrg
107ec681f3SmrgThe simplest example of this is single-sample fast clears
117ec681f3Smrg(:cpp:enumerator:`isl_aux_usage::ISL_AUX_USAGE_CCS_D`) on Ivy Bridge through
127ec681f3SmrgBroadwell and later.  For this scheme, the auxiliary surface stores a single
137ec681f3Smrgbit for each cache-line-pair in the main surface.  If that bit is set, then the
147ec681f3Smrgentire cache line pair contains only the clear color as provided in the
157ec681f3Smrg``RENDER_SURFACE_STATE`` for the image.  If the bit is unset, then it's not
167ec681f3Smrgclear and you should look at the main surface.  Since a cache line is 64B, this
177ec681f3Smrgyields a scale-down factor of 1:1024.
187ec681f3Smrg
197ec681f3SmrgEven the simple fast-clear scheme saves us bandwidth in two places.  The first
207ec681f3Smrgis when we go to clear the surface.  If we're doing a full-surface clear or
217ec681f3Smrgclearing to the same color that was used to clear before, we don't have to
227ec681f3Smrgtouch the main surface at all.  All we have to do is record the clear color and
237ec681f3Smrgsmash the aux data to ``0xff``.  The hardware then knows to ignore whatever is
247ec681f3Smrgin the main surface and look at the clear color instead.  The second is when we
257ec681f3Smrggo to render.  Say we're doing some color blending.  Instead of the blend unit
267ec681f3Smrghaving to read back actual surface contents to blend with, it looks at the
277ec681f3Smrgclear bit and blends with the clear color recorded with the surface state
287ec681f3Smrginstead.  Depending on the geometry and cache utilization, this can save as
297ec681f3Smrgmuch as one whole read of the surface worth of bandwidth.
307ec681f3Smrg
317ec681f3SmrgThe difficulty with a scheme like this comes when we want to do something else
327ec681f3Smrgwith that surface.  What happens if the sampler doesn't support this fast-clear
337ec681f3Smrgscheme (it doesn't on IVB)?  In that case, we have to do a *resolve* where we
347ec681f3Smrgrun a special pipeline that reads the auxiliary data and applies it to the main
357ec681f3Smrgsurface.  In the case of fast clears, this means that, for every 1 bit in the
367ec681f3Smrgauxiliary surface, the corresponding pair of cache lines in the main surface
377ec681f3Smrggets filled with the clear color.  At the end of the resolve operation, the
387ec681f3Smrgmain surface contents are the actual contents of the surface.
397ec681f3Smrg
407ec681f3SmrgTypes of surface compression
417ec681f3Smrg----------------------------
427ec681f3Smrg
437ec681f3SmrgIntel hardware has several different compression schemes that all work along
447ec681f3Smrgsimilar lines:
457ec681f3Smrg
467ec681f3Smrg.. doxygenenum:: isl_aux_usage
477ec681f3Smrg.. doxygenfunction:: isl_aux_usage_has_fast_clears
487ec681f3Smrg.. doxygenfunction:: isl_aux_usage_has_compression
497ec681f3Smrg.. doxygenfunction:: isl_aux_usage_has_hiz
507ec681f3Smrg.. doxygenfunction:: isl_aux_usage_has_mcs
517ec681f3Smrg.. doxygenfunction:: isl_aux_usage_has_ccs
527ec681f3Smrg
537ec681f3SmrgCreating auxiliary surfaces
547ec681f3Smrg---------------------------
557ec681f3Smrg
567ec681f3SmrgEach type of data compression requires some type of auxiliary data on the side.
577ec681f3SmrgFor most, this involves a second auxiliary surface.  ISL provides helpers for
587ec681f3Smrgcreating each of these types of surfaces:
597ec681f3Smrg
607ec681f3Smrg.. doxygenfunction:: isl_surf_get_hiz_surf
617ec681f3Smrg.. doxygenfunction:: isl_surf_get_mcs_surf
627ec681f3Smrg.. doxygenfunction:: isl_surf_supports_ccs
637ec681f3Smrg.. doxygenfunction:: isl_surf_get_ccs_surf
647ec681f3Smrg
657ec681f3SmrgCompression state tracking
667ec681f3Smrg--------------------------
677ec681f3Smrg
687ec681f3SmrgAll of the Intel auxiliary surface compression schemes share a common concept
697ec681f3Smrgof a main surface which may or may not contain correct up-to-date data and some
707ec681f3Smrgauxiliary data which says how to interpret it.  The main surface is divided
717ec681f3Smrginto blocks of some fixed size and some smaller block in the auxiliary data
727ec681f3Smrgcontrols how that main surface block is to be interpreted.  We then have to do
737ec681f3Smrgresolves depending on the different HW units which need to interact with a
747ec681f3Smrggiven surface.
757ec681f3Smrg
767ec681f3SmrgTo help drivers keep track of what all is going on and when resolves need to be
777ec681f3Smrginserted, ISL provides a finite state machine which tracks the current state of
787ec681f3Smrgthe main surface and auxiliary data and their relationship to each other.  The
797ec681f3Smrgstates are encoded with the :cpp:enum:`isl_aux_state` enum.  ISL also provides
807ec681f3Smrghelper functions for operating the state machine and determining what aux op
817ec681f3Smrg(if any) is required to get to the right state for a given operation.
827ec681f3Smrg
837ec681f3Smrg.. doxygenenum:: isl_aux_state
847ec681f3Smrg.. doxygenfunction:: isl_aux_state_has_valid_primary
857ec681f3Smrg.. doxygenfunction:: isl_aux_state_has_valid_aux
867ec681f3Smrg.. doxygenenum:: isl_aux_op
877ec681f3Smrg.. doxygenfunction:: isl_aux_prepare_access
887ec681f3Smrg.. doxygenfunction:: isl_aux_state_transition_aux_op
897ec681f3Smrg.. doxygenfunction:: isl_aux_state_transition_write
90