17ec681f3SmrgAuxiliary surface compression 27ec681f3Smrg============================= 37ec681f3Smrg 47ec681f3SmrgMost lossless image compression on Intel hardware, be that CCS, MCS, or HiZ, 57ec681f3Smrgworks by way of some chunk of auxiliary data (often a surface) which is used 67ec681f3Smrgtogether with the main surface to provide compression. Even though this means 77ec681f3Smrgmore memory is allocated, the scheme allows us to reduce our over-all memory 87ec681f3Smrgbandwidth since the auxiliary data is much smaller than the main surface. 97ec681f3Smrg 107ec681f3SmrgThe simplest example of this is single-sample fast clears 117ec681f3Smrg(:cpp:enumerator:`isl_aux_usage::ISL_AUX_USAGE_CCS_D`) on Ivy Bridge through 127ec681f3SmrgBroadwell and later. For this scheme, the auxiliary surface stores a single 137ec681f3Smrgbit for each cache-line-pair in the main surface. If that bit is set, then the 147ec681f3Smrgentire cache line pair contains only the clear color as provided in the 157ec681f3Smrg``RENDER_SURFACE_STATE`` for the image. If the bit is unset, then it's not 167ec681f3Smrgclear and you should look at the main surface. Since a cache line is 64B, this 177ec681f3Smrgyields a scale-down factor of 1:1024. 187ec681f3Smrg 197ec681f3SmrgEven the simple fast-clear scheme saves us bandwidth in two places. The first 207ec681f3Smrgis when we go to clear the surface. If we're doing a full-surface clear or 217ec681f3Smrgclearing to the same color that was used to clear before, we don't have to 227ec681f3Smrgtouch the main surface at all. All we have to do is record the clear color and 237ec681f3Smrgsmash the aux data to ``0xff``. The hardware then knows to ignore whatever is 247ec681f3Smrgin the main surface and look at the clear color instead. The second is when we 257ec681f3Smrggo to render. Say we're doing some color blending. Instead of the blend unit 267ec681f3Smrghaving to read back actual surface contents to blend with, it looks at the 277ec681f3Smrgclear bit and blends with the clear color recorded with the surface state 287ec681f3Smrginstead. Depending on the geometry and cache utilization, this can save as 297ec681f3Smrgmuch as one whole read of the surface worth of bandwidth. 307ec681f3Smrg 317ec681f3SmrgThe difficulty with a scheme like this comes when we want to do something else 327ec681f3Smrgwith that surface. What happens if the sampler doesn't support this fast-clear 337ec681f3Smrgscheme (it doesn't on IVB)? In that case, we have to do a *resolve* where we 347ec681f3Smrgrun a special pipeline that reads the auxiliary data and applies it to the main 357ec681f3Smrgsurface. In the case of fast clears, this means that, for every 1 bit in the 367ec681f3Smrgauxiliary surface, the corresponding pair of cache lines in the main surface 377ec681f3Smrggets filled with the clear color. At the end of the resolve operation, the 387ec681f3Smrgmain surface contents are the actual contents of the surface. 397ec681f3Smrg 407ec681f3SmrgTypes of surface compression 417ec681f3Smrg---------------------------- 427ec681f3Smrg 437ec681f3SmrgIntel hardware has several different compression schemes that all work along 447ec681f3Smrgsimilar lines: 457ec681f3Smrg 467ec681f3Smrg.. doxygenenum:: isl_aux_usage 477ec681f3Smrg.. doxygenfunction:: isl_aux_usage_has_fast_clears 487ec681f3Smrg.. doxygenfunction:: isl_aux_usage_has_compression 497ec681f3Smrg.. doxygenfunction:: isl_aux_usage_has_hiz 507ec681f3Smrg.. doxygenfunction:: isl_aux_usage_has_mcs 517ec681f3Smrg.. doxygenfunction:: isl_aux_usage_has_ccs 527ec681f3Smrg 537ec681f3SmrgCreating auxiliary surfaces 547ec681f3Smrg--------------------------- 557ec681f3Smrg 567ec681f3SmrgEach type of data compression requires some type of auxiliary data on the side. 577ec681f3SmrgFor most, this involves a second auxiliary surface. ISL provides helpers for 587ec681f3Smrgcreating each of these types of surfaces: 597ec681f3Smrg 607ec681f3Smrg.. doxygenfunction:: isl_surf_get_hiz_surf 617ec681f3Smrg.. doxygenfunction:: isl_surf_get_mcs_surf 627ec681f3Smrg.. doxygenfunction:: isl_surf_supports_ccs 637ec681f3Smrg.. doxygenfunction:: isl_surf_get_ccs_surf 647ec681f3Smrg 657ec681f3SmrgCompression state tracking 667ec681f3Smrg-------------------------- 677ec681f3Smrg 687ec681f3SmrgAll of the Intel auxiliary surface compression schemes share a common concept 697ec681f3Smrgof a main surface which may or may not contain correct up-to-date data and some 707ec681f3Smrgauxiliary data which says how to interpret it. The main surface is divided 717ec681f3Smrginto blocks of some fixed size and some smaller block in the auxiliary data 727ec681f3Smrgcontrols how that main surface block is to be interpreted. We then have to do 737ec681f3Smrgresolves depending on the different HW units which need to interact with a 747ec681f3Smrggiven surface. 757ec681f3Smrg 767ec681f3SmrgTo help drivers keep track of what all is going on and when resolves need to be 777ec681f3Smrginserted, ISL provides a finite state machine which tracks the current state of 787ec681f3Smrgthe main surface and auxiliary data and their relationship to each other. The 797ec681f3Smrgstates are encoded with the :cpp:enum:`isl_aux_state` enum. ISL also provides 807ec681f3Smrghelper functions for operating the state machine and determining what aux op 817ec681f3Smrg(if any) is required to get to the right state for a given operation. 827ec681f3Smrg 837ec681f3Smrg.. doxygenenum:: isl_aux_state 847ec681f3Smrg.. doxygenfunction:: isl_aux_state_has_valid_primary 857ec681f3Smrg.. doxygenfunction:: isl_aux_state_has_valid_aux 867ec681f3Smrg.. doxygenenum:: isl_aux_op 877ec681f3Smrg.. doxygenfunction:: isl_aux_prepare_access 887ec681f3Smrg.. doxygenfunction:: isl_aux_state_transition_aux_op 897ec681f3Smrg.. doxygenfunction:: isl_aux_state_transition_write 90