1SandyBridge's New Acceleration 2------------------------------ 3 4The guiding principle behind the design is to avoid GPU context switches. 5On SandyBridge (and beyond), these are especially pernicious because the 6RENDER and BLT engine are now on different rings and require 7synchronisation of the various execution units when switching contexts. 8They were not cheap on early generation, but with the increasing 9complexity of the GPU, avoiding such serialisations is important. 10 11Furthermore, we try very hard to avoid migrating between the CPU and GPU. 12Every pixmap (apart from temporary "scratch" surfaces which we intend to 13use on the GPU) is created in system memory. All operations are then done 14upon this shadow copy until we are forced to move it onto the GPU. Such 15migration can only be first triggered by: setting the pixmap as the 16scanout (we obviously need a GPU buffer here), using the pixmap as a DRI 17buffer (the client expects to perform hardware acceleration and we do not 18want to disappoint) and lastly using the pixmap as a RENDER target. This 19last is chosen because when we know we are going to perform hardware 20acceleration and will continue to do so without fallbacks, using the GPU 21is much, much faster than the CPU. The heuristic I chose therefore was 22that if the application uses RENDER, i.e. cairo, then it will only be 23using those paths and not intermixing core drawing operations and so 24unlikely to trigger a fallback. 25 26The complicating case is front-buffer rendering. So in order to accommodate 27using RENDER on an application whilst running xterm without a composite 28manager redirecting all the pixmaps to backing surfaces, we have to 29perform damage tracking to avoid excess migration of portions of the 30buffer. 31