17ec681f3SmrgFAQ 27ec681f3Smrg=== 37ec681f3Smrg 47ec681f3SmrgWhy another software rasterizer? 57ec681f3Smrg-------------------------------- 67ec681f3Smrg 77ec681f3SmrgGood question, given there are already three (swrast, softpipe, 87ec681f3Smrgllvmpipe) in the Mesa tree. Two important reasons for this: 97ec681f3Smrg 107ec681f3Smrg * Architecture - given our focus on scientific visualization, our 117ec681f3Smrg workloads are much different than the typical game; we have heavy 127ec681f3Smrg vertex load and relatively simple shaders. In addition, the core 137ec681f3Smrg counts of machines we run on are much higher. These parameters led 147ec681f3Smrg to design decisions much different than llvmpipe. 157ec681f3Smrg 167ec681f3Smrg * Historical - Intel had developed a high performance software 177ec681f3Smrg graphics stack for internal purposes. Later we adapted this 187ec681f3Smrg graphics stack for use in visualization and decided to move forward 197ec681f3Smrg with Mesa to provide a high quality API layer while at the same 207ec681f3Smrg time benefiting from the excellent performance the software 217ec681f3Smrg rasterizerizer gives us. 227ec681f3Smrg 237ec681f3SmrgWhat's the architecture? 247ec681f3Smrg------------------------ 257ec681f3Smrg 267ec681f3SmrgSWR is a tile based immediate mode renderer with a sort-free threading 277ec681f3Smrgmodel which is arranged as a ring of queues. Each entry in the ring 287ec681f3Smrgrepresents a draw context that contains all of the draw state and work 297ec681f3Smrgqueues. An API thread sets up each draw context and worker threads 307ec681f3Smrgwill execute both the frontend (vertex/geometry processing) and 317ec681f3Smrgbackend (fragment) work as required. The ring allows for backend 327ec681f3Smrgthreads to pull work in order. Large draws are split into chunks to 337ec681f3Smrgallow vertex processing to happen in parallel, with the backend work 347ec681f3Smrgpickup preserving draw ordering. 357ec681f3Smrg 367ec681f3SmrgOur pipeline uses just-in-time compiled code for the fetch shader that 377ec681f3Smrgdoes vertex attribute gathering and AOS to SOA conversions, the vertex 387ec681f3Smrgshader and fragment shaders, streamout, and fragment blending. SWR 397ec681f3Smrgcore also supports geometry and compute shaders but we haven't exposed 407ec681f3Smrgthem through our driver yet. The fetch shader, streamout, and blend is 417ec681f3Smrgbuilt internally to swr core using LLVM directly, while for the vertex 427ec681f3Smrgand pixel shaders we reuse bits of llvmpipe from 437ec681f3Smrg``gallium/auxiliary/gallivm`` to build the kernels, which we wrap 447ec681f3Smrgdifferently than llvmpipe's ``auxiliary/draw`` code. 457ec681f3Smrg 467ec681f3SmrgWhat's the performance? 477ec681f3Smrg----------------------- 487ec681f3Smrg 497ec681f3SmrgFor the types of high-geometry workloads we're interested in, we are 507ec681f3Smrgsignificantly faster than llvmpipe. This is to be expected, as 517ec681f3Smrgllvmpipe only threads the fragment processing and not the geometry 527ec681f3Smrgfrontend. The performance advantage over llvmpipe roughly scales 537ec681f3Smrglinearly with the number of cores available. 547ec681f3Smrg 557ec681f3SmrgWhile our current performance is quite good, we know there is more 567ec681f3Smrgpotential in this architecture. When we switched from a prototype 577ec681f3SmrgOpenGL driver to Mesa we regressed performance severely, some due to 587ec681f3Smrginterface issues that need tuning, some differences in shader code 597ec681f3Smrggeneration, and some due to conformance and feature additions to the 607ec681f3Smrgcore swr. We are looking to recovering most of this performance back. 617ec681f3Smrg 627ec681f3SmrgWhat's the conformance? 637ec681f3Smrg----------------------- 647ec681f3Smrg 657ec681f3SmrgThe major applications we are targeting are all based on the 667ec681f3SmrgVisualization Toolkit (VTK), and as such our development efforts have 677ec681f3Smrgbeen focused on making sure these work as best as possible. Our 687ec681f3Smrgcurrent code passes vtk's rendering tests with their new "OpenGL2" 697ec681f3Smrg(really OpenGL 3.2) backend at 99%. 707ec681f3Smrg 717ec681f3Smrgpiglit testing shows a much lower pass rate, roughly 80% at the time 727ec681f3Smrgof writing. Core SWR undergoes rigorous unit testing and we are quite 737ec681f3Smrgconfident in the rasterizer, and understand the areas where it 747ec681f3Smrgcurrently has issues (example: line rendering is done with triangles, 757ec681f3Smrgso doesn't match the strict line rendering rules). The majority of 767ec681f3Smrgthe piglit failures are errors in our driver layer interfacing Mesa 777ec681f3Smrgand SWR. Fixing these issues is one of our major future development 787ec681f3Smrggoals. 797ec681f3Smrg 807ec681f3SmrgWhy are you open sourcing this? 817ec681f3Smrg------------------------------- 827ec681f3Smrg 837ec681f3Smrg * Our customers prefer open source, and allowing them to simply 847ec681f3Smrg download the Mesa source and enable our driver makes life much 857ec681f3Smrg easier for them. 867ec681f3Smrg 877ec681f3Smrg * The internal gallium APIs are not stable, so we'd like our driver 887ec681f3Smrg to be visible for changes. 897ec681f3Smrg 907ec681f3Smrg * It's easier to work with the Mesa community when the source we're 917ec681f3Smrg working with can be used as reference. 927ec681f3Smrg 937ec681f3SmrgWhat are your development plans? 947ec681f3Smrg-------------------------------- 957ec681f3Smrg 967ec681f3Smrg * Performance - see the performance section earlier for details. 977ec681f3Smrg 987ec681f3Smrg * Conformance - see the conformance section earlier for details. 997ec681f3Smrg 1007ec681f3Smrg * Features - core SWR has a lot of functionality we have yet to 1017ec681f3Smrg expose through our driver, such as MSAA, geometry shaders, compute 1027ec681f3Smrg shaders, and tesselation. 1037ec681f3Smrg 1047ec681f3Smrg * AVX512 support 1057ec681f3Smrg 1067ec681f3SmrgWhat is the licensing of the code? 1077ec681f3Smrg---------------------------------- 1087ec681f3Smrg 1097ec681f3Smrg * All code is under the normal Mesa MIT license. 1107ec681f3Smrg 1117ec681f3SmrgWill this work on AMD? 1127ec681f3Smrg---------------------- 1137ec681f3Smrg 1147ec681f3Smrg * If using an AMD processor with AVX or AVX2, it should work though 1157ec681f3Smrg we don't have that hardware around to test. Patches if needed 1167ec681f3Smrg would be welcome. 1177ec681f3Smrg 1187ec681f3SmrgWill this work on ARM, MIPS, POWER, <other non-x86 architecture>? 1197ec681f3Smrg------------------------------------------------------------------------- 1207ec681f3Smrg 1217ec681f3Smrg * Not without a lot of work. We make extensive use of AVX and AVX2 1227ec681f3Smrg intrinsics in our code and the in-tree JIT creation. It is not the 1237ec681f3Smrg intention for this codebase to support non-x86 architectures. 1247ec681f3Smrg 1257ec681f3SmrgWhat hardware do I need? 1267ec681f3Smrg------------------------ 1277ec681f3Smrg 1287ec681f3Smrg * Any x86 processor with at least AVX (introduced in the Intel 1297ec681f3Smrg SandyBridge and AMD Bulldozer microarchitectures in 2011) will 1307ec681f3Smrg work. 1317ec681f3Smrg 1327ec681f3Smrg * You don't need a fire-breathing Xeon machine to work on SWR - we do 1337ec681f3Smrg day-to-day development with laptops and desktop CPUs. 1347ec681f3Smrg 1357ec681f3SmrgDoes one build work on both AVX and AVX2? 1367ec681f3Smrg----------------------------------------- 1377ec681f3Smrg 1387ec681f3SmrgYes. The build system creates two shared libraries, ``libswrAVX.so`` and 1397ec681f3Smrg``libswrAVX2.so``, and ``swr_create_screen()`` loads the appropriate one at 1407ec681f3Smrgruntime. 1417ec681f3Smrg 142