17ec681f3SmrgFAQ
27ec681f3Smrg===
37ec681f3Smrg
47ec681f3SmrgWhy another software rasterizer?
57ec681f3Smrg--------------------------------
67ec681f3Smrg
77ec681f3SmrgGood question, given there are already three (swrast, softpipe,
87ec681f3Smrgllvmpipe) in the Mesa tree. Two important reasons for this:
97ec681f3Smrg
107ec681f3Smrg * Architecture - given our focus on scientific visualization, our
117ec681f3Smrg   workloads are much different than the typical game; we have heavy
127ec681f3Smrg   vertex load and relatively simple shaders.  In addition, the core
137ec681f3Smrg   counts of machines we run on are much higher.  These parameters led
147ec681f3Smrg   to design decisions much different than llvmpipe.
157ec681f3Smrg
167ec681f3Smrg * Historical - Intel had developed a high performance software
177ec681f3Smrg   graphics stack for internal purposes.  Later we adapted this
187ec681f3Smrg   graphics stack for use in visualization and decided to move forward
197ec681f3Smrg   with Mesa to provide a high quality API layer while at the same
207ec681f3Smrg   time benefiting from the excellent performance the software
217ec681f3Smrg   rasterizerizer gives us.
227ec681f3Smrg
237ec681f3SmrgWhat's the architecture?
247ec681f3Smrg------------------------
257ec681f3Smrg
267ec681f3SmrgSWR is a tile based immediate mode renderer with a sort-free threading
277ec681f3Smrgmodel which is arranged as a ring of queues.  Each entry in the ring
287ec681f3Smrgrepresents a draw context that contains all of the draw state and work
297ec681f3Smrgqueues.  An API thread sets up each draw context and worker threads
307ec681f3Smrgwill execute both the frontend (vertex/geometry processing) and
317ec681f3Smrgbackend (fragment) work as required.  The ring allows for backend
327ec681f3Smrgthreads to pull work in order.  Large draws are split into chunks to
337ec681f3Smrgallow vertex processing to happen in parallel, with the backend work
347ec681f3Smrgpickup preserving draw ordering.
357ec681f3Smrg
367ec681f3SmrgOur pipeline uses just-in-time compiled code for the fetch shader that
377ec681f3Smrgdoes vertex attribute gathering and AOS to SOA conversions, the vertex
387ec681f3Smrgshader and fragment shaders, streamout, and fragment blending. SWR
397ec681f3Smrgcore also supports geometry and compute shaders but we haven't exposed
407ec681f3Smrgthem through our driver yet. The fetch shader, streamout, and blend is
417ec681f3Smrgbuilt internally to swr core using LLVM directly, while for the vertex
427ec681f3Smrgand pixel shaders we reuse bits of llvmpipe from
437ec681f3Smrg``gallium/auxiliary/gallivm`` to build the kernels, which we wrap
447ec681f3Smrgdifferently than llvmpipe's ``auxiliary/draw`` code.
457ec681f3Smrg
467ec681f3SmrgWhat's the performance?
477ec681f3Smrg-----------------------
487ec681f3Smrg
497ec681f3SmrgFor the types of high-geometry workloads we're interested in, we are
507ec681f3Smrgsignificantly faster than llvmpipe.  This is to be expected, as
517ec681f3Smrgllvmpipe only threads the fragment processing and not the geometry
527ec681f3Smrgfrontend.  The performance advantage over llvmpipe roughly scales
537ec681f3Smrglinearly with the number of cores available.
547ec681f3Smrg
557ec681f3SmrgWhile our current performance is quite good, we know there is more
567ec681f3Smrgpotential in this architecture.  When we switched from a prototype
577ec681f3SmrgOpenGL driver to Mesa we regressed performance severely, some due to
587ec681f3Smrginterface issues that need tuning, some differences in shader code
597ec681f3Smrggeneration, and some due to conformance and feature additions to the
607ec681f3Smrgcore swr.  We are looking to recovering most of this performance back.
617ec681f3Smrg
627ec681f3SmrgWhat's the conformance?
637ec681f3Smrg-----------------------
647ec681f3Smrg
657ec681f3SmrgThe major applications we are targeting are all based on the
667ec681f3SmrgVisualization Toolkit (VTK), and as such our development efforts have
677ec681f3Smrgbeen focused on making sure these work as best as possible.  Our
687ec681f3Smrgcurrent code passes vtk's rendering tests with their new "OpenGL2"
697ec681f3Smrg(really OpenGL 3.2) backend at 99%.
707ec681f3Smrg
717ec681f3Smrgpiglit testing shows a much lower pass rate, roughly 80% at the time
727ec681f3Smrgof writing.  Core SWR undergoes rigorous unit testing and we are quite
737ec681f3Smrgconfident in the rasterizer, and understand the areas where it
747ec681f3Smrgcurrently has issues (example: line rendering is done with triangles,
757ec681f3Smrgso doesn't match the strict line rendering rules).  The majority of
767ec681f3Smrgthe piglit failures are errors in our driver layer interfacing Mesa
777ec681f3Smrgand SWR.  Fixing these issues is one of our major future development
787ec681f3Smrggoals.
797ec681f3Smrg
807ec681f3SmrgWhy are you open sourcing this?
817ec681f3Smrg-------------------------------
827ec681f3Smrg
837ec681f3Smrg * Our customers prefer open source, and allowing them to simply
847ec681f3Smrg   download the Mesa source and enable our driver makes life much
857ec681f3Smrg   easier for them.
867ec681f3Smrg
877ec681f3Smrg * The internal gallium APIs are not stable, so we'd like our driver
887ec681f3Smrg   to be visible for changes.
897ec681f3Smrg
907ec681f3Smrg * It's easier to work with the Mesa community when the source we're
917ec681f3Smrg   working with can be used as reference.
927ec681f3Smrg
937ec681f3SmrgWhat are your development plans?
947ec681f3Smrg--------------------------------
957ec681f3Smrg
967ec681f3Smrg * Performance - see the performance section earlier for details.
977ec681f3Smrg
987ec681f3Smrg * Conformance - see the conformance section earlier for details.
997ec681f3Smrg
1007ec681f3Smrg * Features - core SWR has a lot of functionality we have yet to
1017ec681f3Smrg   expose through our driver, such as MSAA, geometry shaders, compute
1027ec681f3Smrg   shaders, and tesselation.
1037ec681f3Smrg
1047ec681f3Smrg * AVX512 support
1057ec681f3Smrg
1067ec681f3SmrgWhat is the licensing of the code?
1077ec681f3Smrg----------------------------------
1087ec681f3Smrg
1097ec681f3Smrg * All code is under the normal Mesa MIT license.
1107ec681f3Smrg
1117ec681f3SmrgWill this work on AMD?
1127ec681f3Smrg----------------------
1137ec681f3Smrg
1147ec681f3Smrg * If using an AMD processor with AVX or AVX2, it should work though
1157ec681f3Smrg   we don't have that hardware around to test.  Patches if needed
1167ec681f3Smrg   would be welcome.
1177ec681f3Smrg
1187ec681f3SmrgWill this work on ARM, MIPS, POWER, <other non-x86 architecture>?
1197ec681f3Smrg-------------------------------------------------------------------------
1207ec681f3Smrg
1217ec681f3Smrg * Not without a lot of work.  We make extensive use of AVX and AVX2
1227ec681f3Smrg   intrinsics in our code and the in-tree JIT creation.  It is not the
1237ec681f3Smrg   intention for this codebase to support non-x86 architectures.
1247ec681f3Smrg
1257ec681f3SmrgWhat hardware do I need?
1267ec681f3Smrg------------------------
1277ec681f3Smrg
1287ec681f3Smrg * Any x86 processor with at least AVX (introduced in the Intel
1297ec681f3Smrg   SandyBridge and AMD Bulldozer microarchitectures in 2011) will
1307ec681f3Smrg   work.
1317ec681f3Smrg
1327ec681f3Smrg * You don't need a fire-breathing Xeon machine to work on SWR - we do
1337ec681f3Smrg   day-to-day development with laptops and desktop CPUs.
1347ec681f3Smrg
1357ec681f3SmrgDoes one build work on both AVX and AVX2?
1367ec681f3Smrg-----------------------------------------
1377ec681f3Smrg
1387ec681f3SmrgYes. The build system creates two shared libraries, ``libswrAVX.so`` and
1397ec681f3Smrg``libswrAVX2.so``, and ``swr_create_screen()`` loads the appropriate one at
1407ec681f3Smrgruntime.
1417ec681f3Smrg
142