drivers/openswr/faq.rst

7ec681f3SmrgFAQ
7ec681f3Smrg===
7ec681f3Smrg
7ec681f3SmrgWhy another software rasterizer?
7ec681f3Smrg--------------------------------
7ec681f3Smrg
7ec681f3SmrgGood question, given there are already three (swrast, softpipe,
7ec681f3Smrgllvmpipe) in the Mesa tree. Two important reasons for this:
7ec681f3Smrg
7ec681f3Smrg * Architecture - given our focus on scientific visualization, our
7ec681f3Smrg   workloads are much different than the typical game; we have heavy
7ec681f3Smrg   vertex load and relatively simple shaders.  In addition, the core
7ec681f3Smrg   counts of machines we run on are much higher.  These parameters led
7ec681f3Smrg   to design decisions much different than llvmpipe.
7ec681f3Smrg
7ec681f3Smrg * Historical - Intel had developed a high performance software
7ec681f3Smrg   graphics stack for internal purposes.  Later we adapted this
7ec681f3Smrg   graphics stack for use in visualization and decided to move forward
7ec681f3Smrg   with Mesa to provide a high quality API layer while at the same
7ec681f3Smrg   time benefiting from the excellent performance the software
7ec681f3Smrg   rasterizerizer gives us.
7ec681f3Smrg
7ec681f3SmrgWhat's the architecture?
7ec681f3Smrg------------------------
7ec681f3Smrg
7ec681f3SmrgSWR is a tile based immediate mode renderer with a sort-free threading
7ec681f3Smrgmodel which is arranged as a ring of queues.  Each entry in the ring
7ec681f3Smrgrepresents a draw context that contains all of the draw state and work
7ec681f3Smrgqueues.  An API thread sets up each draw context and worker threads
7ec681f3Smrgwill execute both the frontend (vertex/geometry processing) and
7ec681f3Smrgbackend (fragment) work as required.  The ring allows for backend
7ec681f3Smrgthreads to pull work in order.  Large draws are split into chunks to
7ec681f3Smrgallow vertex processing to happen in parallel, with the backend work
7ec681f3Smrgpickup preserving draw ordering.
7ec681f3Smrg
7ec681f3SmrgOur pipeline uses just-in-time compiled code for the fetch shader that
7ec681f3Smrgdoes vertex attribute gathering and AOS to SOA conversions, the vertex
7ec681f3Smrgshader and fragment shaders, streamout, and fragment blending. SWR
7ec681f3Smrgcore also supports geometry and compute shaders but we haven't exposed
7ec681f3Smrgthem through our driver yet. The fetch shader, streamout, and blend is
7ec681f3Smrgbuilt internally to swr core using LLVM directly, while for the vertex
7ec681f3Smrgand pixel shaders we reuse bits of llvmpipe from
7ec681f3Smrg``gallium/auxiliary/gallivm`` to build the kernels, which we wrap
7ec681f3Smrgdifferently than llvmpipe's ``auxiliary/draw`` code.
7ec681f3Smrg
7ec681f3SmrgWhat's the performance?
7ec681f3Smrg-----------------------
7ec681f3Smrg
7ec681f3SmrgFor the types of high-geometry workloads we're interested in, we are
7ec681f3Smrgsignificantly faster than llvmpipe.  This is to be expected, as
7ec681f3Smrgllvmpipe only threads the fragment processing and not the geometry
7ec681f3Smrgfrontend.  The performance advantage over llvmpipe roughly scales
7ec681f3Smrglinearly with the number of cores available.
7ec681f3Smrg
7ec681f3SmrgWhile our current performance is quite good, we know there is more
7ec681f3Smrgpotential in this architecture.  When we switched from a prototype
7ec681f3SmrgOpenGL driver to Mesa we regressed performance severely, some due to
7ec681f3Smrginterface issues that need tuning, some differences in shader code
7ec681f3Smrggeneration, and some due to conformance and feature additions to the
7ec681f3Smrgcore swr.  We are looking to recovering most of this performance back.
7ec681f3Smrg
7ec681f3SmrgWhat's the conformance?
7ec681f3Smrg-----------------------
7ec681f3Smrg
7ec681f3SmrgThe major applications we are targeting are all based on the
7ec681f3SmrgVisualization Toolkit (VTK), and as such our development efforts have
7ec681f3Smrgbeen focused on making sure these work as best as possible.  Our
7ec681f3Smrgcurrent code passes vtk's rendering tests with their new "OpenGL2"
7ec681f3Smrg(really OpenGL 3.2) backend at 99%.
7ec681f3Smrg
7ec681f3Smrgpiglit testing shows a much lower pass rate, roughly 80% at the time
7ec681f3Smrgof writing.  Core SWR undergoes rigorous unit testing and we are quite
7ec681f3Smrgconfident in the rasterizer, and understand the areas where it
7ec681f3Smrgcurrently has issues (example: line rendering is done with triangles,
7ec681f3Smrgso doesn't match the strict line rendering rules).  The majority of
7ec681f3Smrgthe piglit failures are errors in our driver layer interfacing Mesa
7ec681f3Smrgand SWR.  Fixing these issues is one of our major future development
7ec681f3Smrggoals.
7ec681f3Smrg
7ec681f3SmrgWhy are you open sourcing this?
7ec681f3Smrg-------------------------------
7ec681f3Smrg
7ec681f3Smrg * Our customers prefer open source, and allowing them to simply
7ec681f3Smrg   download the Mesa source and enable our driver makes life much
7ec681f3Smrg   easier for them.
7ec681f3Smrg
7ec681f3Smrg * The internal gallium APIs are not stable, so we'd like our driver
7ec681f3Smrg   to be visible for changes.
7ec681f3Smrg
7ec681f3Smrg * It's easier to work with the Mesa community when the source we're
7ec681f3Smrg   working with can be used as reference.
7ec681f3Smrg
7ec681f3SmrgWhat are your development plans?
7ec681f3Smrg--------------------------------
7ec681f3Smrg
7ec681f3Smrg * Performance - see the performance section earlier for details.
7ec681f3Smrg
7ec681f3Smrg * Conformance - see the conformance section earlier for details.
7ec681f3Smrg
7ec681f3Smrg * Features - core SWR has a lot of functionality we have yet to
7ec681f3Smrg   expose through our driver, such as MSAA, geometry shaders, compute
7ec681f3Smrg   shaders, and tesselation.
7ec681f3Smrg
7ec681f3Smrg * AVX512 support
7ec681f3Smrg
7ec681f3SmrgWhat is the licensing of the code?
7ec681f3Smrg----------------------------------
7ec681f3Smrg
7ec681f3Smrg * All code is under the normal Mesa MIT license.
7ec681f3Smrg
7ec681f3SmrgWill this work on AMD?
7ec681f3Smrg----------------------
7ec681f3Smrg
7ec681f3Smrg * If using an AMD processor with AVX or AVX2, it should work though
7ec681f3Smrg   we don't have that hardware around to test.  Patches if needed
7ec681f3Smrg   would be welcome.
7ec681f3Smrg
7ec681f3SmrgWill this work on ARM, MIPS, POWER, <other non-x86 architecture>?
7ec681f3Smrg-------------------------------------------------------------------------
7ec681f3Smrg
7ec681f3Smrg * Not without a lot of work.  We make extensive use of AVX and AVX2
7ec681f3Smrg   intrinsics in our code and the in-tree JIT creation.  It is not the
7ec681f3Smrg   intention for this codebase to support non-x86 architectures.
7ec681f3Smrg
7ec681f3SmrgWhat hardware do I need?
7ec681f3Smrg------------------------
7ec681f3Smrg
7ec681f3Smrg * Any x86 processor with at least AVX (introduced in the Intel
7ec681f3Smrg   SandyBridge and AMD Bulldozer microarchitectures in 2011) will
7ec681f3Smrg   work.
7ec681f3Smrg
7ec681f3Smrg * You don't need a fire-breathing Xeon machine to work on SWR - we do
7ec681f3Smrg   day-to-day development with laptops and desktop CPUs.
7ec681f3Smrg
7ec681f3SmrgDoes one build work on both AVX and AVX2?
7ec681f3Smrg-----------------------------------------
7ec681f3Smrg
7ec681f3SmrgYes. The build system creates two shared libraries, ``libswrAVX.so`` and
7ec681f3Smrg``libswrAVX2.so``, and ``swr_create_screen()`` loads the appropriate one at
7ec681f3Smrgruntime.
7ec681f3Smrg