1848b8605Smrg<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2848b8605Smrg<html lang="en">
3848b8605Smrg<head>
4848b8605Smrg  <meta http-equiv="content-type" content="text/html; charset=utf-8">
5848b8605Smrg  <title>llvmpipe</title>
6848b8605Smrg  <link rel="stylesheet" type="text/css" href="mesa.css">
7848b8605Smrg</head>
8848b8605Smrg<body>
9848b8605Smrg
10848b8605Smrg<div class="header">
11848b8605Smrg  <h1>The Mesa 3D Graphics Library</h1>
12848b8605Smrg</div>
13848b8605Smrg
14848b8605Smrg<iframe src="contents.html"></iframe>
15848b8605Smrg<div class="content">
16848b8605Smrg
17848b8605Smrg<h1>Introduction</h1>
18848b8605Smrg
19848b8605Smrg<p>
20848b8605SmrgThe Gallium llvmpipe driver is a software rasterizer that uses LLVM to
21848b8605Smrgdo runtime code generation.
22848b8605SmrgShaders, point/line/triangle rasterization and vertex processing are
23b8e80941Smrgimplemented with LLVM IR which is translated to x86, x86-64, or ppc64le machine
24848b8605Smrgcode.
25848b8605SmrgAlso, the driver is multithreaded to take advantage of multiple CPU cores
26848b8605Smrg(up to 8 at this time).
27848b8605SmrgIt's the fastest software rasterizer for Mesa.
28848b8605Smrg</p>
29848b8605Smrg
30848b8605Smrg
31848b8605Smrg<h1>Requirements</h1>
32848b8605Smrg
33848b8605Smrg<ul>
34848b8605Smrg<li>
35848b8605Smrg   <p>
36b8e80941Smrg   For x86 or amd64 processors, 64-bit mode is recommended.
37b8e80941Smrg   Support for SSE2 is strongly encouraged.  Support for SSE3 and SSE4.1 will
38848b8605Smrg   yield the most efficient code.  The fewer features the CPU has the more
39b8e80941Smrg   likely it is that you will run into underperforming, buggy, or incomplete code.
40b8e80941Smrg   </p>
41b8e80941Smrg   <p>
42b8e80941Smrg   For ppc64le processors, use of the Altivec feature (the Vector
43b8e80941Smrg   Facility) is recommended if supported; use of the VSX feature (the
44b8e80941Smrg   Vector-Scalar Facility) is recommended if supported AND Mesa is
45b8e80941Smrg   built with LLVM version 4.0 or later.
46848b8605Smrg   </p>
47848b8605Smrg   <p>
48848b8605Smrg   See /proc/cpuinfo to know what your CPU supports.
49848b8605Smrg   </p>
50848b8605Smrg</li>
51848b8605Smrg<li>
52b8e80941Smrg   <p>Unless otherwise stated, LLVM version 3.4 is recommended; 3.3 or later is required.</p>
53848b8605Smrg   <p>
54848b8605Smrg   For Linux, on a recent Debian based distribution do:
55848b8605Smrg   </p>
56848b8605Smrg<pre>
57848b8605Smrg     aptitude install llvm-dev
58848b8605Smrg</pre>
59b8e80941Smrg   <p>
60b8e80941Smrg   If you want development snapshot builds of LLVM for Debian and derived
61b8e80941Smrg   distributions like Ubuntu, you can use the APT repository at <a
62b8e80941Smrg   href="https://apt.llvm.org/" title="Debian Development packages for LLVM"
63b8e80941Smrg   >apt.llvm.org</a>, which are maintained by Debian's LLVM maintainer.
64b8e80941Smrg   </p>
65848b8605Smrg   <p>
66848b8605Smrg   For a RPM-based distribution do:
67848b8605Smrg   </p>
68848b8605Smrg<pre>
69848b8605Smrg     yum install llvm-devel
70848b8605Smrg</pre>
71848b8605Smrg
72848b8605Smrg   <p>
73b8e80941Smrg   For Windows you will need to build LLVM from source with MSVC or MINGW
74b8e80941Smrg   (either natively or through cross compilers) and CMake, and set the LLVM
75b8e80941Smrg   environment variable to the directory you installed it to.
76848b8605Smrg
77848b8605Smrg   LLVM will be statically linked, so when building on MSVC it needs to be
78848b8605Smrg   built with a matching CRT as Mesa, and you'll need to pass
79b8e80941Smrg   <code>-DLLVM_USE_CRT_xxx=yyy</code> as described below.
80b8e80941Smrg   </p>
81b8e80941Smrg
82b8e80941Smrg   <table border="1">
83b8e80941Smrg     <tr>
84b8e80941Smrg       <th rowspan="2">LLVM build-type</th>
85b8e80941Smrg       <th colspan="2" align="center">Mesa build-type</th>
86b8e80941Smrg     </tr>
87b8e80941Smrg     <tr>
88b8e80941Smrg       <th>debug,checked</th>
89b8e80941Smrg       <th>release,profile</th>
90b8e80941Smrg     </tr>
91b8e80941Smrg     <tr>
92b8e80941Smrg       <th>Debug</th>
93b8e80941Smrg       <td><code>-DLLVM_USE_CRT_DEBUG=MTd</code></td>
94b8e80941Smrg       <td><code>-DLLVM_USE_CRT_DEBUG=MT</code></td>
95b8e80941Smrg     </tr>
96b8e80941Smrg     <tr>
97b8e80941Smrg       <th>Release</th>
98b8e80941Smrg       <td><code>-DLLVM_USE_CRT_RELEASE=MTd</code></td>
99b8e80941Smrg       <td><code>-DLLVM_USE_CRT_RELEASE=MT</code></td>
100b8e80941Smrg     </tr>
101b8e80941Smrg   </table>
102848b8605Smrg
103b8e80941Smrg   <p>
104848b8605Smrg   You can build only the x86 target by passing -DLLVM_TARGETS_TO_BUILD=X86
105848b8605Smrg   to cmake.
106848b8605Smrg   </p>
107848b8605Smrg</li>
108848b8605Smrg
109848b8605Smrg<li>
110848b8605Smrg   <p>scons (optional)</p>
111848b8605Smrg</li>
112848b8605Smrg</ul>
113848b8605Smrg
114848b8605Smrg
115848b8605Smrg<h1>Building</h1>
116848b8605Smrg
117848b8605SmrgTo build everything on Linux invoke scons as:
118848b8605Smrg
119848b8605Smrg<pre>
120848b8605Smrg  scons build=debug libgl-xlib
121848b8605Smrg</pre>
122848b8605Smrg
123b8e80941SmrgAlternatively, you can build it with meson with:
124848b8605Smrg<pre>
125b8e80941Smrg  mkdir build
126b8e80941Smrg  cd build
127b8e80941Smrg  meson -D glx=gallium-xlib -D gallium-drivers=swrast
128b8e80941Smrg  ninja
129848b8605Smrg</pre>
130848b8605Smrg
131848b8605Smrgbut the rest of these instructions assume that scons is used.
132848b8605Smrg
133848b8605SmrgFor Windows the procedure is similar except the target:
134848b8605Smrg
135848b8605Smrg<pre>
136848b8605Smrg  scons platform=windows build=debug libgl-gdi
137848b8605Smrg</pre>
138848b8605Smrg
139848b8605Smrg
140848b8605Smrg<h1>Using</h1>
141848b8605Smrg
142848b8605Smrg<h2>Linux</h2>
143848b8605Smrg
144848b8605Smrg<p>On Linux, building will create a drop-in alternative for libGL.so into</p>
145848b8605Smrg
146848b8605Smrg<pre>
147848b8605Smrg  build/foo/gallium/targets/libgl-xlib/libGL.so
148848b8605Smrg</pre>
149848b8605Smrgor
150848b8605Smrg<pre>
151848b8605Smrg  lib/gallium/libGL.so
152848b8605Smrg</pre>
153848b8605Smrg
154848b8605Smrg<p>To use it set the LD_LIBRARY_PATH environment variable accordingly.</p>
155848b8605Smrg
156848b8605Smrg<p>For performance evaluation pass build=release to scons, and use the corresponding
157848b8605Smrglib directory without the "-debug" suffix.</p>
158848b8605Smrg
159848b8605Smrg
160848b8605Smrg<h2>Windows</h2>
161848b8605Smrg
162848b8605Smrg<p>
163848b8605SmrgOn Windows, building will create
164848b8605Smrg<code>build/windows-x86-debug/gallium/targets/libgl-gdi/opengl32.dll</code>
165848b8605Smrgwhich is a drop-in alternative for system's <code>opengl32.dll</code>.  To use
166848b8605Smrgit put it in the same directory as your application.  It can also be used by
167848b8605Smrgreplacing the native ICD driver, but it's quite an advanced usage, so if you
168848b8605Smrgneed to ask, don't even try it.
169848b8605Smrg</p>
170848b8605Smrg
171848b8605Smrg<p>
172848b8605SmrgThere is however an easy way to replace the OpenGL software renderer that comes
173848b8605Smrgwith Microsoft Windows 7 (or later) with llvmpipe (that is, on systems without
174848b8605Smrgany OpenGL drivers):
175848b8605Smrg</p>
176848b8605Smrg
177848b8605Smrg<ul>
178848b8605Smrg  <li><p>copy build/windows-x86-debug/gallium/targets/libgl-gdi/opengl32.dll to C:\Windows\SysWOW64\mesadrv.dll</p></li>
179848b8605Smrg  <li><p>load this registry settings:</p>
180848b8605Smrg  <pre>REGEDIT4
181848b8605Smrg
182b8e80941Smrg; https://technet.microsoft.com/en-us/library/cc749368.aspx
183b8e80941Smrg; https://www.msfn.org/board/topic/143241-portable-windows-7-build-from-winpe-30/page-5#entry942596
184848b8605Smrg[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Windows NT\CurrentVersion\OpenGLDrivers\MSOGL]
185848b8605Smrg"DLL"="mesadrv.dll"
186848b8605Smrg"DriverVersion"=dword:00000001
187848b8605Smrg"Flags"=dword:00000001
188848b8605Smrg"Version"=dword:00000002
189848b8605Smrg</pre>
190848b8605Smrg  </li>
191848b8605Smrg  <li>Ditto for 64 bits drivers if you need them.</li>
192848b8605Smrg</ul>
193848b8605Smrg
194848b8605Smrg
195848b8605Smrg<h1>Profiling</h1>
196848b8605Smrg
197848b8605Smrg<p>
198848b8605SmrgTo profile llvmpipe you should build as
199848b8605Smrg</p>
200848b8605Smrg<pre>
201848b8605Smrg  scons build=profile &lt;same-as-before&gt;
202848b8605Smrg</pre>
203848b8605Smrg
204848b8605Smrg<p>
205848b8605SmrgThis will ensure that frame pointers are used both in C and JIT functions, and
206848b8605Smrgthat no tail call optimizations are done by gcc.
207848b8605Smrg</p>
208848b8605Smrg
209848b8605Smrg<h2>Linux perf integration</h2>
210848b8605Smrg
211848b8605Smrg<p>
212b8e80941SmrgOn Linux, it is possible to have symbol resolution of JIT code with <a href="https://perf.wiki.kernel.org/">Linux perf</a>:
213848b8605Smrg</p>
214848b8605Smrg
215848b8605Smrg<pre>
216848b8605Smrg	perf record -g /my/application
217848b8605Smrg	perf report
218848b8605Smrg</pre>
219848b8605Smrg
220848b8605Smrg<p>
221848b8605SmrgWhen run inside Linux perf, llvmpipe will create a /tmp/perf-XXXXX.map file with
222848b8605Smrgsymbol address table.  It also dumps assembly code to /tmp/perf-XXXXX.map.asm,
223b8e80941Smrgwhich can be used by the bin/perf-annotate-jit.py script to produce disassembly of
224848b8605Smrgthe generated code annotated with the samples.
225848b8605Smrg</p>
226848b8605Smrg
227848b8605Smrg<p>You can obtain a call graph via
228b8e80941Smrg<a href="https://github.com/jrfonseca/gprof2dot#linux-perf">Gprof2Dot</a>.</p>
229848b8605Smrg
230848b8605Smrg
231848b8605Smrg<h1>Unit testing</h1>
232848b8605Smrg
233848b8605Smrg<p>
234848b8605SmrgBuilding will also create several unit tests in
235848b8605Smrgbuild/linux-???-debug/gallium/drivers/llvmpipe:
236848b8605Smrg</p>
237848b8605Smrg
238848b8605Smrg<ul>
239848b8605Smrg<li> lp_test_blend: blending
240848b8605Smrg<li> lp_test_conv: SIMD vector conversion
241848b8605Smrg<li> lp_test_format: pixel unpacking/packing
242848b8605Smrg</ul>
243848b8605Smrg
244848b8605Smrg<p>
245b8e80941SmrgSome of these tests can output results and benchmarks to a tab-separated file
246b8e80941Smrgfor later analysis, e.g.:
247848b8605Smrg</p>
248848b8605Smrg<pre>
249848b8605Smrg  build/linux-x86_64-debug/gallium/drivers/llvmpipe/lp_test_blend -o blend.tsv
250848b8605Smrg</pre>
251848b8605Smrg
252848b8605Smrg
253848b8605Smrg<h1>Development Notes</h1>
254848b8605Smrg
255848b8605Smrg<ul>
256848b8605Smrg<li>
257b8e80941Smrg  When looking at this code for the first time, start in lp_state_fs.c, and
258b8e80941Smrg  then skim through the lp_bld_* functions called there, and the comments
259848b8605Smrg  at the top of the lp_bld_*.c functions.
260848b8605Smrg</li>
261848b8605Smrg<li>
262848b8605Smrg  The driver-independent parts of the LLVM / Gallium code are found in
263848b8605Smrg  src/gallium/auxiliary/gallivm/.  The filenames and function prefixes
264848b8605Smrg  need to be renamed from "lp_bld_" to something else though.
265848b8605Smrg</li>
266848b8605Smrg<li>
267848b8605Smrg  We use LLVM-C bindings for now. They are not documented, but follow the C++
268848b8605Smrg  interfaces very closely, and appear to be complete enough for code
269848b8605Smrg  generation. See 
270b8e80941Smrg  <a href="https://npcontemplation.blogspot.com/2008/06/secret-of-llvm-c-bindings.html">
271848b8605Smrg  this stand-alone example</a>.  See the llvm-c/Core.h file for reference.
272848b8605Smrg</li>
273848b8605Smrg</ul>
274848b8605Smrg
275848b8605Smrg<h1 id="recommended_reading">Recommended Reading</h1>
276848b8605Smrg
277848b8605Smrg<ul>
278848b8605Smrg  <li>
279848b8605Smrg    <p>Rasterization</p>
280848b8605Smrg    <ul>
281b8e80941Smrg      <li><a href="https://www.cs.unc.edu/~olano/papers/2dh-tri/">Triangle Scan Conversion using 2D Homogeneous Coordinates</a></li>
282848b8605Smrg      <li><a href="http://www.drdobbs.com/parallel/rasterization-on-larrabee/217200602">Rasterization on Larrabee</a> (<a href="http://devmaster.net/posts/2887/rasterization-on-larrabee">DevMaster copy</a>)</li>
283848b8605Smrg      <li><a href="http://devmaster.net/posts/6133/rasterization-using-half-space-functions">Rasterization using half-space functions</a></li>
284848b8605Smrg      <li><a href="http://devmaster.net/posts/6145/advanced-rasterization">Advanced Rasterization</a></li>
285b8e80941Smrg      <li><a href="https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index/">Optimizing Software Occlusion Culling</a></li>
286848b8605Smrg    </ul>
287848b8605Smrg  </li>
288848b8605Smrg  <li>
289848b8605Smrg    <p>Texture sampling</p>
290848b8605Smrg    <ul>
291848b8605Smrg      <li><a href="http://chrishecker.com/Miscellaneous_Technical_Articles#Perspective_Texture_Mapping">Perspective Texture Mapping</a></li>
292b8e80941Smrg      <li><a href="https://www.flipcode.com/archives/Texturing_As_In_Unreal.shtml">Texturing As In Unreal</a></li>
293848b8605Smrg      <li><a href="http://www.gamasutra.com/view/feature/3301/runtime_mipmap_filtering.php">Run-Time MIP-Map Filtering</a></li>
294848b8605Smrg      <li><a href="http://alt.3dcenter.org/artikel/2003/10-26_a_english.php">Will "brilinear" filtering persist?</a></li>
295848b8605Smrg      <li><a href="http://ixbtlabs.com/articles2/gffx/nv40-rx800-3.html">Trilinear filtering</a></li>
296848b8605Smrg      <li><a href="http://devmaster.net/posts/12785/texture-swizzling">Texture Swizzling</a></li>
297848b8605Smrg    </ul>
298848b8605Smrg  </li>
299848b8605Smrg  <li>
300848b8605Smrg    <p>SIMD</p>
301848b8605Smrg    <ul>
302848b8605Smrg      <li><a href="http://www.cdl.uni-saarland.de/projects/wfv/#header4">Whole-Function Vectorization</a></li>
303848b8605Smrg    </ul>
304848b8605Smrg  </li>
305848b8605Smrg  <li>
306848b8605Smrg    <p>Optimization</p>
307848b8605Smrg    <ul>
308848b8605Smrg      <li><a href="http://www.drdobbs.com/optimizing-pixomatic-for-modern-x86-proc/184405807">Optimizing Pixomatic For Modern x86 Processors</a></li>
309848b8605Smrg      <li><a href="http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html">Intel 64 and IA-32 Architectures Optimization Reference Manual</a></li>
310848b8605Smrg      <li><a href="http://www.agner.org/optimize/">Software optimization resources</a></li>
311b8e80941Smrg      <li><a href="https://software.intel.com/en-us/articles/intel-intrinsics-guide">Intel Intrinsics Guide</a></li>
312848b8605Smrg    </ul>
313848b8605Smrg  </li>
314848b8605Smrg  <li>
315848b8605Smrg    <p>LLVM</p>
316848b8605Smrg    <ul>
317848b8605Smrg      <li><a href="http://llvm.org/docs/LangRef.html">LLVM Language Reference Manual</a></li>
318b8e80941Smrg      <li><a href="https://npcontemplation.blogspot.co.uk/2008/06/secret-of-llvm-c-bindings.html">The secret of LLVM C bindings</a></li>
319848b8605Smrg    </ul>
320848b8605Smrg  </li>
321848b8605Smrg  <li>
322848b8605Smrg    <p>General</p>
323848b8605Smrg    <ul>
324b8e80941Smrg      <li><a href="https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/">A trip through the Graphics Pipeline</a></li>
325b8e80941Smrg      <li><a href="https://msdn.microsoft.com/en-us/library/gg615082.aspx#architecture">WARP Architecture and Performance</a></li>
326848b8605Smrg    </ul>
327848b8605Smrg  </li>
328848b8605Smrg</ul>
329848b8605Smrg
330848b8605Smrg</div>
331848b8605Smrg</body>
332848b8605Smrg</html>
333