17ec681f3SmrgGL Dispatch 27ec681f3Smrg=========== 37ec681f3Smrg 47ec681f3SmrgSeveral factors combine to make efficient dispatch of OpenGL functions 57ec681f3Smrgfairly complicated. This document attempts to explain some of the issues 67ec681f3Smrgand introduce the reader to Mesa's implementation. Readers already 77ec681f3Smrgfamiliar with the issues around GL dispatch can safely skip ahead to the 87ec681f3Smrg:ref:`overview of Mesa's implementation <overview>`. 97ec681f3Smrg 107ec681f3Smrg1. Complexity of GL Dispatch 117ec681f3Smrg---------------------------- 127ec681f3Smrg 137ec681f3SmrgEvery GL application has at least one object called a GL *context*. This 147ec681f3Smrgobject, which is an implicit parameter to every GL function, stores all 157ec681f3Smrgof the GL related state for the application. Every texture, every buffer 167ec681f3Smrgobject, every enable, and much, much more is stored in the context. 177ec681f3SmrgSince an application can have more than one context, the context to be 187ec681f3Smrgused is selected by a window-system dependent function such as 197ec681f3Smrg``glXMakeContextCurrent``. 207ec681f3Smrg 217ec681f3SmrgIn environments that implement OpenGL with X-Windows using GLX, every GL 227ec681f3Smrgfunction, including the pointers returned by ``glXGetProcAddress``, are 237ec681f3Smrg*context independent*. This means that no matter what context is 247ec681f3Smrgcurrently active, the same ``glVertex3fv`` function is used. 257ec681f3Smrg 267ec681f3SmrgThis creates the first bit of dispatch complexity. An application can 277ec681f3Smrghave two GL contexts. One context is a direct rendering context where 287ec681f3Smrgfunction calls are routed directly to a driver loaded within the 297ec681f3Smrgapplication's address space. The other context is an indirect rendering 307ec681f3Smrgcontext where function calls are converted to GLX protocol and sent to a 317ec681f3Smrgserver. The same ``glVertex3fv`` has to do the right thing depending on 327ec681f3Smrgwhich context is current. 337ec681f3Smrg 347ec681f3SmrgHighly optimized drivers or GLX protocol implementations may want to 357ec681f3Smrgchange the behavior of GL functions depending on current state. For 367ec681f3Smrgexample, ``glFogCoordf`` may operate differently depending on whether or 377ec681f3Smrgnot fog is enabled. 387ec681f3Smrg 397ec681f3SmrgIn multi-threaded environments, it is possible for each thread to have a 407ec681f3Smrgdifferent GL context current. This means that poor old ``glVertex3fv`` 417ec681f3Smrghas to know which GL context is current in the thread where it is being 427ec681f3Smrgcalled. 437ec681f3Smrg 447ec681f3Smrg.. _overview: 457ec681f3Smrg 467ec681f3Smrg2. Overview of Mesa's Implementation 477ec681f3Smrg------------------------------------ 487ec681f3Smrg 497ec681f3SmrgMesa uses two per-thread pointers. The first pointer stores the address 507ec681f3Smrgof the context current in the thread, and the second pointer stores the 517ec681f3Smrgaddress of the *dispatch table* associated with that context. The 527ec681f3Smrgdispatch table stores pointers to functions that actually implement 537ec681f3Smrgspecific GL functions. Each time a new context is made current in a 547ec681f3Smrgthread, these pointers are updated. 557ec681f3Smrg 567ec681f3SmrgThe implementation of functions such as ``glVertex3fv`` becomes 577ec681f3Smrgconceptually simple: 587ec681f3Smrg 597ec681f3Smrg- Fetch the current dispatch table pointer. 607ec681f3Smrg- Fetch the pointer to the real ``glVertex3fv`` function from the 617ec681f3Smrg table. 627ec681f3Smrg- Call the real function. 637ec681f3Smrg 647ec681f3SmrgThis can be implemented in just a few lines of C code. The file 657ec681f3Smrg``src/mesa/glapi/glapitemp.h`` contains code very similar to this. 667ec681f3Smrg 677ec681f3Smrg.. code-block:: c 687ec681f3Smrg :caption: Sample dispatch function 697ec681f3Smrg 707ec681f3Smrg void glVertex3f(GLfloat x, GLfloat y, GLfloat z) 717ec681f3Smrg { 727ec681f3Smrg const struct _glapi_table * const dispatch = GET_DISPATCH(); 737ec681f3Smrg 747ec681f3Smrg (*dispatch->Vertex3f)(x, y, z); 757ec681f3Smrg } 767ec681f3Smrg 777ec681f3SmrgThe problem with this simple implementation is the large amount of 787ec681f3Smrgoverhead that it adds to every GL function call. 797ec681f3Smrg 807ec681f3SmrgIn a multithreaded environment, a naive implementation of 817ec681f3Smrg``GET_DISPATCH`` involves a call to ``pthread_getspecific`` or a similar 827ec681f3Smrgfunction. Mesa provides a wrapper function called 837ec681f3Smrg``_glapi_get_dispatch`` that is used by default. 847ec681f3Smrg 857ec681f3Smrg3. Optimizations 867ec681f3Smrg---------------- 877ec681f3Smrg 887ec681f3SmrgA number of optimizations have been made over the years to diminish the 897ec681f3Smrgperformance hit imposed by GL dispatch. This section describes these 907ec681f3Smrgoptimizations. The benefits of each optimization and the situations 917ec681f3Smrgwhere each can or cannot be used are listed. 927ec681f3Smrg 937ec681f3Smrg3.1. Dual dispatch table pointers 947ec681f3Smrg~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 957ec681f3Smrg 967ec681f3SmrgThe vast majority of OpenGL applications use the API in a single 977ec681f3Smrgthreaded manner. That is, the application has only one thread that makes 987ec681f3Smrgcalls into the GL. In these cases, not only do the calls to 997ec681f3Smrg``pthread_getspecific`` hurt performance, but they are completely 1007ec681f3Smrgunnecessary! It is possible to detect this common case and avoid these 1017ec681f3Smrgcalls. 1027ec681f3Smrg 1037ec681f3SmrgEach time a new dispatch table is set, Mesa examines and records the ID 1047ec681f3Smrgof the executing thread. If the same thread ID is always seen, Mesa 1057ec681f3Smrgknows that the application is, from OpenGL's point of view, single 1067ec681f3Smrgthreaded. 1077ec681f3Smrg 1087ec681f3SmrgAs long as an application is single threaded, Mesa stores a pointer to 1097ec681f3Smrgthe dispatch table in a global variable called ``_glapi_Dispatch``. The 1107ec681f3Smrgpointer is also stored in a per-thread location via 1117ec681f3Smrg``pthread_setspecific``. When Mesa detects that an application has 1127ec681f3Smrgbecome multithreaded, ``NULL`` is stored in ``_glapi_Dispatch``. 1137ec681f3Smrg 1147ec681f3SmrgUsing this simple mechanism the dispatch functions can detect the 1157ec681f3Smrgmultithreaded case by comparing ``_glapi_Dispatch`` to ``NULL``. The 1167ec681f3Smrgresulting implementation of ``GET_DISPATCH`` is slightly more complex, 1177ec681f3Smrgbut it avoids the expensive ``pthread_getspecific`` call in the common 1187ec681f3Smrgcase. 1197ec681f3Smrg 1207ec681f3Smrg.. code-block:: c 1217ec681f3Smrg :caption: Improved ``GET_DISPATCH`` Implementation 1227ec681f3Smrg 1237ec681f3Smrg #define GET_DISPATCH() \ 1247ec681f3Smrg (_glapi_Dispatch != NULL) \ 1257ec681f3Smrg ? _glapi_Dispatch : pthread_getspecific(&_glapi_Dispatch_key) 1267ec681f3Smrg 1277ec681f3Smrg3.2. ELF TLS 1287ec681f3Smrg~~~~~~~~~~~~ 1297ec681f3Smrg 1307ec681f3SmrgStarting with the 2.4.20 Linux kernel, each thread is allocated an area 1317ec681f3Smrgof per-thread, global storage. Variables can be put in this area using 1327ec681f3Smrgsome extensions to GCC. By storing the dispatch table pointer in this 1337ec681f3Smrgarea, the expensive call to ``pthread_getspecific`` and the test of 1347ec681f3Smrg``_glapi_Dispatch`` can be avoided. 1357ec681f3Smrg 1367ec681f3SmrgThe dispatch table pointer is stored in a new variable called 1377ec681f3Smrg``_glapi_tls_Dispatch``. A new variable name is used so that a single 1387ec681f3SmrglibGL can implement both interfaces. This allows the libGL to operate 1397ec681f3Smrgwith direct rendering drivers that use either interface. Once the 1407ec681f3Smrgpointer is properly declared, ``GET_DISPACH`` becomes a simple variable 1417ec681f3Smrgreference. 1427ec681f3Smrg 1437ec681f3Smrg.. code-block:: c 1447ec681f3Smrg :caption: TLS ``GET_DISPATCH`` Implementation 1457ec681f3Smrg 1467ec681f3Smrg extern __thread struct _glapi_table *_glapi_tls_Dispatch 1477ec681f3Smrg __attribute__((tls_model("initial-exec"))); 1487ec681f3Smrg 1497ec681f3Smrg #define GET_DISPATCH() _glapi_tls_Dispatch 1507ec681f3Smrg 1517ec681f3SmrgUse of this path is controlled by the preprocessor define 1527ec681f3Smrg``USE_ELF_TLS``. Any platform capable of using ELF TLS should use this 1537ec681f3Smrgas the default dispatch method. 1547ec681f3Smrg 1557ec681f3SmrgWindows has a similar concept, and beginning with Windows Vista, shared 1567ec681f3Smrglibraries can take advantage of compiler-assisted TLS. This TLS data 1577ec681f3Smrghas no fixed size and does not compete with API-based TLS (``TlsAlloc``) 1587ec681f3Smrgfor the limited number of slots available there, and so ``USE_ELF_TLS`` can 1597ec681f3Smrgbe used on Windows too, even though it's not truly ELF. 1607ec681f3Smrg 1617ec681f3Smrg3.3. Assembly Language Dispatch Stubs 1627ec681f3Smrg~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1637ec681f3Smrg 1647ec681f3SmrgMany platforms have difficulty properly optimizing the tail-call in the 1657ec681f3Smrgdispatch stubs. Platforms like x86 that pass parameters on the stack 1667ec681f3Smrgseem to have even more difficulty optimizing these routines. All of the 1677ec681f3Smrgdispatch routines are very short, and it is trivial to create optimal 1687ec681f3Smrgassembly language versions. The amount of optimization provided by using 1697ec681f3Smrgassembly stubs varies from platform to platform and application to 1707ec681f3Smrgapplication. However, by using the assembly stubs, many platforms can 1717ec681f3Smrguse an additional space optimization (see :ref:`below <fixedsize>`). 1727ec681f3Smrg 1737ec681f3SmrgThe biggest hurdle to creating assembly stubs is handling the various 1747ec681f3Smrgways that the dispatch table pointer can be accessed. There are four 1757ec681f3Smrgdifferent methods that can be used: 1767ec681f3Smrg 1777ec681f3Smrg#. Using ``_glapi_Dispatch`` directly in builds for non-multithreaded 1787ec681f3Smrg environments. 1797ec681f3Smrg#. Using ``_glapi_Dispatch`` and ``_glapi_get_dispatch`` in 1807ec681f3Smrg multithreaded environments. 1817ec681f3Smrg#. Using ``_glapi_Dispatch`` and ``pthread_getspecific`` in 1827ec681f3Smrg multithreaded environments. 1837ec681f3Smrg#. Using ``_glapi_tls_Dispatch`` directly in TLS enabled multithreaded 1847ec681f3Smrg environments. 1857ec681f3Smrg 1867ec681f3SmrgPeople wishing to implement assembly stubs for new platforms should 1877ec681f3Smrgfocus on #4 if the new platform supports TLS. Otherwise, implement #2 1887ec681f3Smrgfollowed by #3. Environments that do not support multithreading are 1897ec681f3Smrguncommon and not terribly relevant. 1907ec681f3Smrg 1917ec681f3SmrgSelection of the dispatch table pointer access method is controlled by a 1927ec681f3Smrgfew preprocessor defines. 1937ec681f3Smrg 1947ec681f3Smrg- If ``USE_ELF_TLS`` is defined, method #3 is used. 1957ec681f3Smrg- If ``HAVE_PTHREAD`` is defined, method #2 is used. 1967ec681f3Smrg- If none of the preceding are defined, method #1 is used. 1977ec681f3Smrg 1987ec681f3SmrgTwo different techniques are used to handle the various different cases. 1997ec681f3SmrgOn x86 and SPARC, a macro called ``GL_STUB`` is used. In the preamble of 2007ec681f3Smrgthe assembly source file different implementations of the macro are 2017ec681f3Smrgselected based on the defined preprocessor variables. The assembly code 2027ec681f3Smrgthen consists of a series of invocations of the macros such as: 2037ec681f3Smrg 2047ec681f3Smrg.. code-block:: c 2057ec681f3Smrg :caption: SPARC Assembly Implementation of ``glColor3fv`` 2067ec681f3Smrg 2077ec681f3Smrg GL_STUB(Color3fv, _gloffset_Color3fv) 2087ec681f3Smrg 2097ec681f3SmrgThe benefit of this technique is that changes to the calling pattern 2107ec681f3Smrg(i.e., addition of a new dispatch table pointer access method) require 2117ec681f3Smrgfewer changed lines in the assembly code. 2127ec681f3Smrg 2137ec681f3SmrgHowever, this technique can only be used on platforms where the function 2147ec681f3Smrgimplementation does not change based on the parameters passed to the 2157ec681f3Smrgfunction. For example, since x86 passes all parameters on the stack, no 2167ec681f3Smrgadditional code is needed to save and restore function parameters around 2177ec681f3Smrga call to ``pthread_getspecific``. Since x86-64 passes parameters in 2187ec681f3Smrgregisters, varying amounts of code needs to be inserted around the call 2197ec681f3Smrgto ``pthread_getspecific`` to save and restore the GL function's 2207ec681f3Smrgparameters. 2217ec681f3Smrg 2227ec681f3SmrgThe other technique, used by platforms like x86-64 that cannot use the 2237ec681f3Smrgfirst technique, is to insert ``#ifdef`` within the assembly 2247ec681f3Smrgimplementation of each function. This makes the assembly file 2257ec681f3Smrgconsiderably larger (e.g., 29,332 lines for ``glapi_x86-64.S`` versus 2267ec681f3Smrg1,155 lines for ``glapi_x86.S``) and causes simple changes to the 2277ec681f3Smrgfunction implementation to generate many lines of diffs. Since the 2287ec681f3Smrgassembly files are typically generated by scripts, this isn't a 2297ec681f3Smrgsignificant problem. 2307ec681f3Smrg 2317ec681f3SmrgOnce a new assembly file is created, it must be inserted in the build 2327ec681f3Smrgsystem. There are two steps to this. The file must first be added to 2337ec681f3Smrg``src/mesa/sources``. That gets the file built and linked. The second 2347ec681f3Smrgstep is to add the correct ``#ifdef`` magic to 2357ec681f3Smrg``src/mesa/glapi/glapi_dispatch.c`` to prevent the C version of the 2367ec681f3Smrgdispatch functions from being built. 2377ec681f3Smrg 2387ec681f3Smrg.. _fixedsize: 2397ec681f3Smrg 2407ec681f3Smrg3.4. Fixed-Length Dispatch Stubs 2417ec681f3Smrg~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2427ec681f3Smrg 2437ec681f3SmrgTo implement ``glXGetProcAddress``, Mesa stores a table that associates 2447ec681f3Smrgfunction names with pointers to those functions. This table is stored in 2457ec681f3Smrg``src/mesa/glapi/glprocs.h``. For different reasons on different 2467ec681f3Smrgplatforms, storing all of those pointers is inefficient. On most 2477ec681f3Smrgplatforms, including all known platforms that support TLS, we can avoid 2487ec681f3Smrgthis added overhead. 2497ec681f3Smrg 2507ec681f3SmrgIf the assembly stubs are all the same size, the pointer need not be 2517ec681f3Smrgstored for every function. The location of the function can instead be 2527ec681f3Smrgcalculated by multiplying the size of the dispatch stub by the offset of 2537ec681f3Smrgthe function in the table. This value is then added to the address of 2547ec681f3Smrgthe first dispatch stub. 2557ec681f3Smrg 2567ec681f3SmrgThis path is activated by adding the correct ``#ifdef`` magic to 2577ec681f3Smrg``src/mesa/glapi/glapi.c`` just before ``glprocs.h`` is included. 258