17ec681f3SmrgBuffer mapping patterns
27ec681f3Smrg-----------------------
37ec681f3Smrg
47ec681f3SmrgThere are two main strategies the driver has for CPU access to GL buffer
57ec681f3Smrgobjects. One is that the GL calls allocate temporary storage and blit to the GPU
67ec681f3Smrgat
77ec681f3Smrg``glBufferSubData()``/``glBufferData()``/``glFlushMappedBufferRange()``/``glUnmapBuffer()``
87ec681f3Smrgtime. This makes the behavior easily match. However, this may be more costly
97ec681f3Smrgthan direct mapping of the GL BO on some platforms, and is essentially not
107ec681f3Smrgavailable to tiling GPUs (since tiling involves running through the command
117ec681f3Smrgstream multiple times). Thus, GL has additional interfaces to help make it so
127ec681f3Smrgapps can directly access memory while avoiding implicit blocking on the GPU
137ec681f3Smrgrendering from those BOs.
147ec681f3Smrg
157ec681f3SmrgRendering engines have a variety of knobs to set on those GL interfaces for data
167ec681f3Smrgupload, and as a whole they seem to take just about every path available. Let's
177ec681f3Smrglook at some examples to see how they might constrain GL driver buffer upload
187ec681f3Smrgbehavior.
197ec681f3Smrg
207ec681f3SmrgPortal 2
217ec681f3Smrg========
227ec681f3Smrg
237ec681f3Smrg.. code-block:: console
247ec681f3Smrg
257ec681f3Smrg  1030842 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
267ec681f3Smrg  1030876 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 65536, data = NULL, usage = GL_DYNAMIC_DRAW)
277ec681f3Smrg  1030877 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, size = 576, data = blob(576))
287ec681f3Smrg  1030896 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 526, count = 252, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
297ec681f3Smrg  1030915 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 19657, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x1f8, basevertex = 0)
307ec681f3Smrg  1030917 glBufferDataARB(target = GL_ARRAY_BUFFER, size = 1572864, data = NULL, usage = GL_DYNAMIC_DRAW)
317ec681f3Smrg  1030918 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 128, data = blob(128))
327ec681f3Smrg  1030919 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 576, size = 12, data = blob(12))
337ec681f3Smrg  1030936 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x240, basevertex = 0)
347ec681f3Smrg  1030937 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 128, size = 128, data = blob(128))
357ec681f3Smrg  1030938 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 588, size = 12, data = blob(12))
367ec681f3Smrg  1030940 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 4, end = 7, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x24c, basevertex = 0)
377ec681f3Smrg  [... repeated draws at increasing offsets]
387ec681f3Smrg  1033097 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540)
397ec681f3Smrg
407ec681f3SmrgFrom this sequence, we can see that it is important that the driver either
417ec681f3Smrgimplement ``glBufferSubData()`` as a blit from a streaming uploader in sequence with
427ec681f3Smrgthe ``glDraw*()`` calls (a common behavior for non-tiled GPUs, particularly those with
437ec681f3Smrgdedicated memory), or that you:
447ec681f3Smrg
457ec681f3Smrg1) Track the valid range of the buffer so that you don't have to flush the draws
467ec681f3Smrg   and synchronize on each following ``glBufferSubData()``.
477ec681f3Smrg
487ec681f3Smrg2) Reallocate the buffer storage on ``glBufferData`` so that your first
497ec681f3Smrg   ``glBufferSubData()`` of the frame doesn't stall on the last frame's
507ec681f3Smrg   rendering completing.
517ec681f3Smrg
527ec681f3SmrgYou can't just empty your valid range on ``glBufferData()`` unless you know that
537ec681f3Smrgthe GPU access from the previous frame has completed. This pattern of
547ec681f3Smrgincrementing ``glBufferSubData()`` offsets interleaved with draws from that data
557ec681f3Smrgis common among newer Valve games.
567ec681f3Smrg
577ec681f3Smrg.. code-block:: console
587ec681f3Smrg
597ec681f3Smrg  [ during setup ]
607ec681f3Smrg
617ec681f3Smrg  679259 glGenBuffersARB(n = 1, buffers = &1314)
627ec681f3Smrg  679260 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
637ec681f3Smrg  679261 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 3072, data = NULL, usage = GL_STATIC_DRAW)
647ec681f3Smrg  679264 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
657ec681f3Smrg  679269 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072)
667ec681f3Smrg  679270 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
677ec681f3Smrg  
687ec681f3Smrg  [... setup of other buffers on this binding point]
697ec681f3Smrg
707ec681f3Smrg  679343 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
717ec681f3Smrg  679344 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000
727ec681f3Smrg  679346 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
737ec681f3Smrg  679347 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
747ec681f3Smrg  679348 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 768, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384300
757ec681f3Smrg  679350 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
767ec681f3Smrg  679351 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
777ec681f3Smrg  679352 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 1536, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384600
787ec681f3Smrg  679354 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
797ec681f3Smrg  679355 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
807ec681f3Smrg  679356 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 2304, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384900
817ec681f3Smrg  679358 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768)
827ec681f3Smrg  679359 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE
837ec681f3Smrg  
847ec681f3Smrg  [... setup completes and we start drawing later]
857ec681f3Smrg
867ec681f3Smrg  761845 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314)
877ec681f3Smrg  761846 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 323, count = 384, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
887ec681f3Smrg
897ec681f3SmrgThis suggests that, for non-blitting drivers, resetting your "might be used on
907ec681f3Smrgthe GPU" range after a stall could save you a bunch of additional GPU stalls
917ec681f3Smrgduring setup.
927ec681f3Smrg
937ec681f3SmrgTerraria
947ec681f3Smrg========
957ec681f3Smrg
967ec681f3Smrg.. code-block:: console
977ec681f3Smrg
987ec681f3Smrg  167581 glXSwapBuffers(dpy = 0x3004630, drawable = 25165844)
997ec681f3Smrg
1007ec681f3Smrg  167585 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
1017ec681f3Smrg  167586 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 1728, data = blob(1728))
1027ec681f3Smrg  167588 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 71, count = 108, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
1037ec681f3Smrg  167589 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW)
1047ec681f3Smrg  167590 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 27456, data = blob(27456))
1057ec681f3Smrg  167592 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 7, count = 12, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0)
1067ec681f3Smrg  167594 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 8)
1077ec681f3Smrg  167596 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 12)
1087ec681f3Smrg  [...]
1097ec681f3Smrg
1107ec681f3SmrgIn this game, we can see ``glBufferData()`` being used on the same array buffer
1117ec681f3Smrgthroughout, to get new storage so that the ``glBufferSubData()`` doesn't cause
1127ec681f3Smrgsynchronization.
1137ec681f3Smrg
1147ec681f3SmrgDon't Starve
1157ec681f3Smrg============
1167ec681f3Smrg
1177ec681f3Smrg.. code-block:: console
1187ec681f3Smrg
1197ec681f3Smrg  7251917 glGenBuffers(n = 1, buffers = &115052)
1207ec681f3Smrg  7251918 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
1217ec681f3Smrg  7251919 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
1227ec681f3Smrg  7251921 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052)
1237ec681f3Smrg  7251928 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
1247ec681f3Smrg  7251930 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 114872)
1257ec681f3Smrg  7251936 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 18)
1267ec681f3Smrg  7251938 glGenBuffers(n = 1, buffers = &115053)
1277ec681f3Smrg  7251939 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
1287ec681f3Smrg  7251940 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW)
1297ec681f3Smrg  7251942 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053)
1307ec681f3Smrg  7251949 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
1317ec681f3Smrg  7251973 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
1327ec681f3Smrg  [... drawing next frame]
1337ec681f3Smrg  7252388 glDeleteBuffers(n = 1, buffers = &115052)
1347ec681f3Smrg  7252389 glDeleteBuffers(n = 1, buffers = &115053)
1357ec681f3Smrg  7252390 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540)
1367ec681f3Smrg
1377ec681f3SmrgIn this game we have a lot of tiny ``glBufferData()`` calls, suggesting that we
1387ec681f3Smrgcould see working set wins and possibly CPU overhead reduction by packing small
1397ec681f3SmrgGL buffers in the same BO. Interestingly, the deletes of the temporary buffers
1407ec681f3Smrgalways happen at the end of the next frame.
1417ec681f3Smrg
1427ec681f3SmrgEuro Truck Simulator
1437ec681f3Smrg====================
1447ec681f3Smrg
1457ec681f3Smrg.. code-block:: console
1467ec681f3Smrg
1477ec681f3Smrg  [usage of VBO 14,15]
1487ec681f3Smrg  [...]
1497ec681f3Smrg  885199 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
1507ec681f3Smrg  885203 glInvalidateBufferData(buffer = 14)
1517ec681f3Smrg  885204 glInvalidateBufferData(buffer = 15)
1527ec681f3Smrg  [...]
1537ec681f3Smrg  889330 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
1547ec681f3Smrg  889334 glInvalidateBufferData(buffer = 12)
1557ec681f3Smrg  889335 glInvalidateBufferData(buffer = 16)
1567ec681f3Smrg  [...]
1577ec681f3Smrg  893461 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527)
1587ec681f3Smrg  893462 glClientWaitSync(sync = 0x77eee10, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
1597ec681f3Smrg  893463 glDeleteSync(sync = 0x780a630)
1607ec681f3Smrg  893464 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x78ec730
1617ec681f3Smrg  893465 glInvalidateBufferData(buffer = 13)
1627ec681f3Smrg  893466 glInvalidateBufferData(buffer = 17)
1637ec681f3Smrg  893505 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
1647ec681f3Smrg  893506 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1000
1657ec681f3Smrg  893508 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
1667ec681f3Smrg  893509 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 15)
1677ec681f3Smrg  893510 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 32, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034e5df000
1687ec681f3Smrg  893512 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
1697ec681f3Smrg  893532 glBindVertexBuffers(first = 0, count = 2, buffers = {10, 15}, offsets = {0, 0}, strides = {52, 16})
1707ec681f3Smrg  893552 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
1717ec681f3Smrg  893609 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
1727ec681f3Smrg  893732 glBindVertexBuffers(first = 0, count = 1, buffers = &14, offsets = &0, strides = &48)
1737ec681f3Smrg  893733 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 14)
1747ec681f3Smrg  893744 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0xf0, basevertex = 0)
1757ec681f3Smrg  893759 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x2e0, basevertex = 6)
1767ec681f3Smrg  893786 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 600, type = GL_UNSIGNED_SHORT, indices = 0xe87b0, basevertex = 21515)
1777ec681f3Smrg  893822 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
1787ec681f3Smrg  893845 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14)
1797ec681f3Smrg  893846 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 788, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1314
1807ec681f3Smrg  893848 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE
1817ec681f3Smrg  893886 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131)
1827ec681f3Smrg  893943 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6)
1837ec681f3Smrg
1847ec681f3SmrgAt the start of this frame, buffer 14 and 15 haven't been used in the previous 2
1857ec681f3Smrgframes, and the ``GL_ARB_sync`` fence has ensured that the GPU has at least started
1867ec681f3Smrgframe n-1 as the CPU starts the current frame. The first map is ``offset = 0,
1877ec681f3SmrgINVALIDATE_BUFFER | UNSYNCHRONIZED``, which suggests that the driver should
1887ec681f3Smrgreallocate storage for the mapping even in the ``UNSYNCHRONIZED`` case, except
1897ec681f3Smrgthat the buffer is definitely going to be idle, making reallocation unnecessary
1907ec681f3Smrg(you may need to empty your valid range, though, to prevent unnecessary batch
1917ec681f3Smrgflushes).
1927ec681f3Smrg
1937ec681f3SmrgAlso note the use of a totally unrelated binding point for the mapping of the
1947ec681f3Smrgvertex array -- you can't effectively use it as a hint for any buffer placement
1957ec681f3Smrgin memory. The game does also use ``glCopyBufferSubData()``, but only on a
1967ec681f3Smrgdifferent buffer.
1977ec681f3Smrg
1987ec681f3Smrg
1997ec681f3SmrgPlague Inc
2007ec681f3Smrg==========
2017ec681f3Smrg
2027ec681f3Smrg.. code-block:: console
2037ec681f3Smrg
2047ec681f3Smrg  1640732 glXSwapBuffers(dpy = 0xb218f20, drawable = 23068674)
2057ec681f3Smrg  1640733 glClientWaitSync(sync = 0xb4141430, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
2067ec681f3Smrg  1640734 glDeleteSync(sync = 0xb4141430)
2077ec681f3Smrg  1640735 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0xb4141430
2087ec681f3Smrg  
2097ec681f3Smrg  1640780 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 78)
2107ec681f3Smrg  1640787 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 79)
2117ec681f3Smrg  1640788 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
2127ec681f3Smrg  1640795 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL)
2137ec681f3Smrg  1640813 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
2147ec681f3Smrg  1640814 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4000
2157ec681f3Smrg  1640815 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
2167ec681f3Smrg  1640816 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998000
2177ec681f3Smrg  1640817 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
2187ec681f3Smrg  1640819 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
2197ec681f3Smrg  1640820 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
2207ec681f3Smrg  1640821 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
2217ec681f3Smrg  1640823 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
2227ec681f3Smrg  1640824 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
2237ec681f3Smrg  1640825 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 1096)
2247ec681f3Smrg  1640831 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1091)
2257ec681f3Smrg  1640832 glDrawElements(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL)
2267ec681f3Smrg  
2277ec681f3Smrg  1640847 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
2287ec681f3Smrg  1640848 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 352, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4160
2297ec681f3Smrg  1640849 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
2307ec681f3Smrg  1640850 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 88, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998058
2317ec681f3Smrg  1640851 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096)
2327ec681f3Smrg  1640853 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352)
2337ec681f3Smrg  1640854 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
2347ec681f3Smrg  1640855 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091)
2357ec681f3Smrg  1640857 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12)
2367ec681f3Smrg  1640858 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
2377ec681f3Smrg  1640863 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x58, basevertex = 4)
2387ec681f3Smrg
2397ec681f3SmrgAt the start of this frame, the VBOs haven't been used in about 6 frames, and
2407ec681f3Smrgthe ``GL_ARB_sync`` fence has ensured that the GPU has started frame n-1.
2417ec681f3Smrg
2427ec681f3SmrgNote the use of ``glFlushMappedBufferRange()`` on a small fraction of the size
2437ec681f3Smrgof the VBO -- it is important that a blitting driver make use of the flush
2447ec681f3Smrgranges when in explicit mode.
2457ec681f3Smrg
2467ec681f3SmrgDarkest Dungeon
2477ec681f3Smrg===============
2487ec681f3Smrg
2497ec681f3Smrg.. code-block:: console
2507ec681f3Smrg
2517ec681f3Smrg  938384 glXSwapBuffers(dpy = 0x377fcd0, drawable = 23068692)
2527ec681f3Smrg  
2537ec681f3Smrg  938385 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
2547ec681f3Smrg  938386 glBufferData(target = GL_ARRAY_BUFFER, size = 1048576, data = NULL, usage = GL_STREAM_DRAW)
2557ec681f3Smrg  938511 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
2567ec681f3Smrg  938512 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
2577ec681f3Smrg  938514 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 512)
2587ec681f3Smrg  938515 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
2597ec681f3Smrg  938523 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
2607ec681f3Smrg  938524 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
2617ec681f3Smrg  938525 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = NULL)
2627ec681f3Smrg  938527 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
2637ec681f3Smrg  938528 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000
2647ec681f3Smrg  938530 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 512, length = 512)
2657ec681f3Smrg  938531 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE
2667ec681f3Smrg  938539 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1)
2677ec681f3Smrg  938540 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2)
2687ec681f3Smrg  938541 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x30)
2697ec681f3Smrg  [... more maps and draws at increasing offsets]
2707ec681f3Smrg
2717ec681f3SmrgInteresting note for this game, after the initial ``glBufferData()`` in the
2727ec681f3Smrgframe to reallocate the storage, it unsync maps the whole buffer each time, and
2737ec681f3Smrgjust changes which region it flushes. The same GL buffer name is used in every
2747ec681f3Smrgframe.
2757ec681f3Smrg
2767ec681f3SmrgTabletop Simulator
2777ec681f3Smrg==================
2787ec681f3Smrg
2797ec681f3Smrg.. code-block:: console
2807ec681f3Smrg
2817ec681f3Smrg  1287594 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
2827ec681f3Smrg  1287595 glClientWaitSync(sync = 0x7abf554e37b0, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
2837ec681f3Smrg  1287596 glDeleteSync(sync = 0x7abf554e37b0)
2847ec681f3Smrg  1287597 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7abf56647490
2857ec681f3Smrg  
2867ec681f3Smrg  1287614 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
2877ec681f3Smrg  1287615 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7abf2e79a000
2887ec681f3Smrg  1287642 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 614)
2897ec681f3Smrg  1287650 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 5)
2907ec681f3Smrg  1287651 glBufferSubData(target = GL_COPY_WRITE_BUFFER, offset = 0, size = 1088, data = blob(1088))
2917ec681f3Smrg  1287652 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 615)
2927ec681f3Smrg  1287653 glDrawElements(mode = GL_TRIANGLES, count = 1788, type = GL_UNSIGNED_SHORT, indices = NULL)
2937ec681f3Smrg  [... more draw calls]
2947ec681f3Smrg  1289055 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480)
2957ec681f3Smrg  1289057 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384)
2967ec681f3Smrg  1289058 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
2977ec681f3Smrg  1289059 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 480)
2987ec681f3Smrg  1289066 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 12, count = 4)
2997ec681f3Smrg  1289068 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 8, count = 4)
3007ec681f3Smrg  1289553 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692)
3017ec681f3Smrg
3027ec681f3SmrgIn this app, buffer 480 gets used like this every other frame.  The ``GL_ARB_sync``
3037ec681f3Smrgfence ensures that frame n-1 has started on the GPU before CPU work starts on
3047ec681f3Smrgthe current frame, so the unsynchronized access to the buffers is safe.
3057ec681f3Smrg
3067ec681f3SmrgHollow Knight
3077ec681f3Smrg=============
3087ec681f3Smrg
3097ec681f3Smrg.. code-block:: console
3107ec681f3Smrg
3117ec681f3Smrg  1873034 glXSwapBuffers(dpy = 0x28609d0, drawable = 23068692)
3127ec681f3Smrg  1873035 glClientWaitSync(sync = 0x7b1a5ca6e130, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED
3137ec681f3Smrg  1873036 glDeleteSync(sync = 0x7b1a5ca6e130)
3147ec681f3Smrg  1873037 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7b1a5ca6e130
3157ec681f3Smrg  1873038 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
3167ec681f3Smrg  1873039 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c7e000
3177ec681f3Smrg  1873040 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
3187ec681f3Smrg  1873041 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a07430000
3197ec681f3Smrg  1873065 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
3207ec681f3Smrg  1873067 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640)
3217ec681f3Smrg  1873068 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
3227ec681f3Smrg  1873069 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
3237ec681f3Smrg  1873071 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720)
3247ec681f3Smrg  1873072 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
3257ec681f3Smrg  1873073 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
3267ec681f3Smrg  1873074 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 8640, length = 576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c801c0
3277ec681f3Smrg  1873075 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
3287ec681f3Smrg  1873076 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 720, length = 72, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a074302d0
3297ec681f3Smrg  1873077 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29)
3307ec681f3Smrg  1873079 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 576)
3317ec681f3Smrg  1873080 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
3327ec681f3Smrg  1873081 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30)
3337ec681f3Smrg  1873083 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 72)
3347ec681f3Smrg  1873084 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE
3357ec681f3Smrg  1873085 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 29)
3367ec681f3Smrg  1873096 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 30)
3377ec681f3Smrg  1873097 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x2d0, basevertex = 240)
3387ec681f3Smrg
3397ec681f3SmrgIn this app, buffer 29/30 get used like this starting from offset 0 every other
3407ec681f3Smrgframe.  The ``GL_ARB_sync`` fence is used to make sure that the GPU has reached the
3417ec681f3Smrgstart of the previous frame before we go unsynchronized writing over the n-2
3427ec681f3Smrgframe's buffer.
3437ec681f3Smrg
3447ec681f3SmrgBorderlands 2
3457ec681f3Smrg=============
3467ec681f3Smrg
3477ec681f3Smrg.. code-block:: console
3487ec681f3Smrg
3497ec681f3Smrg  3561998 glFlush()
3507ec681f3Smrg  3562004 glXSwapBuffers(dpy = 0xbaf0f90, drawable = 23068705)
3517ec681f3Smrg  3562006 glClientWaitSync(sync = 0x231c2ab0, flags = GL_SYNC_FLUSH_COMMANDS_BIT, timeout = 10000000000) = GL_ALREADY_SIGNALED
3527ec681f3Smrg  3562007 glDeleteSync(sync = 0x231c2ab0)
3537ec681f3Smrg  3562008 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x231aadc0
3547ec681f3Smrg  
3557ec681f3Smrg  3562050 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
3567ec681f3Smrg  3562051 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1792, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xde056000
3577ec681f3Smrg  3562053 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
3587ec681f3Smrg  3562054 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1194)
3597ec681f3Smrg  3562055 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1280, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xd9426000
3607ec681f3Smrg  3562057 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE
3617ec681f3Smrg  [... unrelated draws]
3627ec681f3Smrg  3563051 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193)
3637ec681f3Smrg  3563064 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 875)
3647ec681f3Smrg  3563065 glDrawElementsInstancedARB(mode = GL_TRIANGLES, count = 72, type = GL_UNSIGNED_SHORT, indices = NULL, instancecount = 28)
3657ec681f3Smrg
3667ec681f3SmrgThe ``GL_ARB_sync`` fence ensures that the GPU has started frame n-1 before the CPU
3677ec681f3Smrgstarts on the current frame.
3687ec681f3Smrg
3697ec681f3SmrgThis sequence of buffer uploads appears in each frame with the same buffer
3707ec681f3Smrgnames, so you do need to handle the ``GL_MAP_INVALIDATE_BUFFER_BIT`` as a
3717ec681f3Smrgreallocate if the buffer is GPU-busy (it wasn't in this trace capture) to avoid
3727ec681f3Smrgstalls on the n-1 frame completing.
3737ec681f3Smrg
3747ec681f3SmrgNote that this is just one small buffer. Most of the vertex data goes through a
3757ec681f3Smrg``glBufferSubData()``/``glDraw*()`` path with the VBO used across multiple
3767ec681f3Smrgframes, with a ``glBufferData()`` when needing to wrap.
3777ec681f3Smrg
3787ec681f3SmrgBuffer mapping conclusions
3797ec681f3Smrg--------------------------
3807ec681f3Smrg
3817ec681f3Smrg* Non-blitting drivers must track the valid range of a freshly allocated buffer
3827ec681f3Smrg  as it gets uploaded in ``pipe_transfer_map()`` and avoid stalling on the GPU
3837ec681f3Smrg  when mapping an undefined portion of the buffer when ``glBufferSubData()`` is
3847ec681f3Smrg  interleaved with drawing.
3857ec681f3Smrg
3867ec681f3Smrg* Non-blitting drivers must reallocate storage on ``glBufferData(NULL)`` so that
3877ec681f3Smrg  the following ``glBufferSubData()`` won't stall. That ``glBufferData(NULL)``
3887ec681f3Smrg  call will appear in the driver as an ``invalidate_resource()`` call if
3897ec681f3Smrg  ``PIPE_CAP_INVALIDATE_BUFFER`` is available. (If that flag is not set, then
3907ec681f3Smrg  mesa/st will create a new pipe_resource for you). Storage reallocation may be
3917ec681f3Smrg  skipped if you for some reason know that the buffer is idle, in which case you
3927ec681f3Smrg  can just empty the valid region.
3937ec681f3Smrg
3947ec681f3Smrg* Blitting drivers must use the ``transfer_flush_region()`` region
3957ec681f3Smrg  instead of the mapped range when ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid
3967ec681f3Smrg  blitting too much data. (When that bit is unset, you just blit the whole
3977ec681f3Smrg  mapped range at unmap time.)
3987ec681f3Smrg
3997ec681f3Smrg* Buffer valid range tracking in non-blitting drivers must use the
4007ec681f3Smrg  ``transfer_flush_region()`` region instead of the mapped range when
4017ec681f3Smrg  ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid excess stalls.
4027ec681f3Smrg
4037ec681f3Smrg* Buffer valid range tracking doesn't need to be fancy, "number of bytes
4047ec681f3Smrg  valid starting from 0" is sufficient for all examples found.
4057ec681f3Smrg
4067ec681f3Smrg* Use the ``pipe_debug_callback`` to report stalls on buffer mapping to ease
4077ec681f3Smrg  debug.
4087ec681f3Smrg
4097ec681f3Smrg* Buffer binding points are not useful for tuning buffer placement (See all the
4107ec681f3Smrg  ``PIPE_COPY_WRITE_BUFFER`` instances), you have to track the actual usage
4117ec681f3Smrg  history of a GL BO name.  mesa/st does this for optimizing its state updates
4127ec681f3Smrg  on reallocation in the ``!PIPE_CAP_INVALIDATE_BUFFER`` case, and if you set
4137ec681f3Smrg  ``PIPE_CAP_INVALIDATE_BUFFER`` then you have to flag your own internal state
4147ec681f3Smrg  updates (VBO addresses, XFB addresses, texture buffer addresses, etc.) on
4157ec681f3Smrg  reallocation based on usage history.
416