17ec681f3SmrgBuffer mapping patterns 27ec681f3Smrg----------------------- 37ec681f3Smrg 47ec681f3SmrgThere are two main strategies the driver has for CPU access to GL buffer 57ec681f3Smrgobjects. One is that the GL calls allocate temporary storage and blit to the GPU 67ec681f3Smrgat 77ec681f3Smrg``glBufferSubData()``/``glBufferData()``/``glFlushMappedBufferRange()``/``glUnmapBuffer()`` 87ec681f3Smrgtime. This makes the behavior easily match. However, this may be more costly 97ec681f3Smrgthan direct mapping of the GL BO on some platforms, and is essentially not 107ec681f3Smrgavailable to tiling GPUs (since tiling involves running through the command 117ec681f3Smrgstream multiple times). Thus, GL has additional interfaces to help make it so 127ec681f3Smrgapps can directly access memory while avoiding implicit blocking on the GPU 137ec681f3Smrgrendering from those BOs. 147ec681f3Smrg 157ec681f3SmrgRendering engines have a variety of knobs to set on those GL interfaces for data 167ec681f3Smrgupload, and as a whole they seem to take just about every path available. Let's 177ec681f3Smrglook at some examples to see how they might constrain GL driver buffer upload 187ec681f3Smrgbehavior. 197ec681f3Smrg 207ec681f3SmrgPortal 2 217ec681f3Smrg======== 227ec681f3Smrg 237ec681f3Smrg.. code-block:: console 247ec681f3Smrg 257ec681f3Smrg 1030842 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540) 267ec681f3Smrg 1030876 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 65536, data = NULL, usage = GL_DYNAMIC_DRAW) 277ec681f3Smrg 1030877 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, size = 576, data = blob(576)) 287ec681f3Smrg 1030896 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 526, count = 252, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0) 297ec681f3Smrg 1030915 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 19657, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x1f8, basevertex = 0) 307ec681f3Smrg 1030917 glBufferDataARB(target = GL_ARRAY_BUFFER, size = 1572864, data = NULL, usage = GL_DYNAMIC_DRAW) 317ec681f3Smrg 1030918 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 128, data = blob(128)) 327ec681f3Smrg 1030919 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 576, size = 12, data = blob(12)) 337ec681f3Smrg 1030936 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x240, basevertex = 0) 347ec681f3Smrg 1030937 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 128, size = 128, data = blob(128)) 357ec681f3Smrg 1030938 glBufferSubData(target = GL_ELEMENT_ARRAY_BUFFER, offset = 588, size = 12, data = blob(12)) 367ec681f3Smrg 1030940 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 4, end = 7, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x24c, basevertex = 0) 377ec681f3Smrg [... repeated draws at increasing offsets] 387ec681f3Smrg 1033097 glXSwapBuffers(dpy = 0x82a8000, drawable = 20971540) 397ec681f3Smrg 407ec681f3SmrgFrom this sequence, we can see that it is important that the driver either 417ec681f3Smrgimplement ``glBufferSubData()`` as a blit from a streaming uploader in sequence with 427ec681f3Smrgthe ``glDraw*()`` calls (a common behavior for non-tiled GPUs, particularly those with 437ec681f3Smrgdedicated memory), or that you: 447ec681f3Smrg 457ec681f3Smrg1) Track the valid range of the buffer so that you don't have to flush the draws 467ec681f3Smrg and synchronize on each following ``glBufferSubData()``. 477ec681f3Smrg 487ec681f3Smrg2) Reallocate the buffer storage on ``glBufferData`` so that your first 497ec681f3Smrg ``glBufferSubData()`` of the frame doesn't stall on the last frame's 507ec681f3Smrg rendering completing. 517ec681f3Smrg 527ec681f3SmrgYou can't just empty your valid range on ``glBufferData()`` unless you know that 537ec681f3Smrgthe GPU access from the previous frame has completed. This pattern of 547ec681f3Smrgincrementing ``glBufferSubData()`` offsets interleaved with draws from that data 557ec681f3Smrgis common among newer Valve games. 567ec681f3Smrg 577ec681f3Smrg.. code-block:: console 587ec681f3Smrg 597ec681f3Smrg [ during setup ] 607ec681f3Smrg 617ec681f3Smrg 679259 glGenBuffersARB(n = 1, buffers = &1314) 627ec681f3Smrg 679260 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314) 637ec681f3Smrg 679261 glBufferDataARB(target = GL_ELEMENT_ARRAY_BUFFER, size = 3072, data = NULL, usage = GL_STATIC_DRAW) 647ec681f3Smrg 679264 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000 657ec681f3Smrg 679269 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 3072) 667ec681f3Smrg 679270 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE 677ec681f3Smrg 687ec681f3Smrg [... setup of other buffers on this binding point] 697ec681f3Smrg 707ec681f3Smrg 679343 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314) 717ec681f3Smrg 679344 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384000 727ec681f3Smrg 679346 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768) 737ec681f3Smrg 679347 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE 747ec681f3Smrg 679348 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 768, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384300 757ec681f3Smrg 679350 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768) 767ec681f3Smrg 679351 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE 777ec681f3Smrg 679352 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 1536, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384600 787ec681f3Smrg 679354 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768) 797ec681f3Smrg 679355 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE 807ec681f3Smrg 679356 glMapBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 2304, length = 768, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT) = 0xd7384900 817ec681f3Smrg 679358 glFlushMappedBufferRange(target = GL_ELEMENT_ARRAY_BUFFER, offset = 0, length = 768) 827ec681f3Smrg 679359 glUnmapBuffer(target = GL_ELEMENT_ARRAY_BUFFER) = GL_TRUE 837ec681f3Smrg 847ec681f3Smrg [... setup completes and we start drawing later] 857ec681f3Smrg 867ec681f3Smrg 761845 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1314) 877ec681f3Smrg 761846 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 323, count = 384, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0) 887ec681f3Smrg 897ec681f3SmrgThis suggests that, for non-blitting drivers, resetting your "might be used on 907ec681f3Smrgthe GPU" range after a stall could save you a bunch of additional GPU stalls 917ec681f3Smrgduring setup. 927ec681f3Smrg 937ec681f3SmrgTerraria 947ec681f3Smrg======== 957ec681f3Smrg 967ec681f3Smrg.. code-block:: console 977ec681f3Smrg 987ec681f3Smrg 167581 glXSwapBuffers(dpy = 0x3004630, drawable = 25165844) 997ec681f3Smrg 1007ec681f3Smrg 167585 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW) 1017ec681f3Smrg 167586 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 1728, data = blob(1728)) 1027ec681f3Smrg 167588 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 71, count = 108, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0) 1037ec681f3Smrg 167589 glBufferData(target = GL_ARRAY_BUFFER, size = 196608, data = NULL, usage = GL_STREAM_DRAW) 1047ec681f3Smrg 167590 glBufferSubData(target = GL_ARRAY_BUFFER, offset = 0, size = 27456, data = blob(27456)) 1057ec681f3Smrg 167592 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 7, count = 12, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0) 1067ec681f3Smrg 167594 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 8) 1077ec681f3Smrg 167596 glDrawRangeElementsBaseVertex(mode = GL_TRIANGLES, start = 0, end = 3, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 12) 1087ec681f3Smrg [...] 1097ec681f3Smrg 1107ec681f3SmrgIn this game, we can see ``glBufferData()`` being used on the same array buffer 1117ec681f3Smrgthroughout, to get new storage so that the ``glBufferSubData()`` doesn't cause 1127ec681f3Smrgsynchronization. 1137ec681f3Smrg 1147ec681f3SmrgDon't Starve 1157ec681f3Smrg============ 1167ec681f3Smrg 1177ec681f3Smrg.. code-block:: console 1187ec681f3Smrg 1197ec681f3Smrg 7251917 glGenBuffers(n = 1, buffers = &115052) 1207ec681f3Smrg 7251918 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052) 1217ec681f3Smrg 7251919 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW) 1227ec681f3Smrg 7251921 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115052) 1237ec681f3Smrg 7251928 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6) 1247ec681f3Smrg 7251930 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 114872) 1257ec681f3Smrg 7251936 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 18) 1267ec681f3Smrg 7251938 glGenBuffers(n = 1, buffers = &115053) 1277ec681f3Smrg 7251939 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053) 1287ec681f3Smrg 7251940 glBufferData(target = GL_ARRAY_BUFFER, size = 144, data = blob(144), usage = GL_STREAM_DRAW) 1297ec681f3Smrg 7251942 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 115053) 1307ec681f3Smrg 7251949 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6) 1317ec681f3Smrg 7251973 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540) 1327ec681f3Smrg [... drawing next frame] 1337ec681f3Smrg 7252388 glDeleteBuffers(n = 1, buffers = &115052) 1347ec681f3Smrg 7252389 glDeleteBuffers(n = 1, buffers = &115053) 1357ec681f3Smrg 7252390 glXSwapBuffers(dpy = 0x86dd860, drawable = 20971540) 1367ec681f3Smrg 1377ec681f3SmrgIn this game we have a lot of tiny ``glBufferData()`` calls, suggesting that we 1387ec681f3Smrgcould see working set wins and possibly CPU overhead reduction by packing small 1397ec681f3SmrgGL buffers in the same BO. Interestingly, the deletes of the temporary buffers 1407ec681f3Smrgalways happen at the end of the next frame. 1417ec681f3Smrg 1427ec681f3SmrgEuro Truck Simulator 1437ec681f3Smrg==================== 1447ec681f3Smrg 1457ec681f3Smrg.. code-block:: console 1467ec681f3Smrg 1477ec681f3Smrg [usage of VBO 14,15] 1487ec681f3Smrg [...] 1497ec681f3Smrg 885199 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527) 1507ec681f3Smrg 885203 glInvalidateBufferData(buffer = 14) 1517ec681f3Smrg 885204 glInvalidateBufferData(buffer = 15) 1527ec681f3Smrg [...] 1537ec681f3Smrg 889330 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527) 1547ec681f3Smrg 889334 glInvalidateBufferData(buffer = 12) 1557ec681f3Smrg 889335 glInvalidateBufferData(buffer = 16) 1567ec681f3Smrg [...] 1577ec681f3Smrg 893461 glXSwapBuffers(dpy = 0x379a3e0, drawable = 20971527) 1587ec681f3Smrg 893462 glClientWaitSync(sync = 0x77eee10, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED 1597ec681f3Smrg 893463 glDeleteSync(sync = 0x780a630) 1607ec681f3Smrg 893464 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x78ec730 1617ec681f3Smrg 893465 glInvalidateBufferData(buffer = 13) 1627ec681f3Smrg 893466 glInvalidateBufferData(buffer = 17) 1637ec681f3Smrg 893505 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14) 1647ec681f3Smrg 893506 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1000 1657ec681f3Smrg 893508 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE 1667ec681f3Smrg 893509 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 15) 1677ec681f3Smrg 893510 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 0, length = 32, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034e5df000 1687ec681f3Smrg 893512 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE 1697ec681f3Smrg 893532 glBindVertexBuffers(first = 0, count = 2, buffers = {10, 15}, offsets = {0, 0}, strides = {52, 16}) 1707ec681f3Smrg 893552 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131) 1717ec681f3Smrg 893609 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6) 1727ec681f3Smrg 893732 glBindVertexBuffers(first = 0, count = 1, buffers = &14, offsets = &0, strides = &48) 1737ec681f3Smrg 893733 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 14) 1747ec681f3Smrg 893744 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0xf0, basevertex = 0) 1757ec681f3Smrg 893759 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x2e0, basevertex = 6) 1767ec681f3Smrg 893786 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 600, type = GL_UNSIGNED_SHORT, indices = 0xe87b0, basevertex = 21515) 1777ec681f3Smrg 893822 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6) 1787ec681f3Smrg 893845 glBindBuffer(target = GL_COPY_READ_BUFFER, buffer = 14) 1797ec681f3Smrg 893846 glMapBufferRange(target = GL_COPY_READ_BUFFER, offset = 788, length = 788, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b034efd1314 1807ec681f3Smrg 893848 glUnmapBuffer(target = GL_COPY_READ_BUFFER) = GL_TRUE 1817ec681f3Smrg 893886 glDrawElementsInstancedBaseVertex(mode = GL_TRIANGLES, count = 18, type = GL_UNSIGNED_SHORT, indices = 0x13f280, instancecount = 1, basevertex = 25131) 1827ec681f3Smrg 893943 glDrawArrays(mode = GL_TRIANGLES, first = 0, count = 6) 1837ec681f3Smrg 1847ec681f3SmrgAt the start of this frame, buffer 14 and 15 haven't been used in the previous 2 1857ec681f3Smrgframes, and the ``GL_ARB_sync`` fence has ensured that the GPU has at least started 1867ec681f3Smrgframe n-1 as the CPU starts the current frame. The first map is ``offset = 0, 1877ec681f3SmrgINVALIDATE_BUFFER | UNSYNCHRONIZED``, which suggests that the driver should 1887ec681f3Smrgreallocate storage for the mapping even in the ``UNSYNCHRONIZED`` case, except 1897ec681f3Smrgthat the buffer is definitely going to be idle, making reallocation unnecessary 1907ec681f3Smrg(you may need to empty your valid range, though, to prevent unnecessary batch 1917ec681f3Smrgflushes). 1927ec681f3Smrg 1937ec681f3SmrgAlso note the use of a totally unrelated binding point for the mapping of the 1947ec681f3Smrgvertex array -- you can't effectively use it as a hint for any buffer placement 1957ec681f3Smrgin memory. The game does also use ``glCopyBufferSubData()``, but only on a 1967ec681f3Smrgdifferent buffer. 1977ec681f3Smrg 1987ec681f3Smrg 1997ec681f3SmrgPlague Inc 2007ec681f3Smrg========== 2017ec681f3Smrg 2027ec681f3Smrg.. code-block:: console 2037ec681f3Smrg 2047ec681f3Smrg 1640732 glXSwapBuffers(dpy = 0xb218f20, drawable = 23068674) 2057ec681f3Smrg 1640733 glClientWaitSync(sync = 0xb4141430, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED 2067ec681f3Smrg 1640734 glDeleteSync(sync = 0xb4141430) 2077ec681f3Smrg 1640735 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0xb4141430 2087ec681f3Smrg 2097ec681f3Smrg 1640780 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 78) 2107ec681f3Smrg 1640787 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 79) 2117ec681f3Smrg 1640788 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL) 2127ec681f3Smrg 1640795 glDrawElements(mode = GL_TRIANGLES, count = 9636, type = GL_UNSIGNED_SHORT, indices = NULL) 2137ec681f3Smrg 1640813 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096) 2147ec681f3Smrg 1640814 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4000 2157ec681f3Smrg 1640815 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091) 2167ec681f3Smrg 1640816 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998000 2177ec681f3Smrg 1640817 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096) 2187ec681f3Smrg 1640819 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352) 2197ec681f3Smrg 1640820 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 2207ec681f3Smrg 1640821 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091) 2217ec681f3Smrg 1640823 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12) 2227ec681f3Smrg 1640824 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 2237ec681f3Smrg 1640825 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 1096) 2247ec681f3Smrg 1640831 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1091) 2257ec681f3Smrg 1640832 glDrawElements(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = NULL) 2267ec681f3Smrg 2277ec681f3Smrg 1640847 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096) 2287ec681f3Smrg 1640848 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 352, length = 67584, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xbfef4160 2297ec681f3Smrg 1640849 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091) 2307ec681f3Smrg 1640850 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 88, length = 12, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0xc3998058 2317ec681f3Smrg 1640851 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1096) 2327ec681f3Smrg 1640853 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 352) 2337ec681f3Smrg 1640854 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 2347ec681f3Smrg 1640855 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 1091) 2357ec681f3Smrg 1640857 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 12) 2367ec681f3Smrg 1640858 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 2377ec681f3Smrg 1640863 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 6, type = GL_UNSIGNED_SHORT, indices = 0x58, basevertex = 4) 2387ec681f3Smrg 2397ec681f3SmrgAt the start of this frame, the VBOs haven't been used in about 6 frames, and 2407ec681f3Smrgthe ``GL_ARB_sync`` fence has ensured that the GPU has started frame n-1. 2417ec681f3Smrg 2427ec681f3SmrgNote the use of ``glFlushMappedBufferRange()`` on a small fraction of the size 2437ec681f3Smrgof the VBO -- it is important that a blitting driver make use of the flush 2447ec681f3Smrgranges when in explicit mode. 2457ec681f3Smrg 2467ec681f3SmrgDarkest Dungeon 2477ec681f3Smrg=============== 2487ec681f3Smrg 2497ec681f3Smrg.. code-block:: console 2507ec681f3Smrg 2517ec681f3Smrg 938384 glXSwapBuffers(dpy = 0x377fcd0, drawable = 23068692) 2527ec681f3Smrg 2537ec681f3Smrg 938385 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2) 2547ec681f3Smrg 938386 glBufferData(target = GL_ARRAY_BUFFER, size = 1048576, data = NULL, usage = GL_STREAM_DRAW) 2557ec681f3Smrg 938511 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2) 2567ec681f3Smrg 938512 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000 2577ec681f3Smrg 938514 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 512) 2587ec681f3Smrg 938515 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE 2597ec681f3Smrg 938523 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1) 2607ec681f3Smrg 938524 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2) 2617ec681f3Smrg 938525 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = NULL) 2627ec681f3Smrg 938527 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2) 2637ec681f3Smrg 938528 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1048576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7a73fcaa7000 2647ec681f3Smrg 938530 glFlushMappedBufferRange(target = GL_ARRAY_BUFFER, offset = 512, length = 512) 2657ec681f3Smrg 938531 glUnmapBuffer(target = GL_ARRAY_BUFFER) = GL_TRUE 2667ec681f3Smrg 938539 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 1) 2677ec681f3Smrg 938540 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 2) 2687ec681f3Smrg 938541 glDrawElements(mode = GL_TRIANGLES, count = 24, type = GL_UNSIGNED_SHORT, indices = 0x30) 2697ec681f3Smrg [... more maps and draws at increasing offsets] 2707ec681f3Smrg 2717ec681f3SmrgInteresting note for this game, after the initial ``glBufferData()`` in the 2727ec681f3Smrgframe to reallocate the storage, it unsync maps the whole buffer each time, and 2737ec681f3Smrgjust changes which region it flushes. The same GL buffer name is used in every 2747ec681f3Smrgframe. 2757ec681f3Smrg 2767ec681f3SmrgTabletop Simulator 2777ec681f3Smrg================== 2787ec681f3Smrg 2797ec681f3Smrg.. code-block:: console 2807ec681f3Smrg 2817ec681f3Smrg 1287594 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692) 2827ec681f3Smrg 1287595 glClientWaitSync(sync = 0x7abf554e37b0, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED 2837ec681f3Smrg 1287596 glDeleteSync(sync = 0x7abf554e37b0) 2847ec681f3Smrg 1287597 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7abf56647490 2857ec681f3Smrg 2867ec681f3Smrg 1287614 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480) 2877ec681f3Smrg 1287615 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_RANGE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7abf2e79a000 2887ec681f3Smrg 1287642 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 614) 2897ec681f3Smrg 1287650 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 5) 2907ec681f3Smrg 1287651 glBufferSubData(target = GL_COPY_WRITE_BUFFER, offset = 0, size = 1088, data = blob(1088)) 2917ec681f3Smrg 1287652 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 615) 2927ec681f3Smrg 1287653 glDrawElements(mode = GL_TRIANGLES, count = 1788, type = GL_UNSIGNED_SHORT, indices = NULL) 2937ec681f3Smrg [... more draw calls] 2947ec681f3Smrg 1289055 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 480) 2957ec681f3Smrg 1289057 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 384) 2967ec681f3Smrg 1289058 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 2977ec681f3Smrg 1289059 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 480) 2987ec681f3Smrg 1289066 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 12, count = 4) 2997ec681f3Smrg 1289068 glDrawArrays(mode = GL_TRIANGLE_STRIP, first = 8, count = 4) 3007ec681f3Smrg 1289553 glXSwapBuffers(dpy = 0x3e10810, drawable = 23068692) 3017ec681f3Smrg 3027ec681f3SmrgIn this app, buffer 480 gets used like this every other frame. The ``GL_ARB_sync`` 3037ec681f3Smrgfence ensures that frame n-1 has started on the GPU before CPU work starts on 3047ec681f3Smrgthe current frame, so the unsynchronized access to the buffers is safe. 3057ec681f3Smrg 3067ec681f3SmrgHollow Knight 3077ec681f3Smrg============= 3087ec681f3Smrg 3097ec681f3Smrg.. code-block:: console 3107ec681f3Smrg 3117ec681f3Smrg 1873034 glXSwapBuffers(dpy = 0x28609d0, drawable = 23068692) 3127ec681f3Smrg 1873035 glClientWaitSync(sync = 0x7b1a5ca6e130, flags = 0x0, timeout = 0) = GL_ALREADY_SIGNALED 3137ec681f3Smrg 1873036 glDeleteSync(sync = 0x7b1a5ca6e130) 3147ec681f3Smrg 1873037 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x7b1a5ca6e130 3157ec681f3Smrg 1873038 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29) 3167ec681f3Smrg 1873039 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c7e000 3177ec681f3Smrg 1873040 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30) 3187ec681f3Smrg 1873041 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a07430000 3197ec681f3Smrg 1873065 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29) 3207ec681f3Smrg 1873067 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 8640) 3217ec681f3Smrg 1873068 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 3227ec681f3Smrg 1873069 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30) 3237ec681f3Smrg 1873071 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 720) 3247ec681f3Smrg 1873072 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 3257ec681f3Smrg 1873073 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29) 3267ec681f3Smrg 1873074 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 8640, length = 576, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a04c801c0 3277ec681f3Smrg 1873075 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30) 3287ec681f3Smrg 1873076 glMapBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 720, length = 72, access = GL_MAP_WRITE_BIT | GL_MAP_FLUSH_EXPLICIT_BIT | GL_MAP_UNSYNCHRONIZED_BIT) = 0x7b1a074302d0 3297ec681f3Smrg 1873077 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 29) 3307ec681f3Smrg 1873079 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 576) 3317ec681f3Smrg 1873080 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 3327ec681f3Smrg 1873081 glBindBuffer(target = GL_COPY_WRITE_BUFFER, buffer = 30) 3337ec681f3Smrg 1873083 glFlushMappedBufferRange(target = GL_COPY_WRITE_BUFFER, offset = 0, length = 72) 3347ec681f3Smrg 1873084 glUnmapBuffer(target = GL_COPY_WRITE_BUFFER) = GL_TRUE 3357ec681f3Smrg 1873085 glBindBuffer(target = GL_ARRAY_BUFFER, buffer = 29) 3367ec681f3Smrg 1873096 glBindBuffer(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 30) 3377ec681f3Smrg 1873097 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 36, type = GL_UNSIGNED_SHORT, indices = 0x2d0, basevertex = 240) 3387ec681f3Smrg 3397ec681f3SmrgIn this app, buffer 29/30 get used like this starting from offset 0 every other 3407ec681f3Smrgframe. The ``GL_ARB_sync`` fence is used to make sure that the GPU has reached the 3417ec681f3Smrgstart of the previous frame before we go unsynchronized writing over the n-2 3427ec681f3Smrgframe's buffer. 3437ec681f3Smrg 3447ec681f3SmrgBorderlands 2 3457ec681f3Smrg============= 3467ec681f3Smrg 3477ec681f3Smrg.. code-block:: console 3487ec681f3Smrg 3497ec681f3Smrg 3561998 glFlush() 3507ec681f3Smrg 3562004 glXSwapBuffers(dpy = 0xbaf0f90, drawable = 23068705) 3517ec681f3Smrg 3562006 glClientWaitSync(sync = 0x231c2ab0, flags = GL_SYNC_FLUSH_COMMANDS_BIT, timeout = 10000000000) = GL_ALREADY_SIGNALED 3527ec681f3Smrg 3562007 glDeleteSync(sync = 0x231c2ab0) 3537ec681f3Smrg 3562008 glFenceSync(condition = GL_SYNC_GPU_COMMANDS_COMPLETE, flags = 0) = 0x231aadc0 3547ec681f3Smrg 3557ec681f3Smrg 3562050 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193) 3567ec681f3Smrg 3562051 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1792, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xde056000 3577ec681f3Smrg 3562053 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE 3587ec681f3Smrg 3562054 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1194) 3597ec681f3Smrg 3562055 glMapBufferRange(target = GL_ARRAY_BUFFER, offset = 0, length = 1280, access = GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT) = 0xd9426000 3607ec681f3Smrg 3562057 glUnmapBufferARB(target = GL_ARRAY_BUFFER) = GL_TRUE 3617ec681f3Smrg [... unrelated draws] 3627ec681f3Smrg 3563051 glBindBufferARB(target = GL_ARRAY_BUFFER, buffer = 1193) 3637ec681f3Smrg 3563064 glBindBufferARB(target = GL_ELEMENT_ARRAY_BUFFER, buffer = 875) 3647ec681f3Smrg 3563065 glDrawElementsInstancedARB(mode = GL_TRIANGLES, count = 72, type = GL_UNSIGNED_SHORT, indices = NULL, instancecount = 28) 3657ec681f3Smrg 3667ec681f3SmrgThe ``GL_ARB_sync`` fence ensures that the GPU has started frame n-1 before the CPU 3677ec681f3Smrgstarts on the current frame. 3687ec681f3Smrg 3697ec681f3SmrgThis sequence of buffer uploads appears in each frame with the same buffer 3707ec681f3Smrgnames, so you do need to handle the ``GL_MAP_INVALIDATE_BUFFER_BIT`` as a 3717ec681f3Smrgreallocate if the buffer is GPU-busy (it wasn't in this trace capture) to avoid 3727ec681f3Smrgstalls on the n-1 frame completing. 3737ec681f3Smrg 3747ec681f3SmrgNote that this is just one small buffer. Most of the vertex data goes through a 3757ec681f3Smrg``glBufferSubData()``/``glDraw*()`` path with the VBO used across multiple 3767ec681f3Smrgframes, with a ``glBufferData()`` when needing to wrap. 3777ec681f3Smrg 3787ec681f3SmrgBuffer mapping conclusions 3797ec681f3Smrg-------------------------- 3807ec681f3Smrg 3817ec681f3Smrg* Non-blitting drivers must track the valid range of a freshly allocated buffer 3827ec681f3Smrg as it gets uploaded in ``pipe_transfer_map()`` and avoid stalling on the GPU 3837ec681f3Smrg when mapping an undefined portion of the buffer when ``glBufferSubData()`` is 3847ec681f3Smrg interleaved with drawing. 3857ec681f3Smrg 3867ec681f3Smrg* Non-blitting drivers must reallocate storage on ``glBufferData(NULL)`` so that 3877ec681f3Smrg the following ``glBufferSubData()`` won't stall. That ``glBufferData(NULL)`` 3887ec681f3Smrg call will appear in the driver as an ``invalidate_resource()`` call if 3897ec681f3Smrg ``PIPE_CAP_INVALIDATE_BUFFER`` is available. (If that flag is not set, then 3907ec681f3Smrg mesa/st will create a new pipe_resource for you). Storage reallocation may be 3917ec681f3Smrg skipped if you for some reason know that the buffer is idle, in which case you 3927ec681f3Smrg can just empty the valid region. 3937ec681f3Smrg 3947ec681f3Smrg* Blitting drivers must use the ``transfer_flush_region()`` region 3957ec681f3Smrg instead of the mapped range when ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid 3967ec681f3Smrg blitting too much data. (When that bit is unset, you just blit the whole 3977ec681f3Smrg mapped range at unmap time.) 3987ec681f3Smrg 3997ec681f3Smrg* Buffer valid range tracking in non-blitting drivers must use the 4007ec681f3Smrg ``transfer_flush_region()`` region instead of the mapped range when 4017ec681f3Smrg ``PIPE_MAP_FLUSH_EXPLICIT`` is set, to avoid excess stalls. 4027ec681f3Smrg 4037ec681f3Smrg* Buffer valid range tracking doesn't need to be fancy, "number of bytes 4047ec681f3Smrg valid starting from 0" is sufficient for all examples found. 4057ec681f3Smrg 4067ec681f3Smrg* Use the ``pipe_debug_callback`` to report stalls on buffer mapping to ease 4077ec681f3Smrg debug. 4087ec681f3Smrg 4097ec681f3Smrg* Buffer binding points are not useful for tuning buffer placement (See all the 4107ec681f3Smrg ``PIPE_COPY_WRITE_BUFFER`` instances), you have to track the actual usage 4117ec681f3Smrg history of a GL BO name. mesa/st does this for optimizing its state updates 4127ec681f3Smrg on reallocation in the ``!PIPE_CAP_INVALIDATE_BUFFER`` case, and if you set 4137ec681f3Smrg ``PIPE_CAP_INVALIDATE_BUFFER`` then you have to flag your own internal state 4147ec681f3Smrg updates (VBO addresses, XFB addresses, texture buffer addresses, etc.) on 4157ec681f3Smrg reallocation based on usage history. 416