11.7SriastradThread-local storage.
21.7Sriastrad
31.7SriastradEach thread has a thread control block, or TCB.  The TCB is a
41.7Sriastradvariable-size structure headed by `struct tls_tcb' from <sys/tls.h>,
51.7Sriastradwith:
61.7Sriastrad
71.7Sriastrad(a) static thread-local storage for the TLS data of initial objects,
81.7Sriastrad    i.e., those loaded at startup rather than those dynamically loaded
91.7Sriastrad    by dlopen
101.7Sriastrad
111.7Sriastrad(b) a pointer to a dynamic thread vector (DTV) for the TLS data
121.7Sriastrad    pointers of objects that use global-dynamic or local-dynamic models
131.7Sriastrad    (typically shared libraries or dlopenable modules)
141.7Sriastrad
151.7Sriastrad(c) the pthread_t pointer
161.7Sriastrad
171.7SriastradThe per-thread lwp private pointer, also sometimes called TP (thread
181.7Sriastradpointer), managed by the _lwp_setprivate and _lwp_setprivate syscalls,
191.7Sriastradeither points at the TCB directly, or, on some architectures, points at
201.7Sriastrad
211.7Sriastrad	tp = tcb + sizeof(struct tls_tcb) + TLS_TP_OFFSET.
221.7Sriastrad
231.7SriastradThis bias is chosen for architectures where signed displacements from
241.7SriastradTP enable twice the range of static TLS offsets when biased like this.
251.7SriastradArchitectures with such a tp/tcb offset must provide
261.7Sriastrad
271.7Sriastradvoid *__lwp_gettcb_fast(void);
281.7Sriastrad
291.7Sriastradin machine/mcontext.h and must define __HAVE___LWP_GETTCB_FAST in
301.7Sriastradmachine/types.h to reflect this; otherwise they must provide
311.7Sriastrad__lwp_getprivate_fast to return the TCB pointer.
321.7Sriastrad
331.7SriastradEach architecture has one of two TLS variants, variant I or variant II.
341.7SriastradVariant I places the static thread-local storage _after_ the fixed
351.7Sriastradcontent of the TCB, at increasing addresses (increasing addresses grow
361.7Sriastraddown in diagram):
371.7Sriastrad
381.7Sriastrad	+---------------+
391.7Sriastrad	| dtv pointer   |       tcb points here (struct tls_tcb)
401.7Sriastrad	+---------------+
411.7Sriastrad	| pthread_t     |
421.7Sriastrad	+---------------+
431.7Sriastrad	| obj0 tls      |       obj0->tlsoffset = 0
441.7Sriastrad	|               |
451.7Sriastrad	|               |
461.7Sriastrad	+---------------+
471.7Sriastrad	| obj1 tls      |       obj1->tlsoffset = 3
481.7Sriastrad	+---------------+
491.7Sriastrad	| obj2 tls      |       obj2->tlsoffset = 4
501.7Sriastrad	|               |
511.7Sriastrad	.		.
521.7Sriastrad	.		.
531.7Sriastrad	.		.
541.7Sriastrad	|               |
551.7Sriastrad	+---------------+
561.7Sriastrad	| objN tls      |       objN->tlsoffset = k
571.7Sriastrad	+---------------+
581.7Sriastrad
591.7SriastradVariant II places the static thread-local storage _before_ the fixed
601.7Sriastradcontent of the TCB, at decreasing addresses:
611.7Sriastrad
621.7Sriastrad	+---------------+
631.7Sriastrad	| objN tls      |       objN->tlsoffset = k
641.7Sriastrad	+---------------+
651.7Sriastrad	| obj(N-1) tls  |       obj(N-1)->tlsoffset = k - 1
661.7Sriastrad	.               .
671.7Sriastrad	.               .
681.7Sriastrad	.               .
691.7Sriastrad	|               |
701.7Sriastrad	+---------------+
711.7Sriastrad	| obj2 tls      |       obj2->tlsoffset = 4
721.7Sriastrad	+---------------+
731.7Sriastrad	| obj1 tls      |       obj1->tlsoffset = 3
741.7Sriastrad	+---------------+
751.7Sriastrad	| obj0 tls      |       obj0->tlsoffset = 0
761.7Sriastrad	|               |
771.7Sriastrad	|               |
781.7Sriastrad	+---------------+
791.7Sriastrad	| tcb pointer   |       tcb points here (struct tls_tcb)
801.7Sriastrad	+---------------+
811.7Sriastrad	| dtv pointer   |
821.7Sriastrad	+---------------+
831.7Sriastrad	| pthread_t     |
841.7Sriastrad	+---------------+
851.7Sriastrad
861.7SriastradSee [ELFTLS] Sec. 3 `Run-Time Handling of TLS', Figs 1 and 2, for
871.7Sriastradbigger pictures including the DTV and dynamically allocated TLS blocks.
881.7Sriastrad
891.7SriastradEach architecture also has its own ELF ABI processor supplement with
901.7Sriastradthe architecture-specific relocations and TLS details.
911.7Sriastrad
921.7SriastradReferences:
931.7Sriastrad
941.7Sriastrad	[ELFTLS] Ulrich Drepper, `ELF Handling For Thread-Local
951.7Sriastrad	Storage', Version 0.21, 2023-08-22.
961.7Sriastrad	https://akkadia.org/drepper/tls.pdf
971.7Sriastrad	https://web.archive.org/web/20240718081934/https://akkadia.org/drepper/tls.pdf
981.7Sriastrad
991.1SjoergSteps for adding TLS support for a new platform:
1001.1Sjoerg
1011.1Sjoerg(1) Declare TLS variant in machine/types.h by defining either
1021.1Sjoerg__HAVE_TLS_VARIANT_I or __HAVE_TLS_VARIANT_II.
1031.1Sjoerg
1041.7Sriastrad(2) _lwp_makecontext has to set the reserved register or kernel
1051.7Sriastradtransfer variable in uc_mcontext according to the provided value of
1061.7Sriastrad`private'.  Note that _lwp_makecontext takes tcb, not tp, as an
1071.7Sriastradargument, so make sure to adjust it if needed for the tp/tcb offset.
1081.7SriastradSee src/lib/libc/arch/$PLATFORM/gen/_lwp.c.
1091.1Sjoerg
1101.1SjoergThis is not possible on the VAX as there is no free space in ucontext_t.
1111.1SjoergThis requires either a special version of _lwp_create or versioning
1121.1Sjoergeverything using ucontext_t. Debug support depends on getting the data from
1131.1Sjoergucontext_t, so the second option is possibly required.
1141.1Sjoerg
1151.2Sjoerg(3) _lwp_setprivate(2) has to update the same register as
1161.4Sjoerg_lwp_makecontext uses for the private area pointer. Normally
1171.4Sjoergcpu_lwp_setprivate is provided by MD to reflect the kernel view and
1181.4Sjoergenabled by defining __HAVE_CPU_LWP_SETPRIVATE in machine/types.h.
1191.4Sjoergcpu_setmcontext is responsible for keeping the MI l_private field
1201.4Sjoergsynchronised by calling lwp_setprivate as needed.
1211.4Sjoerg
1221.4Sjoergcpu_switchto has to update the mapping.
1231.1Sjoerg
1241.1Sjoerg_lwp_setprivate is used for the initial thread, all other threads
1251.1Sjoergcreated by libpthread use _lwp_makecontext for this purpose.
1261.1Sjoerg
1271.2Sjoerg(4) Provide __tls_get_addr and possible other MD functions for dynamic
1281.1SjoergTLS offset computation. If such alternative entry points exist (currently
1291.1Sjoergonly i386), also add a weak reference to 0 in src/lib/libc/tls/tls.c.
1301.1Sjoerg
1311.2SjoergThe generic implementation can be found in tls.c and is used with
1321.5Sskrll__HAVE_COMMON___TLS_GET_ADDR. It depends on __lwp_getprivate_fast
1331.4Sjoerg(see below).
1341.1Sjoerg
1351.2Sjoerg(5) Implement the necessary relocation records in mdreloc.c.  There are
1361.1Sjoergtypically three relocation types found in dynamic binaries:
1371.1Sjoerg
1381.1Sjoerg(a) R_TYPE(TLS_DTPOFF): Offset inside the module.  The common TLS code
1391.1Sjoergensures that the DTV vector points to offset 0 inside the module TLS block.
1401.1SjoergThis is normally def->st_value + rela->r_addend.
1411.1Sjoerg
1421.1Sjoerg(b) R_TYPE(TLS_DTPMOD): Module index.
1431.1Sjoerg
1441.1Sjoerg(c) R_TYPE(TLS_TPOFF): Static TLS offset.  The code has to check whether
1451.1Sjoergthe static TLS offset for this module has been allocated
1461.6Sjoerg(defobj->tls_static) and otherwise call _rtld_tls_offset_allocate().  This
1471.1Sjoergmay fail if no static space is available and the object has been pulled
1481.6Sjoergin via dlopen(3). It can also fail if the TLS area has already been used
1491.6Sjoergvia a global-dynamic allocation.
1501.1Sjoerg
1511.1SjoergFor TLS Variant I, this is typically:
1521.1Sjoerg
1531.1Sjoergdef->st_value + rela->r_addend + defobj->tlsoffset + sizeof(struct tls_tcb)
1541.1Sjoerg
1551.1Sjoerge.g. the relocation doesn't include the fixed TCB.
1561.1Sjoerg
1571.1SjoergFor TLS Variant II, this is typically:
1581.1Sjoerg
1591.1Sjoergdef->st_value - defobj->tlsoffset + rela->r_addend
1601.1Sjoerg
1611.1Sjoerge.g. starting offset is counting down from the TCB.
1621.1Sjoerg
1631.7Sriastrad(6) If there is a tp/tcb offset, implement
1641.7Sriastrad
1651.7Sriastrad	__lwp_gettcb_fast()
1661.7Sriastrad	__lwp_settcb()
1671.7Sriastrad
1681.7Sriastradin machine/mcontext.h and set
1691.7Sriastrad
1701.7Sriastrad	__HAVE___LWP_GETTCB_FAST
1711.7Sriastrad	__HAVE___LWP_SETTCB
1721.7Sriastrad
1731.7Sriastradin machine/types.h.
1741.7Sriastrad
1751.7SriastradOtherwise, implement __lwp_getprivate_fast() in machine/mcontext.h and
1761.7Sriastradset __HAVE___LWP_GETPRIVATE_FAST in machine/types.h.
1771.1Sjoerg
1781.7Sriastrad(7) Test using src/tests/lib/libc/tls and src/tests/libexec/ld.elf_so.
1791.7SriastradMake sure with "objdump -R" that t_tls_dynamic has two TPOFF
1801.7Sriastradrelocations and h_tls_dlopen.so.1 and libh_tls_dynamic.so.1 have both
1811.7Sriastradtwo DTPMOD and DTPOFF relocations.
182