README.TLS
1 Thread-local storage.
2
3 Each thread has a thread control block, or TCB. The TCB is a
4 variable-size structure headed by `struct tls_tcb' from <sys/tls.h>,
5 with:
6
7 (a) static thread-local storage for the TLS data of initial objects,
8 i.e., those loaded at startup rather than those dynamically loaded
9 by dlopen
10
11 (b) a pointer to a dynamic thread vector (DTV) for the TLS data
12 pointers of objects that use global-dynamic or local-dynamic models
13 (typically shared libraries or dlopenable modules)
14
15 (c) the pthread_t pointer
16
17 The per-thread lwp private pointer, also sometimes called TP (thread
18 pointer), managed by the _lwp_setprivate and _lwp_setprivate syscalls,
19 either points at the TCB directly, or, on some architectures, points at
20
21 tp = tcb + sizeof(struct tls_tcb) + TLS_TP_OFFSET.
22
23 This bias is chosen for architectures where signed displacements from
24 TP enable twice the range of static TLS offsets when biased like this.
25 Architectures with such a tp/tcb offset must provide
26
27 void *__lwp_gettcb_fast(void);
28
29 in machine/mcontext.h and must define __HAVE___LWP_GETTCB_FAST in
30 machine/types.h to reflect this; otherwise they must provide
31 __lwp_getprivate_fast to return the TCB pointer.
32
33 Each architecture has one of two TLS variants, variant I or variant II.
34 Variant I places the static thread-local storage _after_ the fixed
35 content of the TCB, at increasing addresses (increasing addresses grow
36 down in diagram):
37
38 +---------------+
39 | dtv pointer | tcb points here (struct tls_tcb)
40 +---------------+
41 | pthread_t |
42 +---------------+
43 | obj0 tls | obj0->tlsoffset = 0
44 | |
45 | |
46 +---------------+
47 | obj1 tls | obj1->tlsoffset = 3
48 +---------------+
49 | obj2 tls | obj2->tlsoffset = 4
50 | |
51 . .
52 . .
53 . .
54 | |
55 +---------------+
56 | objN tls | objN->tlsoffset = k
57 +---------------+
58
59 Variant II places the static thread-local storage _before_ the fixed
60 content of the TCB, at decreasing addresses:
61
62 +---------------+
63 | objN tls | objN->tlsoffset = k
64 +---------------+
65 | obj(N-1) tls | obj(N-1)->tlsoffset = k - 1
66 . .
67 . .
68 . .
69 | |
70 +---------------+
71 | obj2 tls | obj2->tlsoffset = 4
72 +---------------+
73 | obj1 tls | obj1->tlsoffset = 3
74 +---------------+
75 | obj0 tls | obj0->tlsoffset = 0
76 | |
77 | |
78 +---------------+
79 | tcb pointer | tcb points here (struct tls_tcb)
80 +---------------+
81 | dtv pointer |
82 +---------------+
83 | pthread_t |
84 +---------------+
85
86 See [ELFTLS] Sec. 3 `Run-Time Handling of TLS', Figs 1 and 2, for
87 bigger pictures including the DTV and dynamically allocated TLS blocks.
88
89 Each architecture also has its own ELF ABI processor supplement with
90 the architecture-specific relocations and TLS details.
91
92 References:
93
94 [ELFTLS] Ulrich Drepper, `ELF Handling For Thread-Local
95 Storage', Version 0.21, 2023-08-22.
96 https://akkadia.org/drepper/tls.pdf
97 https://web.archive.org/web/20240718081934/https://akkadia.org/drepper/tls.pdf
98
99 Steps for adding TLS support for a new platform:
100
101 (1) Declare TLS variant in machine/types.h by defining either
102 __HAVE_TLS_VARIANT_I or __HAVE_TLS_VARIANT_II.
103
104 (2) _lwp_makecontext has to set the reserved register or kernel
105 transfer variable in uc_mcontext according to the provided value of
106 `private'. Note that _lwp_makecontext takes tcb, not tp, as an
107 argument, so make sure to adjust it if needed for the tp/tcb offset.
108 See src/lib/libc/arch/$PLATFORM/gen/_lwp.c.
109
110 This is not possible on the VAX as there is no free space in ucontext_t.
111 This requires either a special version of _lwp_create or versioning
112 everything using ucontext_t. Debug support depends on getting the data from
113 ucontext_t, so the second option is possibly required.
114
115 (3) _lwp_setprivate(2) has to update the same register as
116 _lwp_makecontext uses for the private area pointer. Normally
117 cpu_lwp_setprivate is provided by MD to reflect the kernel view and
118 enabled by defining __HAVE_CPU_LWP_SETPRIVATE in machine/types.h.
119 cpu_setmcontext is responsible for keeping the MI l_private field
120 synchronised by calling lwp_setprivate as needed.
121
122 cpu_switchto has to update the mapping.
123
124 _lwp_setprivate is used for the initial thread, all other threads
125 created by libpthread use _lwp_makecontext for this purpose.
126
127 (4) Provide __tls_get_addr and possible other MD functions for dynamic
128 TLS offset computation. If such alternative entry points exist (currently
129 only i386), also add a weak reference to 0 in src/lib/libc/tls/tls.c.
130
131 The generic implementation can be found in tls.c and is used with
132 __HAVE_COMMON___TLS_GET_ADDR. It depends on __lwp_getprivate_fast
133 (see below).
134
135 (5) Implement the necessary relocation records in mdreloc.c. There are
136 typically three relocation types found in dynamic binaries:
137
138 (a) R_TYPE(TLS_DTPOFF): Offset inside the module. The common TLS code
139 ensures that the DTV vector points to offset 0 inside the module TLS block.
140 This is normally def->st_value + rela->r_addend.
141
142 (b) R_TYPE(TLS_DTPMOD): Module index.
143
144 (c) R_TYPE(TLS_TPOFF): Static TLS offset. The code has to check whether
145 the static TLS offset for this module has been allocated
146 (defobj->tls_static) and otherwise call _rtld_tls_offset_allocate(). This
147 may fail if no static space is available and the object has been pulled
148 in via dlopen(3). It can also fail if the TLS area has already been used
149 via a global-dynamic allocation.
150
151 For TLS Variant I, this is typically:
152
153 def->st_value + rela->r_addend + defobj->tlsoffset + sizeof(struct tls_tcb)
154
155 e.g. the relocation doesn't include the fixed TCB.
156
157 For TLS Variant II, this is typically:
158
159 def->st_value - defobj->tlsoffset + rela->r_addend
160
161 e.g. starting offset is counting down from the TCB.
162
163 (6) If there is a tp/tcb offset, implement
164
165 __lwp_gettcb_fast()
166 __lwp_settcb()
167
168 in machine/mcontext.h and set
169
170 __HAVE___LWP_GETTCB_FAST
171 __HAVE___LWP_SETTCB
172
173 in machine/types.h.
174
175 Otherwise, implement __lwp_getprivate_fast() in machine/mcontext.h and
176 set __HAVE___LWP_GETPRIVATE_FAST in machine/types.h.
177
178 (7) Test using src/tests/lib/libc/tls and src/tests/libexec/ld.elf_so.
179 Make sure with "objdump -R" that t_tls_dynamic has two TPOFF
180 relocations and h_tls_dlopen.so.1 and libh_tls_dynamic.so.1 have both
181 two DTPMOD and DTPOFF relocations.
182