TODO revision 1.6
11.6Syamt$NetBSD: TODO,v 1.6 2007/02/15 15:39:33 yamt Exp $ 21.5Sad 31.5SadBugs to fix, mostly with SA: 41.2Sthorpej 51.3Sjdolecek- some blocking routines (like sem_wait()) don't work if SA's aren't 61.3Sjdolecek running yet, because the alarm system isn't up and running or there is no 71.3Sjdolecek thread context to switch to. It would be weird to use them that 81.3Sjdolecek way, but it's perfectly legal. 91.2Sthorpej- There is a race between pthread_cancel() and 101.2Sthorpej pthread_cond_broadcast() or pthread_exit() about removing an item 111.2Sthorpej from the sleep queue. The locking protocols there need a little 121.2Sthorpej adjustment. 131.4Schristos- pthread_sig.c: pthread__kill_self() passes a bogus ucontext to the handler. 141.4Schristos This is probably not very important. 151.2Sthorpej- pthread_sig.c: Come up with a signal trampoline naming convention like 161.2Sthorpej libc's, so that GDB will have an easier time with things. 171.2Sthorpej- Consider moving pthread__signal_tramp() to its own file, and building 181.2Sthorpej it with -fasync-unwind-tables, so that DWARF2 EH unwinding works through 191.2Sthorpej it. (This is required for e.g. GCC's libjava.) 201.2Sthorpej- Add locking to ld.elf_so so that multiple threads doing lazy binding 211.2Sthorpej doesn't trash things. 221.2Sthorpej- Verify the cancel stub symbol trickery. 231.2Sthorpej 241.2Sthorpej 251.2SthorpejInterfaces/features to implement: 261.2Sthorpej- pthread_atfork() 271.2Sthorpej- priority scheduling 281.2Sthorpej- libc integration: 291.2Sthorpej - foo_r interfaces 301.2Sthorpej- system integration 311.2Sthorpej - some macros and prototypes belong in headers other than pthread.h 321.2Sthorpej 331.2Sthorpej 341.2SthorpejFeatures that need more/better regression tests: 351.2Sthorpej - pthread_cond_broadcast() 361.2Sthorpej - pthread_once() 371.2Sthorpej - pthread_get/setspecific() 381.2Sthorpej - signals 391.2Sthorpej 401.2Sthorpej 411.2SthorpejThings that need fixing: 421.2Sthorpej- Recycle dead threads for new threads. 431.2Sthorpej 441.2SthorpejIdeas to play with: 451.2Sthorpej- Explore the trapcontext vs. usercontext distinction in ucontext_t. 461.2Sthorpej- Get rid of thread structures when too many accumulate (is this 471.2Sthorpej actually a good idea?) 481.2Sthorpej- Adaptive spin/sleep locks for mutexes. 491.2Sthorpej- Currently, each thread uses two real pages of memory: one at the top 501.2Sthorpej of the stack for actual stack data, and one at the bottom for the 511.2Sthorpej pthread_st. If we can get suitable space above the initial stack for 521.2Sthorpej main(), we can cut this to one page per thread. Perhaps crt0 should 531.2Sthorpej do something different (give us more space) if libpthread is linked 541.2Sthorpej in? 551.2Sthorpej- Figure out whether/how to expose the inline version of 561.2Sthorpej pthread_self(). 571.2Sthorpej- Along the same lines, figure out whether/how to use registers reserved 581.2Sthorpej in the ABI for thread-specific-data to implement pthread_self(). 591.4Schristos- Figure out what to do with changing stack sizes. 601.5Sad 611.5SadFuture work for 1:1 threads: 621.5Sad 631.5Sad- Stress testing, particularly with multiple CPUs. 641.5Sad 651.5Sad- Verify that gdb still works well (basic functionality seems to be OK). 661.5Sad 671.6Syamt- A race between pthread_exit() and pthread_create() for detached LWPs, 681.6Syamt where the stack (and pthread structure) could be reclaimed before the 691.6Syamt thread has a chance to call _lwp_exit(), is currently prevented by 701.6Syamt checking the return of _lwp_kill(target, 0). It could be done more 711.6Syamt efficiently. (See shared page item.) 721.5Sad 731.5Sad- Adaptive mutexes and spinlocks (see shared page item). These need 741.5Sad to implement exponential backoff to reduce bus contention. On x86 we 751.5Sad need to issue the 'pause' instruction while spinning, perhaps on other 761.5Sad SMT processors too. 771.5Sad 781.5Sad- Have a shared page that: 791.5Sad 801.5Sad o Allows an LWP to request it not be preempted by the kernel. This would 811.5Sad be used over critical sections like pthread_cond_wait(), where we can 821.5Sad acquire a bunch of spin locks: being preempted while holding them would 831.5Sad suck. _lwp_park() would reset the flag once in kernel mode, and there 841.5Sad would need to be an equivalent way to do this from user mode. The user 851.5Sad path would probably need to notice deferred preemption and call 861.5Sad sched_yield() on exit from the critical section. 871.5Sad 881.5Sad o Perhaps has some kind of hint mechanism that gives us a clue about 891.5Sad whether an LWP is currently running on another CPU. This could be used 901.5Sad for adaptive locks, but would need to be cheap to do in-kernel. 911.5Sad 921.5Sad o Perhaps has a flag value that's reset when a detached LWP is into the 931.5Sad kernel and lwp_exit1(), meaning that its stack can be reclaimed. Again, 941.5Sad may or may not be worth it. 951.5Sad 961.5Sad- Keep a pool of dead LWPs so that we do not have take the full hit of 971.5Sad _lwp_create() every time pthread_create() is called. If nothing else 981.5Sad this is important for benchmarks.. There are a few different ways this 991.5Sad could be implemented, but it needs to be clear if the advantages are 1001.5Sad real. Lots of thought and benchmarking required. 1011.5Sad 1021.5Sad- LWPs that are parked or that have called nanosleep() (common) burn up 1031.5Sad kernel resources. "struct lwp" itself isn't a big deal, but the VA space 1041.5Sad and swap used by kernel stacks is. _lwp_park() takes a ucontext_t pointer 1051.5Sad in expectation that at some point we may be able to recycle the kernel 1061.5Sad stack and re-start the LWP at the correct point, using pageable user 1071.5Sad memory to hold state. It might also be useful to have a nanosleep call 1081.5Sad that does something similar. Again, lots of thought and benchmarking 1091.5Sad required. (Original idea from matt@) 1101.5Sad 1111.5Sad- It's possible that we don't need to take so many spinlocks around 1121.5Sad cancellation points like pthread_cond_wait() given that _lwp_wakeup() 1131.5Sad and _lwp_unpark() need to synchronise anyway. 1141.5Sad 1151.5Sad- Need to give consideration to the order in which threads enter and exit 1161.5Sad synchronisation objects, both in the pthread library and in the kernel. 1171.5Sad Commonly locks are acquired/released in order (a, b, c -> c, b, a). The 1181.5Sad pthread spec probably has something to say about this. 1191.5Sad 1201.5Sad- The kernel scheduler needs improving to handle LWPs and processor affinity 1211.5Sad better, and user space tools like top(1) and ps(1) need to be changed to 1221.5Sad report correctly. Tied into that is the need for a mechanism to impose 1231.5Sad limits on various aspects of LWPs. 1241.5Sad 1251.5Sad- Streamlining of the park/unpark path. 1261.5Sad 1271.5Sad- Priority inheritance and similar nasties. 128