TODO revision 1.8
11.8Sad$NetBSD: TODO,v 1.8 2007/03/02 18:53:51 ad Exp $ 21.5Sad 31.8SadBugs to fix: 41.2Sthorpej 51.2Sthorpej- Add locking to ld.elf_so so that multiple threads doing lazy binding 61.8Sad doesn't trash things. XXX Still the case? 71.2Sthorpej- Verify the cancel stub symbol trickery. 81.2Sthorpej 91.8SadInterfaces/features to implement: 101.2Sthorpej 111.2Sthorpej- priority scheduling 121.2Sthorpej- libc integration: 131.2Sthorpej - foo_r interfaces 141.2Sthorpej- system integration 151.2Sthorpej - some macros and prototypes belong in headers other than pthread.h 161.2Sthorpej 171.8SadFeatures that need more/better regression tests: 181.2Sthorpej 191.2Sthorpej - pthread_cond_broadcast() 201.2Sthorpej - pthread_once() 211.2Sthorpej - pthread_get/setspecific() 221.2Sthorpej - signals 231.2Sthorpej 241.8SadIdeas to play with: 251.2Sthorpej 261.8Sad- Explore the trapcontext vs. usercontext distinction in ucontext_t. 271.2Sthorpej 281.2Sthorpej- Get rid of thread structures when too many accumulate (is this 291.2Sthorpej actually a good idea?) 301.8Sad 311.2Sthorpej- Currently, each thread uses two real pages of memory: one at the top 321.2Sthorpej of the stack for actual stack data, and one at the bottom for the 331.2Sthorpej pthread_st. If we can get suitable space above the initial stack for 341.2Sthorpej main(), we can cut this to one page per thread. Perhaps crt0 should 351.2Sthorpej do something different (give us more space) if libpthread is linked 361.2Sthorpej in? 371.8Sad 381.2Sthorpej- Figure out whether/how to expose the inline version of 391.2Sthorpej pthread_self(). 401.8Sad 411.2Sthorpej- Along the same lines, figure out whether/how to use registers reserved 421.2Sthorpej in the ABI for thread-specific-data to implement pthread_self(). 431.8Sad 441.4Schristos- Figure out what to do with changing stack sizes. 451.5Sad 461.5Sad- Stress testing, particularly with multiple CPUs. 471.5Sad 481.6Syamt- A race between pthread_exit() and pthread_create() for detached LWPs, 491.6Syamt where the stack (and pthread structure) could be reclaimed before the 501.6Syamt thread has a chance to call _lwp_exit(), is currently prevented by 511.6Syamt checking the return of _lwp_kill(target, 0). It could be done more 521.6Syamt efficiently. (See shared page item.) 531.5Sad 541.5Sad- Adaptive mutexes and spinlocks (see shared page item). These need 551.5Sad to implement exponential backoff to reduce bus contention. On x86 we 561.5Sad need to issue the 'pause' instruction while spinning, perhaps on other 571.5Sad SMT processors too. 581.5Sad 591.5Sad- Have a shared page that: 601.5Sad 611.5Sad o Allows an LWP to request it not be preempted by the kernel. This would 621.5Sad be used over critical sections like pthread_cond_wait(), where we can 631.5Sad acquire a bunch of spin locks: being preempted while holding them would 641.5Sad suck. _lwp_park() would reset the flag once in kernel mode, and there 651.5Sad would need to be an equivalent way to do this from user mode. The user 661.5Sad path would probably need to notice deferred preemption and call 671.5Sad sched_yield() on exit from the critical section. 681.5Sad 691.5Sad o Perhaps has some kind of hint mechanism that gives us a clue about 701.5Sad whether an LWP is currently running on another CPU. This could be used 711.5Sad for adaptive locks, but would need to be cheap to do in-kernel. 721.5Sad 731.5Sad o Perhaps has a flag value that's reset when a detached LWP is into the 741.5Sad kernel and lwp_exit1(), meaning that its stack can be reclaimed. Again, 751.5Sad may or may not be worth it. 761.5Sad 771.5Sad- Keep a pool of dead LWPs so that we do not have take the full hit of 781.5Sad _lwp_create() every time pthread_create() is called. If nothing else 791.5Sad this is important for benchmarks.. There are a few different ways this 801.5Sad could be implemented, but it needs to be clear if the advantages are 811.5Sad real. Lots of thought and benchmarking required. 821.5Sad 831.5Sad- LWPs that are parked or that have called nanosleep() (common) burn up 841.5Sad kernel resources. "struct lwp" itself isn't a big deal, but the VA space 851.5Sad and swap used by kernel stacks is. _lwp_park() takes a ucontext_t pointer 861.5Sad in expectation that at some point we may be able to recycle the kernel 871.5Sad stack and re-start the LWP at the correct point, using pageable user 881.5Sad memory to hold state. It might also be useful to have a nanosleep call 891.5Sad that does something similar. Again, lots of thought and benchmarking 901.5Sad required. (Original idea from matt@) 911.5Sad 921.5Sad- Need to give consideration to the order in which threads enter and exit 931.5Sad synchronisation objects, both in the pthread library and in the kernel. 941.8Sad Commonly locks are acquired/released in order (a, b, c -> c, b, a). 951.5Sad 961.5Sad- The kernel scheduler needs improving to handle LWPs and processor affinity 971.5Sad better, and user space tools like top(1) and ps(1) need to be changed to 981.5Sad report correctly. Tied into that is the need for a mechanism to impose 991.5Sad limits on various aspects of LWPs. 1001.5Sad 1011.5Sad- Streamlining of the park/unpark path. 1021.5Sad 1031.5Sad- Priority inheritance and similar nasties. 104