Cross Reference: /src/lib/libpthread/TODO

1.8Sad$NetBSD: TODO,v 1.8 2007/03/02 18:53:51 ad Exp $
1.5Sad
1.8SadBugs to fix:
1.2Sthorpej
1.2Sthorpej- Add locking to ld.elf_so so that multiple threads doing lazy binding
1.8Sad  doesn't trash things. XXX Still the case?
1.2Sthorpej- Verify the cancel stub symbol trickery.
1.2Sthorpej
1.8SadInterfaces/features to implement:
1.2Sthorpej
1.2Sthorpej- priority scheduling
1.2Sthorpej- libc integration:
1.2Sthorpej   - foo_r interfaces
1.2Sthorpej- system integration
1.2Sthorpej   - some macros and prototypes belong in headers other than pthread.h
1.2Sthorpej
1.8SadFeatures that need more/better regression tests:
1.2Sthorpej
1.2Sthorpej - pthread_cond_broadcast()
1.2Sthorpej - pthread_once()
1.2Sthorpej - pthread_get/setspecific()
1.2Sthorpej - signals
1.2Sthorpej
1.8SadIdeas to play with:
1.2Sthorpej
1.8Sad- Explore the trapcontext vs. usercontext distinction in ucontext_t.
1.2Sthorpej
1.2Sthorpej- Get rid of thread structures when too many accumulate (is this
1.2Sthorpej  actually a good idea?)
1.8Sad
1.2Sthorpej- Currently, each thread uses two real pages of memory: one at the top
1.2Sthorpej  of the stack for actual stack data, and one at the bottom for the
1.2Sthorpej  pthread_st. If we can get suitable space above the initial stack for
1.2Sthorpej  main(), we can cut this to one page per thread. Perhaps crt0 should
1.2Sthorpej  do something different (give us more space) if libpthread is linked
1.2Sthorpej  in?
1.8Sad
1.2Sthorpej- Figure out whether/how to expose the inline version of
1.2Sthorpej  pthread_self().
1.8Sad
1.2Sthorpej- Along the same lines, figure out whether/how to use registers reserved
1.2Sthorpej  in the ABI for thread-specific-data to implement pthread_self().
1.8Sad
1.4Schristos- Figure out what to do with changing stack sizes.
1.5Sad
1.5Sad- Stress testing, particularly with multiple CPUs.
1.5Sad
1.6Syamt- A race between pthread_exit() and pthread_create() for detached LWPs,
1.6Syamt  where the stack (and pthread structure) could be reclaimed before the
1.6Syamt  thread has a chance to call _lwp_exit(), is currently prevented by
1.6Syamt  checking the return of _lwp_kill(target, 0).  It could be done more
1.6Syamt  efficiently.  (See shared page item.)
1.5Sad
1.5Sad- Adaptive mutexes and spinlocks (see shared page item). These need
1.5Sad  to implement exponential backoff to reduce bus contention. On x86 we
1.5Sad  need to issue the 'pause' instruction while spinning, perhaps on other
1.5Sad  SMT processors too.
1.5Sad
1.5Sad- Have a shared page that:
1.5Sad
1.5Sad  o Allows an LWP to request it not be preempted by the kernel. This would
1.5Sad    be used over critical sections like pthread_cond_wait(), where we can
1.5Sad    acquire a bunch of spin locks: being preempted while holding them would
1.5Sad    suck. _lwp_park() would reset the flag once in kernel mode, and there
1.5Sad    would need to be an equivalent way to do this from user mode. The user
1.5Sad    path would probably need to notice deferred preemption and call
1.5Sad    sched_yield() on exit from the critical section.
1.5Sad
1.5Sad  o Perhaps has some kind of hint mechanism that gives us a clue about
1.5Sad    whether an LWP is currently running on another CPU. This could be used
1.5Sad    for adaptive locks, but would need to be cheap to do in-kernel.
1.5Sad
1.5Sad  o Perhaps has a flag value that's reset when a detached LWP is into the
1.5Sad    kernel and lwp_exit1(), meaning that its stack can be reclaimed. Again,
1.5Sad    may or may not be worth it.
1.5Sad
1.5Sad- Keep a pool of dead LWPs so that we do not have take the full hit of
1.5Sad  _lwp_create() every time pthread_create() is called. If nothing else
1.5Sad  this is important for benchmarks.. There are a few different ways this
1.5Sad  could be implemented, but it needs to be clear if the advantages are
1.5Sad  real. Lots of thought and benchmarking required.
1.5Sad
1.5Sad- LWPs that are parked or that have called nanosleep() (common) burn up
1.5Sad  kernel resources. "struct lwp" itself isn't a big deal, but the VA space
1.5Sad  and swap used by kernel stacks is. _lwp_park() takes a ucontext_t pointer
1.5Sad  in expectation that at some point we may be able to recycle the kernel
1.5Sad  stack and re-start the LWP at the correct point, using pageable user
1.5Sad  memory to hold state. It might also be useful to have a nanosleep call
1.5Sad  that does something similar. Again, lots of thought and benchmarking
1.5Sad  required. (Original idea from matt@)
1.5Sad
1.5Sad- Need to give consideration to the order in which threads enter and exit
1.5Sad  synchronisation objects, both in the pthread library and in the kernel.
1.8Sad  Commonly locks are acquired/released in order (a, b, c -> c, b, a).
1.5Sad
1.5Sad- The kernel scheduler needs improving to handle LWPs and processor affinity
1.5Sad  better, and user space tools like top(1) and ps(1) need to be changed to
1.5Sad  report correctly.  Tied into that is the need for a mechanism to impose
1.5Sad  limits on various aspects of LWPs.
1.5Sad
1.5Sad- Streamlining of the park/unpark path.
1.5Sad
1.5Sad- Priority inheritance and similar nasties.