TODO revision 1.8
11.8Sad$NetBSD: TODO,v 1.8 2007/03/02 18:53:51 ad Exp $
21.5Sad
31.8SadBugs to fix:
41.2Sthorpej
51.2Sthorpej- Add locking to ld.elf_so so that multiple threads doing lazy binding
61.8Sad  doesn't trash things. XXX Still the case?
71.2Sthorpej- Verify the cancel stub symbol trickery.
81.2Sthorpej
91.8SadInterfaces/features to implement:
101.2Sthorpej
111.2Sthorpej- priority scheduling
121.2Sthorpej- libc integration: 
131.2Sthorpej   - foo_r interfaces
141.2Sthorpej- system integration
151.2Sthorpej   - some macros and prototypes belong in headers other than pthread.h
161.2Sthorpej
171.8SadFeatures that need more/better regression tests:
181.2Sthorpej
191.2Sthorpej - pthread_cond_broadcast()
201.2Sthorpej - pthread_once()
211.2Sthorpej - pthread_get/setspecific()
221.2Sthorpej - signals
231.2Sthorpej
241.8SadIdeas to play with:
251.2Sthorpej
261.8Sad- Explore the trapcontext vs. usercontext distinction in ucontext_t.
271.2Sthorpej
281.2Sthorpej- Get rid of thread structures when too many accumulate (is this
291.2Sthorpej  actually a good idea?)
301.8Sad
311.2Sthorpej- Currently, each thread uses two real pages of memory: one at the top
321.2Sthorpej  of the stack for actual stack data, and one at the bottom for the
331.2Sthorpej  pthread_st. If we can get suitable space above the initial stack for
341.2Sthorpej  main(), we can cut this to one page per thread. Perhaps crt0 should
351.2Sthorpej  do something different (give us more space) if libpthread is linked
361.2Sthorpej  in?
371.8Sad
381.2Sthorpej- Figure out whether/how to expose the inline version of
391.2Sthorpej  pthread_self().
401.8Sad
411.2Sthorpej- Along the same lines, figure out whether/how to use registers reserved
421.2Sthorpej  in the ABI for thread-specific-data to implement pthread_self().
431.8Sad
441.4Schristos- Figure out what to do with changing stack sizes.
451.5Sad
461.5Sad- Stress testing, particularly with multiple CPUs.
471.5Sad
481.6Syamt- A race between pthread_exit() and pthread_create() for detached LWPs,
491.6Syamt  where the stack (and pthread structure) could be reclaimed before the
501.6Syamt  thread has a chance to call _lwp_exit(), is currently prevented by
511.6Syamt  checking the return of _lwp_kill(target, 0).  It could be done more
521.6Syamt  efficiently.  (See shared page item.)
531.5Sad
541.5Sad- Adaptive mutexes and spinlocks (see shared page item). These need
551.5Sad  to implement exponential backoff to reduce bus contention. On x86 we
561.5Sad  need to issue the 'pause' instruction while spinning, perhaps on other
571.5Sad  SMT processors too.
581.5Sad
591.5Sad- Have a shared page that:
601.5Sad
611.5Sad  o Allows an LWP to request it not be preempted by the kernel. This would
621.5Sad    be used over critical sections like pthread_cond_wait(), where we can
631.5Sad    acquire a bunch of spin locks: being preempted while holding them would
641.5Sad    suck. _lwp_park() would reset the flag once in kernel mode, and there
651.5Sad    would need to be an equivalent way to do this from user mode. The user
661.5Sad    path would probably need to notice deferred preemption and call
671.5Sad    sched_yield() on exit from the critical section.
681.5Sad
691.5Sad  o Perhaps has some kind of hint mechanism that gives us a clue about
701.5Sad    whether an LWP is currently running on another CPU. This could be used
711.5Sad    for adaptive locks, but would need to be cheap to do in-kernel.
721.5Sad
731.5Sad  o Perhaps has a flag value that's reset when a detached LWP is into the
741.5Sad    kernel and lwp_exit1(), meaning that its stack can be reclaimed. Again,
751.5Sad    may or may not be worth it.
761.5Sad
771.5Sad- Keep a pool of dead LWPs so that we do not have take the full hit of
781.5Sad  _lwp_create() every time pthread_create() is called. If nothing else
791.5Sad  this is important for benchmarks.. There are a few different ways this
801.5Sad  could be implemented, but it needs to be clear if the advantages are
811.5Sad  real. Lots of thought and benchmarking required.
821.5Sad
831.5Sad- LWPs that are parked or that have called nanosleep() (common) burn up
841.5Sad  kernel resources. "struct lwp" itself isn't a big deal, but the VA space
851.5Sad  and swap used by kernel stacks is. _lwp_park() takes a ucontext_t pointer
861.5Sad  in expectation that at some point we may be able to recycle the kernel
871.5Sad  stack and re-start the LWP at the correct point, using pageable user
881.5Sad  memory to hold state. It might also be useful to have a nanosleep call
891.5Sad  that does something similar. Again, lots of thought and benchmarking
901.5Sad  required. (Original idea from matt@)
911.5Sad
921.5Sad- Need to give consideration to the order in which threads enter and exit
931.5Sad  synchronisation objects, both in the pthread library and in the kernel.
941.8Sad  Commonly locks are acquired/released in order (a, b, c -> c, b, a).
951.5Sad
961.5Sad- The kernel scheduler needs improving to handle LWPs and processor affinity
971.5Sad  better, and user space tools like top(1) and ps(1) need to be changed to
981.5Sad  report correctly.  Tied into that is the need for a mechanism to impose
991.5Sad  limits on various aspects of LWPs.
1001.5Sad
1011.5Sad- Streamlining of the park/unpark path.
1021.5Sad
1031.5Sad- Priority inheritance and similar nasties.
104