TODO revision 1.6
11.6Syamt$NetBSD: TODO,v 1.6 2007/02/15 15:39:33 yamt Exp $
21.5Sad
31.5SadBugs to fix, mostly with SA:
41.2Sthorpej
51.3Sjdolecek- some blocking routines (like sem_wait()) don't work if SA's aren't
61.3Sjdolecek  running yet, because the alarm system isn't up and running or there is no
71.3Sjdolecek  thread context to switch to. It would be weird to use them that
81.3Sjdolecek  way, but it's perfectly legal.
91.2Sthorpej- There is a race between pthread_cancel() and
101.2Sthorpej  pthread_cond_broadcast() or pthread_exit() about removing an item
111.2Sthorpej  from the sleep queue. The locking protocols there need a little
121.2Sthorpej  adjustment.
131.4Schristos- pthread_sig.c: pthread__kill_self() passes a bogus ucontext to the handler.
141.4Schristos  This is probably not very important.
151.2Sthorpej- pthread_sig.c: Come up with a signal trampoline naming convention like
161.2Sthorpej  libc's, so that GDB will have an easier time with things.
171.2Sthorpej- Consider moving pthread__signal_tramp() to its own file, and building
181.2Sthorpej  it with -fasync-unwind-tables, so that DWARF2 EH unwinding works through
191.2Sthorpej  it.  (This is required for e.g. GCC's libjava.)
201.2Sthorpej- Add locking to ld.elf_so so that multiple threads doing lazy binding
211.2Sthorpej  doesn't trash things.
221.2Sthorpej- Verify the cancel stub symbol trickery.
231.2Sthorpej
241.2Sthorpej
251.2SthorpejInterfaces/features to implement:
261.2Sthorpej- pthread_atfork()
271.2Sthorpej- priority scheduling
281.2Sthorpej- libc integration: 
291.2Sthorpej   - foo_r interfaces
301.2Sthorpej- system integration
311.2Sthorpej   - some macros and prototypes belong in headers other than pthread.h
321.2Sthorpej
331.2Sthorpej
341.2SthorpejFeatures that need more/better regression tests:
351.2Sthorpej - pthread_cond_broadcast()
361.2Sthorpej - pthread_once()
371.2Sthorpej - pthread_get/setspecific()
381.2Sthorpej - signals
391.2Sthorpej
401.2Sthorpej
411.2SthorpejThings that need fixing:
421.2Sthorpej- Recycle dead threads for new threads.
431.2Sthorpej
441.2SthorpejIdeas to play with:
451.2Sthorpej- Explore the trapcontext vs. usercontext distinction in ucontext_t.
461.2Sthorpej- Get rid of thread structures when too many accumulate (is this
471.2Sthorpej  actually a good idea?)
481.2Sthorpej- Adaptive spin/sleep locks for mutexes.
491.2Sthorpej- Currently, each thread uses two real pages of memory: one at the top
501.2Sthorpej  of the stack for actual stack data, and one at the bottom for the
511.2Sthorpej  pthread_st. If we can get suitable space above the initial stack for
521.2Sthorpej  main(), we can cut this to one page per thread. Perhaps crt0 should
531.2Sthorpej  do something different (give us more space) if libpthread is linked
541.2Sthorpej  in?
551.2Sthorpej- Figure out whether/how to expose the inline version of
561.2Sthorpej  pthread_self().
571.2Sthorpej- Along the same lines, figure out whether/how to use registers reserved
581.2Sthorpej  in the ABI for thread-specific-data to implement pthread_self().
591.4Schristos- Figure out what to do with changing stack sizes.
601.5Sad
611.5SadFuture work for 1:1 threads:
621.5Sad
631.5Sad- Stress testing, particularly with multiple CPUs.
641.5Sad
651.5Sad- Verify that gdb still works well (basic functionality seems to be OK).
661.5Sad
671.6Syamt- A race between pthread_exit() and pthread_create() for detached LWPs,
681.6Syamt  where the stack (and pthread structure) could be reclaimed before the
691.6Syamt  thread has a chance to call _lwp_exit(), is currently prevented by
701.6Syamt  checking the return of _lwp_kill(target, 0).  It could be done more
711.6Syamt  efficiently.  (See shared page item.)
721.5Sad
731.5Sad- Adaptive mutexes and spinlocks (see shared page item). These need
741.5Sad  to implement exponential backoff to reduce bus contention. On x86 we
751.5Sad  need to issue the 'pause' instruction while spinning, perhaps on other
761.5Sad  SMT processors too.
771.5Sad
781.5Sad- Have a shared page that:
791.5Sad
801.5Sad  o Allows an LWP to request it not be preempted by the kernel. This would
811.5Sad    be used over critical sections like pthread_cond_wait(), where we can
821.5Sad    acquire a bunch of spin locks: being preempted while holding them would
831.5Sad    suck. _lwp_park() would reset the flag once in kernel mode, and there
841.5Sad    would need to be an equivalent way to do this from user mode. The user
851.5Sad    path would probably need to notice deferred preemption and call
861.5Sad    sched_yield() on exit from the critical section.
871.5Sad
881.5Sad  o Perhaps has some kind of hint mechanism that gives us a clue about
891.5Sad    whether an LWP is currently running on another CPU. This could be used
901.5Sad    for adaptive locks, but would need to be cheap to do in-kernel.
911.5Sad
921.5Sad  o Perhaps has a flag value that's reset when a detached LWP is into the
931.5Sad    kernel and lwp_exit1(), meaning that its stack can be reclaimed. Again,
941.5Sad    may or may not be worth it.
951.5Sad
961.5Sad- Keep a pool of dead LWPs so that we do not have take the full hit of
971.5Sad  _lwp_create() every time pthread_create() is called. If nothing else
981.5Sad  this is important for benchmarks.. There are a few different ways this
991.5Sad  could be implemented, but it needs to be clear if the advantages are
1001.5Sad  real. Lots of thought and benchmarking required.
1011.5Sad
1021.5Sad- LWPs that are parked or that have called nanosleep() (common) burn up
1031.5Sad  kernel resources. "struct lwp" itself isn't a big deal, but the VA space
1041.5Sad  and swap used by kernel stacks is. _lwp_park() takes a ucontext_t pointer
1051.5Sad  in expectation that at some point we may be able to recycle the kernel
1061.5Sad  stack and re-start the LWP at the correct point, using pageable user
1071.5Sad  memory to hold state. It might also be useful to have a nanosleep call
1081.5Sad  that does something similar. Again, lots of thought and benchmarking
1091.5Sad  required. (Original idea from matt@)
1101.5Sad
1111.5Sad- It's possible that we don't need to take so many spinlocks around
1121.5Sad  cancellation points like pthread_cond_wait() given that _lwp_wakeup()
1131.5Sad  and _lwp_unpark() need to synchronise anyway.
1141.5Sad
1151.5Sad- Need to give consideration to the order in which threads enter and exit
1161.5Sad  synchronisation objects, both in the pthread library and in the kernel.
1171.5Sad  Commonly locks are acquired/released in order (a, b, c -> c, b, a). The
1181.5Sad  pthread spec probably has something to say about this.
1191.5Sad
1201.5Sad- The kernel scheduler needs improving to handle LWPs and processor affinity
1211.5Sad  better, and user space tools like top(1) and ps(1) need to be changed to
1221.5Sad  report correctly.  Tied into that is the need for a mechanism to impose
1231.5Sad  limits on various aspects of LWPs.
1241.5Sad
1251.5Sad- Streamlining of the park/unpark path.
1261.5Sad
1271.5Sad- Priority inheritance and similar nasties.
128