TODO.smpnet revision 1.23
1$NetBSD: TODO.smpnet,v 1.23 2018/08/14 14:49:13 maxv Exp $
2
3MP-safe components
4==================
5
6They work without the big kernel lock (KERNEL_LOCK), i.e., with NET_MPSAFE
7kernel option.  Some components scale up and some don't.
8
9 - Device drivers
10   - vioif(4)
11   - vmx(4)
12   - wm(4)
13   - ixg(4)
14   - ixv(4)
15 - Layer 2
16   - Ethernet (if_ethersubr.c)
17   - bridge(4)
18     - STP
19   - Fast forward (ipflow)
20 - Layer 3
21   - All except for items in the below section
22 - Interfaces
23   - gif(4)
24   - ipsecif(4)
25   - l2tp(4)
26   - pppoe(4)
27     - if_spppsubr.c
28   - tun(4)
29   - vlan(4)
30 - Packet filters
31   - npf(7)
32 - Others
33   - bpf(4)
34   - ipsec(4)
35   - opencrypto(9)
36   - pfil(9)
37
38Non MP-safe components and kernel options
39=========================================
40
41The components and options aren't MP-safe, i.e., requires the big kernel lock,
42yet.  Some of them can be used safely even if NET_MPSAFE is enabled because
43they're still protected by the big kernel lock.  The others aren't protected and
44so unsafe, e.g, they may crash the kernel.
45
46Protected ones
47--------------
48
49 - Device drivers
50   - Most drivers other than ones listed in the above section
51 - Layer 4
52   - DCCP
53   - SCTP
54   - TCP
55   - UDP
56
57Unprotected ones
58----------------
59
60 - Layer 2
61   - ARCNET (if_arcsubr.c)
62   - ATM (if_atmsubr.c)
63   - BRIDGE_IPF
64   - FDDI (if_fddisubr.c)
65   - HIPPI (if_hippisubr.c)
66   - IEEE 1394 (if_ieee1394subr.c)
67   - IEEE 802.11 (ieee80211(4))
68   - Token ring (if_tokensubr.c)
69 - Layer 3
70   - IPSELSRC
71   - MROUTING
72   - PIM
73   - MPLS (mpls(4))
74   - IPv6 address selection policy
75 - Interfaces
76   - agr(4)
77   - carp(4)
78   - faith(4)
79   - gre(4)
80   - ppp(4)
81   - sl(4)
82   - stf(4)
83   - strip(4)
84   - if_srt
85   - tap(4)
86 - Packet filters
87   - ipf(4)
88   - pf(4)
89 - Others
90   - AppleTalk (sys/netatalk/)
91   - ATM (sys/netnatm/)
92   - Bluetooth (sys/netbt/)
93   - altq(4)
94   - CIFS (sys/netsmb/)
95   - ISDN (sys/netisbn/)
96   - kttcp(4)
97   - NFS
98
99Know issues
100===========
101
102NOMPSAFE
103--------
104
105We use "NOMPSAFE" as a mark that indicates that the code around it isn't MP-safe
106yet.  We use it in comments and also use as part of function names, for example
107m_get_rcvif_NOMPSAFE.  Let's use "NOMPSAFE" to make it easy to find non-MP-safe
108codes by grep.
109
110bpf
111---
112
113MP-ification of bpf requires all of bpf_mtap* are called in normal LWP context
114or softint context, i.e., not in hardware interrupt context.  For Tx, all
115bpf_mtap satisfy the requrement.  For Rx, most of bpf_mtap are called in softint.
116Unfortunately some bpf_mtap on Rx are still called in hardware interrupt context.
117
118This is the list of the functions that have such bpf_mtap:
119
120 - sca_frame_process() @ sys/dev/ic/hd64570.c
121 - en_intr() @ sys/dev/ic/midway.c
122 - rxintr_cleanup() @ sys/dev/pci/if_lmc.c
123 - ipr_rx_data_rdy() @ sys/netisdn/i4b_ipr.c
124
125Ideally we should make the functions run in softint somehow, but we don't have
126actual devices, no time (or interest/love) to work on the task, so instead we
127provide a deferred bpf_mtap mechanism that forcibly runs bpf_mtap in softint
128context.  It's a workaround and once the functions run in softint, we should use
129the original bpf_mtap again.
130
131Lingering obsolete variables
132-----------------------------
133
134Some obsolete global variables and member variables of structures remain to
135avoid breaking old userland programs which directly access such variables via
136kvm(3).
137
138The following programs still use kvm(3) to get some information related to
139the network stack.
140
141 - netstat(1)
142 - vmstat(1)
143 - fstat(1)
144
145netstat(1) accesses ifnet_list, the head of a list of interface objects
146(struct ifnet), and traverses each object through ifnet#if_list member variable.
147ifnet_list and ifnet#if_list is obsoleted by ifnet_pslist and
148ifnet#if_pslist_entry respectively. netstat also accesses the IP address list
149of an interface throught ifnet#if_addrlist. struct ifaddr, struct in_ifaddr
150and struct in6_ifaddr are accessed and the following obsolete member variables
151are stuck: ifaddr#ifa_list, in_ifaddr#ia_hash, in_ifaddr#ia_list,
152in6_ifaddr#ia_next and in6_ifaddr#_ia6_multiaddrs. Note that netstat already
153implements alternative methods to fetch the above information via sysctl(3).
154
155vmstat(1) shows statistics of hash tables created by hashinit(9) in the kernel.
156The statistic information is retrieved via kvm(3). The global variables
157in_ifaddrhash and in_ifaddrhashtbl, which are for a hash table of IPv4
158addresses and obsoleted by in_ifaddrhash_pslist and in_ifaddrhashtbl_pslist,
159are kept for this purpose. We should provide a means to fetch statistics of
160hash tables via sysctl(3).
161
162fstat(1) shows information of bpf instances. Each bpf instance (struct bpf) is
163obtained via kvm(3). bpf_d#_bd_next, bpf_d#_bd_filter and bpf_d#_bd_list
164member variables are obsolete but remain. ifnet#if_xname is also accessed
165via struct bpf_if and obsolete ifnet#if_list is required to remain to not change
166the offset of ifnet#if_xname. The statistic counters (bpf#bd_rcount,
167bpf#bd_dcount and bpf#bd_ccount) are also victims of this restriction; for
168scalability the statistic counters should be per-CPU and we should stop using
169atomic operations for them however we have to remain the counters and atomic
170operations.
171
172Scalability
173-----------
174
175 - Per-CPU rtcaches (used in say IP forwarding) aren't scalable on multiple
176   flows per CPU
177 - ipsec(4) isn't scalable on the number of SA/SP; the cost of a look-up
178   is O(n)
179 - opencrypto(9)'s crypto_newsession()/crypto_freesession() aren't scalable
180   as they are serialized by one mutex
181
182ec_multi* of ethercom
183---------------------
184
185ec_multiaddrs and ec_multicnt of struct ethercom and items listed in
186ec_multiaddrs must be protected by ec_lock.  The core of ethernet subsystem is
187already MP-safe, however, device drivers that use the data should also be fixed.
188A typical change should be to protect manipulations of the data via ETHER_*
189macros such as ETHER_FIRST_MULTI by ETHER_LOCK and ETHER_UNLOCK.
190
191ALTQ
192----
193
194If ALTQ is enabled in the kernel, it enforces to use just one Tx queue (if_snd)
195for packet transmissions, resulting in serializing all Tx packet processing on
196the queue.  We should probably design and implement an alternative queuing
197mechanism that deals with multi-core systems at the first place, not making the
198existing ALTQ MP-safe because it's just annoying.
199