TODO.smpnet revision 1.30
11.30Smsaitoh$NetBSD: TODO.smpnet,v 1.30 2020/01/06 05:38:59 msaitoh Exp $
21.1Sozaki
31.2SozakiMP-safe components
41.2Sozaki==================
51.1Sozaki
61.21SozakiThey work without the big kernel lock (KERNEL_LOCK), i.e., with NET_MPSAFE
71.21Sozakikernel option.  Some components scale up and some don't.
81.21Sozaki
91.7Sozaki - Device drivers
101.30Smsaitoh   - aq(4)
111.7Sozaki   - vioif(4)
121.7Sozaki   - vmx(4)
131.7Sozaki   - wm(4)
141.8Sozaki   - ixg(4)
151.30Smsaitoh   - ixl(4)
161.9Smsaitoh   - ixv(4)
171.7Sozaki - Layer 2
181.7Sozaki   - Ethernet (if_ethersubr.c)
191.7Sozaki   - bridge(4)
201.7Sozaki     - STP
211.7Sozaki   - Fast forward (ipflow)
221.7Sozaki - Layer 3
231.7Sozaki   - All except for items in the below section
241.7Sozaki - Interfaces
251.7Sozaki   - gif(4)
261.22Sozaki   - ipsecif(4)
271.7Sozaki   - l2tp(4)
281.7Sozaki   - pppoe(4)
291.7Sozaki     - if_spppsubr.c
301.7Sozaki   - tun(4)
311.12Sozaki   - vlan(4)
321.7Sozaki - Packet filters
331.7Sozaki   - npf(7)
341.7Sozaki - Others
351.7Sozaki   - bpf(4)
361.12Sozaki   - ipsec(4)
371.12Sozaki   - opencrypto(9)
381.7Sozaki   - pfil(9)
391.2Sozaki
401.2SozakiNon MP-safe components and kernel options
411.2Sozaki=========================================
421.2Sozaki
431.21SozakiThe components and options aren't MP-safe, i.e., requires the big kernel lock,
441.21Sozakiyet.  Some of them can be used safely even if NET_MPSAFE is enabled because
451.21Sozakithey're still protected by the big kernel lock.  The others aren't protected and
461.21Sozakiso unsafe, e.g, they may crash the kernel.
471.21Sozaki
481.21SozakiProtected ones
491.21Sozaki--------------
501.21Sozaki
511.7Sozaki - Device drivers
521.7Sozaki   - Most drivers other than ones listed in the above section
531.21Sozaki - Layer 4
541.21Sozaki   - DCCP
551.21Sozaki   - SCTP
561.21Sozaki   - TCP
571.21Sozaki   - UDP
581.21Sozaki
591.21SozakiUnprotected ones
601.21Sozaki----------------
611.21Sozaki
621.6Sozaki - Layer 2
631.6Sozaki   - ARCNET (if_arcsubr.c)
641.6Sozaki   - BRIDGE_IPF
651.6Sozaki   - FDDI (if_fddisubr.c)
661.6Sozaki   - HIPPI (if_hippisubr.c)
671.6Sozaki   - IEEE 1394 (if_ieee1394subr.c)
681.6Sozaki   - IEEE 802.11 (ieee80211(4))
691.6Sozaki   - Token ring (if_tokensubr.c)
701.6Sozaki - Layer 3
711.6Sozaki   - IPSELSRC
721.6Sozaki   - MROUTING
731.6Sozaki   - PIM
741.6Sozaki   - MPLS (mpls(4))
751.17Sozaki   - IPv6 address selection policy
761.6Sozaki - Interfaces
771.6Sozaki   - agr(4)
781.6Sozaki   - carp(4)
791.6Sozaki   - faith(4)
801.6Sozaki   - gre(4)
811.6Sozaki   - ppp(4)
821.6Sozaki   - sl(4)
831.6Sozaki   - stf(4)
841.6Sozaki   - strip(4)
851.6Sozaki   - if_srt
861.6Sozaki   - tap(4)
871.6Sozaki - Packet filters
881.6Sozaki   - ipf(4)
891.6Sozaki   - pf(4)
901.6Sozaki - Others
911.6Sozaki   - AppleTalk (sys/netatalk/)
921.6Sozaki   - Bluetooth (sys/netbt/)
931.6Sozaki   - altq(4)
941.6Sozaki   - CIFS (sys/netsmb/)
951.6Sozaki   - kttcp(4)
961.6Sozaki   - NFS
971.2Sozaki
981.2SozakiKnow issues
991.2Sozaki===========
1001.1Sozaki
1011.15SozakiNOMPSAFE
1021.15Sozaki--------
1031.15Sozaki
1041.15SozakiWe use "NOMPSAFE" as a mark that indicates that the code around it isn't MP-safe
1051.15Sozakiyet.  We use it in comments and also use as part of function names, for example
1061.15Sozakim_get_rcvif_NOMPSAFE.  Let's use "NOMPSAFE" to make it easy to find non-MP-safe
1071.15Sozakicodes by grep.
1081.15Sozaki
1091.1Sozakibpf
1101.2Sozaki---
1111.1Sozaki
1121.1SozakiMP-ification of bpf requires all of bpf_mtap* are called in normal LWP context
1131.1Sozakior softint context, i.e., not in hardware interrupt context.  For Tx, all
1141.1Sozakibpf_mtap satisfy the requrement.  For Rx, most of bpf_mtap are called in softint.
1151.1SozakiUnfortunately some bpf_mtap on Rx are still called in hardware interrupt context.
1161.1Sozaki
1171.1SozakiThis is the list of the functions that have such bpf_mtap:
1181.1Sozaki
1191.1Sozaki - sca_frame_process() @ sys/dev/ic/hd64570.c
1201.1Sozaki
1211.1SozakiIdeally we should make the functions run in softint somehow, but we don't have
1221.1Sozakiactual devices, no time (or interest/love) to work on the task, so instead we
1231.1Sozakiprovide a deferred bpf_mtap mechanism that forcibly runs bpf_mtap in softint
1241.1Sozakicontext.  It's a workaround and once the functions run in softint, we should use
1251.1Sozakithe original bpf_mtap again.
1261.10Sozaki
1271.10SozakiLingering obsolete variables
1281.10Sozaki-----------------------------
1291.10Sozaki
1301.10SozakiSome obsolete global variables and member variables of structures remain to
1311.10Sozakiavoid breaking old userland programs which directly access such variables via
1321.10Sozakikvm(3).
1331.10Sozaki
1341.10SozakiThe following programs still use kvm(3) to get some information related to
1351.10Sozakithe network stack.
1361.10Sozaki
1371.10Sozaki - netstat(1)
1381.10Sozaki - vmstat(1)
1391.10Sozaki - fstat(1)
1401.10Sozaki
1411.10Sozakinetstat(1) accesses ifnet_list, the head of a list of interface objects
1421.10Sozaki(struct ifnet), and traverses each object through ifnet#if_list member variable.
1431.10Sozakiifnet_list and ifnet#if_list is obsoleted by ifnet_pslist and
1441.10Sozakiifnet#if_pslist_entry respectively. netstat also accesses the IP address list
1451.10Sozakiof an interface throught ifnet#if_addrlist. struct ifaddr, struct in_ifaddr
1461.10Sozakiand struct in6_ifaddr are accessed and the following obsolete member variables
1471.10Sozakiare stuck: ifaddr#ifa_list, in_ifaddr#ia_hash, in_ifaddr#ia_list,
1481.10Sozakiin6_ifaddr#ia_next and in6_ifaddr#_ia6_multiaddrs. Note that netstat already
1491.10Sozakiimplements alternative methods to fetch the above information via sysctl(3).
1501.10Sozaki
1511.10Sozakivmstat(1) shows statistics of hash tables created by hashinit(9) in the kernel.
1521.10SozakiThe statistic information is retrieved via kvm(3). The global variables
1531.10Sozakiin_ifaddrhash and in_ifaddrhashtbl, which are for a hash table of IPv4
1541.10Sozakiaddresses and obsoleted by in_ifaddrhash_pslist and in_ifaddrhashtbl_pslist,
1551.10Sozakiare kept for this purpose. We should provide a means to fetch statistics of
1561.10Sozakihash tables via sysctl(3).
1571.10Sozaki
1581.10Sozakifstat(1) shows information of bpf instances. Each bpf instance (struct bpf) is
1591.10Sozakiobtained via kvm(3). bpf_d#_bd_next, bpf_d#_bd_filter and bpf_d#_bd_list
1601.10Sozakimember variables are obsolete but remain. ifnet#if_xname is also accessed
1611.10Sozakivia struct bpf_if and obsolete ifnet#if_list is required to remain to not change
1621.11Sozakithe offset of ifnet#if_xname. The statistic counters (bpf#bd_rcount,
1631.11Sozakibpf#bd_dcount and bpf#bd_ccount) are also victims of this restriction; for
1641.11Sozakiscalability the statistic counters should be per-CPU and we should stop using
1651.11Sozakiatomic operations for them however we have to remain the counters and atomic
1661.11Sozakioperations.
1671.13Sozaki
1681.13SozakiScalability
1691.13Sozaki-----------
1701.13Sozaki
1711.13Sozaki - Per-CPU rtcaches (used in say IP forwarding) aren't scalable on multiple
1721.13Sozaki   flows per CPU
1731.13Sozaki - ipsec(4) isn't scalable on the number of SA/SP; the cost of a look-up
1741.13Sozaki   is O(n)
1751.14Sknakahar - opencrypto(9)'s crypto_newsession()/crypto_freesession() aren't scalable
1761.14Sknakahar   as they are serialized by one mutex
1771.16Sozaki
1781.16Sozakiec_multi* of ethercom
1791.16Sozaki---------------------
1801.16Sozaki
1811.16Sozakiec_multiaddrs and ec_multicnt of struct ethercom and items listed in
1821.16Sozakiec_multiaddrs must be protected by ec_lock.  The core of ethernet subsystem is
1831.16Sozakialready MP-safe, however, device drivers that use the data should also be fixed.
1841.16SozakiA typical change should be to protect manipulations of the data via ETHER_*
1851.16Sozakimacros such as ETHER_FIRST_MULTI by ETHER_LOCK and ETHER_UNLOCK.
1861.18Sozaki
1871.18SozakiALTQ
1881.18Sozaki----
1891.18Sozaki
1901.18SozakiIf ALTQ is enabled in the kernel, it enforces to use just one Tx queue (if_snd)
1911.18Sozakifor packet transmissions, resulting in serializing all Tx packet processing on
1921.18Sozakithe queue.  We should probably design and implement an alternative queuing
1931.18Sozakimechanism that deals with multi-core systems at the first place, not making the
1941.18Sozakiexisting ALTQ MP-safe because it's just annoying.
1951.27Spgoyette
1961.27SpgoyetteUsing kernel modules
1971.27Spgoyette--------------------
1981.27Spgoyette
1991.27SpgoyettePlease note that if you enable NET_MPSAFE in your kernel, and you use and
2001.27Spgoyetteloadable kernel modules (including compat_xx modules or individual network
2011.27Spgoyetteinterface if_xxx device driver modules), you will need to build custom
2021.27Spgoyettemodules.  For each module you will need to add the following line to its
2031.27SpgoyetteMakefile:
2041.27Spgoyette
2051.27Spgoyette	CPPFLAGS+=	NET_MPSAFE
2061.27Spgoyette
2071.27SpgoyetteFailure to do this may result in unpredictable behavior.
2081.28Sozaki
2091.28SozakiIPv4 address initialization atomicity
2101.28Sozaki-------------------------------------
2111.28Sozaki
2121.28SozakiAn IPv4 address is referenced by several data structures: an associated
2131.28Sozakiinterface, its local route, a connected route (if necessary), the global list,
2141.28Sozakithe global hash table, etc.  These data structures are not updated atomically,
2151.28Sozakii.e., there can be inconsistent states on an IPv4 address in the kernel during
2161.28Sozakithe initialization of an IPv4 address.
2171.28Sozaki
2181.28SozakiOne known failure of the issue is that incoming packets destinating to an
2191.28Sozakiinitializing address can loop in the network stack in a short period of time.
2201.28SozakiThe address initialization creates an local route first and then registers an
2211.28Sozakiinitializing address to the global hash table that is used to decide if an
2221.28Sozakiincoming packet destinates to the host by checking the destination of the packet
2231.28Sozakiis registered to the hash table.  So, if the host allows forwaring, an incoming
2241.28Sozakipacket can match on a local route of an initializing address at ip_output while
2251.28Sozakiit fails the to-self check described above at ip_input.  Because a matched local
2261.28Sozakiroute points a loopback interface as its destination interface, an incoming
2271.28Sozakipacket sends to the network stack (ip_input) again, which results in looping.
2281.28SozakiThe loop stops once an initializing address is registered to the hash table.
2291.28Sozaki
2301.28SozakiOne solution of the issue is to reorder the address initialization instructions,
2311.28Sozakifirst register an address to the hash table then create its routes.  Another
2321.28Sozakisolution is to use the routing table for the to-self check instead of using the
2331.28Sozakiglobal hash table, like IPv6.
2341.29Sozaki
2351.29Sozakiif_flags
2361.29Sozaki--------
2371.29Sozaki
2381.29SozakiTo avoid data race on if_flags it should be protected by a lock (currently it's
2391.29SozakiIFNET_LOCK).  Thus, if_flags should not be accessed on packet processing to
2401.29Sozakiavoid performance degradation by lock contentions.  Traditionally IFF_RUNNING,
2411.29SozakiIFF_UP and IFF_OACTIVE flags of if_flags are checked on packet processing.  If
2421.29Sozakiyou make a driver MP-safe you must remove such checks.
2431.29Sozaki
2441.29SozakiIFF_ALLMULTI can be set/unset via if_mcast_op.  To protect updates of the flag,
2451.29Sozakiwe had added IFNET_LOCK around if_mcast_op.  However that was not a good
2461.29Sozakiapproach because if_mcast_op is typically called in the middle of a call path
2471.29Sozakiand holding IFNET_LOCK such places is problematic.  Actually a deadlock is
2481.29Sozakiobserved.  Probably we should remove IFNET_LOCK and manage IFF_ALLMULTI
2491.29Sozakisomewhere other than if_flags, for example ethercom or driver itself (or a
2501.29Sozakicommon driver framework once it appears).  Such a change is feasible because
2511.29SozakiIFF_ALLMULTI is only set/unset by a driver and not accessed from any common
2521.29Sozakicomponents such as network protocols.
2531.29Sozaki
2541.29SozakiAlso IFF_PROMISC is checked in ether_input and we should get rid of it somehow.
255