TODO.smpnet revision 1.21
11.21Sozaki$NetBSD: TODO.smpnet,v 1.21 2018/08/07 07:19:09 ozaki-r Exp $
21.1Sozaki
31.2SozakiMP-safe components
41.2Sozaki==================
51.1Sozaki
61.21SozakiThey work without the big kernel lock (KERNEL_LOCK), i.e., with NET_MPSAFE
71.21Sozakikernel option.  Some components scale up and some don't.
81.21Sozaki
91.7Sozaki - Device drivers
101.7Sozaki   - vioif(4)
111.7Sozaki   - vmx(4)
121.7Sozaki   - wm(4)
131.8Sozaki   - ixg(4)
141.9Smsaitoh   - ixv(4)
151.7Sozaki - Layer 2
161.7Sozaki   - Ethernet (if_ethersubr.c)
171.7Sozaki   - bridge(4)
181.7Sozaki     - STP
191.7Sozaki   - Fast forward (ipflow)
201.7Sozaki - Layer 3
211.7Sozaki   - All except for items in the below section
221.7Sozaki - Interfaces
231.7Sozaki   - gif(4)
241.7Sozaki   - l2tp(4)
251.7Sozaki   - pppoe(4)
261.7Sozaki     - if_spppsubr.c
271.7Sozaki   - tun(4)
281.12Sozaki   - vlan(4)
291.7Sozaki - Packet filters
301.7Sozaki   - npf(7)
311.7Sozaki - Others
321.7Sozaki   - bpf(4)
331.12Sozaki   - ipsec(4)
341.12Sozaki   - opencrypto(9)
351.7Sozaki   - pfil(9)
361.2Sozaki
371.2SozakiNon MP-safe components and kernel options
381.2Sozaki=========================================
391.2Sozaki
401.21SozakiThe components and options aren't MP-safe, i.e., requires the big kernel lock,
411.21Sozakiyet.  Some of them can be used safely even if NET_MPSAFE is enabled because
421.21Sozakithey're still protected by the big kernel lock.  The others aren't protected and
431.21Sozakiso unsafe, e.g, they may crash the kernel.
441.21Sozaki
451.21SozakiProtected ones
461.21Sozaki--------------
471.21Sozaki
481.7Sozaki - Device drivers
491.7Sozaki   - Most drivers other than ones listed in the above section
501.21Sozaki - Layer 4
511.21Sozaki   - DCCP
521.21Sozaki   - SCTP
531.21Sozaki   - TCP
541.21Sozaki   - UDP
551.21Sozaki
561.21SozakiUnprotected ones
571.21Sozaki----------------
581.21Sozaki
591.6Sozaki - Layer 2
601.6Sozaki   - ARCNET (if_arcsubr.c)
611.6Sozaki   - ATM (if_atmsubr.c)
621.6Sozaki   - BRIDGE_IPF
631.6Sozaki   - FDDI (if_fddisubr.c)
641.6Sozaki   - HIPPI (if_hippisubr.c)
651.6Sozaki   - IEEE 1394 (if_ieee1394subr.c)
661.6Sozaki   - IEEE 802.11 (ieee80211(4))
671.6Sozaki   - Token ring (if_tokensubr.c)
681.6Sozaki - Layer 3
691.6Sozaki   - IPSELSRC
701.6Sozaki   - MROUTING
711.6Sozaki   - PIM
721.6Sozaki   - MPLS (mpls(4))
731.17Sozaki   - IPv6 address selection policy
741.6Sozaki - Interfaces
751.6Sozaki   - agr(4)
761.6Sozaki   - carp(4)
771.6Sozaki   - etherip(4)
781.6Sozaki   - faith(4)
791.6Sozaki   - gre(4)
801.6Sozaki   - ppp(4)
811.6Sozaki   - sl(4)
821.6Sozaki   - stf(4)
831.6Sozaki   - strip(4)
841.6Sozaki   - if_srt
851.6Sozaki   - tap(4)
861.6Sozaki - Packet filters
871.6Sozaki   - ipf(4)
881.6Sozaki   - pf(4)
891.6Sozaki - Others
901.6Sozaki   - AppleTalk (sys/netatalk/)
911.6Sozaki   - ATM (sys/netnatm/)
921.6Sozaki   - Bluetooth (sys/netbt/)
931.6Sozaki   - altq(4)
941.6Sozaki   - CIFS (sys/netsmb/)
951.6Sozaki   - ISDN (sys/netisbn/)
961.6Sozaki   - kttcp(4)
971.6Sozaki   - NFS
981.2Sozaki
991.2SozakiKnow issues
1001.2Sozaki===========
1011.1Sozaki
1021.15SozakiNOMPSAFE
1031.15Sozaki--------
1041.15Sozaki
1051.15SozakiWe use "NOMPSAFE" as a mark that indicates that the code around it isn't MP-safe
1061.15Sozakiyet.  We use it in comments and also use as part of function names, for example
1071.15Sozakim_get_rcvif_NOMPSAFE.  Let's use "NOMPSAFE" to make it easy to find non-MP-safe
1081.15Sozakicodes by grep.
1091.15Sozaki
1101.1Sozakibpf
1111.2Sozaki---
1121.1Sozaki
1131.1SozakiMP-ification of bpf requires all of bpf_mtap* are called in normal LWP context
1141.1Sozakior softint context, i.e., not in hardware interrupt context.  For Tx, all
1151.1Sozakibpf_mtap satisfy the requrement.  For Rx, most of bpf_mtap are called in softint.
1161.1SozakiUnfortunately some bpf_mtap on Rx are still called in hardware interrupt context.
1171.1Sozaki
1181.1SozakiThis is the list of the functions that have such bpf_mtap:
1191.1Sozaki
1201.1Sozaki - sca_frame_process() @ sys/dev/ic/hd64570.c
1211.1Sozaki - en_intr() @ sys/dev/ic/midway.c
1221.20Smsaitoh - rxintr_cleanup() @ sys/dev/pci/if_lmc.c
1231.1Sozaki - ipr_rx_data_rdy() @ sys/netisdn/i4b_ipr.c
1241.1Sozaki
1251.1SozakiIdeally we should make the functions run in softint somehow, but we don't have
1261.1Sozakiactual devices, no time (or interest/love) to work on the task, so instead we
1271.1Sozakiprovide a deferred bpf_mtap mechanism that forcibly runs bpf_mtap in softint
1281.1Sozakicontext.  It's a workaround and once the functions run in softint, we should use
1291.1Sozakithe original bpf_mtap again.
1301.10Sozaki
1311.10SozakiLingering obsolete variables
1321.10Sozaki-----------------------------
1331.10Sozaki
1341.10SozakiSome obsolete global variables and member variables of structures remain to
1351.10Sozakiavoid breaking old userland programs which directly access such variables via
1361.10Sozakikvm(3).
1371.10Sozaki
1381.10SozakiThe following programs still use kvm(3) to get some information related to
1391.10Sozakithe network stack.
1401.10Sozaki
1411.10Sozaki - netstat(1)
1421.10Sozaki - vmstat(1)
1431.10Sozaki - fstat(1)
1441.10Sozaki
1451.10Sozakinetstat(1) accesses ifnet_list, the head of a list of interface objects
1461.10Sozaki(struct ifnet), and traverses each object through ifnet#if_list member variable.
1471.10Sozakiifnet_list and ifnet#if_list is obsoleted by ifnet_pslist and
1481.10Sozakiifnet#if_pslist_entry respectively. netstat also accesses the IP address list
1491.10Sozakiof an interface throught ifnet#if_addrlist. struct ifaddr, struct in_ifaddr
1501.10Sozakiand struct in6_ifaddr are accessed and the following obsolete member variables
1511.10Sozakiare stuck: ifaddr#ifa_list, in_ifaddr#ia_hash, in_ifaddr#ia_list,
1521.10Sozakiin6_ifaddr#ia_next and in6_ifaddr#_ia6_multiaddrs. Note that netstat already
1531.10Sozakiimplements alternative methods to fetch the above information via sysctl(3).
1541.10Sozaki
1551.10Sozakivmstat(1) shows statistics of hash tables created by hashinit(9) in the kernel.
1561.10SozakiThe statistic information is retrieved via kvm(3). The global variables
1571.10Sozakiin_ifaddrhash and in_ifaddrhashtbl, which are for a hash table of IPv4
1581.10Sozakiaddresses and obsoleted by in_ifaddrhash_pslist and in_ifaddrhashtbl_pslist,
1591.10Sozakiare kept for this purpose. We should provide a means to fetch statistics of
1601.10Sozakihash tables via sysctl(3).
1611.10Sozaki
1621.10Sozakifstat(1) shows information of bpf instances. Each bpf instance (struct bpf) is
1631.10Sozakiobtained via kvm(3). bpf_d#_bd_next, bpf_d#_bd_filter and bpf_d#_bd_list
1641.10Sozakimember variables are obsolete but remain. ifnet#if_xname is also accessed
1651.10Sozakivia struct bpf_if and obsolete ifnet#if_list is required to remain to not change
1661.11Sozakithe offset of ifnet#if_xname. The statistic counters (bpf#bd_rcount,
1671.11Sozakibpf#bd_dcount and bpf#bd_ccount) are also victims of this restriction; for
1681.11Sozakiscalability the statistic counters should be per-CPU and we should stop using
1691.11Sozakiatomic operations for them however we have to remain the counters and atomic
1701.11Sozakioperations.
1711.13Sozaki
1721.13SozakiScalability
1731.13Sozaki-----------
1741.13Sozaki
1751.13Sozaki - Per-CPU rtcaches (used in say IP forwarding) aren't scalable on multiple
1761.13Sozaki   flows per CPU
1771.13Sozaki - ipsec(4) isn't scalable on the number of SA/SP; the cost of a look-up
1781.13Sozaki   is O(n)
1791.14Sknakahar - opencrypto(9)'s crypto_newsession()/crypto_freesession() aren't scalable
1801.14Sknakahar   as they are serialized by one mutex
1811.16Sozaki
1821.16Sozakiec_multi* of ethercom
1831.16Sozaki---------------------
1841.16Sozaki
1851.16Sozakiec_multiaddrs and ec_multicnt of struct ethercom and items listed in
1861.16Sozakiec_multiaddrs must be protected by ec_lock.  The core of ethernet subsystem is
1871.16Sozakialready MP-safe, however, device drivers that use the data should also be fixed.
1881.16SozakiA typical change should be to protect manipulations of the data via ETHER_*
1891.16Sozakimacros such as ETHER_FIRST_MULTI by ETHER_LOCK and ETHER_UNLOCK.
1901.18Sozaki
1911.18SozakiALTQ
1921.18Sozaki----
1931.18Sozaki
1941.18SozakiIf ALTQ is enabled in the kernel, it enforces to use just one Tx queue (if_snd)
1951.18Sozakifor packet transmissions, resulting in serializing all Tx packet processing on
1961.18Sozakithe queue.  We should probably design and implement an alternative queuing
1971.18Sozakimechanism that deals with multi-core systems at the first place, not making the
1981.18Sozakiexisting ALTQ MP-safe because it's just annoying.
199