TODO.smpnet revision 1.22
11.22Sozaki$NetBSD: TODO.smpnet,v 1.22 2018/08/07 07:19:28 ozaki-r Exp $ 21.1Sozaki 31.2SozakiMP-safe components 41.2Sozaki================== 51.1Sozaki 61.21SozakiThey work without the big kernel lock (KERNEL_LOCK), i.e., with NET_MPSAFE 71.21Sozakikernel option. Some components scale up and some don't. 81.21Sozaki 91.7Sozaki - Device drivers 101.7Sozaki - vioif(4) 111.7Sozaki - vmx(4) 121.7Sozaki - wm(4) 131.8Sozaki - ixg(4) 141.9Smsaitoh - ixv(4) 151.7Sozaki - Layer 2 161.7Sozaki - Ethernet (if_ethersubr.c) 171.7Sozaki - bridge(4) 181.7Sozaki - STP 191.7Sozaki - Fast forward (ipflow) 201.7Sozaki - Layer 3 211.7Sozaki - All except for items in the below section 221.7Sozaki - Interfaces 231.7Sozaki - gif(4) 241.22Sozaki - ipsecif(4) 251.7Sozaki - l2tp(4) 261.7Sozaki - pppoe(4) 271.7Sozaki - if_spppsubr.c 281.7Sozaki - tun(4) 291.12Sozaki - vlan(4) 301.7Sozaki - Packet filters 311.7Sozaki - npf(7) 321.7Sozaki - Others 331.7Sozaki - bpf(4) 341.12Sozaki - ipsec(4) 351.12Sozaki - opencrypto(9) 361.7Sozaki - pfil(9) 371.2Sozaki 381.2SozakiNon MP-safe components and kernel options 391.2Sozaki========================================= 401.2Sozaki 411.21SozakiThe components and options aren't MP-safe, i.e., requires the big kernel lock, 421.21Sozakiyet. Some of them can be used safely even if NET_MPSAFE is enabled because 431.21Sozakithey're still protected by the big kernel lock. The others aren't protected and 441.21Sozakiso unsafe, e.g, they may crash the kernel. 451.21Sozaki 461.21SozakiProtected ones 471.21Sozaki-------------- 481.21Sozaki 491.7Sozaki - Device drivers 501.7Sozaki - Most drivers other than ones listed in the above section 511.21Sozaki - Layer 4 521.21Sozaki - DCCP 531.21Sozaki - SCTP 541.21Sozaki - TCP 551.21Sozaki - UDP 561.21Sozaki 571.21SozakiUnprotected ones 581.21Sozaki---------------- 591.21Sozaki 601.6Sozaki - Layer 2 611.6Sozaki - ARCNET (if_arcsubr.c) 621.6Sozaki - ATM (if_atmsubr.c) 631.6Sozaki - BRIDGE_IPF 641.6Sozaki - FDDI (if_fddisubr.c) 651.6Sozaki - HIPPI (if_hippisubr.c) 661.6Sozaki - IEEE 1394 (if_ieee1394subr.c) 671.6Sozaki - IEEE 802.11 (ieee80211(4)) 681.6Sozaki - Token ring (if_tokensubr.c) 691.6Sozaki - Layer 3 701.6Sozaki - IPSELSRC 711.6Sozaki - MROUTING 721.6Sozaki - PIM 731.6Sozaki - MPLS (mpls(4)) 741.17Sozaki - IPv6 address selection policy 751.6Sozaki - Interfaces 761.6Sozaki - agr(4) 771.6Sozaki - carp(4) 781.6Sozaki - etherip(4) 791.6Sozaki - faith(4) 801.6Sozaki - gre(4) 811.6Sozaki - ppp(4) 821.6Sozaki - sl(4) 831.6Sozaki - stf(4) 841.6Sozaki - strip(4) 851.6Sozaki - if_srt 861.6Sozaki - tap(4) 871.6Sozaki - Packet filters 881.6Sozaki - ipf(4) 891.6Sozaki - pf(4) 901.6Sozaki - Others 911.6Sozaki - AppleTalk (sys/netatalk/) 921.6Sozaki - ATM (sys/netnatm/) 931.6Sozaki - Bluetooth (sys/netbt/) 941.6Sozaki - altq(4) 951.6Sozaki - CIFS (sys/netsmb/) 961.6Sozaki - ISDN (sys/netisbn/) 971.6Sozaki - kttcp(4) 981.6Sozaki - NFS 991.2Sozaki 1001.2SozakiKnow issues 1011.2Sozaki=========== 1021.1Sozaki 1031.15SozakiNOMPSAFE 1041.15Sozaki-------- 1051.15Sozaki 1061.15SozakiWe use "NOMPSAFE" as a mark that indicates that the code around it isn't MP-safe 1071.15Sozakiyet. We use it in comments and also use as part of function names, for example 1081.15Sozakim_get_rcvif_NOMPSAFE. Let's use "NOMPSAFE" to make it easy to find non-MP-safe 1091.15Sozakicodes by grep. 1101.15Sozaki 1111.1Sozakibpf 1121.2Sozaki--- 1131.1Sozaki 1141.1SozakiMP-ification of bpf requires all of bpf_mtap* are called in normal LWP context 1151.1Sozakior softint context, i.e., not in hardware interrupt context. For Tx, all 1161.1Sozakibpf_mtap satisfy the requrement. For Rx, most of bpf_mtap are called in softint. 1171.1SozakiUnfortunately some bpf_mtap on Rx are still called in hardware interrupt context. 1181.1Sozaki 1191.1SozakiThis is the list of the functions that have such bpf_mtap: 1201.1Sozaki 1211.1Sozaki - sca_frame_process() @ sys/dev/ic/hd64570.c 1221.1Sozaki - en_intr() @ sys/dev/ic/midway.c 1231.20Smsaitoh - rxintr_cleanup() @ sys/dev/pci/if_lmc.c 1241.1Sozaki - ipr_rx_data_rdy() @ sys/netisdn/i4b_ipr.c 1251.1Sozaki 1261.1SozakiIdeally we should make the functions run in softint somehow, but we don't have 1271.1Sozakiactual devices, no time (or interest/love) to work on the task, so instead we 1281.1Sozakiprovide a deferred bpf_mtap mechanism that forcibly runs bpf_mtap in softint 1291.1Sozakicontext. It's a workaround and once the functions run in softint, we should use 1301.1Sozakithe original bpf_mtap again. 1311.10Sozaki 1321.10SozakiLingering obsolete variables 1331.10Sozaki----------------------------- 1341.10Sozaki 1351.10SozakiSome obsolete global variables and member variables of structures remain to 1361.10Sozakiavoid breaking old userland programs which directly access such variables via 1371.10Sozakikvm(3). 1381.10Sozaki 1391.10SozakiThe following programs still use kvm(3) to get some information related to 1401.10Sozakithe network stack. 1411.10Sozaki 1421.10Sozaki - netstat(1) 1431.10Sozaki - vmstat(1) 1441.10Sozaki - fstat(1) 1451.10Sozaki 1461.10Sozakinetstat(1) accesses ifnet_list, the head of a list of interface objects 1471.10Sozaki(struct ifnet), and traverses each object through ifnet#if_list member variable. 1481.10Sozakiifnet_list and ifnet#if_list is obsoleted by ifnet_pslist and 1491.10Sozakiifnet#if_pslist_entry respectively. netstat also accesses the IP address list 1501.10Sozakiof an interface throught ifnet#if_addrlist. struct ifaddr, struct in_ifaddr 1511.10Sozakiand struct in6_ifaddr are accessed and the following obsolete member variables 1521.10Sozakiare stuck: ifaddr#ifa_list, in_ifaddr#ia_hash, in_ifaddr#ia_list, 1531.10Sozakiin6_ifaddr#ia_next and in6_ifaddr#_ia6_multiaddrs. Note that netstat already 1541.10Sozakiimplements alternative methods to fetch the above information via sysctl(3). 1551.10Sozaki 1561.10Sozakivmstat(1) shows statistics of hash tables created by hashinit(9) in the kernel. 1571.10SozakiThe statistic information is retrieved via kvm(3). The global variables 1581.10Sozakiin_ifaddrhash and in_ifaddrhashtbl, which are for a hash table of IPv4 1591.10Sozakiaddresses and obsoleted by in_ifaddrhash_pslist and in_ifaddrhashtbl_pslist, 1601.10Sozakiare kept for this purpose. We should provide a means to fetch statistics of 1611.10Sozakihash tables via sysctl(3). 1621.10Sozaki 1631.10Sozakifstat(1) shows information of bpf instances. Each bpf instance (struct bpf) is 1641.10Sozakiobtained via kvm(3). bpf_d#_bd_next, bpf_d#_bd_filter and bpf_d#_bd_list 1651.10Sozakimember variables are obsolete but remain. ifnet#if_xname is also accessed 1661.10Sozakivia struct bpf_if and obsolete ifnet#if_list is required to remain to not change 1671.11Sozakithe offset of ifnet#if_xname. The statistic counters (bpf#bd_rcount, 1681.11Sozakibpf#bd_dcount and bpf#bd_ccount) are also victims of this restriction; for 1691.11Sozakiscalability the statistic counters should be per-CPU and we should stop using 1701.11Sozakiatomic operations for them however we have to remain the counters and atomic 1711.11Sozakioperations. 1721.13Sozaki 1731.13SozakiScalability 1741.13Sozaki----------- 1751.13Sozaki 1761.13Sozaki - Per-CPU rtcaches (used in say IP forwarding) aren't scalable on multiple 1771.13Sozaki flows per CPU 1781.13Sozaki - ipsec(4) isn't scalable on the number of SA/SP; the cost of a look-up 1791.13Sozaki is O(n) 1801.14Sknakahar - opencrypto(9)'s crypto_newsession()/crypto_freesession() aren't scalable 1811.14Sknakahar as they are serialized by one mutex 1821.16Sozaki 1831.16Sozakiec_multi* of ethercom 1841.16Sozaki--------------------- 1851.16Sozaki 1861.16Sozakiec_multiaddrs and ec_multicnt of struct ethercom and items listed in 1871.16Sozakiec_multiaddrs must be protected by ec_lock. The core of ethernet subsystem is 1881.16Sozakialready MP-safe, however, device drivers that use the data should also be fixed. 1891.16SozakiA typical change should be to protect manipulations of the data via ETHER_* 1901.16Sozakimacros such as ETHER_FIRST_MULTI by ETHER_LOCK and ETHER_UNLOCK. 1911.18Sozaki 1921.18SozakiALTQ 1931.18Sozaki---- 1941.18Sozaki 1951.18SozakiIf ALTQ is enabled in the kernel, it enforces to use just one Tx queue (if_snd) 1961.18Sozakifor packet transmissions, resulting in serializing all Tx packet processing on 1971.18Sozakithe queue. We should probably design and implement an alternative queuing 1981.18Sozakimechanism that deals with multi-core systems at the first place, not making the 1991.18Sozakiexisting ALTQ MP-safe because it's just annoying. 200