Home | History | Annotate | Download | only in kern
History log of /src/sys/kern/uipc_mbuf.c
RevisionDateAuthorComments
 1.255  15-Dec-2024  skrll KNF
 1.254  06-Dec-2024  riastradh sys/kern/sys_socket.c, uipc_*.c: Sprinkle SET_ERROR dtrace probes.

PR kern/58378: Kernel error code origination lacks dtrace probes
 1.253  06-Dec-2024  riastradh sys/kern/sys_socket.c, uipc_*.c: Sort includes.

No functional change intended.
 1.252  27-Nov-2023  ozaki-r mbuf: avoid assertion failure when splitting mbuf cluster

From OpenBSD:

commit 7b4d35e0a60ba1dd4daf4b1c2932020a22463a89
Author: bluhm <bluhm@openbsd.org>
Date: Fri Oct 20 16:25:15 2023 +0000

Avoid assertion failure when splitting mbuf cluster.

m_split() calls m_align() to initialize the data pointer of newly
allocated mbuf. If the new mbuf will be converted to a cluster,
this is not necessary. If additionally the new mbuf is larger than
MLEN, this can lead to a panic.
Only call m_align() when a valid m_data is needed. This is the
case if we do not refecence the existing cluster, but memcpy() the
data into the new mbuf.

Reported-by: syzbot+0e6817f5877926f0e96a@syzkaller.appspotmail.com
OK claudio@ deraadt@

The issue is harmless if DIAGNOSTIC is not enabled.

XXX pullup-10
XXX pullup-9
 1.251  12-Apr-2023  riastradh mbuf(9): New m_get_n, m_gethdr_n.

m_get_n(how, type, alignbytes, nbytes) returns an mbuf with no packet
header having space for nbytes, with an internal buffer pointer
aligned by alignbytes (typically ETHER_ALIGN or similar, if not
zero).

m_gethdr_n(how, type, alignbytes, nbytes) does the same but for an
mbuf with a packet header.

These return NULL on failure, which can happen either:
(a) because how is M_DONTWAIT and allocating memory would sleep, or
(b) because alignbytes + nbytes > MCLBYTES.

On exit, m_len is set to nbytes, as is m_pkthdr.len for m_gethdr_n.

These should be used to systematically replace all calls to m_get,
m_gethdr, MGET, MGETHDR, and m_getcl. Most calls to m_clget and
MCLGET will probably evaporate as a consequence.

Proposed on tech-net last year:
https://mail-index.netbsd.org/tech-net/2022/07/16/msg008285.html
 1.250  01-Apr-2023  skrll 0x%p -> %p in KASSERTMSGs
 1.249  31-Mar-2023  riastradh mbuf(9): Sprinkle KASSERTMSG.

No functional change intended.
 1.248  24-Feb-2023  riastradh kern: Eliminate most __HAVE_ATOMIC_AS_MEMBAR conditionals.

I'm leaving in the conditional around the legacy membar_enters
(store-before-load, store-before-store) in kern_mutex.c and in
kern_lock.c because they may still matter: store-before-load barriers
tend to be the most expensive kind, so eliding them is probably
worthwhile on x86. (It also may not matter; I just don't care to do
measurements right now, and it's a single valid and potentially
justifiable use case in the whole tree.)

However, membar_release/acquire can be mere instruction barriers on
all TSO platforms including x86, so there's no need to go out of our
way with a bad API to conditionalize them. If the procedure call
overhead is measurable we just could change them to be macros on x86
that expand into __insn_barrier.

Discussed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2023/02/23/msg028729.html
 1.247  16-Dec-2022  msaitoh branches: 1.247.2;
Add new "kern.mbuf.nmbclusters_limit" sysctl.

- Used to know the upper limit of nmbclusters.
- It's read only.
 1.246  09-Apr-2022  riastradh sys: Use membar_release/acquire around reference drop.

This just goes through my recent reference count membar audit and
changes membar_exit to membar_release and membar_enter to
membar_acquire -- this should make everything cheaper on most CPUs
without hurting correctness, because membar_acquire is generally
cheaper than membar_enter.
 1.245  12-Mar-2022  riastradh sys: Membar audit around reference count releases.

If two threads are using an object that is freed when the reference
count goes to zero, we need to ensure that all memory operations
related to the object happen before freeing the object.

Using an atomic_dec_uint_nv(&refcnt) == 0 ensures that only one
thread takes responsibility for freeing, but it's not enough to
ensure that the other thread's memory operations happen before the
freeing.

Consider:

Thread A Thread B
obj->foo = 42; obj->baz = 73;
mumble(&obj->bar); grumble(&obj->quux);
/* membar_exit(); */ /* membar_exit(); */
atomic_dec -- not last atomic_dec -- last
/* membar_enter(); */
KASSERT(invariant(obj->foo,
obj->bar));
free_stuff(obj);

The memory barriers ensure that

obj->foo = 42;
mumble(&obj->bar);

in thread A happens before

KASSERT(invariant(obj->foo, obj->bar));
free_stuff(obj);

in thread B. Without them, this ordering is not guaranteed.

So in general it is necessary to do

membar_exit();
if (atomic_dec_uint_nv(&obj->refcnt) != 0)
return;
membar_enter();

to release a reference, for the `last one out hit the lights' style
of reference counting. (This is in contrast to the style where one
thread blocks new references and then waits under a lock for existing
ones to drain with a condvar -- no membar needed thanks to mutex(9).)

I searched for atomic_dec to find all these. Obviously we ought to
have a better abstraction for this because there's so much copypasta.
This is a stop-gap measure to fix actual bugs until we have that. It
would be nice if an abstraction could gracefully handle the different
styles of reference counting in use -- some years ago I drafted an
API for this, but making it cover everything got a little out of hand
(particularly with struct vnode::v_usecount) and I ended up setting
it aside to work on psref/localcount instead for better scalability.

I got bored of adding #ifdef __HAVE_ATOMIC_AS_MEMBAR everywhere, so I
only put it on things that look performance-critical on 5sec review.
We should really adopt membar_enter_preatomic/membar_exit_postatomic
or something (except they are applicable only to atomic r/m/w, not to
atomic_load/store_*, making the naming annoying) and get rid of all
the ifdefs.
 1.244  06-Oct-2021  msaitoh Fix a bug that NMBCLUSTERS(kern.mbuf.nmbclusters) can't be changed by sysctl.
 1.243  04-Mar-2021  msaitoh Revert accidentally committed debug code. Sorry.
 1.242  04-Mar-2021  msaitoh Add missing opt_inet.h.
 1.241  05-May-2020  jdolecek branches: 1.241.2;
fix KASSERT() for MHLEN case in m_defrag() - network stack usually does
m_adj(ETHER_ALIGN) so check that the mbuf chain data fits
M_LEADINGSPACE() + M_TRAILINGSPACE()
 1.240  25-Apr-2020  jdolecek in m_defrag() must copy data elsewhere before adding cluster, the
data part of mbuf gets reused and hence overwritten by extbuf
 1.239  24-Apr-2020  jdolecek add KASSERT() that the while data buffer in a mbuf or the mbuf
cluster fits within the same page

pools actually never return items whose memory cross page boundary for item
sizes smaller than PAGE_SIZE
 1.238  24-Apr-2020  jdolecek change m_defrag() to coalesce the chain to single mbuf if it's short enough
and first mbuf doesn't use external storage

most fragmented packets end up with first short mbuf containing
frame + protocol header only, and second mbuf containing the data;
m_defrag() previously always returned chain of at least two mbufs,
now it should actually return all data in single mbuf for typical
mbuf chain with length < MCLBYTES
 1.237  15-Mar-2020  thorpej branches: 1.237.2;
Add and use a new function, mowner_init_owner(), that initializes an
MBUFTRACE mowner structure (so that providers of it don't have to
grovel the internals).
 1.236  06-Dec-2019  maxv Minor changes, reported by the LGTM bot.
 1.235  19-Oct-2019  tnn mcl_cache: align items to COHERENCY_UNIT

Because we do cache incoherent DMA to/from mbufs we cannot safely share
share cache lines with adjacent items that may be concurrently accessed.
 1.234  28-Sep-2019  jmcneill mbstat_conver_to_user_cb -> mbstat_convert_to_user_cb
 1.233  18-Sep-2019  maxv Handle M_EXT with M_BUFADDR, and introduce M_BUFSIZE. Use them to dedup
code.
 1.232  17-Jan-2019  knakahara branches: 1.232.4;
Fix ipsecif(4) cannot apply input direction packet filter. Reviewed by ozaki-r@n.o and ryo@n.o.

Add ATF later.
 1.231  16-Jan-2019  knakahara Initialize m_pkthdr members explicity.
 1.230  27-Dec-2018  maxv Remove M_COPY_PKTHDR, M_MOVE_PKTHDR, M_ALIGN and MH_ALIGN.
 1.229  22-Dec-2018  maxv Replace M_ALIGN and MH_ALIGN by m_align.
 1.228  22-Dec-2018  maxv Replace: M_COPY_PKTHDR -> m_copy_pkthdr. No functional change, since the
former is a macro to the latter.
 1.227  22-Dec-2018  maxv Move m_align() back into the kernel, and switch M_ALIGN and MH_ALIGN to it.
Forcing a distinction between M_ALIGN and MH_ALIGN is too bug-friendly and
serves no particular purpose.
 1.226  22-Dec-2018  maxv Replace: M_MOVE_PKTHDR -> m_move_pkthdr. No functional change, since the
former is a macro to the latter.
 1.225  15-Nov-2018  maxv Remove the 'copy' argument from m_devget(), unused. While here rename
off0->off.
 1.224  15-Nov-2018  maxv Add KASSERTs.
 1.223  15-Nov-2018  maxv Remove the 't' argument from m_tag_find().
 1.222  15-Nov-2018  maxv Simplify the mtag API:

- Remove m_tag_init(), m_tag_first(), m_tag_next() and
m_tag_delete_nonpersistent().

- Remove the 't' argument from m_tag_delete_chain().
 1.221  15-Nov-2018  maxv Merge uipc_mbuf2.c into uipc_mbuf.c. Reorder the latter a little to gather
similar functions. No functional change.
 1.220  05-Oct-2018  msaitoh s/conver_to/convert_to/. No functional change.
 1.219  03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.218  09-Aug-2018  maxv Localify mcl_cache.
 1.217  18-Jul-2018  msaitoh - Fix compile error for kernel configuration file which has no any Ethernet
device driver.
- Add missing default label.
- Fix NetBSD RCS Id.
 1.216  17-Jul-2018  msaitoh Add /d(dump) and /v(verbose) modifiers to DDB's "show mbuf" command. Mainly
written by Hiroki SUENAGA. Currently, /v supports Ethernet, PPP, PPPoE, ARP,
IPv4, ICMP, IPv6, ICMPv6, TCP and UDP.
 1.215  07-May-2018  maxv branches: 1.215.2;
Copy some KASSERTs from m_move_pkthdr into m_copy_pkthdr, and reorder the
latter to reduce the diff with the former.
 1.214  03-May-2018  maxv Revert my rev1.190, remove the M_READONLY check. The initial code was
correct: what is read-only is the mbuf storage, not the mbuf itself. The
storage contains the packet payload, and never has anything related to
mbufs. So it is fine to remove M_PKTHDR on mbufs that have a read-only
storage.

In fact it was kind of obvious, since several places already manually
remove M_PKTHDR without taking care of the external storage.
 1.213  03-May-2018  maxv Rename m_pkthdr_remove -> m_remove_pkthdr, to match the existing naming
convention, eg m_copy_pkthdr and m_move_pkthdr.
 1.212  28-Apr-2018  maxv Rename the 'flags' and 'nowait' arguments to 'how'. The other BSDs did the
same. Also, in m_defrag, rename 'mold' to 'm'.
 1.211  28-Apr-2018  maxv Modify m_defrag, so that it never frees the first mbuf of the chain. While
here use the given 'flags' argument, and not M_DONTWAIT.

We have a problem with several drivers: they poll an mbuf chain from their
queues and call m_defrag on them, but m_defrag could update the mbuf
pointer, so the mbuf in the queue is no longer valid. It is not easy to
fix each driver, because doing pop+push will reorder the queue, and we
don't really want that to happen.

This problem was independently spotted by me, Kengo, Masanobu, and other
people too it seems (perhaps PR/53218).

Now m_defrag leaves the first mbuf in place, and compresses the chain
only starting from the second mbuf in the chain.

It is important not to compress the first mbuf with hacks, because the
storage of this first mbuf may be shared with other mbufs.
 1.210  27-Apr-2018  maxv Remove unused debug code.
 1.209  27-Apr-2018  maxv Remove reference to m_ext.ext_type (doesn't exist).
 1.208  27-Apr-2018  maxv Remove unused ext_flags field in struct _m_ext_storage.

Also, simplify MEXTMALLOC, mbtypes[] doesn't exist anymore, but the code
still compiled correctly because "malloc" is a macro and the argument
was dropped.
 1.207  27-Apr-2018  maxv Stop passing the pool as argument of the storage. M_EXT_CLUSTER mbufs
are supposed to take their area from mcl_cache only.
 1.206  27-Apr-2018  maxv Remove _MCLGET, merge its content into m_clget(). The code is slightly
modified to reduce the indentation level.
 1.205  27-Apr-2018  maxv Reorder, to group related functions.
 1.204  27-Apr-2018  maxv M_CLUSTER -> M_EXT_CLUSTER
 1.203  27-Apr-2018  maxv Rename m_reclaim -> mb_drain, and localify.
 1.202  27-Apr-2018  maxv Implement M_COPY_PKTHDR as a function, like m_move_pkthdr.
 1.201  27-Apr-2018  maxv Move m_align and m_append into iee80211_netbsd.c. They are part of
net80211, and shouldn't be used outside.
 1.200  27-Apr-2018  maxv Simplify m_copydata, use unsigned int, and change its last argument to
match that of the man page.
 1.199  27-Apr-2018  maxv Style and simplify.
 1.198  27-Apr-2018  maxv Panic in m_copypacket if no header is present, that's a requirement.
 1.197  26-Apr-2018  maxv Change MCLGET, so that it calls m_clget instead of doing the work in a
macro. Macros are inefficient when they contain too many instructions and
are used too often, because of cache coherency (and also register use).

This change saves 32KB of kernel .text.
 1.196  26-Apr-2018  maxv Rename

m_copyback0 -> m_copyback_internal
M_COPYBACK0_* -> CB_*

That's a lot less misleading. While here, fix a bunch of panic messages.
 1.195  26-Apr-2018  maxv Stop adding '0's in parameter and function names, that's just misleading.
Some remain, they need more investigation.
 1.194  26-Apr-2018  maxv Change comment, to clearly say that m_prepend should not be used directly.
 1.193  20-Apr-2018  maxv Cast to int, to properly handle dstoff > MHLEN (which never happens).
 1.192  19-Apr-2018  maxv The mbuf length is allowed to be zero.
 1.191  17-Apr-2018  maxv change the comment
 1.190  17-Apr-2018  maxv If the mbuf is shared leave M_PKTHDR in place. Given where this function
is called from that's not supposed to happen, but I'm growing unconfident
about our mbuf code.
 1.189  16-Apr-2018  maxv Disable the M_PKTHDR check for now. It causes PR/53189 (which is also
reproducible on i386).

It seems that someone is giving looutput a malformed chain.
 1.188  15-Apr-2018  maxv Introduce a m_verify_packet function, that verifies the mbuf chain of a
packet to ensure it is not malformed. Call this function in "points of
interest", that are the IPv4/IPv6/IPsec entry points. There could be more.

We use M_VERIFY_PACKET(m), declared under DIAGNOSTIC only.

This function should not be called everywhere, especially not in places
that temporarily manipulate (and clobber) the mbuf structure; once they're
done they put the mbuf back in a correct format.
 1.187  10-Apr-2018  maxv Remove m_getclr. It is unused, confusing (vs m_clget), and is a weak
implementation (eg you can't request a zeroed pkthdr mbuf).
 1.186  10-Apr-2018  maxv Put the "free" functions close to one another. No functional change.
 1.185  10-Apr-2018  maxv Localify m_ext_free.
 1.184  21-Mar-2018  maxv Localify and remove unused prototypes.
 1.183  21-Mar-2018  maxv Remove these global variables. They are unused, racy, and the only thing
they do is triggering cache synchronization latencies between CPUs.
 1.182  09-Mar-2018  maxv Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:

m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);

m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.181  22-Jan-2018  maxv branches: 1.181.2;
Style and clarify.
 1.180  22-Jan-2018  maxv m_prepend does not tolerate being given len > MHLEN, so add a panic, and
document this behavior.
 1.179  22-Jan-2018  maxv Style, no functional change.
 1.178  22-Jan-2018  maxv Fix m_prepend(). If 'm' is not a pkthdr, it doesn't make sense to use
MH_ALIGN, it should rather be M_ALIGN.

I'm wondering whether there should not be a KASSERT to make sure 'm' is
always a pkthdr.
 1.177  14-Jan-2018  maxv style
 1.176  01-Jan-2018  maxv Detect use-after-frees on mbufs with external storage, too. This is done
even when the refcount is > 1.

Again, this code is enabled by default, because it is fast and quite
useful.
 1.175  01-Jan-2018  maxv Don't use macros, rather inline, much clearer.

For the record, I was partly mistaken in my previous commit: even though
the macros were local, the function names were still the ones of the real
callers.

However, setting the name in m_data was not a good thing; this was a
valid pointer, and the kernel could execute a long time before figuring
out the mbuf was already freed - therefore making debugging more difficult.
And information on the caller can be obtained via ddb anyway.
 1.174  31-Dec-2017  maxv Check MT_FREE by default, and not just under DEBUG (or DIAGNOSTIC). This
code is fast, with an nonexistent overhead - and we already take care of
setting MT_FREE, so why not check it.

In addition, stop registering the function name, that's not helpful since
the MBUFFREE macro is local. Instead, set m_data to NULL, so that any
access to a freed mbuf's data after mtod() or similar will page fault.

The combination of these two changes provides a fast and efficient way of
detecting use-after-frees in the network stack.
 1.173  09-Nov-2017  christos Don't use 0 for PR_NOWAIT
 1.172  31-Mar-2017  msaitoh branches: 1.172.6;
Remove extra 0x in m_print().
 1.171  14-Mar-2017  ozaki-r Use if_acquire and if_release instead of using psref API directly

- Provide if_release for consistency to if_acquire
- Use if_acquire and if_release for ifp iterations
- Make ifnet_psref_class static
 1.170  09-Jan-2017  christos branches: 1.170.2;
If we had an error, don't do the debug checks because they will most certainly
fail and we'll panic.
 1.169  04-Oct-2016  christos Hide MFREE now that it is not being used anymore and provide some debugging
for the location of the last free for debugging kernels.
 1.168  16-Jun-2016  ozaki-r branches: 1.168.2;
Use curlwp_bind and curlwp_bindx instead of open-coding LP_BOUND
 1.167  10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.166  10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.165  12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.164  20-Apr-2016  knakahara Add init function for mbuf.

some functions use mbuf as stack variable instead of allocating by m_get().
They should use this function(s) to prevent access to uninitialized fields.

Currently, the mbuf stack allocating functions are the following.
+ sys/dev/ic/bwi.c
- bwi_rxeof()
- bwi_encap()
+ sys/dev/ic/dp8390.c
- dp8390_ipkdb_send()
+ sys/dev/pci/if_txp.c
- txp_download_fw_section()
+ sys/dev/ppbus/if_plip.c
- lptap()
+ sys/net/bpf.c
- _pf_mtap2()
- _pf_mtap_af()
- _pf_mtap_sl_out()
+ sys/netisdn/i4b_ipr.c
- ipr_rx_data_rdy()
- ipr_tx_queue_empty()

Reviewed by kre@n.o and christos@n.o, thanks.
 1.163  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.162  24-Jul-2015  maxv typo (comment)
 1.161  08-Feb-2015  mlelstv Correct m_len calculation for m_dup() with mbuf clusters.
Fixes kern/49650.
 1.160  02-Dec-2014  ozaki-r Revert "Pull if_drain routine out of m_reclaim"

The commit broke dlopen()'d rumpnet on platforms where ld.so does not
override weak aliases (e.g. musl, Solaris, potentially OS X, ...).

Requested by pooka@.
 1.159  27-Nov-2014  ozaki-r branches: 1.159.2;
Pull if_drain routine out of m_reclaim

It's if-specific and should be in if.c.

No functional change.
 1.158  25-Feb-2014  pooka branches: 1.158.4;
Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.157  15-Nov-2013  christos remove trigger happy assertion. in m_adj negative lengths are valid.
 1.156  14-Nov-2013  christos - add KASSERTS on functions that don't accept M_COPYALL
- compute length for m_copyback0, m_makewritable used from ipf, is using
M_COPYALL.
 1.155  14-Nov-2013  skrll Deal with M_COPYALL becoming -ve properly in m_copym0.

I can now mount via nfs again.
 1.154  14-Nov-2013  christos change M_COPYALL to be -1 instead of depending on it too be "too large",
so that we check explicitly against it in all places. ok gimpy
 1.153  09-Oct-2013  christos - initialize m_len m_pkthgr.len to 0 in constructors, as discussed in tech-net.
- s/MGET/m_get
- s/0/NULL
 1.152  20-Sep-2013  christos mark mbuf as free when we return it to the pool (Beverly Schwartz)
 1.151  28-Jun-2013  matt branches: 1.151.2;
Make m_copydata panics more verbose
 1.150  27-Jun-2013  christos - add m_add() that puts an mbuf to end of a chain
- m_append() and m_align() with their family
- remove parameters from prototypes
 1.149  08-May-2013  pooka print more diagnostic info in panic message
 1.148  19-Jan-2013  rmind Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
 1.147  18-Oct-2012  para bring comment up to reality

kmem_map => kmem_arena
 1.146  29-Apr-2012  dsl branches: 1.146.2;
Remove the unused 'struct malloc_type' args to kern_malloc/realloc/free
The M_xxx arg is left on the calls to malloc() and free(),
maybe they could be converted to an enumeration and just saved in
the malloc header (for deep diag use).
Remove the malloc_type from mbuf extension.
Fixes rump build as well.
Welcome to 6.99.6
 1.145  10-Feb-2012  para branches: 1.145.2; 1.145.6;
proper sizing of kmem_arena on different ports

PR port-i386/45946: Kernel locks up in VMEM system
 1.144  27-Jan-2012  para extending vmem(9) to be able to allocated resources for it's own needs.
simplifying uvm_map handling (no special kernel entries anymore no relocking)
make malloc(9) a thin wrapper around kmem(9)
(with private interface for interrupt safety reasons)

releng@ acknowledged
 1.143  31-Aug-2011  plunky branches: 1.143.2; 1.143.6;
NULL does not need a cast
 1.142  08-Aug-2011  dyoung Miscellaneous mbuf changes:

1 Add some protection against double-freeing mbufs in DIAGNOSTIC kernels.

2 Add a m_defrag() that's derived from
sys/dev/pci/if_vge.c:vge_m_defrag(). This one copies the packet
header.

3 Constify m_tag_find().
 1.141  27-Jul-2011  uebayasi These don't need uvm/uvm_extern.h.
 1.140  24-Apr-2011  rmind - Replace few malloc(9) uses with kmem(9).
- Rename buf_malloc() to buf_alloc(), fix comments.
- Remove some unnecessary inclusions.
 1.139  17-Jan-2011  uebayasi Include internal definitions (uvm/uvm.h) only where necessary.
 1.138  24-Nov-2010  cegger branches: 1.138.2;
No need to print '0x' twice in the printing of
the mbuf flags via 'show mbuf'
 1.137  28-Oct-2010  seanb Always use m_split() in m_copyback() instead of its
local, abridged, version. This closes a window where
a new mbuf (n) can be inserted where n->m_next == n.
 1.136  11-May-2010  pooka remove unnecessary #ifdef
 1.135  16-Apr-2010  rmind Remove mclpool_allocator, which is unnecessary since mb_map removal.
 1.134  08-Feb-2010  joerg branches: 1.134.2;
Handle rump like the direct mapping case.
 1.133  08-Feb-2010  joerg Remove separate mb_map. The nmbclusters is computed at boot time based
on the amount of physical memory and limited by NMBCLUSTERS if present.
Architectures without direct mapping also limit it based on the kmem_map
size, which is used as backing store. On i386 and ARM, the maximum KVA
used for mbuf clusters is limited to 64MB by default.

The old default limits and limits based on GATEWAY have been removed.
key_registered_sb_max is hard-wired to a value derived from 2048
clusters.
 1.132  05-Apr-2009  bouyer branches: 1.132.2;
m_split0(): If the newly allocated mbuf holds only the header,
don't forget to set m_len to 0. Otherwise whatever will compute the size
of this chain (including s_split() itself if called again on this chain)
will get it wrong, leading to various issues.

Bug exposed by the NFS server code with linux clients using TCP mounts.
 1.131  15-Mar-2009  cegger ansify function definitions
 1.130  16-Dec-2008  christos branches: 1.130.2;
replace bitmask_snprintf(9) with snprintb(3)
 1.129  07-Dec-2008  pooka Move some sysctl node creations away from linksets and into the
constructors for subsystems.

XXX: CTLFLAG_PERMANENT is non-sensible.
 1.128  02-Jul-2008  matt branches: 1.128.2; 1.128.4; 1.128.6;
Switch from KASSERT to CTASSERT for those asserts testing sizes of types.
 1.127  28-Apr-2008  martin branches: 1.127.2; 1.127.4;
Remove clause 3 and 4 from TNF licenses
 1.126  09-Apr-2008  thorpej branches: 1.126.2; 1.126.4;
Make the percpu API a little more friendly:
- percpu_getptr() is now called percpu_getref() and implicitly disables
preemption (via crit_enter()) when it is called.
- Added percpu_putref() which implicitly reenables preemption (via
crit_exit()).
 1.125  24-Mar-2008  yamt merge yamt-lazymbuf branch.
 1.124  17-Jan-2008  yamt branches: 1.124.6;
make some mbuf related statistics per-cpu.
 1.123  14-Nov-2007  yamt branches: 1.123.6;
m_print: avoid sign extention of m_flags.
 1.122  07-Nov-2007  ad Merge from vmlocking:

- pool_cache changes.
- Debugger/procfs locking fixes.
- Other minor changes.
 1.121  12-Mar-2007  ad branches: 1.121.12; 1.121.14; 1.121.18; 1.121.20;
Pass an ipl argument to pool_init/POOL_INIT to be used when initializing
the pool's lock.
 1.120  04-Mar-2007  yamt branches: 1.120.2;
fix a fallout from caddr_t changes.
 1.119  04-Mar-2007  christos Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.118  22-Feb-2007  thorpej TRUE -> true, FALSE -> false
 1.117  21-Feb-2007  thorpej Replace the Mach-derived boolean_t type with the C99 bool type. A
future commit will replace use of TRUE and FALSE with true and false.
 1.116  01-Nov-2006  yamt branches: 1.116.4;
remove some __unused from function parameters.
 1.115  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.114  10-Oct-2006  dogcow change the MOWNER_INIT define to take two args; fix extant struct mowner
decls to use it. Makes options MBUFTRACE compile again and not whinge about
missing structure declarations. (Also makes initialization consistent.)
 1.113  03-Sep-2006  christos branches: 1.113.2; 1.113.4;
use c99 initializers
 1.112  08-Aug-2006  pavel MCLAIM the correct mbuf. PR kern/34162.
 1.111  25-May-2006  yamt branches: 1.111.4;
move wait points for kva from upper layers to vm_map. PR/33185 #1.

XXX there is a concern about interaction with kva fragmentation.
see: http://mail-index.NetBSD.org/tech-kern/2006/05/11/0000.html
 1.110  15-Apr-2006  christos branches: 1.110.2;
Coverity CID 848: Protect against NULL deref.
 1.109  19-Mar-2006  yamt m_copyback0:
- unify two copies of code to extend a chain.
- when extending a chain,
- use trailing space of the last mbuf if any.
- use mbuf cluster if appropriate.
 1.108  18-Mar-2006  yamt m_print: fix the previous correctly.
 1.107  18-Mar-2006  chris Fix Coverity CID 1473: Static buffer overrun.

Add a counter for the number of pages, so that we print out the ext_pgs
values.
 1.106  15-Mar-2006  yamt branches: 1.106.2;
m_copyback0: add comments and assertions.
 1.105  24-Jan-2006  yamt branches: 1.105.2; 1.105.4; 1.105.6; 1.105.8;
add ddb "sh mbuf" command.
 1.104  26-Dec-2005  perry branches: 1.104.2;
u_intN_t -> uintN_t
 1.103  08-Dec-2005  thorpej Sprinkle static.
 1.102  09-Nov-2005  skrll Typo in comment.
 1.101  18-Aug-2005  yamt - introduce M_MOVE_PKTHDR and use it where appropriate.
intended to be mostly API compatible with openbsd/freebsd.
- remove a glue #define in netipsec/ipsec_osdep.h.
 1.100  06-Jun-2005  martin branches: 1.100.2;
Since we decided "const struct mbuf *" would not do the right thing (tm),
remove ~all const from mbuf pointers.
 1.99  06-Jun-2005  martin Constify the source arg of m_copydata
 1.98  02-Jun-2005  explorer restore NetBSD RCS tag in __KERNEL_RCSID() macro
 1.97  02-Jun-2005  tron Change first argument of m_copydata() back to "struct mbuf *" because
m_copydata() might eventually modify the "mbuf" structure to support
lazy mbuf mapping as pointed out by YAMAMOTO Takashi on "tech-net".
 1.96  02-Jun-2005  tron Add missing RCS id. Problem pointed out by Jukka Salmi.
 1.95  02-Jun-2005  tron Fix bad botch invented in last change.
 1.94  02-Jun-2005  tron Change the first argument of m_copydata() to "const struct mbuf *" (which
doesn't require any implementation changes). This will allow us to get
rid off a lot of nasty type casts.
 1.93  01-Apr-2005  yamt merge yamt-km branch.
- don't use managed mappings/backing objects for wired memory allocations.
save some resources like pv_entry. also fix (most of) PR/27030.
- simplify kernel memory management API.
- simplify pmap bootstrap of some ports.
- some related cleanups.
 1.92  24-Jan-2005  matt branches: 1.92.2; 1.92.6;
Add IFNET_FOREACH and IFADDR_FOREACH macros and start using them.
 1.91  23-Jan-2005  matt Change initialzie of domains to use link sets. Switch to using STAILQ.
Add a convenience macro DOMAIN_FOREACH to interate through the domain.
 1.90  20-Oct-2004  matt branches: 1.90.4;
Make panic messages print out what condition they though was panic-worthy
instead of a 1 word message.
 1.89  05-Oct-2004  is Some code likes to mix MT_HEADER and MT_DATA. Revert this assertion until
the usage of MT_HEADER vs. MT_DATA is better defined and implemented.
 1.88  17-Sep-2004  enami Delete m_tag from a mbuf being non-pkthdr mbuf rather than newly becoming
pkthdr mbuf.
 1.87  11-Sep-2004  yamt m_split: restore a behaviour on M_PKTHDR, which was unintentionaly
changed when i added m_copyback_cow.
 1.86  08-Sep-2004  yamt m_copyback, m_copyback_cow, m_copydata:
- caddr_t -> void *
- constify.
partly from openbsd.
 1.85  06-Sep-2004  yamt add m_copyback_cow and m_makewritable.
 1.84  21-Jul-2004  yamt m_copyback: add an assertion to detect write attempts to a read-only mbuf.
 1.83  24-Jun-2004  jonathan Rename MBUFTRACE helper function m_claim() to m_claimm(),
for consistency with M_FREE() and m_freem(). Affected files:

sys/mbuf.h
kern/uipc_socket2.c
kern/uipc_mbuf.c
net/if_ethersubr.c
netatalk/ddp_input.c
nfs/nfs_socket.c
 1.82  25-May-2004  atatat Remaining sysctl descriptions under kern subtree
 1.81  22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.80  24-Mar-2004  atatat branches: 1.80.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.79  23-Mar-2004  junyoung Nuke __P().
 1.78  09-Mar-2004  yamt m_cat: assert mbuf types only when coalescing them by copying.
mbuf n often have 0-sized "headers" and their types don't matter much.

PR/24713 from Darrin B. Jewell.
 1.77  26-Feb-2004  itojun m_cat() - if it is safe, copy data portion into 1st mbuf even if 1st mbuf
is M_EXT mbuf.
 1.76  21-Jan-2004  atatat Fix the kern.mbuf tunables.
 1.75  04-Dec-2003  atatat Dynamic sysctl.

Gone are the old kern_sysctl(), cpu_sysctl(), hw_sysctl(),
vfs_sysctl(), etc, routines, along with sysctl_int() et al. Now all
nodes are registered with the tree, and nodes can be added (or
removed) easily, and I/O to and from the tree is handled generically.

Since the nodes are registered with the tree, the mapping from name to
number (and back again) can now be discovered, instead of having to be
hard coded. Adding new nodes to the tree is likewise much simpler --
the new infrastructure handles almost all the work for simple types,
and just about anything else can be done with a small helper function.

All existing nodes are where they were before (numerically speaking),
so all existing consumers of sysctl information should notice no
difference.

PS - I'm sorry, but there's a distinct lack of documentation at the
moment. I'm working on sysctl(3/8/9) right now, and I promise to
watch out for buses.
 1.74  03-Oct-2003  itojun when dropping M_PKTHDR, need to free m_tag associated with it.
 1.73  07-Sep-2003  yamt assert mbuf chains m_cat'ed are of the same type.
 1.72  04-Sep-2003  itojun clarify comment on m_cat().
 1.71  15-Aug-2003  simonb Return NULL instead of 0 for functions that return pointers.
Sprinkle some KNF whitespace.
 1.70  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.69  23-Jun-2003  martin branches: 1.69.2;
Make sure to include opt_foo.h if a defflag option FOO is used.
 1.68  27-May-2003  simonb Fix tyop in a comment.
 1.67  18-Apr-2003  simonb Add a KASSERT to make sure that "sizeof(struct mbuf)" is MSIZE.
Extra insurance for Steve Woodford's recent <sys/mbuf.h> patch.
 1.66  12-Apr-2003  thorpej Add two new mbuf routines:
* m_apply(), which applies a function to each mbuf in chain
starting at a specified offset, for a specified length.
* m_getptr(), which returns a pointer to the mbuf, as well as
the offset into that mbuf, corresponding to an offset from
the beginning of an mbuf chain.

From OpenBSD, cleaned up slightly by me.
 1.65  09-Apr-2003  thorpej * Use a pool_cache constructor to record the physical address of mbufs
in the mbuf header.
* Use the new cached paddr feature of the pool_cache API to record
the physical address of mbuf clusters. (We cannot use a ctor for
clusters, since clusters have no constructed form; they are merely
buffers).

Bus_dma back-ends may use the cached physical addresses to save having to
extract the physical address from virtual.

* Provide space in m_ext recording the vm_page *'s for an SOSEND_LOAN_CHUNK-
sized non-cluster external buffer. Use this in the sosend_loan code to
save having to extract the physical address from virtual and then look
up the vm_page *'s.

* Provide an indication that an external buffer is mapped read-only at
the MMU. Set this flag for the external buffer in the sosend_loan
case, since loaned pages are always mapped read-only. Bus_dma back-ends
may use this information to save cache flushing, since a cache flush of
a read-only mapping is redundant on some architectures (the cache would
have already been flushed when making the mapping read-only).

Part 2 in a series of simple patches contributed by Wasabi Systems
to improve network performance.
 1.64  26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.63  01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.62  31-Jan-2003  thorpej ANSI'ify.
 1.61  25-Sep-2002  thorpej Don't include <sys/map.h>.
 1.60  30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.59  09-Mar-2002  thorpej branches: 1.59.6;
Make mbpool and mclpool use the new drain hook facaility. Adjust
m_reclaim() to match the drain hook signature. This allows us to
delete m_retry() and m_retryhdr(), as the pool allocator will now
perform the reclaimation step for us.

From art@openbsd.org.
 1.58  08-Mar-2002  thorpej Pool deals fairly well with physical memory shortage, but it doesn't
deal with shortages of the VM maps where the backing pages are mapped
(usually kmem_map). Try to deal with this:

* Group all information about the backend allocator for a pool in a
separate structure. The pool references this structure, rather than
the individual fields.
* Change the pool_init() API accordingly, and adjust all callers.
* Link all pools using the same backend allocator on a list.
* The backend allocator is responsible for waiting for physical memory
to become available, but will still fail if it cannot callocate KVA
space for the pages. If this happens, carefully drain all pools using
the same backend allocator, so that some KVA space can be freed.
* Change pool_reclaim() to indicate if it actually succeeded in freeing
some pages, and use that information to make draining easier and more
efficient.
* Get rid of PR_URGENT. There was only one use of it, and it could be
dealt with by the caller.

From art@openbsd.org.
 1.57  12-Feb-2002  thorpej const char *mclpool_warnmsg -> const char mclpool_warnmsg[]

Noted by Matt Thomas.
 1.56  12-Nov-2001  lukem add RCSIDs
 1.55  29-Oct-2001  simonb Don't need to include <uvm/uvm_extern.h> just to include <sys/sysctl.h>
anymore.
 1.54  15-Sep-2001  chs branches: 1.54.2;
a whole bunch of changes to improve performance and robustness under load:

- remove special treatment of pager_map mappings in pmaps. this is
required now, since I've removed the globals that expose the address range.
pager_map now uses pmap_kenter_pa() instead of pmap_enter(), so there's
no longer any need to special-case it.
- eliminate struct uvm_vnode by moving its fields into struct vnode.
- rewrite the pageout path. the pager is now responsible for handling the
high-level requests instead of only getting control after a bunch of work
has already been done on its behalf. this will allow us to UBCify LFS,
which needs tighter control over its pages than other filesystems do.
writing a page to disk no longer requires making it read-only, which
allows us to write wired pages without causing all kinds of havoc.
- use a new PG_PAGEOUT flag to indicate that a page should be freed
on behalf of the pagedaemon when it's unlocked. this flag is very similar
to PG_RELEASED, but unlike PG_RELEASED, PG_PAGEOUT can be cleared if the
pageout fails due to eg. an indirect-block buffer being locked.
this allows us to remove the "version" field from struct vm_page,
and together with shrinking "loan_count" from 32 bits to 16,
struct vm_page is now 4 bytes smaller.
- no longer use PG_RELEASED for swap-backed pages. if the page is busy
because it's being paged out, we can't release the swap slot to be
reallocated until that write is complete, but unlike with vnodes we
don't keep a count of in-progress writes so there's no good way to
know when the write is done. instead, when we need to free a busy
swap-backed page, just sleep until we can get it busy ourselves.
- implement a fast-path for extending writes which allows us to avoid
zeroing new pages. this substantially reduces cpu usage.
- encapsulate the data used by the genfs code in a struct genfs_node,
which must be the first element of the filesystem-specific vnode data
for filesystems which use genfs_{get,put}pages().
- eliminate many of the UVM pagerops, since they aren't needed anymore
now that the pager "put" operation is a higher-level operation.
- enhance the genfs code to allow NFS to use the genfs_{get,put}pages
instead of a modified copy.
- clean up struct vnode by removing all the fields that used to be used by
the vfs_cluster.c code (which we don't use anymore with UBC).
- remove kmem_object and mb_object since they were useless.
instead of allocating pages to these objects, we now just allocate
pages with no object. such pages are mapped in the kernel until they
are freed, so we can use the mapping to find the page to free it.
this allows us to remove splvm() protection in several places.

The sum of all these changes improves write throughput on my
decstation 5000/200 to within 1% of the rate of NetBSD 1.5
and reduces the elapsed time for "make release" of a NetBSD 1.5
source tree on my 128MB pc to 10% less than a 1.5 kernel took.
 1.53  26-Jul-2001  thorpej branches: 1.53.2;
Use pool_cache_*() for mbufs and clusters. While we don't use the
ctor/dtor feature, it's still faster to allocate from the cache groups
than it is from the pool (cache groups are analogous to "magazines"
in the Solaris SLAB allocator).
 1.52  14-Jan-2001  thorpej branches: 1.52.2; 1.52.4;
Change some low-hanging splimp() calls to splvm().
 1.51  14-Nov-2000  itojun make sure every m_aux will be freed.
there are direct use of MFREE() from sys/kern.
(we experienced no memory leak so far, but if we use m_aux for other purposes,
we will need this change)
 1.50  18-Aug-2000  itojun repair m_dup(). specifically, now it is safe against non-MCLBYTES cluster
mbuf. noone seem to be using this function at this moment.
 1.49  18-Aug-2000  itojun disable m_dup(), as it makes false assumption on cluster mbuf and unsafe
(does not do the right thing).
 1.48  18-Aug-2000  itojun add a comment about false assumption made by m_dup()
 1.47  27-Jun-2000  mrg remove include of <vm/vm.h>
 1.46  26-Jun-2000  mrg remove/move more mach vm header files:

<vm/pglist.h> -> <uvm/uvm_pglist.h>
<vm/vm_inherit.h> -> <uvm/uvm_inherit.h>
<vm/vm_kern.h> -> into <uvm/uvm_extern.h>
<vm/vm_object.h> -> nothing
<vm/vm_pager.h> -> into <uvm/uvm_pager.h>

also includes a bunch of <vm/vm_page.h> include removals (due to redudancy
with <vm/vm.h>), and a scattering of other similar headers.
 1.45  01-Mar-2000  itojun branches: 1.45.4;
introduce m->m_pkthdr.aux to hold random data which needs to be passed
between protocol handlers.

ipsec socket pointers, ipsec decryption/auth information, tunnel
decapsulation information are in my mind - there can be several other usage.
at this moment, we use this for ipsec socket pointer passing. this will
avoid reuse of m->m_pkthdr.rcvif in ipsec code.

due to the change, MHLEN will be decreased by sizeof(void *) - for example,
for i386, MHLEN was 100 bytes, but is now 96 bytes.
we may want to increase MSIZE from 128 to 256 for some of our architectures.

take caution if you use it for keeping some data item for long period
of time - use extra caution on M_PREPEND() or m_adj(), as they may result
in loss of m->m_pkthdr.aux pointer (and mbuf leak).

this will bump kernel version.

(as discussed in tech-net, tested in kame tree)
 1.44  27-Oct-1999  itojun add mbuf deep-copy fnudtion, m_dup().
NOTE: if you use m_dup(), your additional kernel code can become
incompatible with 4.xBSD or other *BSD.
 1.43  05-Aug-1999  thorpej branches: 1.43.2; 1.43.4; 1.43.6;
Add some more diagnostic information to the 3 different `panic("m_copym")'
calls.
 1.42  26-Apr-1999  thorpej More improvements to mbuf and mbuf cluster allocation:

- Initialize mbpool and mclpool with msize and mclbytes, respectively,
so that those values may be patched and have an actual affect on the
next system reboot.

- Set low water marks on mbpool (default: 16) and mclpool (default: 8).
This should be of great help for diskless systems, which need to allocate
mbufs in order to clean dirty pages; the low water marks increase the
chances of this being possible to do in memory starvation situations.

- Add support for getting/setting some mbuf-related parameters via sysctl.
* msize and mclsize (read-only)
* nmbclusters (read-only unless the platform has direct-mapped pool pages,
in which case the value can be increased).
* mblowat and mcllowat (read/write)
 1.41  25-Apr-1999  simonb Use the nmbclusters variable and not the NMBCLUSTERS constant when setting
the mclpool hardlimit.
 1.40  01-Apr-1999  thorpej branches: 1.40.4;
mbinit() can now allocate memory. Update a comment accordingly.
 1.39  31-Mar-1999  thorpej Set a hard limit (rather than an advisory high water mark for pages) of
NMBCLUSTERS for the mbuf cluster pool. On platforms which use direct-mapped
segments for pool pages (MIPS and Alpha), this makes NMBCLUSTERS actually
meaningful (such ports don't even allocate mb_map, as it is not used to
map mbuf cluster pages).

Improve the message logged at a maximum rate of once per second. The
new message: "WARNING: mclpool limit reached; increase NMBCLUSTERS".

In the back-end pool page allocator, remove the message about mb_map
being full. The message was not necessarily correct as the allocator
may have been starved for pages, rather than for space in the map. Also,
the hard limit on the mbuf cluster pool will be reached before the map
fills (the last cluster will always fit into the map), so the message
is redundant.

Add a comment in mbinit() about considering setting low water marks on
the mbuf and mbuf cluster pools.
 1.38  24-Mar-1999  mrg completely remove Mach VM support. all that is left is the all the
header files as UVM still uses (most of) these.
 1.37  23-Mar-1999  thorpej Set the high water mark on the mbuf cluster pool to NMBCLUSTERS.
 1.36  22-Mar-1999  thorpej Put back the code to log `mb_map full' that was lost when mbuf clusters
were converted to use the pool allocator.
 1.35  09-Jan-1999  thorpej Garbage-collect `mbutl'.
 1.34  09-Jan-1999  thorpej Garbage-collect `union mcluster' and `mclfree'.
 1.33  18-Dec-1998  thorpej Reverse the stopgap change made in revision 1.29:

date: 1998/08/01 01:47:24; author: thorpej; state: Exp; lines: +18 -8
Don't call the protocol drain routines if how == M_NOWAIT, which typically
means we're in interrupt context. Since we can be called from a network
hardware interrupt, we could corrupt the protocol queues we try to drain
them at that time.

The problem has been addressed by letting the drain'able protocols use
a locking scheme to prevent queue corruption.
 1.32  28-Aug-1998  thorpej branches: 1.32.4;
Add a waitok boolean argument to the VM system's pool page allocator backend.
 1.31  13-Aug-1998  thorpej Oops, this got missed in the vm_offset_t -> vaddr_t change.
 1.30  04-Aug-1998  perry Abolition of bcopy, ovbcopy, bcmp, and bzero, phase one.
bcopy(x, y, z) -> memcpy(y, x, z)
ovbcopy(x, y, z) -> memmove(y, x, z)
bcmp(x, y, z) -> memcmp(x, y, z)
bzero(x, y) -> memset(x, 0, y)
 1.29  01-Aug-1998  thorpej Don't call the protocol drain routines if how == M_NOWAIT, which typically
means we're in interrupt context. Since we can be called from a network
hardware interrupt, we could corrupt the protocol queues we try to drain
them at that time.
 1.28  01-Aug-1998  thorpej Use the pool allocator for mbufs and mbufs clusters (two pools, one for
each). Partially from pk@netbsd.org.
 1.27  22-May-1998  matt branches: 1.27.2;
Add an if_drain to the ifnet structure (call when the system is low
on mbufs). Add code to m_reclaim to call if_drain in each ifnet
that has one set. Remove register from declarations.
 1.26  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.25  12-Feb-1998  kleink Fix variable declarations: register -> register int.
 1.24  10-Feb-1998  mrg - add defopt's for UVM, UVMHIST and PMAP_NEW.
- remove unnecessary UVMHIST_DECL's.
 1.23  05-Feb-1998  mrg initial import of the new virtual memory system, UVM, into -current.

UVM was written by chuck cranor <chuck@maria.wustl.edu>, with some
minor portions derived from the old Mach code. i provided some help
getting swap and paging working, and other bug fixes/ideas. chuck
silvers <chuq@chuq.com> also provided some other fixes.

this is the rest of the MI portion changes.

this will be KNF'd shortly. :-)
 1.22  20-Nov-1997  thorpej In m_split(), restore m_pkthdr.len if an error occurs. From Koji Imada,
PR #3986.
 1.21  06-Jun-1997  pk branches: 1.21.8;
Get `canwait' argument to kmem_malloc() right.
 1.20  28-Apr-1997  mycroft Oops; forgot to GC the last mbuf allocated when out of clusters.
 1.19  24-Apr-1997  mycroft If we fail to allocate a cluster to hold a large packet, simply
drop it rather than using a chain of tiny mbufs.
 1.18  27-Mar-1997  thorpej Update and enhancement to the mbuf code, to support use of non-cluster
external storage. Highlights:

- additional "void *" argument to (*ext_free)(), an opaque
cookie for use by the free function.
- MCLALLOC() and MCLFREE() calls are gone. They are replaced
by MEXTADD() (add external storage to mbuf), MEXTMALLOC()
(malloc() external storage and attach to mbuf), and
MEXTREMOVE() (remove external storage from mbuf).
- completely new external storage reference counting
mechanism; mclrefcnt[] is gone.

These changes will eventually be used to pass driver DMA buffers up
the network stack, and reduce/eliminate copies in certain code paths
(e.g. NFS writes).

From Matt Thomas <matt@3am-software.com> and myself <thorpej@nas.nasa.gov>,
with some input from Chris Demetriou <cgd@cs.cmu.edu> and review by
Charles Hannum <mycroft@mit.edu>.
 1.17  18-Dec-1996  gwr Move `static' to the beginning of the storage class specifiers.
 1.16  13-Jun-1996  cgd if kmem_malloc() fails while trying to allocate an mbuf cluster, try
and free some space by calling m_reclaim(). Also, log the "mb_map full"
error message (at most) every 60-seconds. The old code would log it
once over the lifetime of the system, but that's not a useful diagnostic.
(More useful is the new behaviour, which roughly indicates how often
periods of heavy load occur, without spamming the console and system
logs with messages.)
 1.15  09-Feb-1996  christos branches: 1.15.4;
More proto fixes
 1.14  04-Feb-1996  christos First pass at prototyping
 1.13  30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.12  28-Sep-1994  deraadt don't play with CLBYTES in cpp
 1.11  19-Sep-1994  mycroft m_adj() returns void.
 1.10  29-Jun-1994  cgd branches: 1.10.2;
New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9  13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8  14-Apr-1994  deraadt the packet header is at the start of the mbuf chain, not the end.
 1.7  08-Jan-1994  mycroft #include vm_kern.h.
 1.6  18-Dec-1993  mycroft Canonicalize all #includes.
 1.5  22-Oct-1993  cgd slightly clean up ws's original patch to this file for the sense
of wait vs. nowait. this patch from torek.
 1.4  04-Sep-1993  jtc branches: 1.4.2;
Include systm.h to get prototypes (and possibly inlines) of *max functions.
Change mbinit() to match prototype.
 1.3  20-May-1993  cgd add $Id$ strings, and clean up file headers where necessary
 1.2  21-Mar-1993  cgd after 0.2.2 "stable" patches applied
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.4.2.3  14-Nov-1993  mycroft Canonicalize all #includes.
 1.4.2.2  26-Oct-1993  mycroft Merge changes from trunk.
 1.4.2.1  24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
init_main.c: New method of pseudo-device of initialization.
kern_clock.c: hardclock() and softclock() now take a pointer to a clockframe.
softclock() only does callouts.
kern_synch.c: Remove spurious declaration of endtsleep(). Adjust uses of
averunnable for new struct loadav.
subr_prf.c: Allow printf() formats in panic().
tty.c: averunnable changes.
vfs_subr.c: va_size and va_bytes are now quads.
 1.10.2.1  06-Oct-1994  mycroft Update from trunk.
 1.15.4.1  13-Jun-1996  cgd pull up from trunk:
>if kmem_malloc() fails while trying to allocate an mbuf cluster, try
>and free some space by calling m_reclaim(). Also, log the "mb_map full"
>error message (at most) every 60-seconds. The old code would log it
>once over the lifetime of the system, but that's not a useful diagnostic.
>(More useful is the new behaviour, which roughly indicates how often
>periods of heavy load occur, without spamming the console and system
>logs with messages.)
 1.21.8.1  20-Nov-1997  thorpej Pull up from trunk: restore m_pkthdr.len in m_split() on error.
 1.27.2.1  08-Aug-1998  eeh Revert cdevsw mmap routines to return int.
 1.32.4.1  11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.40.4.1  21-Jun-1999  thorpej Sync w/ -current.
 1.43.6.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.43.4.1  15-Nov-1999  fvdl Sync with -current
 1.43.2.3  18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.43.2.2  22-Nov-2000  bouyer Sync with HEAD.
 1.43.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.45.4.2  04-Feb-2001  he Pull up revision 1.51 (requested by itojun):
Make sure every m_aux will be freed.
 1.45.4.1  19-Aug-2000  itojun pullup 1.48 -> 1.50 (approved by releng-1-5)
repair m_dup(). specifically, now it is safe against non-MCLBYTES external
mbuf. noone seem to be using this function at this moment.
 1.52.4.5  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.52.4.4  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.52.4.3  16-Mar-2002  jdolecek Catch up with -current.
 1.52.4.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.52.4.1  03-Aug-2001  lukem update to -current
 1.52.2.7  18-Oct-2002  nathanw Catch up to -current.
 1.52.2.6  01-Aug-2002  nathanw Catch up to -current.
 1.52.2.5  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.52.2.4  28-Feb-2002  nathanw Catch up to -current.
 1.52.2.3  14-Nov-2001  nathanw Catch up to -current.
 1.52.2.2  21-Sep-2001  nathanw Catch up to -current.
 1.52.2.1  24-Aug-2001  nathanw Catch up with -current.
 1.53.2.1  01-Oct-2001  fvdl Catch up with -current.
 1.54.2.1  12-Nov-2001  thorpej Sync the thorpej-mips-cache branch with -current.
 1.59.6.1  15-Jul-2002  gehenna catch up with -current.
 1.69.2.10  11-Dec-2005  christos Sync with head.
 1.69.2.9  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.69.2.8  01-Apr-2005  skrll Sync with HEAD.
 1.69.2.7  04-Feb-2005  skrll Sync with HEAD.
 1.69.2.6  24-Jan-2005  skrll Sync with HEAD.
 1.69.2.5  02-Nov-2004  skrll Sync with HEAD.
 1.69.2.4  19-Oct-2004  skrll Sync with HEAD
 1.69.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.69.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.69.2.1  03-Aug-2004  skrll Sync with HEAD
 1.80.2.5  08-Oct-2004  jmc Pullup rev 1.89 (requested by is in ticket #895)

Some code likes to mix MT_HEADER and MT_DATA. Revert this assertion until
the usage of MT_HEADER vs. MT_DATA is better defined and implemented.
 1.80.2.4  11-Sep-2004  he Pull up revision 1.87 (requested by yamt in ticket #841):
Restore behaviour of m_split() on M_PKTHDR which was
unintentionally changed when m_copyback_cow() was added.
 1.80.2.3  11-Sep-2004  he Pull up revisions 1.84-1.85 (requested by yamt in ticket #831):
Add an assertion to detect write to a read-only mbuf.
Add m_copyback_cow and m_makewritable.
 1.80.2.2  14-Jul-2004  tron Pull up revision 1.83 (requested by jonathan in ticket #648):
Rename MBUFTRACE helper function m_claim() to m_claimm(),
for consistency with M_FREE() and m_freem(). Affected files:
sys/mbuf.h
kern/uipc_socket2.c
kern/uipc_mbuf.c
net/if_ethersubr.c
netatalk/ddp_input.c
nfs/nfs_socket.c
 1.80.2.1  26-May-2004  he Pull up revision 1.82 (requested by atatat in ticket #388):
Add remaining sysctl descriptions under kern subtree.
 1.90.4.1  29-Apr-2005  kent sync with -current
 1.92.6.3  08-Sep-2006  ghen Pull up following revision(s) (requested by pavel in ticket #1503):
sys/kern/uipc_mbuf.c: revision 1.112
MCLAIM the correct mbuf. PR kern/34162.
 1.92.6.2  09-Jun-2005  snj Pull up revision 1.98 (requested by tron in ticket #387):
restore NetBSD RCS tag in __KERNEL_RCSID() macro
 1.92.6.1  09-Jun-2005  snj Pull up revision 1.96 (requested by tron in ticket #387):
Add missing RCS id. Problem pointed out by Jukka Salmi.
 1.92.2.1  25-Jan-2005  yamt convert to new apis.
 1.100.2.26  27-Feb-2008  yamt remove mbuf ext_lock which is no longer used.
 1.100.2.25  27-Feb-2008  yamt drop lazy mapping of mbuf external storage for now, because:
- it's currently broken wrt asm code. (cpu_in_cksum)
- there are other approaches worth to consider. eg. sf_buf
 1.100.2.24  14-Feb-2008  yamt m_ext_free: optimize the common case.
 1.100.2.23  11-Feb-2008  yamt m_ext_free: don't use atomic op where unnecessary.
 1.100.2.22  05-Feb-2008  yamt use mutex_spin_enter.
 1.100.2.21  21-Jan-2008  yamt sync with head
 1.100.2.20  07-Dec-2007  yamt use atomic ops unconditionally.
 1.100.2.19  15-Nov-2007  yamt mcl_inc_reference, mcl_dec_and_test_reference: use atomic ops if x86.
 1.100.2.18  15-Nov-2007  yamt update a comment
 1.100.2.17  15-Nov-2007  yamt mbpool_cache -> mb_cache
 1.100.2.16  15-Nov-2007  yamt sync with head.
 1.100.2.15  27-Oct-2007  yamt make ext_lock kmutex_t.
 1.100.2.14  03-Sep-2007  yamt kill caddr_t.
 1.100.2.13  03-Sep-2007  yamt sync with head.
 1.100.2.12  26-Feb-2007  yamt sync with head.
 1.100.2.11  30-Dec-2006  yamt sync with head.
 1.100.2.10  07-Jul-2006  yamt - fix typos and compilation problems in uipc_mbuf.c rev.1.100.2.8.
- m_ext_free: fix the recursive call case.
- change return value of mcl_dec_and_test_reference.
- tweak assertions.
 1.100.2.9  07-Jul-2006  yamt m_print: print raw ext_refcnt rather than MCLISREFERENCED.
 1.100.2.8  06-Jul-2006  yamt tweak code so that it can be switched to atomic operations later easily.
 1.100.2.7  06-Jul-2006  yamt - move some macros from mbuf.h to uipc_mbuf.c.
- remove unused MCLBUFREF.
 1.100.2.6  21-Jun-2006  yamt sync with head.
 1.100.2.5  15-Jul-2005  yamt m_mapin: fix an spl botch.
 1.100.2.4  07-Jul-2005  yamt defer mapping only when defined(__HAVE_LAZY_MBUF).
 1.100.2.3  07-Jul-2005  yamt sosend_loan: defer mapping of mbuf external data pages.
mtod: map mbuf external data pages if needed.
 1.100.2.2  07-Jul-2005  yamt de-inline m_ext_free.
 1.100.2.1  07-Jul-2005  yamt adapt to mbuf.h changes.
 1.104.2.1  01-Feb-2006  yamt sync with head.
 1.105.8.1  19-Apr-2006  elad sync with head.
 1.105.6.5  14-Sep-2006  yamt sync with head.
 1.105.6.4  11-Aug-2006  yamt sync with head
 1.105.6.3  26-Jun-2006  yamt sync with head.
 1.105.6.2  24-May-2006  yamt sync with head.
 1.105.6.1  01-Apr-2006  yamt sync with head.
 1.105.4.2  01-Jun-2006  kardel Sync with head.
 1.105.4.1  22-Apr-2006  simonb Sync with head.
 1.105.2.1  09-Sep-2006  rpaulo sync with head
 1.106.2.2  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.106.2.1  28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.110.2.1  19-Jun-2006  chap Sync with head.
 1.111.4.1  08-Sep-2006  rpaulo Pull up following revision(s) (requested by pavel in ticket #135):
sys/kern/uipc_mbuf.c: revision 1.112
MCLAIM the correct mbuf. PR kern/34162.
 1.113.4.2  10-Dec-2006  yamt sync with head.
 1.113.4.1  22-Oct-2006  yamt sync with head
 1.113.2.1  18-Nov-2006  ad Sync with head.
 1.116.4.3  24-Mar-2007  yamt sync with head.
 1.116.4.2  12-Mar-2007  rmind Sync with HEAD.
 1.116.4.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.120.2.3  01-Nov-2007  ad m_reclaim: acquire kernel_lock as this can be called from the pagedaemon.
 1.120.2.2  01-Sep-2007  ad Update for pool_cache API changes.
 1.120.2.1  13-Mar-2007  ad Sync with head.
 1.121.20.2  18-Feb-2008  mjf Sync with HEAD.
 1.121.20.1  19-Nov-2007  mjf Sync with HEAD.
 1.121.18.2  18-Nov-2007  bouyer Sync with HEAD
 1.121.18.1  13-Nov-2007  bouyer Sync with HEAD
 1.121.14.3  23-Mar-2008  matt sync with HEAD
 1.121.14.2  09-Jan-2008  matt sync with HEAD
 1.121.14.1  08-Nov-2007  matt sync with -HEAD
 1.121.12.2  14-Nov-2007  joerg Sync with HEAD.
 1.121.12.1  11-Nov-2007  joerg Sync with HEAD.
 1.123.6.1  19-Jan-2008  bouyer Sync with HEAD
 1.124.6.4  17-Jan-2009  mjf Sync with HEAD.
 1.124.6.3  02-Jul-2008  mjf Sync with HEAD.
 1.124.6.2  02-Jun-2008  mjf Sync with HEAD.
 1.124.6.1  03-Apr-2008  mjf Sync with HEAD.
 1.126.4.4  11-Aug-2010  yamt sync with head.
 1.126.4.3  11-Mar-2010  yamt sync with head
 1.126.4.2  04-May-2009  yamt sync with head.
 1.126.4.1  16-May-2008  yamt sync with head.
 1.126.2.1  18-May-2008  yamt sync with head.
 1.127.4.1  03-Jul-2008  simonb Sync with head.
 1.127.2.1  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.128.6.1  07-Apr-2009  snj Pull up following revision(s) (requested by bouyer in ticket #674):
sys/kern/uipc_mbuf.c: revision 1.132
m_split0(): If the newly allocated mbuf holds only the header,
don't forget to set m_len to 0. Otherwise whatever will compute the size
of this chain (including s_split() itself if called again on this chain)
will get it wrong, leading to various issues.
Bug exposed by the NFS server code with linux clients using TCP mounts.
 1.128.4.2  28-Apr-2009  skrll Sync with HEAD.
 1.128.4.1  19-Jan-2009  skrll Sync with HEAD.
 1.128.2.1  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.130.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.132.2.3  06-Nov-2010  uebayasi Sync with HEAD.
 1.132.2.2  17-Aug-2010  uebayasi Sync with HEAD.
 1.132.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.134.2.3  31-May-2011  rmind sync with head
 1.134.2.2  05-Mar-2011  rmind sync with head
 1.134.2.1  30-May-2010  rmind sync with head
 1.138.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.143.6.2  29-Apr-2012  mrg sync to latest -current.
 1.143.6.1  18-Feb-2012  mrg merge to -current.
 1.143.2.5  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.143.2.4  23-Jan-2013  yamt sync with head
 1.143.2.3  30-Oct-2012  yamt sync with head
 1.143.2.2  23-May-2012  yamt sync with head.
 1.143.2.1  17-Apr-2012  yamt sync with head
 1.145.6.1  03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1547):

sys/kern/uipc_mbuf.c: revision 1.211 (via patch)

Modify m_defrag, so that it never frees the first mbuf of the chain. While
here use the given 'flags' argument, and not M_DONTWAIT.

We have a problem with several drivers: they poll an mbuf chain from their
queues and call m_defrag on them, but m_defrag could update the mbuf
pointer, so the mbuf in the queue is no longer valid. It is not easy to
fix each driver, because doing pop+push will reorder the queue, and we
don't really want that to happen.

This problem was independently spotted by me, Kengo, Masanobu, and other
people too it seems (perhaps PR/53218).

Now m_defrag leaves the first mbuf in place, and compresses the chain
only starting from the second mbuf in the chain.

It is important not to compress the first mbuf with hacks, because the
storage of this first mbuf may be shared with other mbufs.
 1.145.2.2  03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1547):

sys/kern/uipc_mbuf.c: revision 1.211 (via patch)

Modify m_defrag, so that it never frees the first mbuf of the chain. While
here use the given 'flags' argument, and not M_DONTWAIT.

We have a problem with several drivers: they poll an mbuf chain from their
queues and call m_defrag on them, but m_defrag could update the mbuf
pointer, so the mbuf in the queue is no longer valid. It is not easy to
fix each driver, because doing pop+push will reorder the queue, and we
don't really want that to happen.

This problem was independently spotted by me, Kengo, Masanobu, and other
people too it seems (perhaps PR/53218).

Now m_defrag leaves the first mbuf in place, and compresses the chain
only starting from the second mbuf in the chain.

It is important not to compress the first mbuf with hacks, because the
storage of this first mbuf may be shared with other mbufs.
 1.145.2.1  08-Feb-2013  riz branches: 1.145.2.1.2;
Pull up following revision(s) (requested by rmind in ticket #777):
usr.sbin/npf/npfctl/npfctl.c: revision 1.27
sys/net/npf/npf_session.c: revision 1.19
usr.sbin/npf/npftest/libnpftest/npf_mbuf_subr.c: revision 1.4
sys/net/npf/npf_rproc.c: revision 1.5
usr.sbin/npf/npftest/README: revision 1.3
sys/sys/mbuf.h: revision 1.151
sys/net/npf/npf_ruleset.c: revision 1.15
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.3
sys/net/npf/npf_ruleset.c: revision 1.16
usr.sbin/npf/npftest/libnpftest/npf_state_test.c: revision 1.4
usr.sbin/npf/npftest/libnpftest/npf_nbuf_test.c: revision 1.4
sys/net/npf/npf_inet.c: revision 1.19
sys/net/npf/npf_instr.c: revision 1.15
sys/net/npf/npf_handler.c: revision 1.24
sys/net/npf/npf_handler.c: revision 1.25
sys/net/npf/npf_state_tcp.c: revision 1.12
sys/net/npf/npf_processor.c: revision 1.13
sys/net/npf/npf_impl.h: revision 1.25
sys/net/npf/npf_processor.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.10
sys/net/npf/npf_alg_icmp.c: revision 1.14
sys/net/npf/npf_mbuf.c: revision 1.9
usr.sbin/npf/npftest/libnpftest/npf_nat_test.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_rule_test.c: revision 1.3
sys/net/npf/npf_session.c: revision 1.20
sys/net/npf/npf_alg.c: revision 1.6
sys/kern/uipc_mbuf.c: revision 1.148
sys/net/npf/npf_inet.c: revision 1.20
sys/net/npf/npf.h: revision 1.25
sys/net/npf/npf_nat.c: revision 1.18
sys/net/npf/npf_state.c: revision 1.13
sys/net/npf/npf_sendpkt.c: revision 1.13
sys/net/npf/npf_ext_log.c: revision 1.2
usr.sbin/npf/npftest/libnpftest/npf_processor_test.c: revision 1.4
sys/net/npf/npf_ext_normalise.c: revision 1.2
- Rework NPF's nbuf interface: use advancing and ensuring as a main method.
Eliminate unnecessary copy and simplify. Adapt regression tests.
- Simplify ICMP ALG a little. While here, handle ICMP ECHO for traceroute.
- Minor fixes, misc cleanup.
Silence gcc in npf_recache().
Add m_ensure_contig() routine, which is equivalent to m_pullup, but does not
destroy the mbuf chain on failure (it is kept valid).
- nbuf_ensure_contig: rework to use m_ensure_contig(9), which will not free
the mbuf chain on failure. Fixes some corner cases. Improve regression
test and sprinkle some asserts.
- npf_reassembly: clear nbuf on IPv6 reassembly failure path (partial fix).
The problem was found and fix provided by Anthony Mallet.
 1.145.2.1.2.1  03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1547):

sys/kern/uipc_mbuf.c: revision 1.211 (via patch)

Modify m_defrag, so that it never frees the first mbuf of the chain. While
here use the given 'flags' argument, and not M_DONTWAIT.

We have a problem with several drivers: they poll an mbuf chain from their
queues and call m_defrag on them, but m_defrag could update the mbuf
pointer, so the mbuf in the queue is no longer valid. It is not easy to
fix each driver, because doing pop+push will reorder the queue, and we
don't really want that to happen.

This problem was independently spotted by me, Kengo, Masanobu, and other
people too it seems (perhaps PR/53218).

Now m_defrag leaves the first mbuf in place, and compresses the chain
only starting from the second mbuf in the chain.

It is important not to compress the first mbuf with hacks, because the
storage of this first mbuf may be shared with other mbufs.
 1.146.2.5  03-Dec-2017  jdolecek update from HEAD
 1.146.2.4  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.146.2.3  23-Jun-2013  tls resync from head
 1.146.2.2  25-Feb-2013  tls resync with head
 1.146.2.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.151.2.1  18-May-2014  rmind sync with head
 1.158.4.5  22-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1606):

sys/kern/uipc_mbuf.c: revision 1.214

Revert my rev1.190, remove the M_READONLY check. The initial code was
correct: what is read-only is the mbuf storage, not the mbuf itself. The
storage contains the packet payload, and never has anything related to
mbufs. So it is fine to remove M_PKTHDR on mbufs that have a read-only
storage.

In fact it was kind of obvious, since several places already manually
remove M_PKTHDR without taking care of the external storage.
 1.158.4.4  03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1602):

sys/kern/uipc_mbuf.c: revision 1.211 (via patch)

Modify m_defrag, so that it never frees the first mbuf of the chain. While
here use the given 'flags' argument, and not M_DONTWAIT.

We have a problem with several drivers: they poll an mbuf chain from their
queues and call m_defrag on them, but m_defrag could update the mbuf
pointer, so the mbuf in the queue is no longer valid. It is not easy to
fix each driver, because doing pop+push will reorder the queue, and we
don't really want that to happen.

This problem was independently spotted by me, Kengo, Masanobu, and other
people too it seems (perhaps PR/53218).

Now m_defrag leaves the first mbuf in place, and compresses the chain
only starting from the second mbuf in the chain.

It is important not to compress the first mbuf with hacks, because the
storage of this first mbuf may be shared with other mbufs.
 1.158.4.3  17-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1598):

sys/kern/uipc_mbuf.c: revision 1.190

If the mbuf is shared leave M_PKTHDR in place. Given where this function
is called from that's not supposed to happen, but I'm growing unconfident
about our mbuf code.
 1.158.4.2  05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1594):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.158.4.1  09-Feb-2015  martin branches: 1.158.4.1.2; 1.158.4.1.6;
Pull up following revision(s) (requested by mlelstv in ticket #501):
sys/kern/uipc_mbuf.c: revision 1.161
Correct m_len calculation for m_dup() with mbuf clusters.
Fixes kern/49650.
 1.158.4.1.6.4  22-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1606):

sys/kern/uipc_mbuf.c: revision 1.214

Revert my rev1.190, remove the M_READONLY check. The initial code was
correct: what is read-only is the mbuf storage, not the mbuf itself. The
storage contains the packet payload, and never has anything related to
mbufs. So it is fine to remove M_PKTHDR on mbufs that have a read-only
storage.

In fact it was kind of obvious, since several places already manually
remove M_PKTHDR without taking care of the external storage.
 1.158.4.1.6.3  03-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1602):

sys/kern/uipc_mbuf.c: revision 1.211 (via patch)

Modify m_defrag, so that it never frees the first mbuf of the chain. While
here use the given 'flags' argument, and not M_DONTWAIT.

We have a problem with several drivers: they poll an mbuf chain from their
queues and call m_defrag on them, but m_defrag could update the mbuf
pointer, so the mbuf in the queue is no longer valid. It is not easy to
fix each driver, because doing pop+push will reorder the queue, and we
don't really want that to happen.

This problem was independently spotted by me, Kengo, Masanobu, and other
people too it seems (perhaps PR/53218).

Now m_defrag leaves the first mbuf in place, and compresses the chain
only starting from the second mbuf in the chain.

It is important not to compress the first mbuf with hacks, because the
storage of this first mbuf may be shared with other mbufs.
 1.158.4.1.6.2  17-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1598):

sys/kern/uipc_mbuf.c: revision 1.190

If the mbuf is shared leave M_PKTHDR in place. Given where this function
is called from that's not supposed to happen, but I'm growing unconfident
about our mbuf code.
 1.158.4.1.6.1  05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1594):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.158.4.1.2.4  22-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1606):

sys/kern/uipc_mbuf.c: revision 1.214

Revert my rev1.190, remove the M_READONLY check. The initial code was
correct: what is read-only is the mbuf storage, not the mbuf itself. The
storage contains the packet payload, and never has anything related to
mbufs. So it is fine to remove M_PKTHDR on mbufs that have a read-only
storage.

In fact it was kind of obvious, since several places already manually
remove M_PKTHDR without taking care of the external storage.
 1.158.4.1.2.3  15-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #1602):

sys/kern/uipc_mbuf.c: revision 1.211 (via patch)

Modify m_defrag, so that it never frees the first mbuf of the chain. While
here use the given 'flags' argument, and not M_DONTWAIT.

We have a problem with several drivers: they poll an mbuf chain from their
queues and call m_defrag on them, but m_defrag could update the mbuf
pointer, so the mbuf in the queue is no longer valid. It is not easy to
fix each driver, because doing pop+push will reorder the queue, and we
don't really want that to happen.

This problem was independently spotted by me, Kengo, Masanobu, and other
people too it seems (perhaps PR/53218).

Now m_defrag leaves the first mbuf in place, and compresses the chain
only starting from the second mbuf in the chain.

It is important not to compress the first mbuf with hacks, because the
storage of this first mbuf may be shared with other mbufs.
 1.158.4.1.2.2  17-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1598):

sys/kern/uipc_mbuf.c: revision 1.190

If the mbuf is shared leave M_PKTHDR in place. Given where this function
is called from that's not supposed to happen, but I'm growing unconfident
about our mbuf code.
 1.158.4.1.2.1  05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #1594):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.159.2.8  28-Aug-2017  skrll Sync with HEAD
 1.159.2.7  05-Feb-2017  skrll Sync with HEAD
 1.159.2.6  05-Oct-2016  skrll Sync with HEAD
 1.159.2.5  09-Jul-2016  skrll Sync with HEAD
 1.159.2.4  29-May-2016  skrll Sync with HEAD
 1.159.2.3  22-Apr-2016  skrll Sync with HEAD
 1.159.2.2  22-Sep-2015  skrll Sync with HEAD
 1.159.2.1  06-Apr-2015  skrll Sync with HEAD
 1.168.2.3  26-Apr-2017  pgoyette Sync with HEAD
 1.168.2.2  20-Mar-2017  pgoyette Sync with HEAD
 1.168.2.1  04-Nov-2016  pgoyette Sync with HEAD
 1.170.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.172.6.6  25-Oct-2021  martin Pull up following revision(s) (requested by msaitoh in ticket #1703):

sys/conf/files: revision 1.1288
sys/kern/uipc_mbuf.c: revision 1.244
share/man/man4/options.4: revision 1.520

Fix a bug that NMBCLUSTERS(kern.mbuf.nmbclusters) can't be changed by sysctl.

Update the description of the NMBCLUSTERS. Add NMBCLUSTERS_MAX.

defparam NMBCLUSTERS_MAX.
 1.172.6.5  22-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #833):

sys/kern/uipc_mbuf.c: revision 1.214

Revert my rev1.190, remove the M_READONLY check. The initial code was
correct: what is read-only is the mbuf storage, not the mbuf itself. The
storage contains the packet payload, and never has anything related to
mbufs. So it is fine to remove M_PKTHDR on mbufs that have a read-only
storage.

In fact it was kind of obvious, since several places already manually
remove M_PKTHDR without taking care of the external storage.
 1.172.6.4  06-May-2018  martin Pull up following revision(s) (requested by maxv in ticket #802):

sys/kern/uipc_mbuf.c: revision 1.211 (via patch)

Modify m_defrag, so that it never frees the first mbuf of the chain. While
here use the given 'flags' argument, and not M_DONTWAIT.

We have a problem with several drivers: they poll an mbuf chain from their
queues and call m_defrag on them, but m_defrag could update the mbuf
pointer, so the mbuf in the queue is no longer valid. It is not easy to
fix each driver, because doing pop+push will reorder the queue, and we
don't really want that to happen.

This problem was independently spotted by me, Kengo, Masanobu, and other
people too it seems (perhaps PR/53218).
Now m_defrag leaves the first mbuf in place, and compresses the chain
only starting from the second mbuf in the chain.

It is important not to compress the first mbuf with hacks, because the
storage of this first mbuf may be shared with other mbufs.
 1.172.6.3  17-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #770):

sys/kern/uipc_mbuf.c: revision 1.190

If the mbuf is shared leave M_PKTHDR in place. Given where this function
is called from that's not supposed to happen, but I'm growing unconfident
about our mbuf code.
 1.172.6.2  05-Apr-2018  martin Pull up following revision(s) (requested by maxv in ticket #695):

sys/kern/uipc_mbuf.c: revision 1.182
sys/netinet6/frag6.c: revision 1.67
sys/netinet/ip_reass.c: revision 1.14
sys/sys/mbuf.h: revision 1.179

Remove M_PKTHDR from secondary mbufs when reassembling packets.

This is a real problem, because I found at least one component that relies
on the fact that only the first mbuf has M_PKTHDR: far from here, in
m_splithdr, we don't update m->m_pkthdr.len if M_PKTHDR is found in a
secondary mbuf. (The initial intention there was to avoid updating
m_pkthdr.len twice, the assumption was that if M_PKTHDR is set then we're
dealing with the first mbuf.) Therefore, when handling fragmented IPsec
packets (in particular IPv6, IPv4 is a bit more complicated), we may end
up with an incorrect m_pkthdr.len after authentication or decryption. In
the case of ESP, this can lead to a remote crash on this instruction:
m_copydata(m, m->m_pkthdr.len - 3, 3, lastthree);
m_pkthdr.len is bigger than the actual mbuf chain.

It seems possible to me to trigger this bug even if you don't have the ESP
key, because the fragmentation part is outside of the encrypted ESP
payload.

So if you MITM the target, and intercept an incoming ESP packet (which you
can't decrypt), you should be able to forge a new specially-crafted,
fragmented packet and stuff the ESP payload (still encrypted, as you
intercepted it) into it. The decryption succeeds and the target crashes.
 1.172.6.1  27-Feb-2018  martin Pull up following revision(s) (requested by mrg in ticket #593):
sys/dev/marvell/mvxpsec.c: revision 1.2
sys/arch/m68k/m68k/pmap_motorola.c: revision 1.70
sys/opencrypto/crypto.c: revision 1.102
sys/arch/sparc64/sparc64/pmap.c: revision 1.308
sys/ufs/chfs/chfs_malloc.c: revision 1.5
sys/arch/powerpc/oea/pmap.c: revision 1.95
sys/sys/pool.h: revision 1.80,1.82
sys/kern/subr_pool.c: revision 1.209-1.216,1.219-1.220
sys/arch/alpha/alpha/pmap.c: revision 1.262
sys/kern/uipc_mbuf.c: revision 1.173
sys/uvm/uvm_fault.c: revision 1.202
sys/sys/mbuf.h: revision 1.172
sys/kern/subr_extent.c: revision 1.86
sys/arch/x86/x86/pmap.c: revision 1.266 (via patch)
sys/dev/dtv/dtv_scatter.c: revision 1.4

Allow only one pending call to a pool's backing allocator at a time.
Candidate fix for problems with hanging after kva fragmentation related
to PR kern/45718.

Proposed on tech-kern:
https://mail-index.NetBSD.org/tech-kern/2017/10/23/msg022472.html
Tested by bouyer@ on i386.

This makes one small change to the semantics of pool_prime and
pool_setlowat: they may fail with EWOULDBLOCK instead of ENOMEM, if
there is a pending call to the backing allocator in another thread but
we are not actually out of memory. That is unlikely because nearly
always these are used during initialization, when the pool is not in
use.

Define the new flag too for previous commit.

pool_grow can now fail even when sleeping is ok. Catch this case in pool_get
and retry.

Assert that pool_get failure happens only with PR_NOWAIT.
This would have caught the mistake I made last week leading to null
pointer dereferences all over the place, a mistake which I evidently
poorly scheduled alongside maxv's change to the panic message on x86
for null pointer dereferences.

Since pr_lock is now used to wait for two things now (PR_GROWING and
PR_WANTED) we need to loop for the condition we wanted.
make the KASSERTMSG/panic strings consistent as '%s: [%s], __func__, wchan'
Handle the ERESTART case from pool_grow()

don't pass 0 to the pool flags
Guess pool_cache_get(pc, 0) means PR_WAITOK here.
Earlier on in the same context we use kmem_alloc(sz, KM_SLEEP).

use PR_WAITOK everywhere.
use PR_NOWAIT.

Don't use 0 for PR_NOWAIT

use PR_NOWAIT instead of 0

panic ex nihilo -- PR_NOWAITing for zerot

Add assertions that either PR_WAITOK or PR_NOWAIT are set.
- fix an assert; we can reach there if we are nowait or limitfail.
- when priming the pool and failing with ERESTART, don't decrement the number
of pages; this avoids the issue of returning an ERESTART when we get to 0,
and is more correct.
- simplify the pool_grow code, and don't wakeup things if we ENOMEM.

In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. This implements the requirement that
pmap_enter(PMAP_CANFAIL) must not fail when replacing an existing
mapping with the first mapping of a new page, which is an unintended
consequence of the changes from the rmind-uvmplock branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706, as well as the failing assertion
about "uvm_page_locked_p(old_pg)". (but only on x86, various other platforms
will need their own changes to handle this issue.)
In uvm_fault_upper_enter(), if pmap_enter(PMAP_CANFAIL) fails, assert that
the pmap did not leave around a now-stale pmap mapping for an old page.
If such a pmap mapping still existed after we unlocked the vm_map,
the UVM code would not know later that it would need to lock the
lower layer object while calling the pmap to remove or replace that
stale pmap mapping. See PR 52706 for further details.
hopefully workaround the irregularly "fork fails in init" problem.
if a pool is growing, and the grower is PR_NOWAIT, mark this.
if another caller wants to grow the pool and is also PR_NOWAIT,
busy-wait for the original caller, which should either succeed
or hard-fail fairly quickly.

implement the busy-wait by unlocking and relocking this pools
mutex and returning ERESTART. other methods (such as having
the caller do this) were significantly more code and this hack
is fairly localised.
ok chs@ riastradh@

Don't release the lock in the PR_NOWAIT allocation. Move flags setting
after the acquiring the mutex. (from Tobias Nygren)
apply the change from arch/x86/x86/pmap.c rev. 1.266 commitid vZRjvmxG7YTHLOfA:

In pmap_enter_ma(), only try to allocate pves if we might need them,
and even if that fails, only fail the operation if we later discover
that we really do need them. If we are replacing an existing mapping,
reuse the pv structure where possible.

This implements the requirement that pmap_enter(PMAP_CANFAIL) must not fail
when replacing an existing mapping with the first mapping of a new page,
which is an unintended consequence of the changes from the rmind-uvmplock
branch in 2011.

The problem arises when pmap_enter(PMAP_CANFAIL) is used to replace an existing
pmap mapping with a mapping of a different page (eg. to resolve a copy-on-write).
If that fails and leaves the old pmap entry in place, then UVM won't hold
the right locks when it eventually retries. This entanglement of the UVM and
pmap locking was done in rmind-uvmplock in order to improve performance,
but it also means that the UVM state and pmap state need to be kept in sync
more than they did before. It would be possible to handle this in the UVM code
instead of in the pmap code, but these pmap changes improve the handling of
low memory situations in general, and handling this in UVM would be clunky,
so this seemed like the better way to go.

This somewhat indirectly fixes PR 52706 on the remaining platforms where
this problem existed.
 1.181.2.12  18-Jan-2019  pgoyette Synch with HEAD
 1.181.2.11  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.181.2.10  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.181.2.9  20-Oct-2018  pgoyette Sync with head
 1.181.2.8  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.181.2.7  28-Jul-2018  pgoyette Sync with HEAD
 1.181.2.6  21-May-2018  pgoyette Sync with HEAD
 1.181.2.5  02-May-2018  pgoyette Synch with HEAD
 1.181.2.4  22-Apr-2018  pgoyette Sync with HEAD
 1.181.2.3  16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.181.2.2  22-Mar-2018  pgoyette Synch with HEAD, resolve conflicts
 1.181.2.1  15-Mar-2018  pgoyette Synch with HEAD
 1.215.2.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.215.2.1  10-Jun-2019  christos Sync with HEAD
 1.232.4.3  27-Nov-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1768):
sys/kern/uipc_mbuf.c: revision 1.252

mbuf: avoid assertion failure when splitting mbuf cluster

From OpenBSD:
commit 7b4d35e0a60ba1dd4daf4b1c2932020a22463a89
Author: bluhm <bluhm@openbsd.org>
Date: Fri Oct 20 16:25:15 2023 +0000
Avoid assertion failure when splitting mbuf cluster.
m_split() calls m_align() to initialize the data pointer of newly
allocated mbuf. If the new mbuf will be converted to a cluster,
this is not necessary. If additionally the new mbuf is larger than
MLEN, this can lead to a panic.
Only call m_align() when a valid m_data is needed. This is the
case if we do not refecence the existing cluster, but memcpy() the
data into the new mbuf.
Reported-by: syzbot+0e6817f5877926f0e96a@syzkaller.appspotmail.com
OK claudio@ deraadt@

The issue is harmless if DIAGNOSTIC is not enabled.
 1.232.4.2  25-Oct-2021  martin Pull up following revision(s) (requested by msaitoh in ticket #1368):

sys/conf/files: revision 1.1288
sys/kern/uipc_mbuf.c: revision 1.244
share/man/man4/options.4: revision 1.520

Fix a bug that NMBCLUSTERS(kern.mbuf.nmbclusters) can't be changed by sysctl.

Update the description of the NMBCLUSTERS. Add NMBCLUSTERS_MAX.

defparam NMBCLUSTERS_MAX.
 1.232.4.1  11-Aug-2020  martin Pull up following revision(s) (requested by mrg in ticket #1045):

sys/kern/uipc_mbuf.c: revision 1.235
sys/dev/ic/dwc_gmac.c: revision 1.70
sys/dev/ic/dwc_gmac_reg.h: revision 1.20
sys/dev/ic/dwc_gmac.c: revision 1.66
sys/dev/ic/dwc_gmac.c: revision 1.67
sys/dev/ic/dwc_gmac.c: revision 1.68

awge: fix issue that caused rx packets to be corrupt with DIAGNOSTIC kernel

It seems the hardware can only reliably do rx DMA to addresses that are
dcache size aligned. This is hinted at by some GMAC data sheets but hard to
find an authoritative source.

on non-DIAGNOSTIC kernels we always implicitly get MCLBYTES-aligned mbuf
data pointers, but with the reintroduction of POOL_REDZONE for DIAGNOSTIC
we can get 8-byte alignment due to redzone padding. So align rx pointers to
64 bytes which should be good for both arm32 and aarch64.
While here change some bus_dmamap_load() to bus_dmamap_load_mbuf() and add
one missing bus_dmamap_sync(). Also fixes the code to not assume that
MCLBYTES == AWGE_MAX_PACKET. User may override MCLSHIFT in kernel config.
correct pointer arithmetics

mcl_cache: align items to COHERENCY_UNIT

Because we do cache incoherent DMA to/from mbufs we cannot safely share
share cache lines with adjacent items that may be concurrently accessed.

awge: drop redundant m_adj(). Handled via uipc_mbuf.c r1.235 instead.

Mask all the MMC counter interrupts if the MMC module is present.
 1.237.2.1  25-Apr-2020  bouyer Sync with bouyer-xenpvh-base2 (HEAD)
 1.241.2.1  03-Apr-2021  thorpej Sync with HEAD.
 1.247.2.2  20-Sep-2024  martin Pull up following revision(s) (requested by rin in ticket #882):

sys/kern/uipc_mbuf.c: revision 1.250
sys/kern/uipc_mbuf.c: revision 1.249

mbuf(9): Sprinkle KASSERTMSG.
No functional change intended.

0x%p -> %p in KASSERTMSGs
 1.247.2.1  27-Nov-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #475):
sys/kern/uipc_mbuf.c: revision 1.252

mbuf: avoid assertion failure when splitting mbuf cluster

From OpenBSD:
commit 7b4d35e0a60ba1dd4daf4b1c2932020a22463a89
Author: bluhm <bluhm@openbsd.org>
Date: Fri Oct 20 16:25:15 2023 +0000
Avoid assertion failure when splitting mbuf cluster.
m_split() calls m_align() to initialize the data pointer of newly
allocated mbuf. If the new mbuf will be converted to a cluster,
this is not necessary. If additionally the new mbuf is larger than
MLEN, this can lead to a panic.
Only call m_align() when a valid m_data is needed. This is the
case if we do not refecence the existing cluster, but memcpy() the
data into the new mbuf.
Reported-by: syzbot+0e6817f5877926f0e96a@syzkaller.appspotmail.com
OK claudio@ deraadt@

The issue is harmless if DIAGNOSTIC is not enabled.

RSS XML Feed