Home | History | Annotate | Download | only in net
History log of /src/sys/net/bpf.c
RevisionDateAuthorComments
 1.258  20-Oct-2024  mlelstv MBUFTRACE
 1.257  19-Aug-2024  ozaki-r bpf: protect selnotify and selrecord with bd_buf_mtx

We have to make updates and checks of buffers and calls of
selnotify/selrecord atomic to satisfy constraints of sel* API.

Also, bd_state and bd_cv are protected by bd_buf_mtx now.

Fix issue #3 of PR#58596

Part of the fix is inspired by riastradh's patch.
 1.256  19-Aug-2024  ozaki-r bpf: restore wakeup softint

This change fixes the issue that fownsignal which can take an
adaptive mutex is called inside a pserialize read section in
bpf_deliver.

Fix issue #4 (only the latter of two) in PR#58596
 1.255  15-Aug-2024  riastradh bpf(4): KNF whitespace fixes. No functional change intended.

Preparation for:

kern/58596: bpf(4) MP-safety issues
 1.254  15-Aug-2024  riastradh bpf(4): Sort includes. No functional change intended.

Preparation for:

kern/58596: bpf(4) MP-safety issues
 1.253  15-Aug-2024  rin bpf: Mark bpfread_filtops FILTEROP_MPSAFE

Fix deadlock for non-NET_MPSAFE kernel, reported as
PR kern/58531 (thanks manu@ for test).

I've confirmed that there is no new regression for ATF with
any combination of -HEAD/netbsd-10 and default/NET_MPSAFE
rump kernels (aarch64).

Although, some problems have been reported on MP-safety for
bpf(4), PR kern/58596. But, it should take some time to fix.
At the moment, commit this part in advance.

OK ozaki-r@
 1.252  31-Jul-2023  christos Don't call versioned stuff "old". Follow the naming convention for versioning
and name them after the last version of the OS they appeared on.
 1.251  08-Feb-2023  gutteridge bpf.c: support loopback writes when BIOCSHDRCMPLT is set

Following changes in r. 1.249 "bpf: support sending packets on loopback
interfaces", also allow for this to succeed when the "header complete"
flag is set, which is the practice of some tools, e.g., tcpreplay and
Scapy. With this change, both of those example tools now work, e.g.,
Scapy passes "L3bpfSocket - send and sniff on loopback" in its test
suite.

There are several ways of addressing this issue; this commit is
intended to be the most conservative and consistent with the previous
changes. (E.g., FreeBSD instead has special handling of this condition
in its if_loop.c.)
 1.250  07-Feb-2023  gutteridge bpf.c: fix a few typos and grammatical issues in comments
 1.249  30-Nov-2022  ozaki-r branches: 1.249.2;
bpf: support sending packets on loopback interfaces

Previously sending packets on a loopback interface via bpf failed
because the packets are treated as AF_UNSPEC by bpf and the loopback
interface couldn't handle such packets.

This fix enables user programs to prepend a protocol family (AF_INET or
AF_INET6) to a payload. bpf interprets it and treats a packet as so,
not just AF_UNSPEC. The protocol family is encoded as 4 bytes, host byte
order as per DLT_NULL in the specification(*).

(*) https://www.tcpdump.org/linktypes.html

Proposed on tech-net and tech-kern
 1.248  19-Nov-2022  yamt bpf: refresh bd_pid in a few more places as well

This made "netstat -B" show hostapd and wpa_supplicant for me.

kingcrab# netstat -B
Active BPF peers
PID Int Recv Drop Capt Flags Bufsize Comm
433 urtwn0 102 0 2 I-RSH 524288 hostapd
211 urtwn0 102 0 4 I-RS- 32768 dhcpd
670 bwfm0 295 0 2 I-RSH 524288 wpa_supplicant
kingcrab#
 1.247  03-Sep-2022  riastradh bpf(4): Reject bogus timeout values before arithmetic overflows.

Reported-by: syzbot+fbd86bdf579944b64a98@syzkaller.appspotmail.com
https://syzkaller.appspot.com/bug?id=60d46fd4863952897cbf67c6b1bcc8b20ec7bde6

XXX pullup-8
XXX pullup-9
 1.246  15-Mar-2022  riastradh bpf(4): Handle null bf_insn on free.

This is not guaranteed by bpf_setf to be nonnull.

Reported-by: syzbot+de1ec9471dfc2f283dda@syzkaller.appspotmail.com
 1.245  12-Mar-2022  riastradh bpf(4): Nix KM_NOSLEEP and prune dead branch.

https://syzkaller.appspot.com/bug?id=0fa7029d5565d9670a24c364d44bd116c76d7e7f
 1.244  12-Mar-2022  riastradh bpf(4): Clamp read timeout to INT_MAX ticks to avoid overflow.

Reported-by: syzbot+c543d35064d3492b9091@syzkaller.appspotmail.com
 1.243  26-Sep-2021  thorpej Change the kqueue filterops::f_isfd field to filterops::f_flags, and
define a flag FILTEROP_ISFD that has the meaning of the prior f_isfd.
Field and flag name aligned with OpenBSD.

This does not constitute a functional or ABI change, as the field location
and size, and the value placed in that field, are the same as the previous
code, but we're bumping __NetBSD_Version__ so 3rd-party module source code
can adapt, as needed.

NetBSD 9.99.89
 1.242  16-Sep-2021  andvar fix typos in word "successful".
 1.241  14-Jul-2021  yamaguchi unset IFF_PROMISC at bpf_detach()

Doing "d->bd_promisc = 0" is that bpf_detach() does not call
ifpromisc(ifp, 0). Currently, there is no reason for
this behavior so that it is removed.
In addition to the change, the workaround for it in vlan(4)
is also removed.
 1.240  09-Jun-2021  martin Add a bpf_register_track_event() function (and deregister equivalent)
that allows a driver to track listeners attaching/detaching from tap
points.

This is usefull for drivers that would have to do extra work for some
taps and can not easily decide (at the driver level) if the work would
be needed further up the stack.

An example is providing radiotap headers for IEEE 802.11 frames.
 1.239  18-Dec-2020  thorpej branches: 1.239.4;
Use sel{record,remove}_knote().
 1.238  02-Aug-2020  maxv branches: 1.238.2;
Use a more informative panic message.
 1.237  11-Jun-2020  roy bpf(4): Add ioctls BIOCSETWF and BIOCLOCK

Once BIOCLOCK is executed, the device becomes locked which prevents the
execution of ioctl(2) commands which can change the underlying parameters
of the bpf(4) device. An example might be the setting of bpf(4) filter
programs or attaching to different network interfaces.

BIOCSETWF can be used to set write filters for outgoing packets.
Currently if a bpf(4) consumer is compromised, the bpf(4) descriptor can
essentially be used as a raw socket, regardless of consumer's UID.
Write filters give users the ability to constrain which packets can be sent
through the bpf(4) descriptor.

Taken from OpenBSD.
 1.236  16-Mar-2020  pgoyette Use the module subsystem's ability to process SYSCTL_SETUP() entries to
automate installation of sysctl nodes.

Note that there are still a number of device and pseudo-device modules
that create entries tied to individual device units, rather than to the
module itself. These are not changed.
 1.235  07-Feb-2020  thorpej Use percpu_foreach_xcall() to gather volatile per-cpu counters. These
must be serialized against the interrupts / soft-interrupts in which
they're manipulated, as well as protected from non-atomic 64-bit memory
loads on 32-bit platforms.
 1.234  01-Feb-2020  riastradh Fix wrong memory order and switch bpf to atomic_load/store_*.
 1.233  19-Jan-2020  thorpej Stop including strip.h (it's no longer generated).
 1.232  29-Nov-2019  ryo branches: 1.232.2;
bpf can send a packet greater than MCLBYTES (JumboFrame) using multiple mbuf.
 1.231  13-Sep-2019  maxv As I suspected, the KASSERT I added yesterday can fire if we try to process
zero-sized packets. Skip them to prevent a type confusion that can trigger
random page faults later.

Reported-by: syzbot+3e447ebdcb2bcfa402ac@syzkaller.appspotmail.com
 1.230  12-Sep-2019  maxv Add KASSERT to catch bugs. Something tells me it could easily fire.
 1.229  10-Jul-2019  maxv branches: 1.229.2;
Fix info leak: use kmem_zalloc, because we align the buffers, and the
otherwise uninitialized padding bytes get copied to userland in bpf_read().
 1.228  03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.227  25-Jul-2018  msaitoh Initialize some members in a mbuf which is on stack.
 1.226  26-Jun-2018  msaitoh branches: 1.226.2;
Implement the BPF direction filter (BIOC[GS]DIRECTION). It provides backward
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.

This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
 1.225  25-Jun-2018  msaitoh Removal of bpf_tap().
 1.224  14-May-2018  ozaki-r Protect packet input routines with KERNEL_LOCK and splsoftnet

if_input, i.e, ether_input and friends, now runs in softint without any
protections. It's ok for ether_input itself because it's already MP-safe,
however, subsequent routines called from it such as carp_input and agr_input
aren't safe because they're not MP-safe. Protect if_input with KERNEL_LOCK.

if_input can be called from a normal LWP context. In that case we need to
prevent interrupts (softint) from running by splsoftnet to protect non-MP-safe
codes (e.g., carp_input and agr_input).

Pointed out by mlelstv@
 1.223  25-Jan-2018  ozaki-r branches: 1.223.2;
Abandon unnecessary softint

The softint was introduced to defer fownsignal that was called in bpf_wakeup to
softint at v1.139, but now bpf_wakeup always runs in softint so we don't need
the softint anymore.
 1.222  15-Dec-2017  ozaki-r Make softint and callout MP-safe
 1.221  12-Dec-2017  ozaki-r Fix panic in callout_halt (fix typo)

Reported by wiz@
 1.220  30-Nov-2017  christos add fo_name so we can identify the fileops in a simple way.
 1.219  17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.218  25-Oct-2017  maya Use C99 initializer for filterops

Mostly done with spatch with touchups for indentation

@@
expression a;
identifier b,c,d;
identifier p;
@@
const struct filterops p =
- { a, b, c, d
+ {
+ .f_isfd = a,
+ .f_attach = b,
+ .f_detach = c,
+ .f_event = d,
};
 1.217  19-Oct-2017  ozaki-r Turn on D_MPSAFE flag of bpf_cdevsw that is already MP-safe

Pointed out by k-goda@IIJ
 1.216  20-Feb-2017  ozaki-r branches: 1.216.4; 1.216.6;
Reinit a pslist entry before inserting it to a pslist again

Fix PR kern/51984
Tested by nonaka@
 1.215  19-Feb-2017  christos typo
 1.214  13-Feb-2017  ozaki-r Update comments to reflect bpf MP-ification
 1.213  09-Feb-2017  ozaki-r Make bpf MP-safe

By the change, bpf_mtap can run without any locks as long as its bpf filter
doesn't match a target packet. Pushing data to a bpf buffer still needs
a lock. Removing the lock requires big changes and it's a future work.

Another known issue is that we need to remain some obsolete variables to
avoid breaking kvm(3) users such as netstat and fstat. One problem for
MP-ification is that in order to keep statistic counters of bpf_d we need
to use atomic operations for them. Once we retire the kvm(3) users, we
should make the counters per-CPU and remove the atomic operations.
 1.212  01-Feb-2017  ozaki-r Reduce return points
 1.211  01-Feb-2017  ozaki-r Kill tsleep/wakeup and use cv
 1.210  01-Feb-2017  ozaki-r Make bpf_gstats percpu
 1.209  01-Feb-2017  ozaki-r Use pslist(9) instead of queue(9) for psz/psref

As usual some member variables of struct bpf_d and bpf_if remain to avoid
breaking kvm(3) users (netstat and fstat).
 1.208  01-Feb-2017  ozaki-r Use kmem(9) instead of malloc/free
 1.207  01-Feb-2017  ozaki-r Make global variables static
 1.206  25-Jan-2017  ozaki-r Use bpf_ops for bpf_mtap_softint

By doing so we don't need to care whether a kernel enables bpfilter or not.
 1.205  24-Jan-2017  ozaki-r Defer bpf_mtap in Rx interrupt context to softint

bpf_mtap of some drivers is still called in hardware interrupt context.
We want to run them in softint as well as bpf_mtap of most drivers
(see if_percpuq_softint and if_input).

To this end, bpf_mtap_softint mechanism is implemented; it defers
bpf_mtap processing to a dedicated softint for a target driver.
By using the machanism, we can move bpf_mtap processing to softint
without changing target drivers much while it adds some overhead
on CPU and memory. Once target drivers are changed to softint-based,
we should return to normal bpf_mtap.

Proposed on tech-kern and tech-net
 1.204  23-Jan-2017  ozaki-r Make bpf_setf static
 1.203  19-Jul-2016  pgoyette branches: 1.203.2;
Fix regression introduced in tests/net/bpf and tests/net/bpfilter

The rump code needs to call devsw_attach() in order to assign a dev_major
for bpf; it then uses this to create rumps /dev/bpf node. Unfortunately,
this leaves the devsw attached, so when the bpf module tries to initialize
itself, it gets an EEXIST error and fails.

So, once rump has figured what the dev_major should be, call devsw_detach()
to remove the devsw. Then, when the module initialization code calls
devsw_attach() it will succeed.
 1.202  17-Jul-2016  pgoyette Now that we're only calling devsw_attach() in the modular driver, it
is not ok for the driver/module to already exist. So don't ignore
EEXIST.
 1.201  17-Jul-2016  pgoyette Don't initialize variables that no longer exist in built-in module.
 1.200  17-Jul-2016  pgoyette Don't try to call devsw_attach() for built-in driver code.
 1.199  20-Jun-2016  knakahara branches: 1.199.2;
apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.198  10-Jun-2016  ozaki-r Avoid storing a pointer of an interface in a mbuf

Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.

The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.

The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.

Proposed on tech-kern and tech-net.
 1.197  10-Jun-2016  ozaki-r Introduce m_set_rcvif and m_reset_rcvif

The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.

No functional change.
 1.196  07-Jun-2016  pgoyette Create separate modules for i2c_bitbang and bpf_filter so these files
can be included in kernels which need them without also duplicating
them in other modules. Removes the duplicate symbols I found which
prevented loading i2c and bpf modules after having fixed PR 45125.
 1.195  09-Feb-2016  ozaki-r Introduce softint-based if_input

This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.

This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.

To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.

Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html

Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
 1.194  01-Feb-2016  christos Do less work under the kernel lock, otherwise dhcpcd aborting causes us
to deadlock.
 1.193  16-Dec-2015  christos don't free mbuf twice.
XXX: pullup 7.
 1.192  14-Oct-2015  christos PR/49386: Ryota Ozaki: Add a mutex for bpf creation/removal to avoid races.
Add M_CANFAIL to malloc.
 1.191  30-May-2015  joerg Improve wording.
 1.190  29-Dec-2014  ozaki-r Remove unnecessary variable bc
 1.189  13-Sep-2014  rmind branches: 1.189.2;
PR/49190: bpf_deliver: set scratch memory store in bpf_args_t.
 1.188  05-Sep-2014  matt Try not to use f_data, use f_{vnode,socket,pipe,mqueue,kqueue,ksem} to get
a correctly typed pointer.
 1.187  07-Aug-2014  ozaki-r branches: 1.187.2;
Use NULL instead of 0 for pointers
 1.186  28-Jul-2014  alnsn Enable net.bpf.jit only if MODULAR and BPFJIT. Tweak a warning about postponed
jit activation.
 1.185  25-Jul-2014  dholland Add d_discard to all struct cdevsw instances I could find.

All have been set to "nodiscard"; some should get a real implementation.
 1.184  10-Jul-2014  christos initialize args the same way we do in filter.
 1.183  24-Jun-2014  alnsn Implement copfuncs and external memory in bpfjit.
 1.182  16-Mar-2014  dholland branches: 1.182.2;
Change (mostly mechanically) every cdevsw/bdevsw I can find to use
designated initializers.

I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
 1.181  25-Feb-2014  pooka Ensure that the top level sysctl nodes (kern, vfs, net, ...) exist before
the sysctl link sets are processed, and remove redundancy.

Shaves >13kB off of an amd64 GENERIC, not to mention >1k duplicate
lines of code.
 1.180  05-Dec-2013  christos It is silly to kill the system when an interface failed to clear promiscuous
mode. Some return EINVAL when they are dying, but others like USB return EIO.
Downgrade to a DIAGNOSTIC printf. Same should be done for the malloc/NOWAIT,
but this is rarely hit.
 1.179  16-Nov-2013  rmind bpf_deliver: convert to bpf_filter_ext().
 1.178  15-Nov-2013  rmind - Add bpf_args_t and convert bpf_filter_ext() to use it. This allows the
caller to initialise (and re-use) the memory store.
- Add bpf_jit_generate() and bpf_jit_freecode() wrappers.
 1.177  18-Sep-2013  rmind Add bpf_filter_ext() to use with BPF COP, restore bpf_filter() as it was
originally to preserve compatibility. Similarly, add bpf_validate_ext()
which takes bpf_ctx_t.
 1.176  09-Sep-2013  christos PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
XXX: Pullup -6
 1.175  30-Aug-2013  rmind bpf_filter: add a custom argument which can be passed to coprocessor routine.
 1.174  29-Aug-2013  rmind Implement BPF_COP/BPF_COPX instructions in the misc category (BPF_MISC)
which add a capability to call external functions in a predetermined way.

It can be thought as a BPF "coprocessor" -- a generic mechanism to offload
more complex packet inspection operations. There is no default coprocessor
and this functionality is not targeted to the /dev/bpf. This is primarily
targeted to the kernel subsystems, therefore there is no way to set a custom
coprocessor at the userlevel.

Discussed on: tech-net@
OK: core@
 1.173  27-Oct-2012  alnsn branches: 1.173.2;
Add bpfjit and enable it for amd64.
 1.172  27-Sep-2012  alnsn Remove bpf_jit which was ported from FreeBSD recently.

It will soon be replaced with the new bpfjit kernel module.
 1.171  15-Aug-2012  alnsn branches: 1.171.2;
Fix two bugs introduced by recent commit.

- When handling contiguous buffer in _bpf_tap(), pass its real size
rather than 0 to avoid reading packet data as mbuf struct on
out-of-bounds loads.
- Correctly pass pktlen and buflen arguments from bpf_deliver() to
bpf_filter() to avoid reading mbuf struct as packet data.
JIT case is still broken.

Also, test pointers againts NULL.
 1.170  02-Aug-2012  rmind Build fix for some ports.
 1.169  01-Aug-2012  rmind Add BPF JIT compiler, currently supporting amd64 and i386. Code obtained
from FreeBSD. Also, make few BPF fixes and simplifications while here.
Note that bpf_jit_enable is false for now.

OK dyoung@, some feedback from matt@
 1.168  16-Dec-2011  christos branches: 1.168.2; 1.168.6; 1.168.8;
make comment reflect reality
 1.167  15-Dec-2011  christos don't leak mbufs.
 1.166  30-Aug-2011  bouyer branches: 1.166.2; 1.166.6;
Provide netbsd32 compat for bpf. Beside the ioctls, the structure
returned to userland by read(2) also needs to be converted.
For this, the bpf descriptor is flagged as compat32 (or not) in the
open and ioctl functions (where the user process's pid is also updated
in the descriptor). When the bpf buffer is filled in, the 32bits or native
header is used depending on the information stored in the descriptor.

This won't work if a 64bit binary does the open and ioctls, and then
exec a 32bit program which will do the read. But this is very
unlikely to happen in real life ...

Tested on i386 and loongson; with these changes my loongson can run
dhclient and tcpdump with a n32 userland.
 1.165  10-Jun-2011  christos setting things once is enough.
 1.164  30-Mar-2011  christos branches: 1.164.2;
lib/44807: something broken in stat(2), return that we are a character
device in st_mode.
 1.163  30-Mar-2011  bouyer Allocate buffers with (M_WAITOK | M_CANFAIL) instead of M_NOWAIT.
M_NOWAIT cause dhcpd on a low-memory server with lots of interfaces to
occasionally fail to start with ENOBUFS; (M_WAITOK | M_CANFAIL) seems to
fix this.
Tested on 3 different dhcp servers.
 1.162  22-Jan-2011  christos undo previous. Read the diff wrong.
 1.161  22-Jan-2011  christos fix comment
 1.160  02-Jan-2011  christos branches: 1.160.2; 1.160.4;
kern/44310: Alexander Nasonov: write to /dev/bpf truncates size_t to int
 1.159  08-Dec-2010  pooka linkset no more
 1.158  14-Apr-2010  pooka Add a little comment on how bpf can be made unloadable, per pointer from ad.
 1.157  05-Apr-2010  joerg Push the bpf_ops usage back into bpf.h. Push the common ifp->if_bpf
check into the inline functions as well the fourth argument for
bpf_attach.
 1.156  13-Mar-2010  christos branches: 1.156.2;
add BIOC{G,S}FEEDBACK which allows one to receive injected outgoing packets
via bpf.
 1.155  26-Jan-2010  pooka branches: 1.155.2;
Include sys/atomic.h now that it's used but gets stealth-included
only on some archs.
 1.154  25-Jan-2010  pooka Make bpf dynamically loadable.
 1.153  19-Jan-2010  pooka Redefine bpf linkage through an always present op vector, i.e.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.

Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.

Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
 1.152  17-Jan-2010  pooka Forward declare struct bpf_if and use that as the type for bpf_if
instead of "void *". Buys us oo times the type-safety for 0 times
the price.
(no functional change)
 1.151  15-Jan-2010  pooka * remove just-for-kicks locking
* KNF
* remove outdated comment (quite a funny one to read in 2010, though)
 1.150  20-Dec-2009  dsl If a multithreaded app closes an fd while another thread is blocked in
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
 1.149  09-Dec-2009  dsl Rename fo_drain() to fo_abort(), 'drain' is used to mean 'wait for output
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
 1.148  23-Nov-2009  rmind Remove some unecessary includes sys/user.h header.
 1.147  05-Oct-2009  christos add the error from ifpromisc to the panic.
 1.146  11-Apr-2009  christos Fix locking as Andy explained. Also fill in uid and gid like sys_pipe did.
 1.145  11-Apr-2009  christos Fix PR/37878 and PR/37550: Provide stat(2) for all devices and don't use
fbadop_stat.
 1.144  04-Apr-2009  ad Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.

Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.

thr0 accept(fd, ...)
thr1 close(fd)
 1.143  11-Mar-2009  mrg like KERN_FILE2: *do* update "needed" when there is no count. we want
userland to know what sort of size to provide..

while here, slightly normalise the previous to init_sysctl.c.
 1.142  11-Jan-2009  christos branches: 1.142.2;
merge christos-time_t
 1.141  15-Jun-2008  christos branches: 1.141.4; 1.141.6;
- add if_alloc (ours just mallocs), and if_initname and use them (from FreeBSD)
- kill memsets where M_ZERO can be used.
 1.140  21-May-2008  ad branches: 1.140.2;
Acquire kernel_lock in the bpf fileops.
 1.139  24-Apr-2008  ad branches: 1.139.2; 1.139.4;
Network protocol interrupts can now block on locks, so merge the globals
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:

- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.

- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.

- The system spends less time at IPL_SCHED, and there is less lock activity.
 1.138  20-Apr-2008  scw Pull in a couple of fixes from FreeBSD, the first of which addresses a
failure of wpa_supplicant(8) to re-key promptly, as reported in
http://mail-index.netbsd.org/tech-net/2008/04/18/msg000459.html

- Make bpf's read timeout work more correctly with select/poll.

- A fix for catchpacket() which delays calling bpf_wakeup() until
the state has been updated.
 1.137  26-Mar-2008  christos branches: 1.137.2; 1.137.4;
- put const back, no reason to modify the prototype.
1. Please don't cast function pointers to (void *), use the full function
prototype cast; this is for archs where a function pointer is not a regular
pointer.
2. Compare pointers to NULL not 0.
 1.136  24-Mar-2008  yamt merge yamt-lazymbuf branch.
 1.135  21-Mar-2008  ad Catch up with descriptor handling changes. See kern_descrip.c revision
1.173 for details.
 1.134  01-Mar-2008  rmind Welcome to 4.99.55:

- Add a lot of missing selinit() and seldestroy() calls.

- Merge selwakeup() and selnotify() calls into a single selnotify().

- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.

Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
 1.133  20-Feb-2008  matt branches: 1.133.2; 1.133.6;
s/u_\(int[0-9]*_t\)/u\1/g
(change u_int*_t to uint*_t)
 1.132  20-Dec-2007  dyoung Use LIST_FOREACH().
 1.131  05-Dec-2007  pooka branches: 1.131.4;
Do not "return 1" from kqfilter for errors. That value is passed
directly to the userland caller and results in a mysterious EPERM.
Instead, return EINVAL or something else sensible depending on the
case.
 1.130  11-Jul-2007  xtraeme branches: 1.130.6; 1.130.8; 1.130.14; 1.130.16;
Replace a simple lock with a mutex and make it static (as it's only used
on this file). Ok by ad@.
 1.129  09-Jul-2007  ad Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.128  30-May-2007  christos Move the nasty ifdefs in one place. Requested by ad and dyoung.
 1.127  29-May-2007  christos Add a sockaddr_storage member to "struct ifreq" maintaining backwards
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
 1.126  04-Mar-2007  christos branches: 1.126.2; 1.126.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.125  16-Nov-2006  christos branches: 1.125.4;
__unused removal on arguments; approved by core.
 1.124  25-Oct-2006  elad Kill some KAUTH_GENERIC_ISSUSER uses.
 1.123  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.122  28-Aug-2006  christos branches: 1.122.2; 1.122.4;
add missing initializer
 1.121  04-Aug-2006  martin Fix typo in comment
 1.120  26-Jul-2006  christos Patch from Dheeraj S, inspired by the following FreeBSD change:

Rather than calling mircotime() in catchpacket(), make catchpacket()
take a timeval indicating when the packet was captured. Move
microtime() to the calling functions and grab the timestamp as soon
as we know that we're going to call catchpacket at least once.

This means that we call microtime() once per matched packet, as
opposed to once per matched packet per bpf listener. It also means
that we return the same timestamp to all bpf listeners, rather than
slightly different ones.

It would be more accurate to call microtime() even earlier for all
packets, as you have to grab (1+#listener) locks before you can
determine if the packet will be logged. You could always grab a
timestamp before the locks, but microtime() can be costly, so this
didn't seem like a good idea.

(I guess most ethernet interfaces will have a bpf listener these
days because of dhclient. That means that we could be doing two bpf
locks on most packets going through the interface.)
 1.119  23-Jul-2006  ad Use the LWP cached credentials where sane.
 1.118  27-Jun-2006  tron Make this build with GCC 4.x.
 1.117  14-May-2006  elad branches: 1.117.4;
integrate kauth.
 1.116  10-May-2006  mrg quell GCC 4.1 uninitialised variable warnings.

XXX: we should audit the tree for which old ones are no longer needed
after getting the older compilers out of the tree..
 1.115  26-Dec-2005  rpaulo branches: 1.115.4; 1.115.6; 1.115.8; 1.115.10; 1.115.12;
Kill BPF_KERN_FILTER. Seems like it died with the new pppd import.
No replies from tech-kern@, but who introduced this option 8 years ago
(Christos) said it's ok to remove it.
 1.114  24-Dec-2005  perry Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.113  14-Dec-2005  rpaulo Correct typo in comments.
 1.112  11-Dec-2005  christos merge ktrace-lwp.
 1.111  05-Sep-2005  rpaulo Use ANSI function declarations everywhere and a consistent indentation on
them.
 1.110  04-Aug-2005  rpaulo Implemented the kernel part of BPF statistics and BPF peers, net.bpf.stats
and net.bpf.peers sysctls respectively.

A new structure was added to describe the external (user viewable)
representation of a BPF file; a new entry was added to the bpf_d
structure to store the PID of the calling process; a simple_lock was added
to protect the insert/removal from the net.bpf.peers sysctl handler.

This idea came from FreeBSD (Christian S.J. Peron) but while it is
implemented with sysctl's it differs a bit.

Reviewed by: christos@ and atatat@ (who gave me the tip for the net.bpf.peers
sysctl helper function).
 1.109  22-Jun-2005  peter branches: 1.109.2;
Missing m_freem() in bpf_write. PR/29138.
 1.108  20-Jun-2005  atatat Change the rest of the sysctl subsystem to use const consistently.
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
 1.107  26-Feb-2005  perry nuke trailing whitespace
 1.106  12-Feb-2005  christos pass the flag to fdclone.
 1.105  30-Nov-2004  christos branches: 1.105.4; 1.105.6;
Clonify bpf. I am not changing /dev/bpfX -> /dev/bpf until all userland
programs have been fixed.
 1.104  19-Aug-2004  christos Factor out the hand-crafting of mbufs from the interface files. Reviewed by
gimpy. XXX: I could have used bpf_mtap2 on some of the new functions, but I
chose not to, because I just wanted to do what amounts to a code move.
 1.103  19-Aug-2004  christos - ansify
- remove unnecessary casts
- change caddr_t to void *
- no functional change.
 1.102  05-Aug-2004  enami Don't refuse to attach an interface even if it is down so that one can
capture the very first packet when an interface is up.
 1.101  06-Jun-2004  dyoung Per Matt Thomas' and Darren Reed's suggestions:

Add bpf_deliver prototype.

Rename bpf_measure to m_length and move it to sys/sys/mbuf.h. I
make m_length an inline function in the header file to preserve
its performance characteristics, for better or for worse.

Optimize m_length: use the length in m_pkthdr.len, if M_PKTHDR.

In bpf_deliver, zero the on-stack mbuf before we do anything else
with it.
 1.100  29-May-2004  darrenr back out previous change - these diffs aren't what I'd tested.
 1.99  29-May-2004  darrenr add mmap(2) interface to bpf(4) devices, along with BIOCMMAPINFO ioctl call
for applications to interact with the bpf device for the purpose of using
mmap to examinen captured data.
 1.98  25-May-2004  atatat Sysctl descriptions under net subtree (net.key not done)
 1.97  19-May-2004  darrenr reapply a change that got undone with more recent changes to bpf to wakeup
any sleepers _after_ the device info has been updated, not before.
 1.96  30-Apr-2004  dyoung Add bpf_mtap2, which taps a packet whose head is in a void *buffer
and whose tail is in an mbuf chain.
 1.95  20-Apr-2004  darrenr If we timeout waiting for data on the bpf device, allow data in the current
storage buffer (bd_sbuf) to indicate that there is data present.
 1.94  15-Apr-2004  darrenr Add a count of the number of packets that match the bpf filter applied to a
particule device. In doing this, make a new the bpf_stat structure with
members that are u_long rather than u_int, matching the counters in the bpf_d.
the original bpf_stat is now bpf_stat_old and so to the original ioctl
is preserved as BIOCGSTATSOLD.
 1.93  14-Apr-2004  darrenr * from bpf 1.2a1, use the IO_NDELAY flag in bpfread() to indicate whether or
not a read operation should be allowed to sleep. This allows the use of
bd_rtout with a value of "-1" to be eliminated (signed comparison and
assignment to an unsigned long.)
* in 1.91, a change was introduced that had bpfpoll() returning POLLRDNORM
set when the timeout expired. This impacted poorly on performance as well
as causing select to return an fd available for reading when it wasn't.
Change the behaviour here to only allow the possibility of POLLIN being
returned as active in the event of a timeout.
 1.92  11-Apr-2004  darrenr from freebsd's kern/36219, the if expression in deciding whether or not
to return something check the value of bd_state in the wrong place.
 1.91  10-Apr-2004  darrenr Fix bpf so that select will return for a timeout (from FreeBSD.)

Fix the behaviour of BIOCIMMEDIATE (fix from LBL BPF code via FreeBSD.)

In bpf_mtap(), optimise the calling of bpf_filter() and catchpacket()
based on whether or not the entire packet is in one mbuf (based on
similar change FreeBSD but fixes BIOC*SEESENT issue with that.)

Copy the implementation of BIOCSSEESENT, BIOCGSEESENT by FreeBSD.

Review Assistance: Guy Harris

PRs: kern/8674, kern/12170
 1.90  24-Mar-2004  atatat branches: 1.90.2;
Tango on sysctl_createv() and flags. The flags have all been renamed,
and sysctl_createv() now uses more arguments.
 1.89  22-Jan-2004  jonathan Make bpf_maxbufsize writable via sysctl, as written by Andrew Brown.
 1.88  21-Jan-2004  jonathan Fix an Emacs finger-glitch (missing semicolon#).
 1.87  21-Jan-2004  jonathan Update bpf buffer parameters, as per recent discussion on tech-net.

Increase the default bpf buffer size used by naive apps that don't do
BIOCSBLEN, from 8k to 32k. The former value of 8192 is too small to
hold a normal jumbo Ethernet frame (circa 9k), 16k is a little small
for Large-jumbo (~16k) frames supported by newer gigabit
Ethernet/10Gbe, so (somewhat arbitrarily) increase the default to 32k.

Increase the upper limit to which BIOSBLEN can raise bpf buffer-size
drastically, to 1 Mbyte. State-of-the-art for packet capture circa
1999 was around 256k; savvy NetBSD developers now use 1 Mbyte.
Note that libpcap has been updated to do binary-search on BIOCSBLEN
values up to 1 Mbyte.

Work is in progress to make both values sysctl'able. Source comments
note that consensus on tech-net is that we should find some heuristic
to set the boot-time default values dynamically, based on system memory.
 1.86  22-Sep-2003  christos - pass signo to fownsignal [ok by jd]
- make urg signal handling use fownsignal
- remove out of band detection in sowakeup
 1.85  21-Sep-2003  jdolecek cleanup & uniform descriptor owner handling:
* introduce fsetown(), fgetown(), fownsignal() - this sets/retrieves/signals
the owner of descriptor, according to appropriate sematics
of TIOCSPGRP/FIOSETOWN/SIOCSPGRP/TIOCGPGRP/FIOGETOWN/SIOCGPGRP ioctl; use
these routines instead of custom code where appropriate
* make every place handling TIOCSPGRP/TIOCGPGRP handle also FIOSETOWN/FIOGETOWN
properly, and remove the translation of FIO[SG]OWN to TIOC[SG]PGRP
in sys_ioctl() & sys_fcntl()
* also remove the socket-specific hack in sys_ioctl()/sys_fcntl() and
pass the ioctls down to soo_ioctl() as any other ioctl

change discussed on tech-kern@
 1.84  13-Aug-2003  wrstuden Include correct file for defopt.
 1.83  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.82  29-Jun-2003  fvdl branches: 1.82.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.81  28-Jun-2003  darrenr From OpenBSD 1.33-1.34:
When using bpf(4) in immediate mode, and using kevent(2) to receive
notification of packet arrival, the usermode application isn't notified
until a second packet arrives.

This is because KNOTE() calls filt_bpfread() before bd_slen has been
updated with the newly arrived packet length, so it looks like there
is no data there.

Moving the bpf_wakeup() call for immediate mode to after bd_slen is set
fixes it.

From: wayne@epipe.com.au in pr 3175
 1.80  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.79  19-Jun-2003  itojun avoid panic in malloc() under extremely low memory situation.
OpenBSD problem report 2235, 2236, 2640. fix by Otto Moerbeek.
 1.78  13-Mar-2003  dsl Check that the process/process group id passed to TIOCSPRP is in the session
of the current process.
 1.77  26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.76  26-Nov-2002  christos si_ -> sel_
 1.75  23-Oct-2002  jdolecek merge kqueue branch into -current

kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals

kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)

based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
 1.74  25-Sep-2002  thorpej Don't include <sys/map.h>.
 1.73  24-Sep-2002  itojun backout recent changes, for PR 18392.
bpf_mtap() gets called with not-well-initialized mbuf, so we need to go through
it without touching m->m_pkthdr.len and such. it's part of our bpf_mtap() API
(at least today).
 1.72  19-Sep-2002  atatat Add a missing semi-colon.
 1.71  19-Sep-2002  darrenr For the trivial case where the packet is only in one mbuf, call bpf_tap()
(idea from FreeBSD) - alternative to changing bpf_filter() to be aware of
kernel calling convetion where 0 is passed as the length for mbufs.
 1.70  19-Sep-2002  darrenr If M_PKTHDR is set we can use m_pkthdr.len instead of the for loop.
 1.69  15-Sep-2002  thorpej In bpf_setdlt(), preserve the promiscuous mode setting of the
descriptor.

From David Young <dyoung@ojctech.com>, slight change by me.
 1.68  11-Sep-2002  itojun KNF - return is not a function.
 1.67  06-Sep-2002  gehenna Merge the gehenna-devsw branch into the trunk.

This merge changes the device switch tables from static array to
dynamically generated by config(8).

- All device switches is defined as a constant structure in device drivers.

- The new grammer ``device-major'' is introduced to ``files''.

device-major <prefix> char <num> [block <num>] [<rules>]

- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.

- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.

- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.

- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.

- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.
 1.66  28-Aug-2002  onoe Define new kernel interface bpfattach2() to register another data link
type for the driver, which will be used for 802.11 drivers.
Also add 2 APIs to get a list of available DLTs and use one for them.
BIOCGDLTLIST (struct bpf_dltlist)
BIOCSDLT (u_int)
 1.65  06-Jun-2002  wrstuden defparam BPF_BUFSIZE
 1.64  23-Mar-2002  darrenr branches: 1.64.2;
If someone is poll'ing to write to bpf, assume that it can always be done
and include POLLOUT and POLLWRNORM in the returned events flag set.
Derived from FreeBSD.
 1.63  12-Nov-2001  lukem add RCSIDs
 1.62  10-Sep-2001  bjh21 Add MI Econet support. This is lacking any interfaces to higher-layer
protocols, and lacking any timeouts, but it basically works, doing four-way
handshakes in both directions and incoming Machine Peek operations.

Oh, and Econet is Acorn's ancient, proprietary 500kbit/s networking
technology.
 1.61  13-Apr-2001  thorpej branches: 1.61.2; 1.61.4;
Remove the use of splimp() from the NetBSD kernel. splnet()
and only splnet() is allowed for the protection of data structures
used by network devices.
 1.60  29-Dec-2000  thorpej branches: 1.60.2;
Fix non-blocking BPF reads, from Guy Harris, kern/11836.
 1.59  12-Dec-2000  thorpej Use <net/dlt.h> to get the DLT_* constants. Also change bpfattach()
and bpf_change_type() to take just a pointer to the ifnet, rather than
a pointer to the ifnet and a pointer to a member of the ifnet (the bpf
pointer).

We'll let this ride on the Dec 12 1.5N version bump.
 1.58  04-Jul-2000  thorpej Move ifpromimsc() to if.c
 1.57  28-May-2000  jhawk branches: 1.57.2;
Ensure that all callers of pfind() can deal with pfind(0) returning
a real procp* rather than NULL.
 1.56  28-May-2000  matt Fix bpf output on fddi to actually work. Make it compatible with ULTRIX
and Tru64.
 1.55  12-May-2000  jonathan branches: 1.55.2;
Make BPF_BUFSIZE overridable: 8192 is smaller than MTU of some devices.
TODO: defopt, or make sysctl'able (c.f. FreeBSD).
 1.54  12-Apr-2000  chs remove support for sunos and ancient BSDs.
 1.53  30-Mar-2000  augustss Kill some more register declarations.
 1.52  13-Mar-2000  soren Fix doubled 'the's in comments.
 1.51  02-Feb-2000  enami Revoke bpf device on detach.
 1.50  02-Feb-2000  enami Since we are allowed to wait, no need to check the return value.
 1.49  02-Feb-2000  enami Remove duplicated forward declarations.
 1.48  31-Jan-2000  thorpej Implement bpfdetach().
 1.47  11-May-1999  thorpej branches: 1.47.2;
* Add the ability to change the data link type on the fly.
* Define two more data link types: NetBSD PPP-over-serial and NetBSD
PPP-over-Ethernet. (Different PPP encaps have different header formats!)
 1.46  04-Dec-1998  bouyer branches: 1.46.2; 1.46.6;
Init the decriptors at boot time rather than at interface attach time.
Now that we have pcmcia hot-plug, it's not the same. Fixes kern/3189.
 1.45  05-Nov-1998  jonathan Increase compiled-in default bpf buffer size from 4096 to 8192.
(the libpcap API provides no way to resize the inkernel buffe,r and
4096 is too small to capture maximum-sized FDDI frames.)
 1.44  18-Aug-1998  thorpej Add some braces to make egcs happy (ambiguous else warning).
 1.43  06-Aug-1998  perry Sigh. "consts in prototypes can be quite a drag..."
fix last two fixes one more time, this time dealing with ugly
prototype issues, including the fact that the bcopy returns nothing,
but memcpy returns a void *. Never mind that we don't use it...
 1.42  06-Aug-1998  perry Fix botched prototype decl in last fix.
 1.41  06-Aug-1998  perry Convert bcopy,bzero to memcpy,memset
This was semi-nontrivial, since a function pointer to bcopy gets used
in this file.
Note #1: The catchpacket routine, which takes a function pointer to
bpf_mcpy or memcpy, should probably be converted to take a
flag that just says which is used, so memcpy can be inlined.
Note #2: The code is heavily #ifdef'ed to run on older operating
systems. We probably want to clean that cruft out, unless
someone is planning a new release of the code at LBL (doubtful.)
 1.40  30-Apr-1998  thorpej Implement two new BPF ioctls: BPFGHDRCMPLT and BPFSHDRCMPLT, to get/set
the "header already complete" flag. This allows BPF writers to spoof
layer 2 source addresses (providing the layer 2 in use supports it) in
applications where this is necessary. From Greg Smith <greg@nas.nasa.gov>.
 1.39  01-Mar-1998  fvdl Merge with Lite2 + local changes
 1.38  12-Oct-1997  mycroft Do *not* free the mbuf chain we just created.
 1.37  09-Oct-1997  christos GC bd_sig
 1.36  09-Oct-1997  christos Sync with bpf-1.2a1
- whitespace
- add rcsid; our sccsid is newer than the one on 1.2a1.
- change prototype to add mtu
- change size_t to u_int for consistency.
- add alignment stuff in bpf_movein
- add more consistency checks bpf_movein
- use one uiomove and then bcopy the data in bpf_movein
- update the comment for the panic when ifpromisc fails.
- separate the case when we have non blocking I/O and
no data and return EWOULDBLOCK
- check for other errors and return them
- pass the mtu to bpf_movein
- Add the BPF_KERN_FILTER junk, just so that we keep up with the code
- remove BIOCSRSIG, BIOCGRSIG; SIGIO does this well.
- don't add the SIOCGIFADDR stuff (it is bogus)
- Check for malloc return for consistency.
- comment should say poll
- change formatting to match the current code.
- save and restore the pcount and flags in case we fail to set the
interface into promiscuous mode.
- fix spelling typo.
 1.35  17-Mar-1997  scottr branches: 1.35.4;
if_arc.h is in net, not netinet.
 1.34  15-Mar-1997  is New ARP system, supports IPv4 over any hardware link.

Some of the stuff (e.g., rarpd, bootpd, dhcpd etc., libsa) still will
only support Ethernet. Tcpdump itself should be ok, but libpcap needs
lot of work.

For the detailed change history, look at the commit log entries for
the is-newarp branch.
 1.33  21-Feb-1997  thorpej Don't let the read timeout get inadvertently rounded down to 0.
From John Hawkinson <jhawk@mit.edu>, PR #2531.
 1.32  13-Oct-1996  christos branches: 1.32.4;
backout previous kprintf change
 1.31  10-Oct-1996  christos - printf -> kprintf, sprintf -> ksprintf
 1.30  07-Sep-1996  mycroft Implement poll(2).
 1.29  14-Jun-1996  cgd avoid unnecessary checks of m_get/MGET/etc.'s return values. When
they're called with M_WAIT, they are defined to never return NULL.
 1.28  22-May-1996  mycroft Remove duplicate definition of bpf_setif().
 1.27  07-May-1996  thorpej Kill a couple of unnecessary calls to strlen().
 1.26  07-May-1996  thorpej Changed struct ifnet to have a pointer to the softc of the underlying
device and a printable "external name" (name + unit number), thus eliminating
if_name and if_unit. Updated interface to (*if_watchdog)() and (*if_reset)()
to take a struct ifnet *, rather than a unit number.
 1.25  30-Mar-1996  christos Eliminate need for and remove net_conf.h
 1.24  13-Feb-1996  christos Net prototypes
 1.23  27-Sep-1995  thorpej Enhancements to the bpf from Stu Grossman <grossman@cygnus.com>:
* grok FIONBIO, FIOASYNC, and TIOC{G,S}PGRP
* add BIOC{G,S}RSIG; get/set the signal to be delivered
to the process or process group upon packet reception.
Defaults to SIGIO.
 1.22  13-Aug-1995  mycroft Don't pass through SIOCGIFADDR, per Steve McCanne.
 1.21  12-Aug-1995  mycroft splnet --> splsoftnet
 1.20  23-Jul-1995  mycroft For outgoing packets, always allocate a header mbuf and fill it in.
 1.19  22-Apr-1995  cgd copy routines should take size_t lengths for prototype consistency.
don't assume that tick is >= 1000; loses badly on alpha (div. by zero)
only try unaligned copies if NetBSD's UNALIGNED_ACCESS symbol is defined.
various misc type size cleanups, mostly short -> int16_t.
 1.18  22-Mar-1995  mycroft Fix panic when an interface in promiscuous mode goes down and the BPF user
tries to turn off promiscuous mode. From Lon Willett.
 1.17  23-Feb-1995  glass preliminary arcnet support. uses lame but RFC address resolution
 1.16  30-Oct-1994  cgd be more careful with types, also pull in headers where necessary.
 1.15  15-Jul-1994  cgd don't use inline, use __inline, like cdefs intends (so it can kill it if nongcc
 1.14  29-Jun-1994  cgd branches: 1.14.2;
this is what cdefs.h is for
 1.13  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.12  13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.11  25-Jan-1994  deraadt new from mccanne. be afraid.
 1.10  12-Jan-1994  mycroft Get the pkthdr.len calculation right.
 1.9  12-Jan-1994  deraadt writing out of bpf; use a hdr mbuf and set the pkthdr.len as well.
(rarpd now works with if_ep.c!)
 1.8  18-Dec-1993  mycroft Canonicalize all #includes.
 1.7  23-Nov-1993  cgd defines change
 1.6  15-Nov-1993  deraadt add bpfilterattach(), as in magnum
 1.5  18-May-1993  cgd branches: 1.5.4;
make kernel select interface be one-stop shopping & clean it all up.
 1.4  09-Apr-1993  glass fixes stupid piece of bpf code that duplicates cdefs.h's handling of
'inline' in such a way as to cause stupid warnings.
 1.3  05-Apr-1993  deraadt selwakeup() takes a "pid_t" rather than "struct proc *" now.
 1.2  25-Mar-1993  cgd added BPF support, as provided by David Greenman (davidg@implode.rain.com)
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  01-Mar-1998  fvdl Import 4.4BSD-Lite2
 1.1.1.2  01-Mar-1998  fvdl Import 4.4BSD-Lite for reference
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.5.4.5  03-Dec-1993  mycroft Path from Andrew Moore <alm@netcom.com> to make sure the ether type field is
correct when sending raw packets.
 1.5.4.4  27-Nov-1993  mycroft Remove remaining sleep()s.
 1.5.4.3  23-Nov-1993  cgd defines change
 1.5.4.2  09-Oct-1993  mycroft Add dummy bpfilterattach() to make autoconfig happy.
 1.5.4.1  24-Sep-1993  mycroft Make all files using spl*() #include cpu.h. Changes from trunk.
 1.14.2.1  15-Jul-1994  cgd updates from trunk. basically, C language errors.
 1.32.4.3  12-Mar-1997  is Merge in changes from The Trunk
 1.32.4.2  09-Mar-1997  is netinet/if_ether.h -> netinet/if_inarp.h
 1.32.4.1  07-Feb-1997  is Snapshot of new ARP code.

Our old ARP code was hardwired for 6-byte length medium
addresses, while the protocol is designed for any size.

This snapshot contains a first hack at getting rid of
Ethernet specific data structures. The ep driver is updated
(and tested on the PCI bus), the iy and fpa drivers have been
updated, but not real life tested yet.

If you want to test this with other drivers, you have to update
them first yourself, and probably tag the relevant directories.
Better contact me if you want to do this.
 1.35.4.1  14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.46.6.1  21-Jun-1999  thorpej Sync w/ -current.
 1.46.2.1  11-Dec-1998  kenh The beginnings of interface detach support. Still some bugs, but mostly
works for me.

This work was originally by Bill Studenmund, and cleaned up by me.
 1.47.2.4  21-Apr-2001  bouyer Sync with HEAD
 1.47.2.3  05-Jan-2001  bouyer Sync with HEAD
 1.47.2.2  13-Dec-2000  bouyer Sync with HEAD (for UBC fixes).
 1.47.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.55.2.1  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.57.2.1  25-Jan-2001  jhawk Pull up revision 1.60 (requested by thorpej):
Fix non-blocking BPF reads. Fixes PR kern/11836.
 1.60.2.9  11-Dec-2002  thorpej Sync with HEAD.
 1.60.2.8  11-Nov-2002  nathanw Catch up to -current
 1.60.2.7  18-Oct-2002  nathanw Catch up to -current.
 1.60.2.6  17-Sep-2002  nathanw Catch up to -current.
 1.60.2.5  20-Jun-2002  nathanw Catch up to -current.
 1.60.2.4  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.60.2.3  14-Nov-2001  nathanw Catch up to -current.
 1.60.2.2  21-Sep-2001  nathanw Catch up to -current.
 1.60.2.1  21-Jun-2001  nathanw Catch up to -current.
 1.61.4.3  01-Oct-2001  fvdl Catch up with -current.
 1.61.4.2  26-Sep-2001  fvdl * add a VCLONED vnode flag that indicates a vnode representing a cloned
device.
* rename REVOKEALL to REVOKEALIAS, and add a REVOKECLONE flag, to pass
to VOP_REVOKE
* the revoke system call will revoke all aliases, as before, but not the
clones
* vdevgone is called when detaching a device, so make it use REVOKECLONE
to get rid of all clones as well
* clean up all uses of VOP_OPEN wrt. locking.
* add a few VOPS to spec_vnops that need to do something when it's a
clone vnode (access and getattr)
* add a copy of the vnode vattr structure of the original 'master' vnode
to the specinfo of a cloned vnode. could possibly redirect getattr to
the 'master' vnode, but this has issues with revoke
* add a vdev_reassignvp function that disassociates a vnode from its
original device, and reassociates it with the specified dev_t. to be
used by cloning devices only, in case a new minor is allocated.
* change all direct references in drivers to v_devcookie and v_rdev
to vdev_privdata(vp) and vdev_rdev(vp). for diagnostic purposes
when debugging race conditions that still exist wrt. locking and
revoking vnodes.
* make the locking state of a vnode consistent when passed to
d_open and d_close (unlocked). locked would be better, but has
some deadlock issues
 1.61.4.1  07-Sep-2001  thorpej Commit my "devvp" changes to the thorpej-devvp branch. This
replaces the use of dev_t in most places with a struct vnode *.

This will form the basic infrastructure for real cloning device
support (besides being architecurally cleaner -- it'll be good
to get away from using numbers to represent objects).
 1.61.2.7  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.61.2.6  02-Oct-2002  jdolecek do not need the (void *) cast for kn_hook anymore
 1.61.2.5  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.61.2.4  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.61.2.3  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.61.2.2  13-Sep-2001  thorpej Update the kqueue branch to HEAD.
 1.61.2.1  08-Sep-2001  thorpej Add kqueue support.
 1.64.2.3  29-Aug-2002  gehenna catch up with -current.
 1.64.2.2  20-Jun-2002  gehenna catch up with -current.
 1.64.2.1  16-May-2002  gehenna Add the character device switch.
Replace the direct-access to devsw table with calling devsw APIs.
 1.82.2.10  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.82.2.9  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.82.2.8  15-Feb-2005  skrll Sync with HEAD.
 1.82.2.7  18-Dec-2004  skrll Sync with HEAD.
 1.82.2.6  21-Sep-2004  skrll Fix the sync with head I botched.
 1.82.2.5  18-Sep-2004  skrll Sync with HEAD.
 1.82.2.4  25-Aug-2004  skrll Sync with HEAD.
 1.82.2.3  12-Aug-2004  skrll Sync with HEAD.
 1.82.2.2  03-Aug-2004  skrll Sync with HEAD
 1.82.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.90.2.2  28-May-2004  tron Pull up revision 1.98 (requested by atatat in ticket #391):
Sysctl descriptions under net subtree (net.key not done)
 1.90.2.1  21-Apr-2004  jmc Pullup rev 1.91-1.95 (requested by darrenr in ticket #167)

Reduce bpf buffer to 32k from 1M to reduce kernel memory usage from userland
binaries.
Fix bpf so that select will return for a timeout.
Fix the behaviour of BIOCIMMEDIATE.
In bpf_mtap(), optimise the calling of bpf_filter() and catchpacket()
based on whether or not the entire packet is in one mbuf.
Various other bpf fixes, including PR#8674, PR#12170
 1.105.6.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.105.4.1  29-Apr-2005  kent sync with -current
 1.109.2.9  24-Mar-2008  yamt sync with head.
 1.109.2.8  17-Mar-2008  yamt sync with head.
 1.109.2.7  27-Feb-2008  yamt sync with head.
 1.109.2.6  21-Jan-2008  yamt sync with head
 1.109.2.5  07-Dec-2007  yamt sync with head
 1.109.2.4  03-Sep-2007  yamt sync with head.
 1.109.2.3  30-Dec-2006  yamt sync with head.
 1.109.2.2  21-Jun-2006  yamt sync with head.
 1.109.2.1  07-Jul-2005  yamt de-constify mbuf.
 1.115.12.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.115.10.4  11-May-2006  elad sync with head
 1.115.10.3  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.115.10.2  10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.115.10.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.115.8.3  03-Sep-2006  yamt sync with head.
 1.115.8.2  11-Aug-2006  yamt sync with head
 1.115.8.1  24-May-2006  yamt sync with head.
 1.115.6.1  01-Jun-2006  kardel Sync with head.
 1.115.4.1  09-Sep-2006  rpaulo sync with head
 1.117.4.1  13-Jul-2006  gdamore Merge from HEAD.
 1.122.4.2  10-Dec-2006  yamt sync with head.
 1.122.4.1  22-Oct-2006  yamt sync with head
 1.122.2.1  18-Nov-2006  ad Sync with head.
 1.125.4.1  12-Mar-2007  rmind Sync with HEAD.
 1.126.4.1  11-Jul-2007  mjf Sync with head.
 1.126.2.5  15-Jul-2007  ad Sync with head.
 1.126.2.4  15-Jul-2007  ad Sync with head.
 1.126.2.3  01-Jul-2007  ad Adapt to callout API change.
 1.126.2.2  09-Jun-2007  ad Sync with head.
 1.126.2.1  10-Apr-2007  ad Changes to select/poll:

- Make them MP safe and decouple from the proc locks.
- selwakeup: don't call p_find, or traverse per-proc LWP lists (ouch).
- selwakeup: don't lock the sleep queue unless we need to.
 1.130.16.2  26-Dec-2007  ad Sync with head.
 1.130.16.1  08-Dec-2007  ad Sync with head.
 1.130.14.2  27-Dec-2007  mjf Sync with HEAD.
 1.130.14.1  08-Dec-2007  mjf Sync with HEAD.
 1.130.8.2  23-Mar-2008  matt sync with HEAD
 1.130.8.1  09-Jan-2008  matt sync with HEAD
 1.130.6.1  09-Dec-2007  jmcneill Sync with HEAD.
 1.131.4.1  02-Jan-2008  bouyer Sync with HEAD
 1.133.6.5  17-Jan-2009  mjf Sync with HEAD.
 1.133.6.4  29-Jun-2008  mjf Sync with HEAD.
 1.133.6.3  02-Jun-2008  mjf Sync with HEAD.
 1.133.6.2  03-Apr-2008  mjf Sync with HEAD.
 1.133.6.1  29-Mar-2008  mjf - etc/devfsd.conf: Add some rules to give nodes like /dev/tty and
/dev/null better default modes, i.e. 0666.

- sbin/init: Run devfsd -s before going to multiuser.

- sys/arch: Provide arm32, i386, sparc with a mem_init() function to request
device nodes for /dev/null, /dev/zero, etc.

- sys/dev: Convert rnd, wd, agp, raid, cd, sd, wsdisplay, wskbd, wsmouse,
wsmux, tty, bpf, swap to devfs New World Order.

- sys/fs/devfs: Make the visibility attribute of device nodes configurable.
Also provide a function to mount a devfs on boot.

- sys/kern: Add a new boot flag, -n. This disables devfs support. Unless
the -n flag is specified the kernel will mount a devfs file
system on boot.
 1.133.2.1  24-Mar-2008  keiichi sync with head.
 1.137.4.3  17-Jun-2008  yamt sync with head.
 1.137.4.2  04-Jun-2008  yamt sync with head
 1.137.4.1  18-May-2008  yamt sync with head.
 1.137.2.3  28-Dec-2008  christos back to usecs now for source compatibility
 1.137.2.2  01-Nov-2008  christos Sync with head.
 1.137.2.1  29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.139.4.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.139.2.3  11-Aug-2010  yamt sync with head.
 1.139.2.2  11-Mar-2010  yamt sync with head
 1.139.2.1  04-May-2009  yamt sync with head.
 1.140.2.1  18-Jun-2008  simonb Sync with head.
 1.141.6.3  11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #1874):
sys/net/bpf.c: revision 1.176 via patch
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.141.6.2  05-Apr-2011  riz branches: 1.141.6.2.2;
Pull up following revision(s) (requested by bouyer in ticket #1587):
sys/net/bpf.c: revision 1.163
Allocate buffers with (M_WAITOK | M_CANFAIL) instead of M_NOWAIT.
M_NOWAIT cause dhcpd on a low-memory server with lots of interfaces to
occasionally fail to start with ENOBUFS; (M_WAITOK | M_CANFAIL) seems to
fix this.
Tested on 3 different dhcp servers.
 1.141.6.1  04-Apr-2009  snj branches: 1.141.6.1.6;
Pull up following revision(s) (requested by ad in ticket #661):
sys/arch/xen/xen/xenevt.c: revision 1.32
sys/compat/svr4/svr4_net.c: revision 1.56
sys/compat/svr4_32/svr4_32_net.c: revision 1.19
sys/dev/dmover/dmover_io.c: revision 1.32
sys/dev/putter/putter.c: revision 1.21
sys/kern/kern_descrip.c: revision 1.190
sys/kern/kern_drvctl.c: revision 1.23
sys/kern/kern_event.c: revision 1.64
sys/kern/sys_mqueue.c: revision 1.14
sys/kern/sys_pipe.c: revision 1.109
sys/kern/sys_socket.c: revision 1.59
sys/kern/uipc_syscalls.c: revision 1.136
sys/kern/vfs_vnops.c: revision 1.164
sys/kern/uipc_socket.c: revision 1.188
sys/net/bpf.c: revision 1.144
sys/net/if_tap.c: revision 1.55
sys/opencrypto/cryptodev.c: revision 1.47
sys/sys/file.h: revision 1.67
sys/sys/param.h: patch
sys/sys/socketvar.h: revision 1.119
Add fileops::fo_drain(), to be called from fd_close() when there is more
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.
Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.
thr0 accept(fd, ...)
thr1 close(fd)
 1.141.6.2.2.1  11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #1874):
sys/net/bpf.c: revision 1.176 via patch
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.141.6.1.6.1  11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #1874):
sys/net/bpf.c: revision 1.176 via patch
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.141.4.2  28-Apr-2009  skrll Sync with HEAD.
 1.141.4.1  19-Jan-2009  skrll Sync with HEAD.
 1.142.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.155.2.1  30-Apr-2010  uebayasi Sync with HEAD.
 1.156.2.4  12-Jun-2011  rmind sync with head
 1.156.2.3  21-Apr-2011  rmind sync with head
 1.156.2.2  05-Mar-2011  rmind sync with head
 1.156.2.1  30-May-2010  rmind sync with head
 1.160.4.1  08-Feb-2011  bouyer Sync with HEAD
 1.160.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.164.2.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.166.6.1  18-Feb-2012  mrg merge to -current.
 1.166.2.3  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.166.2.2  30-Oct-2012  yamt sync with head
 1.166.2.1  17-Apr-2012  yamt sync with head
 1.168.8.1  11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #941):
sys/net/bpf.c: revision 1.176
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.168.6.1  11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #941):
sys/net/bpf.c: revision 1.176
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.168.2.1  11-Sep-2013  msaitoh Pull up following revision(s) (requested by spz in ticket #941):
sys/net/bpf.c: revision 1.176
PR/48198: Peter Bex: Avoid kernel panic caused by setting a very small bpf
buffer size.
 1.171.2.3  03-Dec-2017  jdolecek update from HEAD
 1.171.2.2  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.171.2.1  20-Nov-2012  tls Resync to 2012-11-19 00:00:00 UTC
 1.173.2.1  18-May-2014  rmind sync with head
 1.182.2.1  10-Aug-2014  tls Rebase.
 1.187.2.1  21-Sep-2014  snj Pull up following revision(s) (requested by rmind in ticket #106):
sys/net/bpf.c: revision 1.189
PR/49190: bpf_deliver: set scratch memory store in bpf_args_t.
 1.189.2.8  28-Aug-2017  skrll Sync with HEAD
 1.189.2.7  05-Feb-2017  skrll Sync with HEAD
 1.189.2.6  05-Oct-2016  skrll Sync with HEAD
 1.189.2.5  09-Jul-2016  skrll Sync with HEAD
 1.189.2.4  19-Mar-2016  skrll Sync with HEAD
 1.189.2.3  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.189.2.2  06-Jun-2015  skrll Sync with HEAD
 1.189.2.1  06-Apr-2015  skrll Sync with HEAD
 1.199.2.4  20-Mar-2017  pgoyette Sync with HEAD
 1.199.2.3  26-Jul-2016  pgoyette Rename LOCALCOUNT_INITIALIZER to DEVSW_MODULE_INIT. This better describes
what we're doing, and why.
 1.199.2.2  19-Jul-2016  pgoyette Instead of repeatedly typing the conditional initialization of the
.d_localcount members in the various {b,c}devsw, define an initializer
macro and use it. This also removes the need for defining new symbols
for each 'struct localcount'.

As suggested by riastradh@
 1.199.2.1  17-Jul-2016  pgoyette Adapt some modular drivers to the localcount(9) world. We're still
not actually using the localcount stuff, but we need to differentiate
between built-in vs loaded drivers and allocate a "struct localcount"
only for loaded drivers.
 1.203.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.216.6.9  04-Aug-2023  martin Apply patch, requested by ozaki-r in ticket #1885:

sys/net/bpf.c (apply patch)

bpf: allow to read with no filter (regressed at revision 1.213,
fixed differently in -current)
 1.216.6.8  22-Feb-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1802):

sys/net/bpf.c: revision 1.247 (manually merged)

bpf(4): Reject bogus timeout values before arithmetic overflows.
 1.216.6.7  04-Aug-2019  martin Pull up following revision(s) (requested by maxv in ticket #1323):

sys/net/bpf.c: revision 1.229

Fix info leak: use kmem_zalloc, because we align the buffers, and the
otherwise uninitialized padding bytes get copied to userland in bpf_read().
 1.216.6.6  15-May-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #826):

sys/net/if_bridge.c: revision 1.155
sys/net/if.c: revision 1.421
sys/net/bpf.c: revision 1.224
sys/net/if.c: revision 1.422
sys/net/if.c: revision 1.423

Use if_is_mpsafe (NFC)

Protect packet input routines with KERNEL_LOCK and splsoftnet
if_input, i.e, ether_input and friends, now runs in softint without any
protections. It's ok for ether_input itself because it's already MP-safe,
however, subsequent routines called from it such as carp_input and agr_input
aren't safe because they're not MP-safe. Protect if_input with KERNEL_LOCK.
if_input can be called from a normal LWP context. In that case we need to
prevent interrupts (softint) from running by splsoftnet to protect
non-MP-safe
codes (e.g., carp_input and agr_input).

Pointed out by mlelstv@

Protect if_deferred_start_softint with KERNEL_LOCK if the interface isn't
MP-safe
 1.216.6.5  05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #526):
sys/net/bpfdesc.h: revision 1.45
sys/net/bpf.c: revision 1.223
Abandon unnecessary softint
The softint was introduced to defer fownsignal that was called in bpf_wakeup to
softint at v1.139, but now bpf_wakeup always runs in softint so we don't need
the softint anymore.
 1.216.6.4  02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.216.6.3  21-Dec-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #454):
sys/net/bpf.c: revision 1.222
Make softint and callout MP-safe
 1.216.6.2  21-Dec-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #446):
sys/net/bpf.c: revision 1.221
Fix panic in callout_halt (fix typo)
Reported by wiz@
 1.216.6.1  25-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #329):
sys/net/bpf.c: revision 1.217
Turn on D_MPSAFE flag of bpf_cdevsw that is already MP-safe
Pointed out by k-goda@IIJ
 1.216.4.2  29-Apr-2017  pgoyette Remove more unnecessary #include for sys/localcount.h
 1.216.4.1  27-Apr-2017  pgoyette Restore all work from the former pgoyette-localcount branch (which is
now abandoned doe to cvs merge botch).

The branch now builds, and installs via anita. There are still some
problems (cgd is non-functional and all atf tests time-out) but they
will get resolved soon.
 1.223.2.4  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.223.2.3  28-Jul-2018  pgoyette Sync with HEAD
 1.223.2.2  25-Jun-2018  pgoyette Sync with HEAD
 1.223.2.1  21-May-2018  pgoyette Sync with HEAD
 1.226.2.3  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.226.2.2  08-Apr-2020  martin Merge changes from current as of 20200406
 1.226.2.1  10-Jun-2019  christos Sync with HEAD
 1.229.2.4  13-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #1886):

sys/net/bpfdesc.h: revision 1.49
sys/net/bpf.c: revision 1.256
sys/net/bpf.c: revision 1.257
sys/net/bpfdesc.h: revision 1.50

bpf: restore wakeup softint

This change fixes the issue that fownsignal which can take an
adaptive mutex is called inside a pserialize read section in
bpf_deliver.

Fix issue #4 (only the latter of two) in PR#58596
bpf: protect selnotify and selrecord with bd_buf_mtx

We have to make updates and checks of buffers and calls of
selnotify/selrecord atomic to satisfy constraints of sel* API.

Also, bd_state and bd_cv are protected by bd_buf_mtx now.

Fix issue #3 of PR#58596

Part of the fix is inspired by riastradh's patch.
 1.229.2.3  04-Aug-2023  martin Apply patch, requested by ozaki-r in ticket #1708:

sys/net/bpf.c (apply patch)

bpf: allow to read with no filter (regressed at revision 1.213,
fixed differently in -current)
 1.229.2.2  22-Feb-2023  martin Pull up following revision(s) (requested by riastradh in ticket #1605):

sys/net/bpf.c: revision 1.247 (manually merged)

bpf(4): Reject bogus timeout values before arithmetic overflows.
 1.229.2.1  16-Oct-2019  martin Pull up following revision(s) (requested by maxv in ticket #335):

sys/net/bpf.c: revision 1.230
sys/net/bpf.c: revision 1.231

Add KASSERT to catch bugs. Something tells me it could easily fire.

-

As I suspected, the KASSERT I added yesterday can fire if we try to process
zero-sized packets. Skip them to prevent a type confusion that can trigger
random page faults later.
 1.232.2.2  29-Feb-2020  ad Sync with head.
 1.232.2.1  25-Jan-2020  ad Sync with head.
 1.238.2.1  03-Jan-2021  thorpej Sync w/ HEAD.
 1.239.4.2  01-Aug-2021  thorpej Sync with HEAD.
 1.239.4.1  17-Jun-2021  thorpej Sync w/ HEAD.
 1.249.2.3  13-Sep-2024  martin Pull up following revision(s) (requested by ozaki-r in ticket #858):

sys/net/bpfdesc.h: revision 1.49
sys/net/bpf.c: revision 1.256
sys/net/bpf.c: revision 1.257
sys/net/bpfdesc.h: revision 1.50

bpf: restore wakeup softint

This change fixes the issue that fownsignal which can take an
adaptive mutex is called inside a pserialize read section in
bpf_deliver.

Fix issue #4 (only the latter of two) in PR#58596
bpf: protect selnotify and selrecord with bd_buf_mtx

We have to make updates and checks of buffers and calls of
selnotify/selrecord atomic to satisfy constraints of sel* API.

Also, bd_state and bd_cv are protected by bd_buf_mtx now.

Fix issue #3 of PR#58596

Part of the fix is inspired by riastradh's patch.
 1.249.2.2  22-Aug-2024  martin Pull up following revision(s) (requested by rin in ticket #784):

sys/net/bpf.c: revision 1.253

bpf: Mark bpfread_filtops FILTEROP_MPSAFE

Fix deadlock for non-NET_MPSAFE kernel, reported as
PR kern/58531 (thanks manu@ for test).

I've confirmed that there is no new regression for ATF with
any combination of -HEAD/netbsd-10 and default/NET_MPSAFE
rump kernels (aarch64).

Although, some problems have been reported on MP-safety for
bpf(4), PR kern/58596. But, it should take some time to fix.

At the moment, commit this part in advance.
OK ozaki-r@
 1.249.2.1  24-Feb-2023  martin Pull up following revision(s) (requested by gutteridge in ticket #103):

sys/net/bpf.c: revision 1.251

bpf.c: support loopback writes when BIOCSHDRCMPLT is set

Following changes in r. 1.249 "bpf: support sending packets on loopback
interfaces", also allow for this to succeed when the "header complete"
flag is set, which is the practice of some tools, e.g., tcpreplay and
Scapy. With this change, both of those example tools now work, e.g.,
Scapy passes "L3bpfSocket - send and sniff on loopback" in its test
suite.

There are several ways of addressing this issue; this commit is
intended to be the most conservative and consistent with the previous
changes. (E.g., FreeBSD instead has special handling of this condition
in its if_loop.c.)

RSS XML Feed