Home | History | Annotate | Download | only in netinet
History log of /src/sys/netinet/ip_var.h
RevisionDateAuthorComments
 1.135  27-Jun-2025  andvar Grammar and spelling fixes, mainly in comments. A few in documentation,
logging, test description, and SCSI ASC/ASCQ assignment descriptions.
 1.134  10-Apr-2022  andvar branches: 1.134.10;
fix various typos in comments and output/log messages.
 1.133  03-Feb-2021  roy CTASSERT -> __CTASSERT to unbreak userland build.

While here move __packed in tcp_debug.h back to where it was and
note removal warrants more investigation.
 1.132  03-Feb-2021  roy Sprinkle CTASSERT to enforce on-wire layout without __packed
 1.131  03-Feb-2021  roy Remove __packed from various network structures

They are already network aligned and adding the __packed attribute
just causes needless compiler warnings about accssing members of packed
objects.
 1.130  28-Aug-2020  ozaki-r branches: 1.130.2;
inet: reduce silent packet discards
 1.129  28-Aug-2020  ozaki-r inet, inet6: count packets dropped by IPsec

The counters count packets dropped due to security policy checks.
 1.128  13-May-2019  ozaki-r Count packets dropped by pfil
 1.127  14-Sep-2018  maxv Use non-variadic function pointer in protosw::pr_input.
 1.126  10-Jul-2018  maxv Remove the second argument from ip_reass_packet(). We want the IP header
on the mbuf, not elsewhere. Simplifies the NPF reassembly code a little.
No real functional change.
 1.125  08-Apr-2018  maxv branches: 1.125.2;
Remove the ipre_mlast field and the TRAVERSE macro.

The goal was to store in ipre_mlast the last mbuf of the chain, so that
m_cat could be called on it. But it's not needed, since m_cat already
does the equivalent of TRAVERSE itself.

If it were needed, there would be a bug, since we don't call TRAVERSE on
ipre_mlast when creating a new reassembly entry.
 1.124  08-Apr-2018  maxv Remove unused field, and sync comment with reality.
 1.123  03-Apr-2018  maxv Remove unused fields and outdated comment.
 1.122  10-Jan-2018  knakahara branches: 1.122.2;
add ipsec(4) interface, which is used for route-based VPN.

man and ATF are added later, please see man for details.

reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
 1.121  11-Dec-2017  ryo As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.120  10-Aug-2017  ryo Add support IP_PKTINFO for sendmsg(2).

The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.

Reviewed by ozaki-r@ and christos@. thanks.
 1.119  31-Mar-2017  ozaki-r branches: 1.119.6;
Don't use a single global variable to store source route information for multiple incoming packets

It's not MP-safe. So use a m_tag to store the information instead.

Pointed out by knakahara@
The fix is from OpenBSD (originally fixed in FreeBSD)
 1.118  03-Mar-2017  ozaki-r Pass inpcb/in6pcb instead of socket to ip_output/ip6_output

- Passing a socket to Layer 3 is layer violation and even unnecessary
- The change makes codes of callers and IPsec a bit simple
 1.117  16-Feb-2017  knakahara add l2tp(4) L2TPv3 interface.

originally implemented by IIJ SEIL team.
 1.116  08-Dec-2016  ozaki-r branches: 1.116.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.115  01-Aug-2016  knakahara improve fast-forward performance when the number of flows exceeds IPFLOW_MAX.

In the fast-forward case, when the number of flows exceeds IPFLOW_MAX, the
performmance degraded to about 50% compared to the case less than IPFLOW_MAX
flows. This modification suppresses the degradation to 65%. Furthermore,
the modified kernel is about the same performance as the original kernel
when the number of flows is less than IPFLOW_MAX.

The original patch is implemented by ryo@n.o. Thanks.
 1.114  21-Jun-2016  ozaki-r branches: 1.114.2;
Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.113  13-Jun-2016  knakahara make ipflow_reap() static function.
 1.112  28-Apr-2016  ozaki-r Constify rtentry of if_output

We no longer need to change rtentry below if_output.

The change makes it clear where rtentries are changed (or not)
and helps forthcoming locking (os psrefing) rtentries.
 1.111  26-Apr-2016  ozaki-r Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
 1.110  20-Jan-2016  riastradh Give proper prototype to ip_output.
 1.109  20-Jan-2016  riastradh Give proper prototype to rip_output.
 1.108  04-Jun-2015  ozaki-r Pull out route lookups from L2 output routines

Route lookups for routes of RTF_GATEWAY were done in L2 output
routines such as ether_output, but they should be done in L3
i.e., before L2 output routines. This change places the lookups
between L3 output routines (say ip_output) and the L2 output
routines.

The change is based on dyoung's patch submitted in the thread:
https://mail-index.netbsd.org/tech-net/2013/02/01/msg003847.html
You can find out detailed investigations by dyoung about the
issue in there.

Note that the change introduces a workaround for MPLS. ether_output
knew that it needs to fill the ethertype of a frame as MPLS,
based on a tag of an original route (rtentry), but now we don't
pass it to ehter_output. So we have to tell that in another way.
We use mtag to do so for now, which introduces some overhead.
We should fix it somehow in the future.

Discussed on tech-kern and tech-net.
 1.107  11-Oct-2014  christos branches: 1.107.2;
exposet multicast option functions which are used by the v6 code now.
 1.106  05-Jun-2014  rmind - Implement pktqueue interface for lockless IP input queue.
- Replace ipintrq and ip6intrq with the pktqueue mechanism.
- Eliminate kernel-lock from ipintr() and ip6intr().
- Some preparation work to push softnet_lock out of ipintr().

Discussed on tech-net.
 1.105  30-May-2014  rmind Use __CTASSERT() in the header.
 1.104  29-May-2014  rmind Make IGMP and multicast group management code MP-safe. Use a read-write
lock to protect the hash table of multicast address records; also, make it
private and eliminate some macros. In the long term, the lookup path ought
to be optimised.
 1.103  23-May-2014  rmind Make ip_forward() static, there is no need to expose it.
 1.102  22-May-2014  rmind - Make ip_setmoptions(), ip_getmoptions() and ip_pcbopts() static.
- ip_output: eliminate 7th variadic argument; IP_RETURNMTU is flag
always used to store MTU size into struct inpcb::inp_errormtu.
- Clean up these routines: reduce #ifdefs, variable scopes, etc.
 1.101  22-May-2014  rmind - Add in_init() and move some functions, variables and sysctls into in.c
where they belong to. Make some functions and variables static.
- ip_input.c: reduce some #ifdefs, cleanup a little.
- Move some sysctls into ip_flow.c as they belong there.

No functional change.
 1.100  18-May-2014  rmind Add struct pr_usrreqs with a pr_generic function and prepare for the
dismantling of pr_usrreq in the protocols; no functional change intended.
PRU_ATTACH/PRU_DETACH changes will follow soon.

Bump for struct protosw. Welcome to 6.99.62!
 1.99  19-Mar-2014  liamjfoy branches: 1.99.2;
Move ipflow into ip_var.h and fix confliction
 1.98  19-Mar-2014  liamjfoy Remove ipflow_prune and replace with ipflow_reap. ok rmind@
 1.97  03-May-2011  dyoung branches: 1.97.4; 1.97.14; 1.97.18;
*_drain() routines may be called with locks held, so instead of doing
any work in *_drain(), set a drain-needed flag. Do the work in the
fasttimo handler.

Contributed by Coyote Point Systems, Inc.
 1.96  05-Nov-2010  rmind branches: 1.96.2;
ip_reass_packet: finish abstraction; some clean-up.
Discussed some time ago with matt@.
 1.95  25-Aug-2010  rmind Use own IPv4 reassembly queue entry structure and leave struct ipqent only
for TCP. Now both struct ipfr_qent, struct ipfr_queue and hashed fragment
queue are abstracted and no longer public.
 1.94  19-Jul-2010  rmind Revert previous change of making struct ipqent invisible to userland.
 1.93  19-Jul-2010  rmind Abstract IP reassembly into single generic routine - ip_reass_packet().
Make struct ipq private and struct ipqent not visible to userland.
Push ip_len adjustment into reassembly layer.

OK matt@
 1.92  13-Jul-2010  rmind Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@
 1.91  01-Feb-2009  pooka branches: 1.91.4; 1.91.6;
Init ipflow pool dynamically instead of using a linkset.
 1.90  12-Oct-2008  plunky branches: 1.90.2;
update ip_pcbopts() to use sockopt(9) API.

cleans up function and one small fix is that we now stop copying user
options to the mbuf when the _EOL is given, previously this function
would continue to copy options.
 1.89  16-Aug-2008  plunky constify sockopt in the PRCO_SETOPT path
 1.88  06-Aug-2008  plunky Convert socket options code to use a sockopt structure
instead of laying everything into an mbuf.

approved by core
 1.87  12-Apr-2008  thorpej branches: 1.87.4; 1.87.6; 1.87.10;
Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated
when the user requests them via sysctl.
 1.86  09-Apr-2008  thorpej - ipflow is not used outside ip_flow.c; move its definition there.
- Make ipflow_reap() private to ip_flow.c, and introduce ipflow_prune()
for external callers to use (avoids returning an ipflow * that is never
actually used anyway).
 1.85  07-Apr-2008  thorpej Change IP stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ipstat structure; old netstat
binaries will continue to work properly.
 1.84  06-Feb-2008  matt branches: 1.84.6;
Add a new ip_id generation scheme based on a Fisher-Yates shuffle over a
sliding window. XXX replace use of arc4random RSN.
 1.83  25-Dec-2007  perry Convert many of the uses of __attribute__ to equivalent
__packed, __unused and __dead macros from cdefs.h
 1.82  22-Dec-2007  matt Make sure ip_newid etal doesn't return an ip_id of 0.
 1.81  22-Dec-2007  matt Add ipq_tos to struct ipqe. (Doesn't increase size since the last member
was a u_int16_t).
 1.80  02-Oct-2007  dyoung branches: 1.80.4; 1.80.6; 1.80.10;
Delete the unused second argument to ip_stripoptions(), move it
closer to its single caller in if_eon.c, try to move fewer bytes
by moving the IP header forward instead of moving the tail of the
mbuf backward, and use m_adj(9) instead of fiddling directly with
mbuf data members.
 1.79  25-Mar-2007  liamjfoy branches: 1.79.8; 1.79.10; 1.79.12;
Add net.inet.ip.hashsize to control the IPv4 fast forward hash table size.
 1.78  17-Feb-2007  dyoung branches: 1.78.4; 1.78.6; 1.78.8;
KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.77  16-Feb-2006  perry branches: 1.77.20;
Change "inline" back to "__inline" in .h files -- C99 is still too
new, and some apps compile things in C89 mode. C89 keywords stay.

As per core@.
 1.76  24-Dec-2005  perry branches: 1.76.2; 1.76.4; 1.76.6;
Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
 1.75  11-Dec-2005  christos merge ktrace-lwp.
 1.74  10-Dec-2005  elad Multiple inclusion protection, as suggested by christos@ on tech-kern@
few days ago.
 1.73  22-Nov-2005  yamt revert rev.1.72 as it isn't necessary.
 1.72  06-May-2005  matt branches: 1.72.2; 1.72.8;
Add #include <sys/protosw.h> when _KERNEL
 1.71  29-Apr-2005  yamt move decl of inetsw to its own header to avoid array of incomplete type.
found by gcc4. reported by Adam Ciarcinski.
 1.70  07-Apr-2005  yamt when doing TSO, avoid to use duplicated ip_id heavily.
XXX ip_randomid
 1.69  15-Dec-2004  thorpej branches: 1.69.2; 1.69.8;
Don't perform checksums on loopback interfaces. They can be reenabled with
the net.inet.*.do_loopback_cksum sysctl.

Approved by: groo
 1.68  22-Apr-2004  matt Constify protosw arrays. This can reduce the kernel .data section by
over 4K (if all the network protocols) are loaded.
 1.67  21-Apr-2004  itojun no space between function name and paren: foo (blah) -> foo(blah)
 1.66  18-Apr-2004  matt De __P()
 1.65  12-Dec-2003  scw Make fast-ipsec and ipflow (Fast Forwarding) interoperate.

The idea is that we only clear M_CANFASTFWD if an SPD exists
for the packet. Otherwise, it's safe to add a fast-forward
cache entry for the route.

To make this work properly, we invalidate the entire ipflow
cache if a fast-ipsec key is added or changed.
 1.64  08-Dec-2003  jonathan Add new field ipq_nfrags to struct ipq. Maintain count of fragments
(fragments, not fragmented packets) in each queue entry.
Use ipq_nfrags to maintain a count of total fragments in reassembly queue.
 1.63  06-Dec-2003  jonathan Replace the single global IP reassembly list/listhead, with a
hashtable of list-heads. Independently re-invented, then reworked to
match similar code in FreeBSD.
 1.62  26-Nov-2003  itojun define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.
 1.61  25-Nov-2003  itojun knf
 1.60  17-Nov-2003  jonathan Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
 1.59  06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.58  19-Aug-2003  itojun make ip_fragment public (it is for coming PF integration)
 1.57  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.56  29-Jun-2003  fvdl branches: 1.56.2;
Back out the lwp/ktrace changes. They contained a lot of colateral damage,
and need to be examined and discussed more.
 1.55  28-Jun-2003  darrenr Pass lwp pointers throughtout the kernel, as required, so that the lwpid can
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.

Bump the kernel rev up to 1.6V
 1.54  23-Jun-2003  martin Protect opt_*.h includes by _KERNEL_OPT
 1.53  23-Jun-2003  martin Make sure to include opt_foo.h if a defflag option FOO is used.
 1.52  15-Jun-2003  matt Change the way multicasts are kept. They now use a hash table in the same
manner as the ifaddr hash table. By doing this, the mkludge code can go
away. At the same time, keep track of what pcbs are using what ifaddr and
when an address is deleted from an interface, notify/abort all sockets
that have that address as a source. Switch IGMP and multicasts to use pools
for allocation. Fix a number of potential problems in the igmp code where
allocation failures could cause a trap/panic.
 1.51  26-Feb-2003  matt Add MBUFTRACE kernel option.
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
 1.50  28-Jan-2003  wiz success, not sucess. Noted by mjl.
 1.49  11-Sep-2002  itojun correct signedness mixup in pointer passing. sync w/kame
 1.48  30-Jun-2002  thorpej Changes to allow the IPv4 and IPv6 layers to align headers themseves,
as necessary:
* Implement a new mbuf utility routine, m_copyup(), is is like
m_pullup(), except that it always prepends and copies, rather
than only doing so if the desired length is larger than m->m_len.
m_copyup() also allows an offset into the destination mbuf, which
allows space for packet headers, in the forwarding case.
* Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These
macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that
architectures which do not have strict alignment constraints don't
pay for the test or visit the new align-if-needed path.
* Use the new macros to check if a header needs to be aligned, or to
assert that it already is, as appropriate.

Note: This code is still somewhat experimental. However, the new
code path won't be visited if individual device drivers continue
to guarantee that packets are delivered to layer 3 already properly
aligned (which are rules that are already in use).
 1.47  07-May-2002  matt branches: 1.47.2;
Change struct ipqe to use TAILQ's instead of LIST's (primarily for TCP's
benefit currently). Rework tcp_reass code to optimize the 4 most likely causes
of out-of-order packets: first OoO pkt, next OoO pkt in seq, OoO pkt is part
of new chuck of OoO packets, and the OoO pkt fills the first hole. Add evcnts
to instrument tcp_reass (enabled by the options TCP_REASS_COUNTERS). This is
part 1/2 of tcp_reass changes.
 1.46  21-Dec-2001  itojun have rip_ctlinput to notify routing changes to raw sockets
(protosw change to be done). sync with kame
 1.45  02-Mar-2001  itojun branches: 1.45.2; 1.45.4;
increase ipstat.ips_badaddr if the packet fails to pass address checks.
 1.44  13-Jan-2001  itojun allow IP_MULTICAST_IF and IP_ADD/DROP_MEMBERSHIP to specify interface
by interface index. if the interface address specified is in 0.0.0.0/8
it will be considered as interface index in network byteorder.

getsockopt(IP_MULTICAST_IF) preserves old behavior if
setsockopt(IP_MULTICAST_IF) was done with interface address, and
returns interface index if setsockopt(IP_MULTICAST_IF) was done with
interface index (again using the form in 0.0.0.0/8).

Suggested by Dave Thaler, based on RIPv2 MIB spec (RFC1724 section 3.3).

http://mail-index.netbsd.org/tech-net/2001/01/13/0003.html
 1.43  17-Oct-2000  thorpej Add an IP_MTUDISC flag to the flags that can be passed to
ip_output(). This flag, if set, causes ip_output() to set
DF in the IP header if the MTU in the route is not locked.

This allows a bunch of redundant code, which I was never
really all that happy about adding in the first place, to
be eliminated.

Inspired by a similar change made by provos@openbsd.org when
he integrated NetBSD's Path MTU Discovery code into OpenBSD.
 1.42  25-Aug-2000  tron Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.
 1.41  30-Mar-2000  simonb branches: 1.41.4;
Delete redundant decl of ip_gif_ttl - it's in <netinet/in_gif.h>.
Delete redundant decl of ip_mforward() - it's in <netinet/ip_mroute.h>.
 1.40  20-Nov-1999  thorpej Add the `packed' attribute to structures which describe wire protocol data.
 1.39  19-Nov-1999  bouyer Update protocoles and interfaces stats counters to 64bit.
RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14)
struct with 32bit counters (binary compat, conditioned on COMPAT_14).
Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4.
Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic()
when the message is larger than MHLEN.
 1.38  06-Jul-1999  itojun branches: 1.38.2; 1.38.8;
sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.37  01-Jul-1999  itojun IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.36  08-Oct-1998  thorpej branches: 1.36.8; 1.36.10;
Use the pool allocator for ipflow entries.
 1.35  08-Oct-1998  thorpej Use the pool allocator for ipqent structures.
 1.34  02-Jun-1998  thorpej In addition to the IP flow hash table, put the flows on a list. The table
is used for fast lookup, the list for traversal of all flows. Also, use
PRT timers.
 1.33  11-May-1998  thorpej Back out previous. This problem was already fixed in a different way.
 1.32  11-May-1998  matt Let usr.sbin/tcpdump build again.
 1.31  04-May-1998  matt Default IP flow to being enabled. Add a sysctl to control the maximum
number of flows (net.inet.ip.maxflows). If set to 0, will disable fast
path forwarding.
 1.30  30-Apr-1998  thorpej Need <net/route.h>
 1.29  29-Apr-1998  matt Add support for "fast" forwarding. Add hooks in if_ethersubr.c and
if_fddisubr.c to fastpath IP forwarding. If ip_forward successfully
forwards a packet, it will create a cache (ipflow) entry. ether_input
and fddi_input will first call ipflow_fastforward with the received
packet and if the packet passes enough tests, it will be forwarded (the
ttl is decremented and the cksum is adjusted incrementally).
 1.28  29-Apr-1998  matt New TCP reassembly code. The new code reduces the memory needed by
out-of-order packets and builds the infrastructure needed for sending
SACK blocks (to be added shortly).
 1.27  29-Apr-1998  kml Add support for deletion of routes added by path MTU discovery;
uses new generic route timeout code. Add sysctl for timeout period.
 1.26  24-Mar-1998  kml Ensure that we take the IP option length into account when we calculate
the effective maximum send size for TCP. ip_optlen() and tcp_optlen()
should probably be inlined for efficiency.
 1.25  10-Feb-1998  perry add/cleanup multiple inclusion protection.
 1.24  05-Jan-1998  thorpej Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes
left were SCCS IDs and Copyright dates.
 1.23  05-Jan-1998  lukem enhance ephemeral port allocation code:
* support sysctl net.inet.ip.anonportmin (lowest ephemeral port)
and net.inet.ip.anonportmax (highest ephemeral port).
these can't be set to >65535, < IPPORT_RESERVED (unless IPNOPRIVPORTS
is defined), and anonportmin has to be < anonportmax.
* use a cleaner way of only cycling through the available set once;
this will be useful for when a random allocation scheme is used
* define IPPORT_ANON{MIN,MAX} instead of IPPORT_USER{LOW,HIGH}
 1.22  18-Oct-1997  kml branches: 1.22.2;
change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc
 1.21  14-Oct-1997  thorpej Define IP_RETURNMTU. (Matt missed this part of his diff, I guess :-)
 1.20  24-Jun-1997  thorpej branches: 1.20.4;
Eliminate use of dtom() from the network code, allowing more flexible
use of mbuf external storage and increasing performance (by eliminating
an m_pullup() for clusters in the IP reassembly code).

Changes from Koji Imada <koji@math.human.nagoya-u.ac.jp>, in PR #3628
and #3480, with ever-so-slight integration changes by me.
 1.19  11-Jan-1997  thorpej Implement the IP_RECVIF socket option: supply a datagram packet's incoming
interface using a sockaddr_dl in a control mbuf.

Implement SO_TIMESTAMP for IP datagrams.

Move packet information option processing into a generic function
so that they work with multicast UDP and raw IP as well as unicast UDP.

Contributed by Bill Fenner <fenner@parc.xerox.com>.
 1.18  25-Oct-1996  thorpej Make length and offset fields unsigned. From Kevin M. Lahey <kml@nas.nasa.gov>
Add a counter to IP stats, to count packets which are discarded on the
grounds that they are too large.
 1.17  22-May-1996  mycroft Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and
PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait.
Remove SS_PRIV completely.
 1.16  13-Feb-1996  christos branches: 1.16.4;
netinet prototypes
 1.15  21-Nov-1995  cgd make netinet work on systems where pointers and longs are 64 bits
(like the alpha). Biggest problem: IP headers were overlayed with
structure which included pointers, and which therefore didn't overlay
properly on 64-bit machines. Solution: instead of threading pointers
through IP header overlays, add a "queue element" structure to do
the threading, and point it at the ip headers.
 1.14  12-Jun-1995  mycroft Various cleanup, including:
* Convert several data structures to use queue.h.
* Split in_pcbnotify() into two parts; one for notifying a specific PCB, and
one for notifying all PCBs for a particular foreign address.
 1.13  14-May-1995  cgd drop (and record) malformed IP fragments. Fixes pr 1030 (differently).
 1.12  13-Apr-1995  cgd be a bit more careful and explicit with types. (basically a large no-op.)
 1.11  26-Mar-1995  jtc KERNEL -> _KERNEL
 1.10  29-Jun-1994  cgd New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
 1.9  13-May-1994  mycroft Update to 4.4-Lite networking code, with a few local changes.
 1.8  10-Jan-1994  mycroft Change the counters to be all the same type -- u_long.
 1.7  10-Jan-1994  mycroft Should compile now with or without `options MULTICAST'.
 1.6  09-Jan-1994  mycroft Prototype the rest.
 1.5  08-Jan-1994  mycroft Fix some inconsistent spacing; spaces at the end of lines, etc.
 1.4  06-Dec-1993  hpeyerl multicast support.
>From Chris Maeda, cmaeda@cs.washington.edu
These patches are derived from the IP Multicast patches for BSDI.
 1.3  20-May-1993  cgd more rcsid additions and file header cleanups
 1.2  19-Apr-1993  mycroft Add consistent multiple-inclusion protection.
 1.1  21-Mar-1993  cgd branches: 1.1.1;
Initial revision
 1.1.1.3  05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
 1.1.1.2  05-Jan-1998  thorpej Import sys/netinet from 4.4BSD-Lite for reference purposes.
 1.1.1.1  21-Mar-1993  cgd initial import of 386bsd-0.1 sources
 1.16.4.2  11-Dec-1996  mycroft From trunk:
Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods
that need it.
Fix numerous memory leaks and bogus return values.
 1.16.4.1  10-Nov-1996  thorpej Update from trunk:
- Make ip_len and ip_off unsigned.
- Make sure we don't accept or transmit packets larger than the
maximim IP packet size.
This fixes the so-called `death ping' bug.

Sum of work from Bill Fenner <fenner@parc.xerox.com>,
Kevin Lahey <kml@nas.nasa.gov>, and myself.

Thanks to Curt Sampson, Jukka Marin, and Kevin Lahey for testing
this under NetBSD 1.2
 1.20.4.1  14-Oct-1997  thorpej Update marc-pcmcia branch from trunk.
 1.22.2.1  09-May-1998  mycroft Pull up patch from kml.
 1.36.10.3  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.36.10.2  06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.36.10.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.36.8.2  02-Aug-1999  thorpej Update from trunk.
 1.36.8.1  01-Jul-1999  thorpej Sync w/ -current.
 1.38.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.38.2.3  12-Mar-2001  bouyer Sync with HEAD.
 1.38.2.2  18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.38.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.41.4.1  26-Aug-2000  tron Pull up from current (approved by thorpej):

Add new sysctl variables "net.inet.ip.lowportmin" and
"net.inet.ip.lowportmax" which can be used to the set minimum
and maximum port number assigned to sockets using
IP_PORTRANGE_LOW.

syssrc/sys/netinet/in.h 1.49 -> 1.50
syssrc/sys/netinet/in_pcb.c 1.66 -> 1.67
syssrc/sys/netinet/ip_input.c 1.116 -> 1.117
syssrc/sys/netinet/ip_var.h 1.41 -> 1.42
 1.45.4.4  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.45.4.3  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.45.4.2  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.45.4.1  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.45.2.4  17-Sep-2002  nathanw Catch up to -current.
 1.45.2.3  01-Aug-2002  nathanw Catch up to -current.
 1.45.2.2  20-Jun-2002  nathanw Catch up to -current.
 1.45.2.1  08-Jan-2002  nathanw Catch up to -current.
 1.47.2.1  15-Jul-2002  gehenna catch up with -current.
 1.56.2.7  11-Dec-2005  christos Sync with head.
 1.56.2.6  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.56.2.5  18-Dec-2004  skrll Sync with HEAD.
 1.56.2.4  21-Sep-2004  skrll Fix the sync with head I botched.
 1.56.2.3  18-Sep-2004  skrll Sync with HEAD.
 1.56.2.2  03-Aug-2004  skrll Sync with HEAD
 1.56.2.1  02-Jul-2003  darrenr Apply the aborted ktrace-lwp changes to a specific branch. This is just for
others to review, I'm concerned that patch fuziness may have resulted in some
errant code being generated but I'll look at that later by comparing the diff
from the base to the branch with the file I attempt to apply to it. This will,
at the very least, put the changes in a better context for others to review
them and attempt to tinker with removing passing of 'struct lwp' through
the kernel.
 1.69.8.1  13-Apr-2005  tron Pull up revision 1.70 (requested by yamt in ticket #145):
when doing TSO, avoid to use duplicated ip_id heavily.
XXX ip_randomid
 1.69.2.1  29-Apr-2005  kent sync with -current
 1.72.8.1  29-Nov-2005  yamt sync with head.
 1.72.2.6  11-Feb-2008  yamt sync with head.
 1.72.2.5  21-Jan-2008  yamt sync with head
 1.72.2.4  27-Oct-2007  yamt sync with head.
 1.72.2.3  03-Sep-2007  yamt sync with head.
 1.72.2.2  26-Feb-2007  yamt sync with head.
 1.72.2.1  21-Jun-2006  yamt sync with head.
 1.76.6.1  22-Apr-2006  simonb Sync with head.
 1.76.4.1  09-Sep-2006  rpaulo sync with head
 1.76.2.1  18-Feb-2006  yamt sync with head.
 1.77.20.2  15-Apr-2007  yamt sync with head.
 1.77.20.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.78.8.1  29-Mar-2007  reinoud Pullup to -current
 1.78.6.1  11-Jul-2007  mjf Sync with head.
 1.78.4.2  09-Oct-2007  ad Sync with head.
 1.78.4.1  10-Apr-2007  ad Sync with head.
 1.79.12.1  06-Oct-2007  yamt sync with head.
 1.79.10.3  23-Mar-2008  matt sync with HEAD
 1.79.10.2  09-Jan-2008  matt sync with HEAD
 1.79.10.1  06-Nov-2007  matt sync with HEAD
 1.79.8.1  04-Oct-2007  joerg Sync with HEAD.
 1.80.10.1  02-Jan-2008  bouyer Sync with HEAD
 1.80.6.1  26-Dec-2007  ad Sync with head.
 1.80.4.1  18-Feb-2008  mjf Sync with HEAD.
 1.84.6.3  17-Jan-2009  mjf Sync with HEAD.
 1.84.6.2  28-Sep-2008  mjf Sync with HEAD.
 1.84.6.1  02-Jun-2008  mjf Sync with HEAD.
 1.87.10.1  19-Oct-2008  haad Sync with HEAD.
 1.87.6.1  18-Sep-2008  wrstuden Sync with wrstuden-revivesa-base-2.
 1.87.4.3  09-Oct-2010  yamt sync with head
 1.87.4.2  11-Aug-2010  yamt sync with head.
 1.87.4.1  04-May-2009  yamt sync with head.
 1.90.2.1  03-Mar-2009  skrll Sync with HEAD.
 1.91.6.2  31-May-2011  rmind sync with head
 1.91.6.1  05-Mar-2011  rmind sync with head
 1.91.4.3  06-Nov-2010  uebayasi Sync with HEAD.
 1.91.4.2  22-Oct-2010  uebayasi Sync with HEAD (-D20101022).
 1.91.4.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.96.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.97.18.3  18-May-2014  rmind sync with head
 1.97.18.2  28-Aug-2013  rmind Checkpoint work in progress:
- Initial split of the protocol user-request method into the following
methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq.
- Adjust socreate(9) and sonewconn(9) to call pr_attach without the
socket lock held (as a preparation for the locking scheme adjustment).
- Adjust all pr_attach routines to assert that PCB is not set.
- Sprinkle various comments, document some routines and their locking.
- Remove M_PCB, replace with kmem(9).
- Fix few bugs spotted on the way.
 1.97.18.1  17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.97.14.2  03-Dec-2017  jdolecek update from HEAD
 1.97.14.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.97.4.1  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.99.2.1  10-Aug-2014  tls Rebase.
 1.107.2.7  28-Aug-2017  skrll Sync with HEAD
 1.107.2.6  05-Feb-2017  skrll Sync with HEAD
 1.107.2.5  05-Oct-2016  skrll Sync with HEAD
 1.107.2.4  09-Jul-2016  skrll Sync with HEAD
 1.107.2.3  29-May-2016  skrll Sync with HEAD
 1.107.2.2  19-Mar-2016  skrll Sync with HEAD
 1.107.2.1  06-Jun-2015  skrll Sync with HEAD
 1.114.2.4  26-Apr-2017  pgoyette Sync with HEAD
 1.114.2.3  20-Mar-2017  pgoyette Sync with HEAD
 1.114.2.2  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.114.2.1  06-Aug-2016  pgoyette Sync with HEAD
 1.116.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.119.6.2  11-Feb-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #536):
distrib/sets/lists/base/shl.mi: 1.825
distrib/sets/lists/comp/mi: 1.2168-1.2169
distrib/sets/lists/comp/shl.mi: 1.310
distrib/sets/lists/debug/mi: 1.234
distrib/sets/lists/debug/shl.mi: 1.188
distrib/sets/lists/man/mi: 1.1570
distrib/sets/lists/tests/mi: 1.772
etc/mtree/NetBSD.dist.tests: 1.150
share/man/man4/Makefile: 1.650
share/man/man4/ipsec.4: 1.42-1.43
share/man/man4/ipsecif.4: 1.1-1.5
sys/arch/amd64/conf/ALL: 1.77
sys/arch/amd64/conf/GENERIC: 1.480
sys/conf/files: 1.1191
sys/net/Makefile: 1.34
sys/net/files.net: 1.14
sys/net/if.c: 1.404
sys/net/if.h: 1.248
sys/net/if_gif.c: 1.135
sys/net/if_ipsec.c: 1.1-1.3
sys/net/if_ipsec.h: 1.1
sys/net/if_l2tp.c: 1.16
sys/net/if_types.h: 1.28
sys/netinet/in.c: 1.214
sys/netinet/in.h: 1.103
sys/netinet/in_gif.c: 1.92
sys/netinet/ip_var.h: 1.122
sys/netinet6/in6.c: 1.257
sys/netinet6/in6.h: 1.88
sys/netinet6/in6_gif.c: 1.90
sys/netinet6/ip6_var.h: 1.75
sys/netipsec/Makefile: 1.6
sys/netipsec/files.netipsec: 1.13
sys/netipsec/ipsec.h: 1.62
sys/netipsec/ipsecif.c: 1.1
sys/netipsec/ipsecif.h: 1.1
sys/netipsec/key.c: 1.246-1.247
sys/netipsec/key.h: 1.34
sys/rump/net/Makefile.rumpnetcomp: 1.20
sys/rump/net/lib/libipsec/IPSEC.ioconf: 1.1
sys/rump/net/lib/libipsec/Makefile: 1.1
sys/rump/net/lib/libipsec/ipsec_component.c: 1.1
tests/net/Makefile: 1.34
tests/net/if_ipsec/Makefile: 1.1
tests/net/if_ipsec/t_ipsec.sh: 1.1-1.2
Don't touch an SP without a reference to it
unify processing to check nesting count for some tunnel protocols.
add ipsec(4) interface, which is used for route-based VPN.
man and ATF are added later, please see man for details.
reviewed by christos@n.o, joerg@n.o and ozaki-r@n.o, thanks.
https://mail-index.netbsd.org/tech-net/2017/12/18/msg006557.html
ipsec(4) interface supports rump now.
add ipsec(4) interface ATF.
add ipsec(4) interface man as ipsecif.4.
add ipsec(4) interface to amd64/GENERIC and amd64/ALL configs.
apply in{,6}_tunnel_validate() to gif(4).
Spell IPsec that way. Simplify macro usage. Sort SEE ALSO. Bump
date for previous.
Improve wording and macro use.
Some parts are not clear to me, so someone with knowledge of ipsecif(4)
should improve this some more.
Improve ipsecif.4. Default port ipsec(4) NAT-T is tested now.
pointed out by wiz@n.o and suggested by ozaki-r@n.o, thanks.
Change the prefix of test names to ipsecif_ to distinguish from tests for ipsec(4)
New sentence, new line. Remove empty macro.
Fix PR kern/52920. Pointed out by David Binderman, thanks.
Improve wording, and put a new drawing, from me and Kengo Nakahara.
apply a little more #ifdef INET/INET6. fixes !INET6 builds.
 1.119.6.1  21-Dec-2017  snj Pull up following revision(s) (requested by ryo in ticket #445):
distrib/sets/lists/debug/mi: revision 1.222
distrib/sets/lists/tests/mi: revision 1.760
share/man/man4/ip.4: revision 1.38
sys/netinet/in.c: revision 1.207
sys/netinet/in.h: revision 1.101
sys/netinet/in_pcb.c: revision 1.179
sys/netinet/in_pcb.h: revision 1.64
sys/netinet/ip_output.c: revision 1.284, 1.286
sys/netinet/ip_var.h: revision 1.120-1.121
sys/netinet/raw_ip.c: revision 1.166-1.167
sys/netinet/udp_usrreq.c: revision 1.235-1.236
sys/netinet/udp_var.h: revision 1.42
tests/net/net/Makefile: revision 1.21
tests/net/net/t_pktinfo_send.c: revision 1.1-1.2
Add support IP_PKTINFO for sendmsg(2).
The source address or output interface can be specified by adding IP_PKTINFO
to the control part of the message on a SOCK_DGRAM or SOCK_RAW socket.
Reviewed by ozaki-r@ and christos@. thanks.
--
As is the case with IPV6_PKTINFO, IP_PKTINFO can be sent without EADDRINUSE
even if the UDP address:port in use is specified.
 1.122.2.4  30-Sep-2018  pgoyette Ssync with HEAD
 1.122.2.3  28-Jul-2018  pgoyette Sync with HEAD
 1.122.2.2  16-Apr-2018  pgoyette Sync with HEAD, resolve some conflicts
 1.122.2.1  07-Apr-2018  pgoyette Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
 1.125.2.1  10-Jun-2019  christos Sync with HEAD
 1.130.2.1  03-Apr-2021  thorpej Sync with HEAD.
 1.134.10.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed