Home | History | Annotate | Download | only in netinet
History log of /src/sys/netinet/files.netinet
RevisionDateAuthorComments
 1.30  20-Sep-2022  ozaki-r tcp: separate syn cache stuffs into tcp_syncache.[ch] files

No functional change.
 1.29  08-Mar-2021  christos remove now unused pseudo-random ip id code.
 1.28  29-Jul-2017  maxv branches: 1.28.16;
Remove TCP_COMPAT_42.
 1.27  13-Oct-2015  rjs Add core networking support for SCTP.
 1.26  10-Feb-2015  rjs Add DCCP protocol support from KAME.
 1.25  02-Dec-2014  christos add routines to print in_addr and sockaddr_in (in_print and sin_print)
 1.24  25-Jun-2012  christos branches: 1.24.2; 1.24.16;
rename rfc6056 -> portalgo, requested by yamt
 1.23  24-Sep-2011  christos branches: 1.23.2;
Add inet4 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.22  03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.21  13-Jul-2010  rmind branches: 1.21.2;
Split-off IPv4 re-assembly mechanism into a separate module. Abstract
into ip_reass_init(), ip_reass_lookup(), etc (note: abstraction is not
yet complete). No functional changes to the actual mechanism.

OK matt@
 1.20  25-Jan-2008  joerg branches: 1.20.10; 1.20.30; 1.20.32;
Refactor in_cksum/in4_cksum/in6_cksum implementations:
- All three functions are included in the kernel by default.
They call a backend function cpu_in_cksum after possibly
computing the checksum of the pseudo header.
- cpu_in_cksum is the core to implement the one-complement sum.
The default implementation is moderate fast on most platforms
and provides a 32bit accumulator with 16bit addends for L32 platforms
and a 64bit accumulator with 32bit addends for L64 platforms.
It handles edge cases like very large mbuf chains (could happen with
native IPv6 in the future) and provides a good base for new native
implementations.
- Modify i386 and amd64 assembly to use the new interface.

This disables the MD implementations on !x86 until the conversion is
done. For Alpha, the portable version is faster.
 1.19  02-May-2007  dyoung branches: 1.19.8; 1.19.14;
Remove obsolete files netinet/in_route.[ch].
 1.18  02-May-2007  dyoung Remove unused option.
 1.17  09-Dec-2006  dyoung branches: 1.17.2; 1.17.6; 1.17.8;
Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.16  25-Nov-2006  yamt move tso-by-software code to their own files. no functional changes.
 1.15  23-Nov-2006  tron Backout accidental commit which broke kernel builds.
 1.14  23-Nov-2006  rpaulo New EtherIP driver based on tap(4) and gif(4) by Hans Rosenfeld.
Notable changes:
* Fixes PR 34268.
* Separates the code from gif(4) (which is more cleaner).
* Allows the usage of STP (Spanning Tree Protocol).
* Removed EtherIP implementation from gif(4)/tap(4).

Some input from Christos.
 1.13  13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.12  09-Oct-2006  rpaulo Modular (I tried ;-) TCP congestion control API. Whenever certain conditions
happen in the TCP stack, this interface calls the specified callback to
handle the situation according to the currently selected congestion
control algorithm.
A new sysctl node was created: net.inet.tcp.congctl.{available,selected}
with obvious meanings.
The old net.inet.tcp.newreno MIB was removed.
The API is discussed in tcp_congctl(9).

In the near future, it will be possible to selected a congestion control
algorithm on a per-socket basis.

Discussed on tech-net and reviewed by <yamt>.
 1.11  11-Dec-2005  christos branches: 1.11.20; 1.11.22;
merge ktrace-lwp.
 1.10  28-Feb-2005  jonathan branches: 1.10.4;
Commit TCP SACK patches from Kentaro A. Karahone's patch at:
http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz

Fixes in that patch for pre-existing TCP pcb initializations were already
committed to NetBSD-current, so are not included in this commit.

The SACK patch has been observed to correctly negotiate and respond,
to SACKs in wide-area traffic.

There are two indepenently-observed, as-yet-unresolved anomalies:
First, seeing unexplained delays between in fast retransmission
(potentially explainable by an 0.2sec RTT between adjacent
ethernet/wifi NICs); and second, peculiar and unepxlained TCP
retransmits observed over an ath0 card.

After discussion with several interested developers, I'm committing
this now, as-is, for more eyes to use and look over. Current hypothesis
is that the anomalies above may in fact be due to link/level (hardware,
driver, HAL, firmware) abberations in the test setup, affecting both
Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.
 1.9  13-Jan-2005  drochner branches: 1.9.2; 1.9.4;
compile tcp_debug.c only if the TCP_DEBUG option is set,
and remove the "#ifdef TCP_DEBUG" around everything
 1.8  04-Sep-2004  manu IPv4 PIM support, based on a submission from Pavlin Radoslavov posted on
tech-net@
 1.7  01-May-2004  matt defflag TCP_OUTPUT_COUNTERS and TCP_REASS_COUNTERS
 1.6  25-Apr-2004  jonathan Initial commit of a port of the FreeBSD implementation of RFC 2385
(MD5 signatures for TCP, as used with BGP). Credit for original
FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship
credited to sentex.net. Shortening of the setsockopt() name
attributed to Vincent Jardin.

This commit is a minimal, working version of the FreeBSD code, as
MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp
modified to set the TCP-MD5 option; BMS's additions to tcpdump-current
(tcpdump -M) confirm that the MD5 signatures are correct. Committed
as-is for further testing between a NetBSD BGP speaker (e.g., quagga)
and industry-standard BGP speakers (e.g., Cisco, Juniper).


NOTE: This version has two potential flaws. First, I do see any code
that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5
options are internally padded and assumed to be 32-bit aligned. A more
space-efficient scheme is to pack all TCP options densely (and
possibly unaligned) into the TCP header ; then do one final padding to
a 4-byte boundary. Pre-existing comments note that accounting for
TCP-option space when we add SACK is yet to be done. For now, I'm
punting on that; we can solve it properly, in a way that will handle
SACK blocks, as a separate exercise.

In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c
,and modifies:

sys/net/pfkeyv2.h,v 1.15
sys/netinet/files.netinet,v 1.5
sys/netinet/ip.h,v 1.25
sys/netinet/tcp.h,v 1.15
sys/netinet/tcp_input.c,v 1.200
sys/netinet/tcp_output.c,v 1.109
sys/netinet/tcp_subr.c,v 1.165
sys/netinet/tcp_usrreq.c,v 1.89
sys/netinet/tcp_var.h,v 1.109
sys/netipsec/files.netipsec,v 1.3
sys/netipsec/ipsec.c,v 1.11
sys/netipsec/ipsec.h,v 1.7
sys/netipsec/key.c,v 1.11
share/man/man4/tcp.4,v 1.16
lib/libipsec/pfkey.c,v 1.20
lib/libipsec/pfkey_dump.c,v 1.17
lib/libipsec/policy_token.l,v 1.8
sbin/setkey/parse.y,v 1.14
sbin/setkey/setkey.8,v 1.27
sbin/setkey/token.l,v 1.15

Note that the preceding two revisions to tcp.4 will be
required to cleanly apply this diff.
 1.5  26-Nov-2003  itojun always compile ip_id.c
 1.4  26-Nov-2003  itojun define RANDOM_IP_ID by default (unifdef -DRANDOM_IP_ID).
one use remains in sys/netipsec, which is kept for freebsd source code compat.
 1.3  17-Nov-2003  jonathan Revert the (default) ip_id algorithm to the pre-randomid algorithm,
due to demonstrated low-period repeated IDs from the randomized IP_id
code. Consensus is that the low-period repetition (much less than
2^15) is not suitable for general-purpose use.

Allocators of new IPv4 IDs should now call the function ip_newid().
Randomized IP_ids is now a config-time option, "options RANDOM_IP_ID".
ip_newid() can use ip_random-id()_IP_ID if and only if configured
with RANDOM_IP_ID. A sysctl knob should be provided.

This API may be reworked in the near future to support linear ip_id
counters per (src,dst) IP-address pair.
 1.2  06-Sep-2003  itojun randomize IPv4/v6 fragment ID and IPv6 flowlabel. avoids predictability
of these fields. ip_id.c is from openbsd. ip6_id.c is adapted by kame.
 1.1  10-Oct-2002  thorpej branches: 1.1.2; 1.1.8;
Move netinet, netinet6, ipsec, and ipfilter config defns to
netinet/files.ipfilter, etinet/files.netinet, netinet6/files.netinet6,
and netinet6/files.netipsec.

XXX There are still a few stragglers in conf/files, which are entangled
with other network protocols.
 1.1.8.5  04-Mar-2005  skrll Sync with HEAD.

Hi Perry!
 1.1.8.4  17-Jan-2005  skrll Sync with HEAD.
 1.1.8.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.1.8.2  18-Sep-2004  skrll Sync with HEAD.
 1.1.8.1  03-Aug-2004  skrll Sync with HEAD
 1.1.2.2  18-Oct-2002  nathanw Catch up to -current.
 1.1.2.1  10-Oct-2002  nathanw file files.netinet was added on branch nathanw_sa on 2002-10-18 02:45:16 +0000
 1.9.4.1  19-Mar-2005  yamt sync with head. xen and whitespace. xen part is not finished.
 1.9.2.1  29-Apr-2005  kent sync with -current
 1.10.4.3  04-Feb-2008  yamt sync with head.
 1.10.4.2  03-Sep-2007  yamt sync with head.
 1.10.4.1  30-Dec-2006  yamt sync with head.
 1.11.22.2  10-Dec-2006  yamt sync with head.
 1.11.22.1  22-Oct-2006  yamt sync with head
 1.11.20.2  12-Jan-2007  ad Sync with head.
 1.11.20.1  18-Nov-2006  ad Sync with head.
 1.17.8.1  11-Jul-2007  mjf Sync with head.
 1.17.6.1  08-Jun-2007  ad Sync with head.
 1.17.2.1  07-May-2007  yamt sync with head.
 1.19.14.1  18-Feb-2008  mjf Sync with HEAD.
 1.19.8.1  23-Mar-2008  matt sync with HEAD
 1.20.32.2  31-May-2011  rmind sync with head
 1.20.32.1  05-Mar-2011  rmind sync with head
 1.20.30.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.20.10.1  11-Aug-2010  yamt sync with head.
 1.21.2.1  06-Jun-2011  jruoho Sync with HEAD.
 1.23.2.1  30-Oct-2012  yamt sync with head
 1.24.16.3  28-Aug-2017  skrll Sync with HEAD
 1.24.16.2  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.24.16.1  06-Apr-2015  skrll Sync with HEAD
 1.24.2.1  03-Dec-2017  jdolecek update from HEAD
 1.28.16.1  03-Apr-2021  thorpej Sync with HEAD.

RSS XML Feed