Home | History | Annotate | Download | only in netinet
History log of /src/sys/netinet/tcp_vtw.c
RevisionDateAuthorComments
 1.25  07-Oct-2024  jakllsch Allow CACHE_LINE_SIZE 256 with uint64_t fatp_word_t
 1.24  04-Nov-2022  ozaki-r branches: 1.24.8;
inpcb: rename functions to inpcb_*

Inspired by rmind-smpnet patches.
 1.23  28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.22  28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.21  13-Aug-2021  andvar fix typos in words "pointer" and s/fram /frame/
 1.20  01-Oct-2019  chs in many device attach paths, allocate memory with KM_SLEEP instead of KM_NOSLEEP
and remove code to handle failures that can no longer happen.
 1.19  03-May-2018  maxv branches: 1.19.2;
Remove now unused tcpip.h includes. Some were already unused before.
 1.18  01-Jun-2017  chs branches: 1.18.8;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.17  13-Dec-2016  ozaki-r Remove unnecessary inclusions of nd6.h
 1.16  28-Jul-2016  martin PR kern/51371: avoid shifting negative values
 1.15  26-Apr-2016  ozaki-r branches: 1.15.2;
Sweep unnecessary route.h inclusions
 1.14  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.13  31-Mar-2015  ozaki-r Remove unnecessary opt_ipsec.h inclusions
 1.12  10-Nov-2014  maxv branches: 1.12.2;
Do not uselessly include <sys/malloc.h>.
 1.11  05-Sep-2014  matt Don't use C++ keywords (class, template) as variables
 1.10  15-Sep-2013  martin branches: 1.10.4;
ifdef a variable like its use
 1.9  13-Apr-2012  yamt branches: 1.9.2; 1.9.4;
add a big comment
(copy and paste from cvs log rev.1.1)
 1.8  17-Jul-2011  joerg branches: 1.8.2; 1.8.6;
Retire varargs.h support. Move machine/stdarg.h logic into MI
sys/stdarg.h and expect compiler to provide proper builtins, defaulting
to the GCC interface. lint still has a special fallback.
Reduce abuse of _BSD_VA_LIST_ by defining __va_list by default and
derive va_list as required by standards.
 1.7  06-Jun-2011  dyoung Don't allocate resources for vtw until/unless it is enabled. This will
further help those machines where memory is in short supply.

TBD: release resources after vtw is disabled and all entries have
expired.
 1.6  03-Jun-2011  dyoung branches: 1.6.2;
Don't sleep until memory becomes available.

Use kmem_zalloc() instead of kmem_alloc() + bzero().

During initialization, try to get all of the memory we need for the
vestigial time-wait structures before we set any of the structures up,
and if any single allocation fails, release all of the memory.

This should help low-memory hosts. A much better fix postpones
allocating any memory until vtw is enabled through the sysctl.
 1.5  03-Jun-2011  dyoung Defer scheduling vtw_tick() and setting the vtw hooks until
vtw_control() is called. In this way, vtw_tick() will be re-scheduled
repeatedly while vtw is in use.

Pay tcp_vtw_was_enabled no attention in vtw_earlyinit(), since it's
always going to be 0 during initialization.
 1.4  17-May-2011  dholland branches: 1.4.2; 1.4.4;
typo in comment
 1.3  11-May-2011  drochner use getmicrouptime(9) rather than microtime(9) for TIME_WAIT duration
calculation, because this doesn't get confused by system time changes,
and uses less CPU cycles
reviewed by dyoung
 1.2  06-May-2011  drochner remove an empty function
 1.1  03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.4.4.1  23-Jun-2011  cherry Catchup with rmind-uvmplock merge.
 1.4.2.3  12-Jun-2011  rmind sync with head
 1.4.2.2  31-May-2011  rmind sync with head
 1.4.2.1  17-May-2011  rmind file tcp_vtw.c was added on branch rmind-uvmplock on 2011-05-31 03:05:08 +0000
 1.6.2.2  06-Jun-2011  jruoho Sync with HEAD.
 1.6.2.1  03-Jun-2011  jruoho file tcp_vtw.c was added on branch jruoho-x86intr on 2011-06-06 09:09:57 +0000
 1.8.6.1  29-Apr-2012  mrg sync to latest -current.
 1.8.2.2  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.8.2.1  17-Apr-2012  yamt sync with head
 1.9.4.3  18-May-2014  rmind sync with head
 1.9.4.2  23-Sep-2013  rmind - Add some initial locking to the IPv4 PCB.
- Rename inpcb_lookup_*() routines to be more accurate and add comments.
- Add some comments about connection life-cycle WRT socket layer.
 1.9.4.1  17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.9.2.2  03-Dec-2017  jdolecek update from HEAD
 1.9.2.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.10.4.1  17-Jan-2015  martin Pull up following revision(s) (requested by maxv in ticket #427):
sys/compat/svr4/svr4_schedctl.c: revision 1.8
sys/netinet/tcp_timer.c: revision 1.88
sys/miscfs/genfs/layer_vfsops.c: revision 1.45
sys/compat/svr4/svr4_ioctl.c: revision 1.37
sys/ufs/chfs/chfs_vfsops.c: revision 1.14
sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91
sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30
sys/compat/common/kern_time_50.c: revision 1.28
sys/netinet6/ip6_forward.c: revision 1.74
sys/miscfs/umapfs/umap_vnops.c: revision 1.57
sys/compat/svr4/svr4_fcntl.c: revision 1.74
distrib/sets/lists/comp/mi: revision 1.1931
sys/netinet6/udp6_output.c: revision 1.46
sys/fs/puffs/puffs_compat.c: revision 1.3
sys/fs/udf/udf_rename.c: revision 1.11
sys/compat/svr4/svr4_filio.c: revision 1.24
sys/fs/udf/udf_rename.c: revision 1.12
sys/netinet/tcp_usrreq.c: revision 1.202
sys/miscfs/umapfs/umap_subr.c: revision 1.29
sys/compat/linux/common/linux_fadvise64.c: revision 1.3
sys/netinet/if_atm.c: revision 1.34
sys/miscfs/procfs/procfs_subr.c: revision 1.106
sys/miscfs/genfs/layer_subr.c: revision 1.37
sys/netinet/tcp_sack.c: revision 1.30
sys/compat/freebsd/freebsd_misc.c: revision 1.33
sys/compat/freebsd/freebsd_file.c: revision 1.33
sys/ufs/chfs/chfs_vnode.c: revision 1.12
sys/compat/svr4/svr4_ttold.c: revision 1.34
sys/compat/linux/common/linux_file.c: revision 1.114
sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43
sys/compat/linux/common/linux_signal.c: revision 1.76
sys/compat/common/compat_util.c: revision 1.46
sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18
sys/compat/svr4/svr4_sockio.c: revision 1.36
sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32
sys/compat/svr4/svr4_signal.c: revision 1.66
sys/kern/kern_exec.c: revision 1.410
sys/fs/puffs/puffs_vfsops.c: revision 1.115
sys/compat/svr4/svr4_exec_elf64.c: revision 1.15
sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159
sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50
sys/compat/linux32/common/linux32_misc.c: revision 1.24
sys/netinet/in_pcb.c: revision 1.153
sys/sys/malloc.h: revision 1.116
sys/compat/common/if_43.c: revision 1.9
share/man/man9/Makefile: revision 1.380
sys/netinet/tcp_vtw.c: revision 1.12
sys/miscfs/umapfs/umap_vfsops.c: revision 1.95
sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186
sys/compat/common/uipc_syscalls_43.c: revision 1.46
sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115
sys/fs/puffs/puffs_msgif.c: revision 1.97
sys/compat/svr4/svr4_ipc.c: revision 1.27
sys/compat/linux/common/linux_exec.c: revision 1.117
sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66
sys/netinet/tcp_output.c: revision 1.179
sys/compat/svr4/svr4_termios.c: revision 1.28
sys/fs/udf/udf_strat_bootstrap.c: revision 1.4
sys/fs/puffs/puffs_subr.c: revision 1.67
sys/fs/puffs/puffs_node.c: revision 1.36
sys/miscfs/overlay/overlay_vnops.c: revision 1.21
sys/fs/cd9660/cd9660_node.c: revision 1.34
sys/netinet/raw_ip.c: revision 1.146
sys/sys/mallocvar.h: revision 1.13
sys/miscfs/overlay/overlay_vfsops.c: revision 1.63
share/man/man9/malloc.9: revision 1.50
sys/netinet6/dest6.c: revision 1.18
sys/compat/linux/common/linux_uselib.c: revision 1.33
sys/compat/linux/common/linux_socket.c: revision 1.120
share/man/man9/malloc.9: revision 1.51
sys/netinet/tcp_subr.c: revision 1.257
sys/compat/linux/common/linux_socketcall.c: revision 1.45
sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3
sys/compat/freebsd/freebsd_ipc.c: revision 1.17
sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109
sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17
sys/netinet6/in6_pcb.c: revision 1.132
sys/netinet6/in6_ifattach.c: revision 1.94
sys/compat/svr4/svr4_exec_elf32.c: revision 1.15
sys/miscfs/nullfs/null_vfsops.c: revision 1.90
sys/fs/cd9660/cd9660_util.c: revision 1.12
sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48
sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20
sys/miscfs/procfs/procfs_vfsops.c: revision 1.94
sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28
sys/compat/linux/common/linux_sched.c: revision 1.67
sys/compat/linux/common/linux_exec_aout.c: revision 1.67
sys/compat/linux/common/linux_pipe.c: revision 1.67
sys/compat/linux/common/linux_llseek.c: revision 1.34
sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10
Do not uselessly include <sys/malloc.h>.
Cleanup:
- remove struct kmembuckets (dead)
- correctly deadify MALLOC_XX
- remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead)
- remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT()
and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc
New sentence, new line. Bump date for previous.
Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9)
man pages.
 1.12.2.6  28-Aug-2017  skrll Sync with HEAD
 1.12.2.5  05-Feb-2017  skrll Sync with HEAD
 1.12.2.4  05-Oct-2016  skrll Sync with HEAD
 1.12.2.3  29-May-2016  skrll Sync with HEAD
 1.12.2.2  22-Sep-2015  skrll Sync with HEAD
 1.12.2.1  06-Apr-2015  skrll Sync with HEAD
 1.15.2.2  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.15.2.1  06-Aug-2016  pgoyette Sync with HEAD
 1.18.8.1  21-May-2018  pgoyette Sync with HEAD
 1.19.2.1  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.24.8.1  02-Aug-2025  perseant Sync with HEAD

RSS XML Feed