History log of /src/sys/netinet/tcp_var.h |
Revision | | Date | Author | Comments |
1.199 |
| 03-Dec-2024 |
andvar | s/packlets/packets/ in comment.
|
1.198 |
| 28-Oct-2022 |
ozaki-r | inpcb: integrate data structures of PCB into one
Data structures of network protocol control blocks (PCBs), i.e., struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of the data structures have to handle them separately and thus the code is cluttered and duplicated.
The commit integrates the data structures into one, struct inpcb. As a result, users of PCBs only have to handle just one data structure, so the code becomes simple.
One drawback is that the data size of PCB for IPv4 increases by 40 bytes (from 248 bytes to 288 bytes).
|
1.197 |
| 20-Sep-2022 |
ozaki-r | tcp: separate syn cache stuffs into tcp_syncache.[ch] files
No functional change.
|
1.196 |
| 31-Jul-2021 |
andvar | s/threshhold/threshold
|
1.195 |
| 08-Mar-2021 |
christos | branches: 1.195.4; Remove the unused "addin" argument (it was always 0) and go back using a random iss by default (instead of rfc1948)
|
1.194 |
| 03-Feb-2021 |
roy | Sprinkle CTASSERT to enforce on-wire layout without __packed
|
1.193 |
| 03-Feb-2021 |
roy | Remove __packed from various network structures
They are already network aligned and adding the __packed attribute just causes needless compiler warnings about accssing members of packed objects.
|
1.192 |
| 05-Mar-2020 |
riastradh | branches: 1.192.4; Revert "Include opt_diagnostic.h for DIAGNOSTIC."
This did not do what I thought it did. opt_diagnostic.h is only for the unused _DIAGNOSTIC, which seems like an abortive attempt to incrementally convert DIAGNOSTIC to an opt_*.h option rather than a command-line option.
|
1.191 |
| 05-Mar-2020 |
riastradh | Include opt_diagnostic.h for DIAGNOSTIC.
...at least, in header files, which may not have already included libkern.h.
|
1.190 |
| 27-Dec-2018 |
maxv | Remove unused arguments.
|
1.189 |
| 14-Sep-2018 |
maxv | Use non-variadic function pointer in protosw::pr_input.
|
1.188 |
| 03-Sep-2018 |
riastradh | Rename min/max -> uimin/uimax for better honesty.
These functions are defined on unsigned int. The generic name min/max should not silently truncate to 32 bits on 64-bit systems. This is purely a name change -- no functional change intended.
HOWEVER! Some subsystems have
#define min(a, b) ((a) < (b) ? (a) : (b)) #define max(a, b) ((a) > (b) ? (a) : (b))
even though our standard name for that is MIN/MAX. Although these may invite multiple evaluation bugs, these do _not_ cause integer truncation.
To avoid `fixing' these cases, I first changed the name in libkern, and then compile-tested every file where min/max occurred in order to confirm that it failed -- and thus confirm that nothing shadowed min/max -- before changing it.
I have left a handful of bootloaders that are too annoying to compile-test, and some dead code:
cobalt ews4800mips hp300 hppa ia64 luna68k vax acorn32/if_ie.c (not included in any kernels) macppc/if_gm.c (superseded by gem(4))
It should be easy to fix the fallout once identified -- this way of doing things fails safe, and the goal here, after all, is to _avoid_ silent integer truncations, not introduce them.
Maybe one day we can reintroduce min/max as type-generic things that never silently truncate. But we should avoid doing that for a while, so that existing code has a chance to be detected by the compiler for conversion to uimin/uimax without changing the semantics until we can properly audit it all. (Who knows, maybe in some cases integer truncation is actually intended!)
|
1.187 |
| 22-Aug-2018 |
msaitoh | - Cleanup for dynamic sysctl: - Remove unused *_NAMES macros for sysctl. - Remove unused *_MAXID for sysctls. - Move CTL_MACHDEP sysctl definitions for m68k into m68k/include/cpu.h and use them on all m68k machines.
|
1.186 |
| 29-Apr-2018 |
maxv | branches: 1.186.2; Move struct tcpiphdr from tcpip.h to tcp_var.h, to match UDP (udpiphdr in udp_var.h).
tcpip.h is now empty, and can be removed.
|
1.185 |
| 28-Mar-2018 |
maxv | Remove two unused args from syn_cache_get().
|
1.184 |
| 12-Feb-2018 |
maxv | branches: 1.184.2; Remove unused argument from tcp_signature_getsav.
|
1.183 |
| 12-Feb-2018 |
maxv | Remove the 'm' argument from syn_cache_respond(); all it does with it is freeing it, so free in the caller instead.
|
1.182 |
| 19-Jan-2018 |
ozaki-r | Run tcp_slowtimo in workqueue if NET_MPSAFE
If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as much as possible to prevent any softint handlers including callout handlers such as tcp_slowtimo from sticking on softnet_lock because it results in undesired delays of executing subsequent softint handlers.
NFCI for !NET_MPSAFE
|
1.181 |
| 15-Nov-2017 |
ozaki-r | Make syn_cache_timer static
|
1.180 |
| 31-Jul-2017 |
maxv | Fix TCPCTL_NAMES, and remove TCPCTL_VARIABLES.
|
1.179 |
| 28-Jul-2017 |
maxv | Remove TCP_COMPAT_42. This feature is a workaround for a bug in the TCP stack of BSD4.2. Having such features just does not make any sense, and looking at the code, I'm not sure it actually works.
|
1.178 |
| 07-Jul-2017 |
ozaki-r | Rename key_alloc* functions (NFC)
We shouldn't use the term "alloc" for functions that just look up data and actually don't allocate memory.
|
1.177 |
| 14-Feb-2015 |
he | branches: 1.177.10; Change the new counter variables in struct tcpcb to uint32_t, as per christos' comments.
|
1.176 |
| 14-Feb-2015 |
he | Port over the TCP_INFO socket option from FreeBSD, originally from the Linux 2.6 TCP API. This permits the caller to query certain information about a TCP connection, and is used by pkgsrc's net/iperf3 test program if available.
This extends struct tcbcb with three fields to count retransmits, out-of-sequence receives and zero window announcements, and will therefore warrant a kernel revision bump (done separately).
|
1.175 |
| 31-Jul-2014 |
rtr | branches: 1.175.2; 1.175.4; split PRU_DISCONNECT, PRU_SHUTDOWN and PRU_ABORT function out of pr_generic() usrreq switches and put into separate functions
xxx_disconnect(struct socket *) xxx_shutdown(struct socket *) xxx_abort(struct socket *)
- always KASSERT(solocked(so)) even if not implemented - replace calls to pr_generic() with req = PRU_{DISCONNECT,SHUTDOWN,ABORT} with calls to pr_{disconnect,shutdown,abort}() respectively
rename existing internal functions used to implement above functionality to permit use of the names for xxx_{disconnect,shutdown,abort}().
- {l2cap,sco,rfcomm}_disconnect() -> {l2cap,sco,rfcomm}_disconnect_pcb() - {unp,rip,tcp}_disconnect() -> {unp,rip,tcp}_disconnect1() - unp_shutdown() -> unp_shutdown1()
patch reviewed by rmind
|
1.174 |
| 19-May-2014 |
rmind | - Split off PRU_ATTACH and PRU_DETACH logic into separate functions. - Replace malloc with kmem and eliminate M_PCB while here. - Sprinkle more asserts.
|
1.173 |
| 18-May-2014 |
rmind | Add struct pr_usrreqs with a pr_generic function and prepare for the dismantling of pr_usrreq in the protocols; no functional change intended. PRU_ATTACH/PRU_DETACH changes will follow soon.
Bump for struct protosw. Welcome to 6.99.62!
|
1.172 |
| 02-Jan-2014 |
pooka | branches: 1.172.2; Allow kernels compiled with INET+INET6 to be booted as IPv4-only or IPv6-only.
|
1.171 |
| 12-Nov-2013 |
kefren | * implement TCP CUBIC congestion control algorithm * move tcp_sack_newack bits inside reno and newreno_fast_retransmit_newack * notify ECN peer about cwnd shrink in [new]reno_slow_retransmit
Based on the patch proposed on tech-net@ on Nov 7 with minor improvments: * adapt wmax for no-fast convergence case * correct cbrt calculation for big window sizes (>750KB)
|
1.170 |
| 10-Apr-2013 |
christos | branches: 1.170.4; Limit the tcp initial window setting to 10, leaving it by default to 4 and simplifying the code in process. Per draft-ietf-initcwnd-08.txt.
|
1.169 |
| 02-Feb-2012 |
tls | branches: 1.169.6; Entropy-pool implementation move and cleanup.
1) Move core entropy-pool code and source/sink/sample management code to sys/kern from sys/dev.
2) Remove use of NRND as test for presence of entropy-pool code throughout source tree.
3) Remove use of RND_ENABLED in device drivers as microoptimization to avoid expensive operations on disabled entropy sources; make the rnd_add calls do this directly so all callers benefit.
4) Fix bug in recent rnd_add_data()/rnd_add_uint32() changes that might have lead to slight entropy overestimation for some sources.
5) Add new source types for environmental sensors, power sensors, VM system events, and skew between clocks, with a sample implementation for each.
ok releng to go in before the branch due to the difficulty of later pullup (widespread #ifdef removal and moved files). Tested with release builds on amd64 and evbarm and live testing on amd64.
|
1.168 |
| 31-Oct-2011 |
yamt | branches: 1.168.2; 1.168.6; tcp_reass_unlock: assertion
|
1.167 |
| 25-May-2011 |
gdt | Add comment urging a separation of TCP_RTT_SHIFT into separate defines describing the EWMA calculation and the storage representation. (No code change.)
|
1.166 |
| 03-May-2011 |
dyoung | Reduces the resources demanded by TCP sessions in TIME_WAIT-state using methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime Truncation (MSLT).
MSLT and VTW were contributed by Coyote Point Systems, Inc.
Even after a TCP session enters the TIME_WAIT state, its corresponding socket and protocol control blocks (PCBs) stick around until the TCP Maximum Segment Lifetime (MSL) expires. On a host whose workload necessarily creates and closes down many TCP sockets, the sockets & PCBs for TCP sessions in TIME_WAIT state amount to many megabytes of dead weight in RAM.
Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to a class based on the nearness of the peer. Corresponding to each class is an MSL, and a session uses the MSL of its class. The classes are loopback (local host equals remote host), local (local host and remote host are on the same link/subnet), and remote (local host and remote host communicate via one or more gateways). Classes corresponding to nearer peers have lower MSLs by default: 2 seconds for loopback, 10 seconds for local, 60 seconds for remote. Loopback and local sessions expire more quickly when MSLT is used.
Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket dead weight with a compact representation of the session, called a "vestigial PCB". VTW data structures are designed to be very fast and memory-efficient: for fast insertion and lookup of vestigial PCBs, the PCBs are stored in a hash table that is designed to minimize the number of cacheline visits per lookup/insertion. The memory both for vestigial PCBs and for elements of the PCB hashtable come from fixed-size pools, and linked data structures exploit this to conserve memory by representing references with a narrow index/offset from the start of a pool instead of a pointer. When space for new vestigial PCBs runs out, VTW makes room by discarding old vestigial PCBs, oldest first. VTW cooperates with MSLT.
It may help to think of VTW as a "FIN cache" by analogy to the SYN cache.
A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT sessions as fast as it can is approximately 17% idle when VTW is active versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM when VTW is active (approximately 64k vestigial PCBs are created) than when it is inactive.
|
1.165 |
| 03-May-2011 |
dyoung | *_drain() routines may be called with locks held, so instead of doing any work in *_drain(), set a drain-needed flag. Do the work in the fasttimo handler.
Contributed by Coyote Point Systems, Inc.
|
1.164 |
| 20-Apr-2011 |
gdt | Rewrite comments about TCP RTO calculations.
Long ago, the storage representations of srtt and rttvar were changed from the 4.4BSD scheme, and the comments are out of sync with the code. This commit rewrites most of the comments that explain the RTO calculations, and points out some issues in the code.
Joint work with Bev Schwartz of BBN (original analysis and comments), but I have rewritten and extended them, so errors are mine.
This material is based upon work supported by the Defense Advanced Research Projects Agency and Space and Naval Warfare Systems Center, Pacific, under Contract No. N66001-09-C-2073. Approved for Public Release, Distribution Unlimited
|
1.163 |
| 14-Apr-2011 |
yamt | comments
|
1.162 |
| 16-Sep-2009 |
pooka | branches: 1.162.4; 1.162.6; Replace a large number of link set based sysctl node creations with calls from subsystem constructors. Benefits both future kernel modules and rump.
no change to sysctl nodes on i386/MONOLITHIC & build tested i386/ALL
|
1.161 |
| 09-Sep-2009 |
darran | Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl. Okayed by tls@.
|
1.160 |
| 27-May-2009 |
pooka | POOL_INIT -> pool_init
|
1.159 |
| 29-Jan-2009 |
pooka | branches: 1.159.2; stinkset purge: POOL_INIT -> pool_init also, make the syncache pool static in scope
|
1.158 |
| 06-Aug-2008 |
plunky | branches: 1.158.2; 1.158.4; 1.158.10; Convert socket options code to use a sockopt structure instead of laying everything into an mbuf.
approved by core
|
1.157 |
| 28-Apr-2008 |
martin | branches: 1.157.2; 1.157.6; Remove clause 3 and 4 from TNF licenses
|
1.156 |
| 24-Apr-2008 |
ad | branches: 1.156.2; Merge the socket locking patch:
- Socket layer becomes MP safe. - Unix protocols become MP safe. - Allows protocol processing interrupts to safely block on locks. - Fixes a number of race conditions.
With much feedback from matt@ and plunky@.
|
1.155 |
| 12-Apr-2008 |
thorpej | branches: 1.155.2; Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated when the user requests them via sysctl.
|
1.154 |
| 08-Apr-2008 |
thorpej | Change TCP stats from a structure to an array of uint64_t's.
Note: This is ABI-compatible with the old tcpstat structure; old netstat binaries will continue to work properly.
|
1.153 |
| 29-Feb-2008 |
matt | Rework tcp congctl selection code so that the congctl entries can be const. Don't access tcp_congctl stuff outside of tcp_congctl.c, use routines to update t_congctl. This code is slightly now more complicated.
|
1.152 |
| 27-Feb-2008 |
matt | Convert stragglers to ansi definitions from old-style definitons. Remember that func() is not ansi, func(void) is.
|
1.151 |
| 25-Dec-2007 |
perry | branches: 1.151.2; 1.151.6; Convert many of the uses of __attribute__ to equivalent __packed, __unused and __dead macros from cdefs.h
|
1.150 |
| 02-Aug-2007 |
rmind | branches: 1.150.4; 1.150.10; 1.150.12; 1.150.16; 1.150.20; TCP socket buffers automatic sizing - ported from FreeBSD. http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html
! Disabled by default, marked as experimental. Testers are very needed. ! Someone should thoroughly test this, and improve if possible.
Discussed on <tech-net>: http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html Thanks Greg Troxel for comments.
OK by the long silence on <tech-net>.
|
1.149 |
| 09-Jul-2007 |
ad | branches: 1.149.2; Merge some of the less invasive changes from the vmlocking branch:
- kthread, callout, devsw API changes - select()/poll() improvements - miscellaneous MT safety improvements
|
1.148 |
| 25-Jun-2007 |
christos | tcpdrop kernel bits (from anon ymous)
|
1.147 |
| 20-Jun-2007 |
christos | - per socket keepalive settings - settable connection establishment timeout
|
1.146 |
| 02-May-2007 |
dyoung | Eliminate address family-specific route caches (struct route, struct route_in6, struct route_iso), replacing all caches with a struct route.
The principle benefit of this change is that all of the protocol families can benefit from route cache-invalidation, which is necessary for correct routing. Route-cache invalidation fixes an ancient PR, kern/3508, at long last; it fixes various other PRs, also.
Discussions with and ideas from Joerg Sonnenberger influenced this work tremendously. Of course, all design oversights and bugs are mine.
DETAILS
1 I added to each address family a pool of sockaddrs. I have introduced routines for allocating, copying, and duplicating, and freeing sockaddrs:
struct sockaddr *sockaddr_alloc(sa_family_t af, int flags); struct sockaddr *sockaddr_copy(struct sockaddr *dst, const struct sockaddr *src); struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags); void sockaddr_free(struct sockaddr *sa);
sockaddr_alloc() returns either a sockaddr from the pool belonging to the specified family, or NULL if the pool is exhausted. The returned sockaddr has the right size for that family; sa_family and sa_len fields are initialized to the family and sockaddr length---e.g., sa_family = AF_INET and sa_len = sizeof(struct sockaddr_in). sockaddr_free() puts the given sockaddr back into its family's pool.
sockaddr_dup() and sockaddr_copy() work analogously to strdup() and strcpy(), respectively. sockaddr_copy() KASSERTs that the family of the destination and source sockaddrs are alike.
The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is passed directly to pool_get(9).
2 I added routines for initializing sockaddrs in each address family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(), etc. They are fairly self-explanatory.
3 structs route_in6 and route_iso are no more. All protocol families use struct route. I have changed the route cache, 'struct route', so that it does not contain storage space for a sockaddr. Instead, struct route points to a sockaddr coming from the pool the sockaddr belongs to. I added a new method to struct route, rtcache_setdst(), for setting the cache destination:
int rtcache_setdst(struct route *, const struct sockaddr *);
rtcache_setdst() returns 0 on success, or ENOMEM if no memory is available to create the sockaddr storage.
It is now possible for rtcache_getdst() to return NULL if, say, rtcache_setdst() failed. I check the return value for NULL everywhere in the kernel.
4 Each routing domain (struct domain) has a list of live route caches, dom_rtcache. rtflushall(sa_family_t af) looks up the domain indicated by 'af', walks the domain's list of route caches and invalidates each one.
|
1.145 |
| 04-Mar-2007 |
christos | branches: 1.145.2; 1.145.4; Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
|
1.144 |
| 17-Feb-2007 |
dyoung | KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous parentheses in return statements.
Cosmetic: don't open-code TAILQ_FOREACH().
Cosmetic: change types of variables to avoid oodles of casts: in in6_src.c, avoid casts by changing several route_in6 pointers to struct route pointers. Remove unnecessary casts to caddr_t elsewhere.
Pave the way for eliminating address family-specific route caches: soon, struct route will not embed a sockaddr, but it will hold a reference to an external sockaddr, instead. We will set the destination sockaddr using rtcache_setdst(). (I created a stub for it, but it isn't used anywhere, yet.) rtcache_free() will free the sockaddr. I have extracted from rtcache_free() a helper subroutine, rtcache_clear(). rtcache_clear() will "forget" a cached route, but it will not forget the destination by releasing the sockaddr. I use rtcache_clear() instead of rtcache_free() in rtcache_update(), because rtcache_update() is not supposed to forget the destination.
Constify:
1 Introduce const accessor for route->ro_dst, rtcache_getdst().
2 Constify the 'dst' argument to ifnet->if_output(). This led me to constify a lot of code called by output routines.
3 Constify the sockaddr argument to protosw->pr_ctlinput. This led me to constify a lot of code called by ctlinput routines.
4 Introduce const macros for converting from a generic sockaddr to family-specific sockaddrs, e.g., sockaddr_in: satocsin6, satocsin, et cetera.
|
1.143 |
| 06-Dec-2006 |
yamt | branches: 1.143.2; add some more tcp mowners.
|
1.142 |
| 06-Dec-2006 |
yamt | - make tcp_reass static. - constify.
|
1.141 |
| 21-Oct-2006 |
yamt | branches: 1.141.2; 1.141.4; - constify. - make tcp_dooptions and tcpipqent_pool static.
|
1.140 |
| 19-Oct-2006 |
yamt | implement RFC3465 appropriate byte counting. from Kentaro A. Kurahone, with minor adjustments by me. the ack prediction part of the original patch was omitted because it's a separate change. reviewed by Rui Paulo.
|
1.139 |
| 16-Oct-2006 |
rpaulo | Export the tcp_do_rfc1948 variable to userland via sysctl. The code to generate an ISS via an MD5 hash has been present in the NetBSD kernel since 2001, but it wasn't even exported to userland at that time. It was agreed on tech-net with the original author <thorpej> that we should let the user decide if he wants to enable it or not. Not enabled by default.
|
1.138 |
| 09-Oct-2006 |
rpaulo | Modular (I tried ;-) TCP congestion control API. Whenever certain conditions happen in the TCP stack, this interface calls the specified callback to handle the situation according to the currently selected congestion control algorithm. A new sysctl node was created: net.inet.tcp.congctl.{available,selected} with obvious meanings. The old net.inet.tcp.newreno MIB was removed. The API is discussed in tcp_congctl(9).
In the near future, it will be possible to selected a congestion control algorithm on a per-socket basis.
Discussed on tech-net and reviewed by <yamt>.
|
1.137 |
| 05-Sep-2006 |
rpaulo | branches: 1.137.2; 1.137.4; Import of TCP ECN algorithm for congestion control. Both available for IPv4 and IPv6. Basic implementation test results are available at http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.
Work sponsored by the Google Summer of Code project 2006. Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their help, comments and support during the project.
|
1.136 |
| 22-Jul-2006 |
rpaulo | revert stuff that shouldn't have gone in.
|
1.135 |
| 22-Jul-2006 |
rpaulo | TCP RFC is 793, not 783.
|
1.134 |
| 16-Feb-2006 |
perry | branches: 1.134.2; Change "inline" back to "__inline" in .h files -- C99 is still too new, and some apps compile things in C89 mode. C89 keywords stay.
As per core@.
|
1.133 |
| 24-Dec-2005 |
perry | branches: 1.133.2; 1.133.4; 1.133.6; Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
|
1.132 |
| 11-Dec-2005 |
christos | merge ktrace-lwp.
|
1.131 |
| 10-Dec-2005 |
elad | Multiple inclusion protection, as suggested by christos@ on tech-kern@ few days ago.
|
1.130 |
| 06-Sep-2005 |
rpaulo | Implement tcp.inet{,6}.tcp{,6}.(debug|debx) when TCP_DEBUG is set. They can be used to ``transliterate protocol trace'' like trpt(8) does.
|
1.129 |
| 10-Aug-2005 |
yamt | move {tcp,udp}_do_loopback_cksum back to tcp/udp so that they can be referenced by ipv6.
|
1.128 |
| 05-Aug-2005 |
elad | Add sysctls for IP, ICMP, TCP, and UDP statistics.
|
1.127 |
| 19-Jul-2005 |
christos | Implement PMTU checks from:
http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html
1. Don't act on ICMP-need-frag immediately if adhoc checks on the advertised MTU fail. The MTU update is delayed until a TCP retransmit happens. 2. Ignore ICMP Source Quench messages meant for TCP connections.
From OpenBSD.
|
1.126 |
| 29-May-2005 |
christos | branches: 1.126.2; - add const - remove bogus casts - avoid nested variables
|
1.125 |
| 05-Apr-2005 |
kurahone | Added sysctl tunable limits for the number of maximum SACK holes per connection and per system.
Idea taken from FreeBSD.
|
1.124 |
| 29-Mar-2005 |
yamt | protect tcpipqent with splvm.
|
1.123 |
| 16-Mar-2005 |
yamt | branches: 1.123.2; simplify data receiver side sack processing. - introduce t_segqlen, the number of segments in segq/timeq. the name is from freebsd. - rather than maintaining a copy of sack blocks (rcv_sack_block[]), build it directly from the segment list when needed.
|
1.122 |
| 16-Mar-2005 |
yamt | - use full sized segments unless we actually have SACKs to send. - avoid TSO duplicate D-SACK. - send SACKs regardless of TF_ACKNOW. - don't clear rcv_sack_num when transmitting.
discussed on tech-net@.
|
1.121 |
| 09-Mar-2005 |
atatat | gc the tcp_sysctl() prototype since it's completely vestigial
|
1.120 |
| 02-Mar-2005 |
mycroft | Copyright maintenance.
|
1.119 |
| 28-Feb-2005 |
jonathan | Commit TCP SACK patches from Kentaro A. Karahone's patch at: http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz
Fixes in that patch for pre-existing TCP pcb initializations were already committed to NetBSD-current, so are not included in this commit.
The SACK patch has been observed to correctly negotiate and respond, to SACKs in wide-area traffic.
There are two indepenently-observed, as-yet-unresolved anomalies: First, seeing unexplained delays between in fast retransmission (potentially explainable by an 0.2sec RTT between adjacent ethernet/wifi NICs); and second, peculiar and unepxlained TCP retransmits observed over an ath0 card.
After discussion with several interested developers, I'm committing this now, as-is, for more eyes to use and look over. Current hypothesis is that the anomalies above may in fact be due to link/level (hardware, driver, HAL, firmware) abberations in the test setup, affecting both Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.
|
1.118 |
| 06-Feb-2005 |
pk | Update tcp_trace() prototype to match implementation.
|
1.117 |
| 27-Jan-2005 |
mycroft | Introduce a new state variable, t_partialacks. It has 3 states: * t_partialacks<0 means we are not in fast recovery. * t_partialacks==0 means we are in fast recovery, but we have not received any partial acks yet. * t_partialacks>0 means we are in fast recovery, and we have received partial acks.
This is used to implement 2 changes in RFC 3782: * We keep the notion that we are in fast recovery separate from t_dupacks, so it is not reset due to out-of-order acks. (This affects both the Reno and NewReno cases.) * We only reset the retransmit timer on the first partial ack -- preventing us from possibly taking one RTO per segment once fast recovery is initiated.
As before, it is hard to measure any difference between Reno and NewReno in the real-world cases that I've tested.
|
1.116 |
| 26-Jan-2005 |
mycroft | Fix two problems in our TCP stack:
1) If an echoed RFC 1323 time stamp appears to be later than the current time, ignore it and fall back to old-style RTT calculation. This prevents ending up with a negative RTT and panicking later.
2) Fix NewReno. This involves a few changes:
a) Implement the send_high variable in RFC 2582. Our implementation is subtly different; it is one *past* the last sequence number transmitted rather than being equal to it. This simplifies some logic and makes the code smaller. Additional logic was required to prevent sequence number wraparound problems; this is not mentioned in RFC 2582.
b) Make sure we reset t_dupacks on new acks, but *not* on a partial ack. All of the new ack code is pushed out into tcp_newreno(). (Later this will probably be a pluggable function.) Thus t_dupacks keeps track of whether we're in fast recovery all the time, with Reno or NewReno, which keeps some logic simpler.
c) We do not need to update snd_recover when we're not in fast recovery. See tech-net for an explanation of this.
d) In the gratuitous fast retransmit prevention case, do not send a packet. RFC 2582 specifically says that we should "do nothing".
e) Do not inflate the congestion window on a partial ack. (This is done by testing t_dupacks to see whether we're still in fast recovery.)
This brings the performance of NewReno back up to the same as Reno in a few random test cases (e.g. transferring peer-to-peer over my wireless network). I have not concocted a good test case for the behavior specific to NewReno.
|
1.115 |
| 21-Dec-2004 |
yamt | branches: 1.115.2; 1.115.4; factor out receive side tcp/udp checksum handling code so that they can be used by eg. packet filters.
reviewed by Christos Zoulas on tech-net@. (slightly tweaked since then to make tcp and udp similar.)
|
1.114 |
| 15-Dec-2004 |
thorpej | Don't perform checksums on loopback interfaces. They can be reenabled with the net.inet.*.do_loopback_cksum sysctl.
Approved by: groo
|
1.113 |
| 15-Sep-2004 |
yamt | fix ipqent pool corruption problems. make tcp reass code use its own pool of ipqent rather than sharing it with ip reass code. PR/24782.
|
1.112 |
| 18-May-2004 |
itojun | fix MD5 signature support to actually validate inbound signature, and drop packet if fails.
|
1.111 |
| 26-Apr-2004 |
itojun | make TCP MD5 signature work with KAME IPSEC (#define IPSEC).
support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the right thing).
XXX current TCP MD5 signature code has giant flaw: it does not validate signature on input (can't believe it! what is the point?)
|
1.110 |
| 25-Apr-2004 |
jonathan | Initial commit of a port of the FreeBSD implementation of RFC 2385 (MD5 signatures for TCP, as used with BGP). Credit for original FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship credited to sentex.net. Shortening of the setsockopt() name attributed to Vincent Jardin.
This commit is a minimal, working version of the FreeBSD code, as MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp modified to set the TCP-MD5 option; BMS's additions to tcpdump-current (tcpdump -M) confirm that the MD5 signatures are correct. Committed as-is for further testing between a NetBSD BGP speaker (e.g., quagga) and industry-standard BGP speakers (e.g., Cisco, Juniper).
NOTE: This version has two potential flaws. First, I do see any code that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5 options are internally padded and assumed to be 32-bit aligned. A more space-efficient scheme is to pack all TCP options densely (and possibly unaligned) into the TCP header ; then do one final padding to a 4-byte boundary. Pre-existing comments note that accounting for TCP-option space when we add SACK is yet to be done. For now, I'm punting on that; we can solve it properly, in a way that will handle SACK blocks, as a separate exercise.
In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c ,and modifies:
sys/net/pfkeyv2.h,v 1.15 sys/netinet/files.netinet,v 1.5 sys/netinet/ip.h,v 1.25 sys/netinet/tcp.h,v 1.15 sys/netinet/tcp_input.c,v 1.200 sys/netinet/tcp_output.c,v 1.109 sys/netinet/tcp_subr.c,v 1.165 sys/netinet/tcp_usrreq.c,v 1.89 sys/netinet/tcp_var.h,v 1.109 sys/netipsec/files.netipsec,v 1.3 sys/netipsec/ipsec.c,v 1.11 sys/netipsec/ipsec.h,v 1.7 sys/netipsec/key.c,v 1.11 share/man/man4/tcp.4,v 1.16 lib/libipsec/pfkey.c,v 1.20 lib/libipsec/pfkey_dump.c,v 1.17 lib/libipsec/policy_token.l,v 1.8 sbin/setkey/parse.y,v 1.14 sbin/setkey/setkey.8,v 1.27 sbin/setkey/token.l,v 1.15
Note that the preceding two revisions to tcp.4 will be required to cleanly apply this diff.
|
1.109 |
| 21-Apr-2004 |
itojun | no space between function name and paren: foo (blah) -> foo(blah)
|
1.108 |
| 20-Apr-2004 |
itojun | - respond to RST by ACK, as suggested in NISCC recommendation - rate-limit ACKs against RSTs and SYNs
|
1.107 |
| 18-Apr-2004 |
matt | De __P()
|
1.106 |
| 22-Oct-2003 |
thorpej | branches: 1.106.2; Rather than zeroing a tcpcb structure and filling in all the fields individually, create a tcpcb template pre-initialized (and pre-zero'd) with the static and mostly-static tcpcb parameters. The template is now copied into the new tcpcb, which zeros and initializes most of the tcpcb in one pass. The template is kept up-to-date as TCP sysctl variables are changed.
Combined with the previous sb_max change, TCP socket creation is now 25% faster.
|
1.105 |
| 04-Sep-2003 |
itojun | revamp inpcb/in6pcb so that they are more aligned with each other. in6pcb lookup now uses hash(9).
|
1.104 |
| 07-Aug-2003 |
agc | Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
|
1.103 |
| 20-Jul-2003 |
he | As a temporary workaround, apply the fix from PR#20390, thereby cooperating with the callout code in working around the race condition caused by the TCP code's use of the callout facility.
Instead of unconditionally releasing memory in tcp_close() and SYN_CACHE_PUT(), check whether any of the related callout handlers are about to be invoked (but have not yet done callout_ack()), and if so, just mark the associated data structure (tcpcb or syn cache entry) as "dead", and test for this (and release storage) in the callout handler functions.
|
1.102 |
| 29-Jun-2003 |
fvdl | branches: 1.102.2; Back out the lwp/ktrace changes. They contained a lot of colateral damage, and need to be examined and discussed more.
|
1.101 |
| 29-Jun-2003 |
ragge | Add code to remember where in the send queue of mbufs the last packet was sent from. This change avoid a linear search through all mbufs when using large TCP windows, and therefore permit high-speed connections on long distances.
Tested on a 1 Gigabit connection between Lule� and San Francisco, a distance of about 15000km. With TCP windows of just over 20 Mbytes it could keep up with 950Mbit/s.
After discussions with Matt Thomas and Jason Thorpe.
|
1.100 |
| 28-Jun-2003 |
darrenr | Pass lwp pointers throughtout the kernel, as required, so that the lwpid can be inserted into ktrace records. The general change has been to replace "struct proc *" with "struct lwp *" in various function prototypes, pass the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
|
1.99 |
| 26-Jun-2003 |
christos | abuse the mib instead of abusing the new pointer. Idea from simon burge. It allows the tcp_sysctl_ident to run by non-super-users. No backwards compatibility provided.
|
1.98 |
| 23-Jun-2003 |
martin | Make sure to include opt_foo.h if a defflag option FOO is used.
|
1.97 |
| 19-Apr-2003 |
christos | PR/2352: Tor Egge: Add sysctl to get uid of connected socket.
|
1.96 |
| 01-Mar-2003 |
thorpej | Allow TCP connections to hosts on a local network to use a larger slow start initial window. Default this larger initial window to 4 packets, allowing it to be adjusted with net.inet.tcp.init_win_local.
|
1.95 |
| 26-Feb-2003 |
matt | Add MBUFTRACE kernel option. Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *) to m_get*(M_WAIT, *). These are not performance critical and making them call m_get saves considerable space. Add m_clget analogue of MCLGET and make corresponding change for M_WAIT uses. Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE. Begin to change netstat to use sysctl.
|
1.94 |
| 02-Nov-2002 |
perry | /*CONTCOND*/ while (0)'ed macros
|
1.93 |
| 30-Jun-2002 |
thorpej | Changes to allow the IPv4 and IPv6 layers to align headers themseves, as necessary: * Implement a new mbuf utility routine, m_copyup(), is is like m_pullup(), except that it always prepends and copies, rather than only doing so if the desired length is larger than m->m_len. m_copyup() also allows an offset into the destination mbuf, which allows space for packet headers, in the forwarding case. * Add *_HDR_ALIGNED_P() macros for IP, IPv6, ICMP, and IGMP. These macros expand to 1 if __NO_STRICT_ALIGNMENT is defined, so that architectures which do not have strict alignment constraints don't pay for the test or visit the new align-if-needed path. * Use the new macros to check if a header needs to be aligned, or to assert that it already is, as appropriate.
Note: This code is still somewhat experimental. However, the new code path won't be visited if individual device drivers continue to guarantee that packets are delivered to layer 3 already properly aligned (which are rules that are already in use).
|
1.92 |
| 09-Jun-2002 |
itojun | whitespace
|
1.91 |
| 26-May-2002 |
itojun | path MTU discovery blackhole detection. PR 12790 (sorry for not committing it for a long time)
|
1.90 |
| 12-May-2002 |
matt | branches: 1.90.2; 1.90.4; Eliminate commons.
|
1.89 |
| 15-Mar-2002 |
itojun | have tcp6_drain
|
1.88 |
| 24-Jan-2002 |
itojun | place NRL copyright notice itself, not a reference to it.
|
1.87 |
| 11-Sep-2001 |
thorpej | Use callouts for SYN cache timers, rather than traversing time queues in tcp_slowtimo().
|
1.86 |
| 10-Sep-2001 |
thorpej | Use callouts for TCP timers, rather than traversing the list of all open TCP connections in tcp_slowtimo() (which is called 2x per second). It's fairly rare for TCP timers to actually fire, so saving this list traversal is good, especially if you want to scale to thousands of open connections.
|
1.85 |
| 10-Sep-2001 |
thorpej | Split tcp_timers() into multiple functions, one for each timer, and call it directly from tcp_slowtimo() (via a table) rather than going through tcp_userreq().
This will allow us to call TCP timers directly from callouts, in a future revision.
|
1.84 |
| 10-Sep-2001 |
thorpej | Change the way receive idle time and round trip time are measured. Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now take a timstamp (via tcp_now) and use subtraction to compute the delta when we actually need it (using unsigned arithmetic so that tcp_now wrapping is handled correctly).
Based on similar changes in FreeBSD.
|
1.83 |
| 10-Sep-2001 |
thorpej | Use a callout for the delayed ACK timer, and delete tcp_fasttimo(). Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
|
1.82 |
| 31-Jul-2001 |
thorpej | branches: 1.82.2; Count the number of times we "self-quench" (ip_output() returns ENOBUFS), and don't inline tcp_segsize() if profiling.
|
1.81 |
| 30-May-2001 |
mrg | branches: 1.81.2; use _KERNEL_OPT
|
1.80 |
| 26-May-2001 |
matt | Make t_flags a u_int instead of u_short. It's followed by a mbuf pointer so there's padding around it already. And it increases the amount of bits available for TF_* flags.
|
1.79 |
| 13-Apr-2001 |
thorpej | Remove the use of splimp() from the NetBSD kernel. splnet() and only splnet() is allowed for the protection of data structures used by network devices.
|
1.78 |
| 20-Mar-2001 |
thorpej | Two changes, designed to make us even more resilient against TCP ISS attacks (which we already fend off quite well).
1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic hash method of generating TCP ISS values. Note, this code is experimental and disabled by default (experimental enough that I don't export the variable via sysctl yet, either). There are a couple of issues I'd like to discuss with Steve, so this code should only be used by people who really know what they're doing.
2. Per a recent thread on Bugtraq, it's possible to determine a system's uptime by snooping the RFC1323 TCP timestamp options sent by a host; in 4.4BSD, timestamps are created by incrementing the tcp_now variable at 2 Hz; there's even a company out there that uses this to determine web server uptime. According to Newsham's paper "The Problem With Random Increments", while NetBSD's TCP ISS generation method is much better than the "random increment" method used by FreeBSD and OpenBSD, it is still theoretically possible to mount an attack against NetBSD's method if the attacker knows how many times the tcp_iss_seq variable has been incremented. By not leaking uptime information, we can make that much harder to determine. So, we avoid the leak by giving each TCP connection a timebase of 0.
|
1.77 |
| 19-Oct-2000 |
itojun | branches: 1.77.2; remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c (separate TCP/IPv6 stack) into netbsd-current.
|
1.76 |
| 18-Oct-2000 |
thorpej | Restructure the Path MTU Discovery code somewhat to avoid entering rtentry's for hosts we're not actually communicating with.
Do this by invoking the ctlinput for the protocol, which is responsible for validating the ICMP message: * TCP -- Lookup the connection based on the address/port pairs in the ICMP message. * AH/ESP -- Lookup the SA based on the SPI in the ICMP message.
If validation succeeds, ctlinput is responsible for calling icmp_mtudisc(). icmp_mtudisc() then invokes callbacks registered by protocols (such as TCP) which want to take some sort of special action when a path's MTU changes. For TCP, this is where we now refresh cached routes and re-enter slow-start.
As a side-effect, this fixes the problem where TCP would not be notified when a path's MTU changed if AH/ESP were being used.
XXX Note, this is only a fix for the IPv4 case. For the IPv6 XXX case, we need to wait for the KAME folks.
Reviewed by sommerfeld@netbsd.org and itojun@netbsd.org.
|
1.75 |
| 15-Aug-2000 |
itojun | net.inet.tcp.rstratelimit is deprecated. make it invalid and return ENOPROTOOPT.
|
1.74 |
| 28-Jul-2000 |
itojun | nuke the following sysctl variables. "ppsratelimit" should work better. need to recompile sbin/sysctl after updating /usr/include. net.inet.tcp.rstratelimit net.inet.icmp.errratelimit net.inet6.icmp6.errratelimit
|
1.73 |
| 27-Jul-2000 |
itojun | implement net.inet.tcp.rstppslimit to limit TCP RSTs by packet-per-second basis. default: 100pps
set default value for net.inet.tcp.rstratelimit to 0 (disabled), NOTE: it does not work right for smaller-than-1/hz interval. maybe we should nuke it, or make it impossible to set smaller-than-1/hz value.
|
1.72 |
| 15-Feb-2000 |
thorpej | branches: 1.72.4; Add support for rate-limiting RSTs sent in response to no socket for an incoming packet. Default minimum interval is 10ms. The interval is changeable via the "net.inet.tcp.rstratelimit" sysctl variable.
|
1.71 |
| 13-Dec-1999 |
itojun | sync IPv6 part with latest KAME tree. IPsec part is left unmodified due to massive changes in KAME side. - IPv6 output goes through nd6_output - faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator using heavily modified DNS servers - per-interface statistics (required for IPv6 MIB) - interface autoconfig is revisited - udp input handling has a big change for mapped address support. - introduce in4_cksum() for non-overwriting checksumming - introduce m_pulldown() - neighbor discovery cleanups/improvements - netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland) - IFA_STATS is fixed a bit (not tested) - and more more more.
TODO: - cleanup os-independency #ifdef - avoid rcvif dual use (for IPsec) to help ifdetach
(sorry for jumbo commit, I can't separate this any more...)
|
1.70 |
| 08-Dec-1999 |
itojun | do not drop from IP header to tcp option until sbappend(), to reduce requirement to mbuf chain. part of KAME sync, committed separately for its (possible) impact.
|
1.69 |
| 19-Nov-1999 |
bouyer | Update protocoles and interfaces stats counters to 64bit. RTM_IFINFO is now 0xf, 0xe is RTM_OIFINFO which returns the old (if_msghdr14) struct with 32bit counters (binary compat, conditioned on COMPAT_14). Same for sysctl: node 3 is renamed NET_RT_OIFLIST, NET_RT_IFLIST is now node 4. Change rt_msg1() to add an mbuf to the mbuf chain instead of just panic() when the message is larger than MHLEN.
|
1.68 |
| 23-Sep-1999 |
itojun | branches: 1.68.2; 1.68.8; cleanup and correct TCP MSS consideration with IPsec headers.
MSS advertisement must always be: max(if mtu) - ip hdr siz - tcp hdr siz We violated this in the previous code so it was fixed.
tcp_mss_to_advertise() now takes af (af on wire) as its argument, to compute right ip hdr siz.
tcp_segsize() will take care of IPsec header size. One thing I'm not really sure is how to handle IPsec header size in *rxsegsizep (inbound segment size estimation). The current code subtracts possible *outbound* IPsec size from *rxsegsizep, hoping that the peer is using the same IPsec policy as me. It may not be applicable, could TCP gulu please comment...
|
1.67 |
| 25-Aug-1999 |
itojun | When listening socket goes away, remove assockated syn cache entires. Stale syn cache entries are useless because none of them will be used if there is no listening socket, as tcp_input looks up listening socket by in_pcblookup*() before looking into syn cache.
This fixes race condition due to dangling socket pointer from syn cache entries to listening socket (this was introduced when ipsec is merged in).
This should preserve currently implemented behavior (but not 4.4BSD behavior prior to syn cache).
Tested in KAME repository before commit, but we'd better run some regression tests.
|
1.66 |
| 12-Aug-1999 |
itojun | fix sototcpcb(). this sometimes caused panic on OOB data reception.
the macro may need to be expanded into dedicated function, rather than a macro, to capture unsupported values.
|
1.65 |
| 31-Jul-1999 |
itojun | sync with recent KAME. - loosen ipsec restriction on packet diredtion. - revise icmp6 redirect handling on IsRouter bit. - tcp/udp notification processing (link-local address case) - cosmetic fixes (better code share across *BSD).
|
1.64 |
| 22-Jul-1999 |
itojun | - implement IPv6 pmtud, which is necessary for TCP6. - fix memory leak on SO_DEBUG over TCP.
|
1.63 |
| 14-Jul-1999 |
itojun | Use proper ip protocol # field and tcp hdr on sending RST against SYN, when ip header and tcp header are not adjacent to each other (i.e. when ip6 options are attached).
To test this, try telnet @::1@::1 port toward a port without responding server. Prior to the fix, the kernel will generate broken RST packet.
|
1.62 |
| 09-Jul-1999 |
thorpej | defopt INET6, and put it in opt_inet.h (most places already include this file, which is why the file list is so short).
|
1.61 |
| 01-Jul-1999 |
itojun | IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628. (Sorry for a big commit, I can't separate this into several pieces...) Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.
- sys/kern: do not assume single mbuf, accept chained mbuf on passing data from userland to kernel (or other way round). - "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ package (ftp://ftp.csl.sony.co.jp/pub/kjc/). - sys/netinet/tcp*: IPv4/v6 dual stack tcp support. - sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those file to be there so we patch it up. - sys/netinet: IPsec additions are here and there. - sys/netinet6/*: most of IPv6 code sits here. - sys/netkey: IPsec key management code - dev/pci/pcidevs: regen
In my understanding no code here is subject to export control so it should be safe.
|
1.60 |
| 23-May-1999 |
ad | Add new sysctl (net.inet.tcp.log_refused) that when set, causes refused TCP connections to be logged.
|
1.59 |
| 29-Apr-1999 |
thorpej | Implement retransmit logic for the SYN cache engine. Fixes a rare condition where one side can think a connection exists, where the other side thinks the connection was never established.
The original problem was first reported by Ty Sarna in PR #5909. The original fix I made to the code didn't cover all cases. The problem this fix addresses was reported by Christoph Badura via private e-mail.
Many thanks to Bill Sommerfeld for helping me to test this code, and for finding a subtle bug.
|
1.58 |
| 24-Jan-1999 |
thorpej | branches: 1.58.2; Oops, forgot to update copyright notice in previous.
|
1.57 |
| 24-Jan-1999 |
thorpej | * Completely rewrite syn_cache_respond(). - Don't use tcp_respond(), instead create the tcp/ip header from scratch, and send it ourself. - Reuse the mbuf that carried the SYN, or allocate one if that is not available. - Cache the route we look up to do the Path MTU Discovery check, and transfer the reference to that route to the inpcb when the connection completes. * Macro'ize a small, but often repeated code fragment.
|
1.56 |
| 18-Dec-1998 |
thorpej | Add a lock around the TCPCB's sequence queue, to prevent tcp_drain() from corrupting the queue if called from a device's interrupt context.
Similar in nature to the problem reported in PR #5684.
|
1.55 |
| 06-Oct-1998 |
matt | Add a sysctl for newreno (default to off).
|
1.54 |
| 04-Oct-1998 |
matt | Adapt the NEWRENO changes from the UCSB diffs of BSDI 3.0's TCP to NetBSD. Ignore the SACK & FACK stuff for now.
|
1.53 |
| 10-Sep-1998 |
mouse | Create tcp.keepidle, tcp.keepintvl, tcp.keepcnt, tcp.slowhz sysctls.
|
1.52 |
| 09-Sep-1998 |
thorpej | Use an algorithm similar to that in tcp_notify() to determine if syn_cache_unreach() should remove the entry, or just continue on.
Algorithm is to only remove the entry if we've had more than one unreach error and have retransmitted 3 or more times. This prevents the following scenario, as noted in PR #5909 (PR from Ty Sarna, scenario from Charles Hannum):
* Host A sends a SYN. * Host A retransmits the SYN. * Host B gets the first SYN and sends a SYN-ACK. * Host B gets the second SYN and sends a SYN-ACK. * One of the SYN-ACK bounces with an ICMP unreachable, causing the `SYN cache' entry to be removed with no notification. * Host A receives the other SYN-ACK, sends an ACK, and goes to ESTABLISHED state.
Should fix PR #5909.
|
1.51 |
| 21-Jul-1998 |
mycroft | Implement a better fix for the `gratuitous FIN' problem, as mentioned on tcp-impl but with a bit more commentary.
|
1.50 |
| 11-May-1998 |
thorpej | Nuke TUBA per my note to tech-net; there's no reason to keep it around.
|
1.49 |
| 07-May-1998 |
thorpej | Rework the syn cache code somewhat: - Don't use home-grown queue manipulation. Use <sys/queue.h> instead. The data structures are a little larger, but we are otherwise wasting the memory chunk anyway (we're already a 64-byte malloc bucket). - Fix a bug in the cache-is-full case: if the oldest element removed from the first non-empty bucket was the only element in the bucket, the bucket wouldn't be removed from the bucket cache, causing queue corruption later. - Optimize the syn cache timers by using PRT timers rather than home-grown decrement-and-propagate timers.
This code is now a fair bit smaller, and significantly easier to read and understand.
|
1.48 |
| 06-May-1998 |
thorpej | Use the monotonically increasing slow timer timestamp provided by the protocol dispatch layer for TCP timers. This saves having to modify a potentially large number of timer values (which were shorts, and expanded to ... a lot of code on the Alpha).
|
1.47 |
| 02-May-1998 |
thorpej | Reintroduce the immediate ACK-on-PUSH behavior removed in revision 1.47, but make the decision to do this dependent on the sysctl variable net.inet.tcp.ack_on_push, which is disabled by default.
|
1.46 |
| 01-May-1998 |
thorpej | Garbage-collect.
|
1.45 |
| 30-Apr-1998 |
thorpej | In the CWM code, don't use the Floyd initial window computation as the burst size allowed, but rather a fixed number of packets, as described in the Internet Draft. Default allowed burst is 4 packets, per the Draft.
Make the use of CWM and the allowed burst size tunable via sysctl.
|
1.44 |
| 30-Apr-1998 |
thorpej | Make tcp_compat_42 a sysctl option.
|
1.43 |
| 29-Apr-1998 |
matt | New TCP reassembly code. The new code reduces the memory needed by out-of-order packets and builds the infrastructure needed for sending SACK blocks (to be added shortly).
|
1.42 |
| 29-Apr-1998 |
thorpej | Make use of the work-arounds for ancient broken TCP peers run-time conditional (tcp_compat_42). The kernel config option TCP_COMPAT_42 will still enable this by default, or disable this by default if the option is not included (i.e. current behavior). This will be made a sysctl soon.
|
1.41 |
| 13-Apr-1998 |
kml | Fix to ensure that the correct MSS is advertised for loopback TCP connections by using the MTU of the interface. Also added a knob, mss_ifmtu, to force all connections to use the MTU of the interface to calculate the advertised MSS.
|
1.40 |
| 07-Apr-1998 |
thorpej | Remember any source routes that may have accompanied a SYN.
|
1.39 |
| 03-Apr-1998 |
thorpej | Now that we have a flags word in the syn cache entry, use a flag to indicate "peer will do timestamps" rather than a bitfield, and give the now-unsed bit to the hash, making it now 32 bits.
|
1.38 |
| 03-Apr-1998 |
thorpej | Clean up some comments wrt. the syn cache code.
|
1.37 |
| 31-Mar-1998 |
thorpej | Fix a potential-congestion case in the larger initial congestion window code, as clarified in the TCPIMPL WG meeting at IETF #41: If the SYN (active open) or SYN,ACK (passive open) was retransmitted, the initial congestion window for the first slow start of that connection must be one segment.
|
1.36 |
| 17-Mar-1998 |
kml | Ensure that the TCP segment size reflects the size of TCP options in the packet. This fixes a bug that was resulting in extra packets in retransmissions (the second packet would be 12 bytes long, reflecting the RFC1323 timestamp option size).
|
1.35 |
| 19-Feb-1998 |
thorpej | Update copyright (sigh, should have done this long ago).
|
1.34 |
| 10-Feb-1998 |
perry | add/cleanup multiple inclusion protection.
|
1.33 |
| 05-Jan-1998 |
thorpej | Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes left were SCCS IDs and Copyright dates.
|
1.32 |
| 31-Dec-1997 |
thorpej | Implement a queue for delayed ACK processing. This queue is used in tcp_fasttimo() in lieu of scanning all open TCP connections.
|
1.31 |
| 17-Dec-1997 |
thorpej | Keep stats on connections dropped due to excessive persist timeout.
|
1.30 |
| 13-Dec-1997 |
thorpej | After further examination of traces of bulk transfers (with help from Kevin Lahey), undo the "defer window update until next delayed ACK".
|
1.29 |
| 11-Dec-1997 |
thorpej | Implement an infrastructure to allow larger initial congestion windows. The sysctl'able variable "tcp_init_win", when set to 0, selects an auto-tuning algorithm for selecting the initial window, based on transmit segment size, per discussion in the IETF tcpimpl working group.
Default initial window is still 1 segment, but will soon become 2 segments, per discussion in tcpimpl.
|
1.28 |
| 11-Dec-1997 |
thorpej | In the PRU_RCVD entry point, if TF_DELACK is set, don't send the window update now, since it will be sent within 200ms when the delayed ACK is sent. Instrument how many hits we get on this optimization.
|
1.27 |
| 10-Dec-1997 |
thorpej | Implement tcp_drain().
|
1.26 |
| 08-Nov-1997 |
kml | TCP MSS fixes to provide cleaner slow-start and recovery.
|
1.25 |
| 17-Oct-1997 |
kml | branches: 1.25.2; Path MTU Discovery support. This is turned off by default. Use sysctl -w net.inet.icmp.mtudisc=1 to turn on. Still to come: path removal after some period, black hole detection
|
1.24 |
| 10-Oct-1997 |
explorer | Add hooks to use the kernel random system to generate TCP sequence numbers.
|
1.23 |
| 22-Sep-1997 |
thorpej | Fix several annoyances related to MSS handling in BSD TCP: - Don't overload t_maxseg. Previous behavior was to set it to the min of the peer's advertised MSS, our advertised MSS, and tcp_mssdflt (for non-local networks). This breaks PMTU discovery running on either host. Instead, remember the MSS we advertise, and use it as appropriate (in silly window avoidance). - Per last bullet, split tcp_mss() into several functions for handling MSS (ours and peer's), and performing various tasks when a connection becomes ESTABLISHED. - Introduce a new function, tcp_segsize(), which computes the max size for every segment transmitted in tcp_output(). This will eventually be used to hook in PMTU discovery.
|
1.22 |
| 29-Aug-1997 |
gwr | Tweaks to allow operation with an interface address of 0.0.0.0 (needed for NFS mountroot using BOOTP to get boot parameters)
|
1.21 |
| 28-Jul-1997 |
thorpej | branches: 1.21.2; Make the following tunable via sysctl, inspired by BSD/OS: - tcp_sendspace - tcp_recvspace - tcp_mssdflt - tcp_syn_cache_limit - tcp_syn_bucket_limit - tcp_syn_cache_timer
|
1.20 |
| 23-Jul-1997 |
thorpej | Pull SYN_cache_branch down into the main line.
|
1.19 |
| 10-Dec-1996 |
mycroft | branches: 1.19.8; Fix RTT scaling problems introduced with Brakmo and Peterson changes.
|
1.18 |
| 22-May-1996 |
mycroft | Pass a proc pointer down to the usrreq and pcbbind functions for PRU_ATTACH, PRU_BIND and PRU_CONTROL. The usrreq interface really needs to be split up, but this will have to wait. Remove SS_PRIV completely.
|
1.17 |
| 13-Feb-1996 |
christos | branches: 1.17.4; netinet prototypes
|
1.16 |
| 31-Jan-1996 |
mycroft | Build a hash table of PCBs. Hash function needs tweaking.
|
1.15 |
| 21-Nov-1995 |
cgd | make netinet work on systems where pointers and longs are 64 bits (like the alpha). Biggest problem: IP headers were overlayed with structure which included pointers, and which therefore didn't overlay properly on 64-bit machines. Solution: instead of threading pointers through IP header overlays, add a "queue element" structure to do the threading, and point it at the ip headers.
|
1.14 |
| 30-Sep-1995 |
thorpej | branches: 1.14.2; Implement tcp_sysctl(). Add a sysctl option to enable/disable RFC1323 extensions to TCP. From John Kohl <jtk@kolvir.blrc.ma.us>.
|
1.13 |
| 12-Jun-1995 |
mycroft | Various cleanup, including: * Convert several data structures to use queue.h. * Split in_pcbnotify() into two parts; one for notifying a specific PCB, and one for notifying all PCBs for a particular foreign address.
|
1.12 |
| 11-Jun-1995 |
mycroft | As suggested by Brakmo and Peterson: * Don't add the extra 1/8 of the mss when ramping up the congestion window. * Scale the RTT values slightly to adjust for rounding errors. * Set the lower bound of the RTO to RTT+2.
|
1.11 |
| 13-Apr-1995 |
cgd | be a bit more careful and explicit with types. (basically a large no-op.)
|
1.10 |
| 26-Mar-1995 |
jtc | KERNEL -> _KERNEL
|
1.9 |
| 29-Jun-1994 |
cgd | New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
|
1.8 |
| 13-May-1994 |
mycroft | Update to 4.4-Lite networking code, with a few local changes.
|
1.7 |
| 10-Jan-1994 |
mycroft | Change the counters to be all the same type -- u_long.
|
1.6 |
| 10-Jan-1994 |
mycroft | Don't prototype this until it's safe.
|
1.5 |
| 08-Jan-1994 |
mycroft | Prototypes.
|
1.4 |
| 08-Jan-1994 |
mycroft | Fix some inconsistent spacing; spaces at the end of lines, etc.
|
1.3 |
| 20-May-1993 |
cgd | more rcsid additions and file header cleanups
|
1.2 |
| 19-Apr-1993 |
mycroft | Add consistent multiple-inclusion protection.
|
1.1 |
| 21-Mar-1993 |
cgd | branches: 1.1.1; Initial revision
|
1.1.1.3 |
| 05-Jan-1998 |
thorpej | Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
|
1.1.1.2 |
| 05-Jan-1998 |
thorpej | Import sys/netinet from 4.4BSD-Lite for reference purposes.
|
1.1.1.1 |
| 21-Mar-1993 |
cgd | initial import of 386bsd-0.1 sources
|
1.14.2.1 |
| 02-Feb-1996 |
mycroft | Bring in changes for mondo patch 2.
|
1.17.4.2 |
| 11-Dec-1996 |
mycroft | From trunk: Eliminate SS_PRIV; instead, pass down a proc pointer to the usrreq methods that need it. Fix numerous memory leaks and bogus return values.
|
1.17.4.1 |
| 10-Dec-1996 |
mycroft | From trunk: Fix RTT scaling problems introduced with Brakmo and Peterson changes.
|
1.19.8.6 |
| 16-Jul-1997 |
thorpej | Declare struct tcp_opt_info here; it's needed by tuba_tcpinput().
|
1.19.8.5 |
| 29-Jun-1997 |
thorpej | Instrument syn cache hash collisions.
|
1.19.8.4 |
| 28-Jun-1997 |
thorpej | KNF.
|
1.19.8.3 |
| 28-Jun-1997 |
thorpej | Use explicit type sizes in struct cyn_cache, and add a comment about this structure being larger than intended on the Alpha.
|
1.19.8.2 |
| 26-Jun-1997 |
thorpej | tcp_mss() needs to take a u_int, not a u_int16_t.
|
1.19.8.1 |
| 14-May-1997 |
mellon | More of David Borman's SYN cache patches for Lite2:
- Define syn_cache entry and syn_cache_head structures. - Add syn_cache statistics to tcpstat structure. - Declare externs for syn cache variables. - Update prototypes: tcp_dooptions, tcp_mss, tcp_respond. - Add prototypes for syn_cache_* functions.
|
1.21.2.3 |
| 14-Oct-1997 |
thorpej | Update marc-pcmcia branch from trunk.
|
1.21.2.2 |
| 29-Sep-1997 |
thorpej | Update marc-pcmcia branch from trunk.
|
1.21.2.1 |
| 01-Sep-1997 |
thorpej | Update marc-pcmcia branch from trunk.
|
1.25.2.4 |
| 09-May-1998 |
mycroft | Pull up patch from kml.
|
1.25.2.3 |
| 05-May-1998 |
mycroft | Pull up 1.36, per request of kml.
|
1.25.2.2 |
| 29-Jan-1998 |
mellon | Pull up 1.27-1.33 (thorpej)
|
1.25.2.1 |
| 08-Nov-1997 |
thorpej | Pull up from trunk: TCP MSS fixes to provide cleaner slow-start and recovery. (kml)
|
1.58.2.1 |
| 29-Apr-1999 |
perry | branches: 1.58.2.1.2; 1.58.2.1.4; pullup 1.58->1.59 (thorpej)
|
1.58.2.1.4.3 |
| 30-Nov-1999 |
itojun | bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch just for reference purposes. This commit includes 1.4 -> 1.4.1 sync for kame branch.
The branch does not compile at all (due to the lack of ALTQ and some other source code). Please do not try to modify the branch, this is just for referenre purposes.
synchronization to latest KAME will take place on HEAD branch soon.
|
1.58.2.1.4.2 |
| 06-Jul-1999 |
itojun | KAME/NetBSD 1.4, SNAP kit 1999/07/05. NOTE: this branch is just for reference purposes (i.e. for taking cvs diff). do not touch anything on the branch. actual work must be done on HEAD branch.
|
1.58.2.1.4.1 |
| 28-Jun-1999 |
itojun | KAME/NetBSD 1.4 SNAP kit, dated 19990628.
NOTE: this branch (kame) is used just for refernce. this may not compile due to multiple reasons.
|
1.58.2.1.2.3 |
| 02-Aug-1999 |
thorpej | Update from trunk.
|
1.58.2.1.2.2 |
| 01-Jul-1999 |
thorpej | Sync w/ -current.
|
1.58.2.1.2.1 |
| 21-Jun-1999 |
thorpej | Sync w/ -current.
|
1.68.8.1 |
| 27-Dec-1999 |
wrstuden | Pull up to last week's -current.
|
1.68.2.3 |
| 21-Apr-2001 |
bouyer | Sync with HEAD
|
1.68.2.2 |
| 27-Mar-2001 |
bouyer | Sync with HEAD.
|
1.68.2.1 |
| 20-Nov-2000 |
bouyer | Update thorpej_scsipi to -current as of a month ago
|
1.72.4.3 |
| 20-Apr-2004 |
jmc | Pullup patch (requested by itojun in ticket #143)
If a segment is received with RST set and the segment is completely to the left of the receive window, ignore it. Add some additional comments to the code that deals with received segemnts that are completely to the right of the receive window. If an invalid SYN is received, force an ACK and drop it; if the other side really sent the SYN; it'll respond with a reset. Respond to RST by ACK, as suggested in NISCC recommendation. Rate-limit ACKs against RSTs and SYNs. If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
|
1.72.4.2 |
| 24-Jan-2002 |
he | Pull up revision 1.88 (requested by itojun): Clean up the NRL copyright.
|
1.72.4.1 |
| 16-Aug-2000 |
itojun | pullup (approved by releng-1-5)
switch from net.inet*.*.*ratelimit to net.inet*.*.ppslimit.
(tags are rough estimate - we had some try-and-error in main trunc) sys/netinet/icmp6.h 1.9 -> 1.11 sys/netinet/icmp_var.h 1.15 -> 1.17 sys/netinet/in_proto.c 1.39 -> 1.42 sys/netinet/ip_icmp.c 1.50 -> 1.51, 1.52 -> 1.54 sys/netinet/tcp_input.c 1.111 -> 1.112, 1.115 -> 1.117 sys/netinet/tcp_usrreq.c 1.52 -> 1.53 sys/netinet/tcp_var.h 1.72 -> 1.75 sys/netinet6/icmp6.c 1.34 -> 1.35, 1.36 -> 1.38 sys/netinet6/in6_proto.c 1.17 -> 1.19
|
1.77.2.9 |
| 11-Nov-2002 |
nathanw | Catch up to -current
|
1.77.2.8 |
| 01-Aug-2002 |
nathanw | Catch up to -current.
|
1.77.2.7 |
| 20-Jun-2002 |
nathanw | Catch up to -current.
|
1.77.2.6 |
| 01-Apr-2002 |
nathanw | Catch up to -current. (CVS: It's not just a program. It's an adventure!)
|
1.77.2.5 |
| 28-Feb-2002 |
nathanw | Catch up to -current.
|
1.77.2.4 |
| 21-Sep-2001 |
nathanw | Catch up to -current.
|
1.77.2.3 |
| 24-Aug-2001 |
nathanw | Catch up with -current.
|
1.77.2.2 |
| 21-Jun-2001 |
nathanw | Catch up to -current.
|
1.77.2.1 |
| 09-Apr-2001 |
nathanw | Catch up with -current.
|
1.81.2.5 |
| 06-Sep-2002 |
jdolecek | sync kqueue branch with HEAD
|
1.81.2.4 |
| 23-Jun-2002 |
jdolecek | catch up with -current on kqueue branch
|
1.81.2.3 |
| 11-Feb-2002 |
jdolecek | Sync w/ -current.
|
1.81.2.2 |
| 13-Sep-2001 |
thorpej | Update the kqueue branch to HEAD.
|
1.81.2.1 |
| 03-Aug-2001 |
lukem | update to -current
|
1.82.2.1 |
| 01-Oct-2001 |
fvdl | Catch up with -current.
|
1.90.4.3 |
| 20-Apr-2004 |
jmc | Pullup patch (requested by itojun in ticket #1680)
If a segment is received with RST set and the segment is completely to the left of the receive window, ignore it. Add some additional comments to the code that deals with received segemnts that are completely to the right of the receive window. If an invalid SYN is received, force an ACK and drop it; if the other side really sent the SYN; it'll respond with a reset. Respond to RST by ACK, as suggested in NISCC recommendation. Rate-limit ACKs against RSTs and SYNs. If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
|
1.90.4.2 |
| 22-Oct-2003 |
jmc | Pullup rev 1.03 (requested by he in ticket #1530)
Introduce a new INVOKING status for callouts, and use it to close a race condition in the TCP code. Fixes PR#20390.
|
1.90.4.1 |
| 05-Sep-2003 |
tron | Pull up revision 1.91 (requested by tls in ticket #1445): path MTU discovery blackhole detection. PR 12790 (sorry for not committing it for a long time)
|
1.90.2.3 |
| 15-Jul-2002 |
gehenna | catch up with -current.
|
1.90.2.2 |
| 20-Jun-2002 |
gehenna | catch up with -current.
|
1.90.2.1 |
| 30-May-2002 |
gehenna | Catch up with -current.
|
1.102.2.12 |
| 11-Dec-2005 |
christos | Sync with head.
|
1.102.2.11 |
| 10-Nov-2005 |
skrll | Sync with HEAD. Here we go again...
|
1.102.2.10 |
| 01-Apr-2005 |
skrll | Sync with HEAD.
|
1.102.2.9 |
| 04-Mar-2005 |
skrll | Sync with HEAD.
Hi Perry!
|
1.102.2.8 |
| 07-Feb-2005 |
skrll | Sunc with HEAD.
|
1.102.2.7 |
| 04-Feb-2005 |
skrll | Sync with HEAD.
|
1.102.2.6 |
| 17-Jan-2005 |
skrll | Sync with HEAD.
|
1.102.2.5 |
| 18-Dec-2004 |
skrll | Sync with HEAD.
|
1.102.2.4 |
| 21-Sep-2004 |
skrll | Fix the sync with head I botched.
|
1.102.2.3 |
| 18-Sep-2004 |
skrll | Sync with HEAD.
|
1.102.2.2 |
| 03-Aug-2004 |
skrll | Sync with HEAD
|
1.102.2.1 |
| 02-Jul-2003 |
darrenr | Apply the aborted ktrace-lwp changes to a specific branch. This is just for others to review, I'm concerned that patch fuziness may have resulted in some errant code being generated but I'll look at that later by comparing the diff from the base to the branch with the file I attempt to apply to it. This will, at the very least, put the changes in a better context for others to review them and attempt to tinker with removing passing of 'struct lwp' through the kernel.
|
1.106.2.2 |
| 18-Sep-2004 |
he | Pull up revision 1.113 (requested by yamt in ticket #861): Fix ipqent pool corruption problems. Make the TCP reassembly code use its own pool of ipqent rather than sharing it with the IP reassembly code. Fixes PR#24782.
|
1.106.2.1 |
| 20-Apr-2004 |
jmc | Pullup patch (requested by itojun in ticket #169)
If a segment is received with RST set and the segment is completely to the left of the receive window, ignore it. Add some additional comments to the code that deals with received segemnts that are completely to the right of the receive window. If an invalid SYN is received, force an ACK and drop it; if the other side really sent the SYN; it'll respond with a reset. Respond to RST by ACK, as suggested in NISCC recommendation. Rate-limit ACKs against RSTs and SYNs. If SYN is coming and RCV.NXT == SEG.SEQ, then ACK with value - 1.
|
1.115.4.2 |
| 19-Mar-2005 |
yamt | sync with head. xen and whitespace. xen part is not finished.
|
1.115.4.1 |
| 12-Feb-2005 |
yamt | sync with head.
|
1.115.2.1 |
| 29-Apr-2005 |
kent | sync with -current
|
1.123.2.2 |
| 06-May-2005 |
tron | Pull up revision 1.125 (requested by kurahone in ticket #199): Added sysctl tunable limits for the number of maximum SACK holes per connection and per system. Idea taken from FreeBSD.
|
1.123.2.1 |
| 04-Apr-2005 |
tron | Pull up revision 1.124 (requested by yamt in ticket #90): protect tcpipqent with splvm.
|
1.126.2.6 |
| 17-Mar-2008 |
yamt | sync with head.
|
1.126.2.5 |
| 21-Jan-2008 |
yamt | sync with head
|
1.126.2.4 |
| 03-Sep-2007 |
yamt | sync with head.
|
1.126.2.3 |
| 26-Feb-2007 |
yamt | sync with head.
|
1.126.2.2 |
| 30-Dec-2006 |
yamt | sync with head.
|
1.126.2.1 |
| 21-Jun-2006 |
yamt | sync with head.
|
1.133.6.1 |
| 22-Apr-2006 |
simonb | Sync with head.
|
1.133.4.3 |
| 09-Sep-2006 |
rpaulo | sync with head
|
1.133.4.2 |
| 14-Mar-2006 |
rpaulo | Remove in6pcb in parameter list.
|
1.133.4.1 |
| 14-Mar-2006 |
rpaulo | Remove back pointer to in6pcb.
|
1.133.2.1 |
| 18-Feb-2006 |
yamt | sync with head.
|
1.134.2.2 |
| 14-Sep-2006 |
yamt | sync with head.
|
1.134.2.1 |
| 11-Aug-2006 |
yamt | sync with head
|
1.137.4.2 |
| 10-Dec-2006 |
yamt | sync with head.
|
1.137.4.1 |
| 22-Oct-2006 |
yamt | sync with head
|
1.137.2.2 |
| 12-Jan-2007 |
ad | Sync with head.
|
1.137.2.1 |
| 18-Nov-2006 |
ad | Sync with head.
|
1.141.4.1 |
| 03-Jun-2008 |
skrll | Sync with netbsd-4.
|
1.141.2.1 |
| 21-Jan-2008 |
bouyer | Pull up following revision(s) (requested by ghen in ticket #1039): sys/netinet/tcp_var.h: revision 1.148 distrib/sets/lists/comp/mi: revision 1.1035 distrib/sets/lists/man/mi: revision 1.1010 usr.sbin/tcpdrop/Makefile: revision 1.1 usr.sbin/tcpdrop/tcpdrop.c: revision 1.1 - 1.3 usr.sbin/tcpdrop/tcpdrop.8: revision 1.1 usr.sbin/Makefile: revision 1.228 via patch sys/netinet/tcp_usrreq.c: revision 1.133 distrib/sets/lists/base/mi: revision 1.712 Import tcpdrop(8) from OpenBSD
|
1.143.2.3 |
| 07-May-2007 |
yamt | sync with head.
|
1.143.2.2 |
| 12-Mar-2007 |
rmind | Sync with HEAD.
|
1.143.2.1 |
| 27-Feb-2007 |
yamt | - sync with head. - move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
|
1.145.4.1 |
| 11-Jul-2007 |
mjf | Sync with head.
|
1.145.2.4 |
| 20-Aug-2007 |
ad | Sync with HEAD.
|
1.145.2.3 |
| 15-Jul-2007 |
ad | Sync with head.
|
1.145.2.2 |
| 01-Jul-2007 |
ad | Adapt to callout API change.
|
1.145.2.1 |
| 08-Jun-2007 |
ad | Sync with head.
|
1.149.2.1 |
| 15-Aug-2007 |
skrll | Sync with HEAD.
|
1.150.20.2 |
| 02-Aug-2007 |
rmind | TCP socket buffers automatic sizing - ported from FreeBSD. http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html
! Disabled by default, marked as experimental. Testers are very needed. ! Someone should thoroughly test this, and improve if possible.
Discussed on <tech-net>: http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html Thanks Greg Troxel for comments.
OK by the long silence on <tech-net>.
|
1.150.20.1 |
| 02-Aug-2007 |
rmind | file tcp_var.h was added on branch matt-mips64 on 2007-08-02 02:42:43 +0000
|
1.150.16.1 |
| 02-Jan-2008 |
bouyer | Sync with HEAD
|
1.150.12.1 |
| 26-Dec-2007 |
ad | Sync with head.
|
1.150.10.1 |
| 18-Feb-2008 |
mjf | Sync with HEAD.
|
1.150.4.2 |
| 23-Mar-2008 |
matt | sync with HEAD
|
1.150.4.1 |
| 09-Jan-2008 |
matt | sync with HEAD
|
1.151.6.3 |
| 28-Sep-2008 |
mjf | Sync with HEAD.
|
1.151.6.2 |
| 02-Jun-2008 |
mjf | Sync with HEAD.
|
1.151.6.1 |
| 03-Apr-2008 |
mjf | Sync with HEAD.
|
1.151.2.1 |
| 24-Mar-2008 |
keiichi | sync with head.
|
1.155.2.1 |
| 18-May-2008 |
yamt | sync with head.
|
1.156.2.5 |
| 11-Mar-2010 |
yamt | sync with head
|
1.156.2.4 |
| 16-Sep-2009 |
yamt | sync with head
|
1.156.2.3 |
| 20-Jun-2009 |
yamt | sync with head
|
1.156.2.2 |
| 04-May-2009 |
yamt | sync with head.
|
1.156.2.1 |
| 16-May-2008 |
yamt | sync with head.
|
1.157.6.1 |
| 19-Oct-2008 |
haad | Sync with HEAD.
|
1.157.2.1 |
| 18-Sep-2008 |
wrstuden | Sync with wrstuden-revivesa-base-2.
|
1.158.10.1 |
| 21-Apr-2010 |
matt | sync to netbsd-5
|
1.158.4.1 |
| 26-Sep-2009 |
snj | Pull up following revision(s) (requested by darran in ticket #950): sys/netinet/tcp_input.c: revision 1.299 sys/netinet/tcp_usrreq.c: revision 1.156 sys/netinet/tcp_var.h: revision 1.161 Make tcp msl (max segment life) tunable via sysctl net.inet.tcp.msl. Okayed by tls@.
|
1.158.2.1 |
| 03-Mar-2009 |
skrll | Sync with HEAD.
|
1.159.2.1 |
| 23-Jul-2009 |
jym | Sync with HEAD.
|
1.162.6.1 |
| 06-Jun-2011 |
jruoho | Sync with HEAD.
|
1.162.4.2 |
| 31-May-2011 |
rmind | sync with head
|
1.162.4.1 |
| 21-Apr-2011 |
rmind | sync with head
|
1.168.6.1 |
| 18-Feb-2012 |
mrg | merge to -current.
|
1.168.2.2 |
| 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
1.168.2.1 |
| 17-Apr-2012 |
yamt | sync with head
|
1.169.6.3 |
| 03-Dec-2017 |
jdolecek | update from HEAD
|
1.169.6.2 |
| 20-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
1.169.6.1 |
| 23-Jun-2013 |
tls | resync from head
|
1.170.4.3 |
| 18-May-2014 |
rmind | sync with head
|
1.170.4.2 |
| 28-Aug-2013 |
rmind | Checkpoint work in progress: - Initial split of the protocol user-request method into the following methods: pr_attach, pr_detach and pr_generic for old the pr_usrreq. - Adjust socreate(9) and sonewconn(9) to call pr_attach without the socket lock held (as a preparation for the locking scheme adjustment). - Adjust all pr_attach routines to assert that PCB is not set. - Sprinkle various comments, document some routines and their locking. - Remove M_PCB, replace with kmem(9). - Fix few bugs spotted on the way.
|
1.170.4.1 |
| 17-Jul-2013 |
rmind | Checkpoint work in progress: - Move PCB structures under __INPCB_PRIVATE, adjust most of the callers and thus make IPv4 PCB structures mostly opaque. Any volunteers for merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)? - Move various global vars to the modules where they belong, make them static. - Some preliminary work for IPv4 PCB locking scheme. - Make raw IP code mostly MP-safe. Simplify some of it. - Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should run from a software interrupt, rather than hard. - Rework tun(4) pseudo interface to be MP-safe. - Work towards making some other interfaces more strict.
|
1.172.2.1 |
| 10-Aug-2014 |
tls | Rebase.
|
1.175.4.2 |
| 28-Aug-2017 |
skrll | Sync with HEAD
|
1.175.4.1 |
| 06-Apr-2015 |
skrll | Sync with HEAD
|
1.175.2.1 |
| 21-Feb-2015 |
martin | Pull up following revision(s) (requested by he in ticket #530): sys/netinet/tcp_output.c: revision 1.180 sys/netinet/tcp_input.c: revision 1.336 sys/netinet/tcp_usrreq.c: revision 1.203 share/man/man4/tcp.4: revision 1.30 sys/netinet/tcp.h: revision 1.31 sys/netinet/tcp_subr.c: revision 1.258 sys/netinet/tcp_var.h: revision 1.176 sys/netinet/tcp_var.h: revision 1.177 sys/sys/param.h: bump revision
Port over the TCP_INFO socket option from FreeBSD, originally from the Linux 2.6 TCP API. This permits the caller to query certain information about a TCP connection, and is used by pkgsrc's net/iperf3 test program if available.
This extends struct tcbcb with three fields to count retransmits, out-of-sequence receives and zero window announcements, and will therefore warrant a kernel revision bump (done separately).
Change the new counter variables in struct tcpcb to uint32_t, as per christos' comments.
|
1.177.10.2 |
| 03-Feb-2018 |
snj | Pull up following revision(s) (requested by ozaki-r in ticket #514): sys/net/route.c: 1.205 sys/net/rtsock.c: 1.237-1.238 sys/netinet/in.c: 1.215 sys/netinet/tcp_subr.c: 1.272 sys/netinet/tcp_timer.c: 1.93 sys/netinet/tcp_timer.h: 1.29 sys/netinet/tcp_var.h: 1.182 sys/netinet6/in6.c: 1.258 Remove extra pserialize_perform from in_purgeaddr It's already performed in ifa_remove. Note so there (in in6_unlink_ifa too). Release rt_so_mtx on updating a rtentry to avoid a deadlock with route_intr The deadlock happened only if NET_MPSAFE on. Run tcp_slowtimo in workqueue if NET_MPSAFE If NET_MPSAFE is enabled, we have to avoid taking softnet_lock in softint as much as possible to prevent any softint handlers including callout handlers such as tcp_slowtimo from sticking on softnet_lock because it results in undesired delays of executing subsequent softint handlers. NFCI for !NET_MPSAFE Fix a return value of rt_update_prepare Callers expect it to be an errno. Fix another deadlock When waiting for a route update to finish, a waiter has to release its reference to the route to avoid a deadlock. Because a updater tries to wait for references to a target route (except for a reference by the updater itself) to be released.
|
1.177.10.1 |
| 21-Oct-2017 |
snj | Pull up following revision(s) (requested by ozaki-r in ticket #300): crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19 crypto/dist/ipsec-tools/src/setkey/token.l: 1.20 distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759 doc/TODO.smpnet: 1.12-1.13 sys/net/pfkeyv2.h: 1.32 sys/net/raw_cb.c: 1.23-1.24, 1.28 sys/net/raw_cb.h: 1.28 sys/net/raw_usrreq.c: 1.57-1.58 sys/net/rtsock.c: 1.228-1.229 sys/netinet/in_proto.c: 1.125 sys/netinet/ip_input.c: 1.359-1.361 sys/netinet/tcp_input.c: 1.359-1.360 sys/netinet/tcp_output.c: 1.197 sys/netinet/tcp_var.h: 1.178 sys/netinet6/icmp6.c: 1.213 sys/netinet6/in6_proto.c: 1.119 sys/netinet6/ip6_forward.c: 1.88 sys/netinet6/ip6_input.c: 1.181-1.182 sys/netinet6/ip6_output.c: 1.193 sys/netinet6/ip6protosw.h: 1.26 sys/netipsec/ipsec.c: 1.100-1.122 sys/netipsec/ipsec.h: 1.51-1.61 sys/netipsec/ipsec6.h: 1.18-1.20 sys/netipsec/ipsec_input.c: 1.44-1.51 sys/netipsec/ipsec_netbsd.c: 1.41-1.45 sys/netipsec/ipsec_output.c: 1.49-1.64 sys/netipsec/ipsec_private.h: 1.5 sys/netipsec/key.c: 1.164-1.234 sys/netipsec/key.h: 1.20-1.32 sys/netipsec/key_debug.c: 1.18-1.21 sys/netipsec/key_debug.h: 1.9 sys/netipsec/keydb.h: 1.16-1.20 sys/netipsec/keysock.c: 1.59-1.62 sys/netipsec/keysock.h: 1.10 sys/netipsec/xform.h: 1.9-1.12 sys/netipsec/xform_ah.c: 1.55-1.74 sys/netipsec/xform_esp.c: 1.56-1.72 sys/netipsec/xform_ipcomp.c: 1.39-1.53 sys/netipsec/xform_ipip.c: 1.50-1.54 sys/netipsec/xform_tcp.c: 1.12-1.16 sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170 sys/rump/librump/rumpnet/net_stub.c: 1.27 sys/sys/protosw.h: 1.67-1.68 tests/net/carp/t_basic.sh: 1.7 tests/net/if_gif/t_gif.sh: 1.11 tests/net/if_l2tp/t_l2tp.sh: 1.3 tests/net/ipsec/Makefile: 1.7-1.9 tests/net/ipsec/algorithms.sh: 1.5 tests/net/ipsec/common.sh: 1.4-1.6 tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2 tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2 tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7 tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7 tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18 tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6 tests/net/ipsec/t_ipsec_tunnel.sh: 1.9 tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3 tests/net/mcast/t_mcast.sh: 1.6 tests/net/net/t_ipaddress.sh: 1.11 tests/net/net_common.sh: 1.20 tests/net/npf/t_npf.sh: 1.3 tests/net/route/t_flags.sh: 1.20 tests/net/route/t_flags6.sh: 1.16 usr.bin/netstat/fast_ipsec.c: 1.22 Do m_pullup before mtod
It may fix panicks of some tests on anita/sparc and anita/GuruPlug. --- KNF --- Enable DEBUG for babylon5 --- Apply C99-style struct initialization to xformsw --- Tweak outputs of netstat -s for IPsec
- Get rid of "Fast" - Use ipsec and ipsec6 for titles to clarify protocol - Indent outputs of sub protocols
Original outputs were organized like this:
(Fast) IPsec: IPsec ah: IPsec esp: IPsec ipip: IPsec ipcomp: (Fast) IPsec: IPsec ah: IPsec esp: IPsec ipip: IPsec ipcomp:
New outputs are organized like this:
ipsec: ah: esp: ipip: ipcomp: ipsec6: ah: esp: ipip: ipcomp: --- Add test cases for IPComp --- Simplify IPSEC_OSTAT macro (NFC) --- KNF; replace leading whitespaces with hard tabs --- Introduce and use SADB_SASTATE_USABLE_P --- KNF --- Add update command for testing
Updating an SA (SADB_UPDATE) requires that a process issuing SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI). This means that update command must be used with add command in a configuration of setkey. This usage is normally meaningless but useful for testing (and debugging) purposes. --- Add test cases for updating SA/SP
The tests require newly-added udpate command of setkey. --- PR/52346: Frank Kardel: Fix checksumming for NAT-T See XXX for improvements. --- Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE
It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters that have IPsec accelerators; a driver sets the mtag to a packet when its device has already encrypted the packet.
Unfortunately no driver implements such offload features for long years and seems unlikely to implement them soon. (Note that neither FreeBSD nor Linux doesn't have such drivers.) Let's remove related (unused) codes and simplify the IPsec code. --- Fix usages of sadb_msg_errno --- Avoid updating sav directly
On SADB_UPDATE a target sav was updated directly, which was unsafe. Instead allocate another sav, copy variables of the old sav to the new one and replace the old one with the new one. --- Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid --- Rename key_alloc* functions (NFC)
We shouldn't use the term "alloc" for functions that just look up data and actually don't allocate memory. --- Use explicit_memset to surely zero-clear key_auth and key_enc --- Make sure to clear keys on error paths of key_setsaval --- Add missing KEY_FREESAV --- Make sure a sav is inserted to a sah list after its initialization completes --- Remove unnecessary zero-clearing codes from key_setsaval
key_setsaval is now used only for a newly-allocated sav. (It was used to reset variables of an existing sav.) --- Correct wrong assumption of sav->refcnt in key_delsah
A sav in a list is basically not to be sav->refcnt == 0. And also KEY_FREESAV assumes sav->refcnt > 0. --- Let key_getsavbyspi take a reference of a returning sav --- Use time_mono_to_wall (NFC) --- Separate sending message routine (NFC) --- Simplify; remove unnecessary zero-clears
key_freesaval is used only when a target sav is being destroyed. --- Omit NULL checks for sav->lft_c
sav->lft_c can be NULL only when initializing or destroying sav. --- Omit unnecessary NULL checks for sav->sah --- Omit unnecessary check of sav->state
key_allocsa_policy picks a sav of either MATURE or DYING so we don't need to check its state again. --- Simplify; omit unnecessary saidx passing
- ipsec_nextisr returns a saidx but no caller uses it - key_checkrequest is passed a saidx but it can be gotton by another argument (isr) --- Fix splx isn't called on some error paths --- Fix header size calculation of esp where sav is NULL --- Fix header size calculation of ah in the case sav is NULL
This fix was also needed for esp. --- Pass sav directly to opencrypto callback
In a callback, use a passed sav as-is by default and look up a sav only if the passed sav is dead. --- Avoid examining freshness of sav on packet processing
If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance, we don't need to examine each sav and also don't need to delete one on the fly and send up a message. Fortunately every sav lists are sorted as we need.
Added key_validate_savlist validates that each sav list is surely sorted (run only if DEBUG because it's not cheap). --- Add test cases for SAs with different SPIs --- Prepare to stop using isr->sav
isr is a shared resource and using isr->sav as a temporal storage for each packet processing is racy. And also having a reference from isr to sav makes the lifetime of sav non-deterministic; such a reference is removed when a packet is processed and isr->sav is overwritten by new one. Let's have a sav locally for each packet processing instead of using shared isr->sav.
However this change doesn't stop using isr->sav yet because there are some users of isr->sav. isr->sav will be removed after the users find a way to not use isr->sav. --- Fix wrong argument handling --- fix printf format. --- Don't validate sav lists of LARVAL or DEAD states
We don't sort the lists so the validation will always fail.
Fix PR kern/52405 --- Make sure to sort the list when changing the state by key_sa_chgstate --- Rename key_allocsa_policy to key_lookup_sa_bysaidx --- Separate test files --- Calculate ah_max_authsize on initialization as well as esp_max_ivlen --- Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag --- Restore a comment removed in previous
The comment is valid for the below code. --- Make tests more stable
sleep command seems to wait longer than expected on anita so use polling to wait for a state change. --- Add tests that explicitly delete SAs instead of waiting for expirations --- Remove invalid M_AUTHIPDGM check on ESP isr->sav
M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can have AH authentication as sav->tdb_authalgxform. However, in that case esp_input and esp_input_cb are used to do ESP decryption and AH authentication and M_AUTHIPDGM never be set to a mbuf. So checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless. --- Look up sav instead of relying on unstable sp->req->sav
This code is executed only in an error path so an additional lookup doesn't matter. --- Correct a comment --- Don't release sav if calling crypto_dispatch again --- Remove extra KEY_FREESAV from ipsec_process_done
It should be done by the caller. --- Don't bother the case of crp->crp_buf == NULL in callbacks --- Hold a reference to an SP during opencrypto processing
An SP has a list of isr (ipsecrequest) that represents a sequence of IPsec encryption/authentication processing. One isr corresponds to one opencrypto processing. The lifetime of an isr follows its SP.
We pass an isr to a callback function of opencrypto to continue to a next encryption/authentication processing. However nobody guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.
In order to avoid such unexpected destruction of isr, hold a reference to its SP during opencrypto processing. --- Don't make SAs expired on tests that delete SAs explicitly --- Fix a debug message --- Dedup error paths (NFC) --- Use pool to allocate tdb_crypto
For ESP and AH, we need to allocate an extra variable space in addition to struct tdb_crypto. The fixed size of pool items may be larger than an actual requisite size of a buffer, but still the performance improvement by replacing malloc with pool wins. --- Don't use unstable isr->sav for header size calculations
We may need to optimize to not look up sav here for users that don't need to know an exact size of headers (e.g., TCP segmemt size caclulation). --- Don't use sp->req->sav when handling NAT-T ESP fragmentation
In order to do this we need to look up a sav however an additional look-up degrades performance. A sav is later looked up in ipsec4_process_packet so delay the fragmentation check until then to avoid an extra look-up. --- Don't use key_lookup_sp that depends on unstable sp->req->sav
It provided a fast look-up of SP. We will provide an alternative method in the future (after basic MP-ification finishes). --- Stop setting isr->sav on looking up sav in key_checkrequest --- Remove ipsecrequest#sav --- Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore --- Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu
Probably due to PR 43997 --- Add localcount to rump kernels --- Remove unused macro --- Fix key_getcomb_setlifetime
The fix adjusts a soft limit to be 80% of a corresponding hard limit.
I'm not sure the fix is really correct though, at least the original code is wrong. A passed comb is zero-cleared before calling key_getcomb_setlifetime, so comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100; is meaningless. --- Provide and apply key_sp_refcnt (NFC)
It simplifies further changes. --- Fix indentation
Pointed out by knakahara@ --- Use pslist(9) for sptree --- Don't acquire global locks for IPsec if NET_MPSAFE
Note that the change is just to make testing easy and IPsec isn't MP-safe yet. --- Let PF_KEY socks hold their own lock instead of softnet_lock
Operations on SAD and SPD are executed via PF_KEY socks. The operations include deletions of SAs and SPs that will use synchronization mechanisms such as pserialize_perform to wait for references to SAs and SPs to be released. It is known that using such mechanisms with holding softnet_lock causes a dead lock. We should avoid the situation. --- Make IPsec SPD MP-safe
We use localcount(9), not psref(9), to make the sptree and secpolicy (SP) entries MP-safe because SPs need to be referenced over opencrypto processing that executes a callback in a different context.
SPs on sockets aren't managed by the sptree and can be destroyed in softint. localcount_drain cannot be used in softint so we delay the destruction of such SPs to a thread context. To do so, a list to manage such SPs is added (key_socksplist) and key_timehandler_spd deletes dead SPs in the list.
For more details please read the locking notes in key.c.
Proposed on tech-kern@ and tech-net@ --- Fix updating ipsec_used
- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush - key_update_used wasn't called if an SP had been added/deleted but a reply to userland failed --- Fix updating ipsec_used; turn on when SPs on sockets are added --- Add missing IPsec policy checks to icmp6_rip6_input
icmp6_rip6_input is quite similar to rip6_input and the same checks exist in rip6_input. --- Add test cases for setsockopt(IP_IPSEC_POLICY) --- Don't use KEY_NEWSP for dummy SP entries
By the change KEY_NEWSP is now not called from softint anymore and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP. --- Comment out unused functions --- Add test cases that there are SPs but no relevant SAs --- Don't allow sav->lft_c to be NULL
lft_c of an sav that was created by SADB_GETSPI could be NULL. --- Clean up clunky eval strings
- Remove unnecessary \ at EOL - This allows to omit ; too - Remove unnecessary quotes for arguments of atf_set - Don't expand $DEBUG in eval - We expect it's expanded on execution
Suggested by kre@ --- Remove unnecessary KEY_FREESAV in an error path
sav should be freed (unreferenced) by the caller. --- Use pslist(9) for sahtree --- Use pslist(9) for sah->savtree --- Rename local variable newsah to sah
It may not be new. --- MP-ify SAD slightly
- Introduce key_sa_mtx and use it for some list operations - Use pserialize for some list iterations --- Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future
KEY_SA_UNREF is still key_freesav so no functional change for now.
This change reduces diff of further changes. --- Remove out-of-date log output
Pointed out by riastradh@ --- Use KDASSERT instead of KASSERT for mutex_ownable
Because mutex_ownable is too heavy to run in a fast path even for DIAGNOSTIC + LOCKDEBUG.
Suggested by riastradh@ --- Assemble global lists and related locks into cache lines (NFCI)
Also rename variable names from *tree to *list because they are just lists, not trees.
Suggested by riastradh@ --- Move locking notes --- Update the locking notes
- Add locking order - Add locking notes for misc lists such as reglist - Mention pserialize, key_sp_ref and key_sp_unref on SP operations
Requested by riastradh@ --- Describe constraints of key_sp_ref and key_sp_unref
Requested by riastradh@ --- Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL --- Add __read_mostly to key_psz
Suggested by riastradh@ --- Tweak wording (pserialize critical section => pserialize read section)
Suggested by riastradh@ --- Add missing mutex_exit --- Fix setkey -D -P outputs
The outputs were tweaked (by me), but I forgot updating libipsec in my local ATF environment... --- MP-ify SAD (key_sad.sahlist and sah entries)
localcount(9) is used to protect key_sad.sahlist and sah entries as well as SPD (and will be used for SAD sav).
Please read the locking notes of SAD for more details. --- Introduce key_sa_refcnt and replace sav->refcnt with it (NFC) --- Destroy sav only in the loop for DEAD sav --- Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf
If key_sendup_mbuf isn't passed a socket, the assertion fails. Originally in this case sb->sb_so was softnet_lock and callers held softnet_lock so the assertion was magically satisfied. Now sb->sb_so is key_so_mtx and also softnet_lock isn't always held by callers so the assertion can fail.
Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.
Reported by knakahara@ Tested by knakahara@ and ozaki-r@ --- Fix locking notes of SAD --- Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain
If we call key_sendup_mbuf from key_acquire that is called on packet processing, a deadlock can happen like this: - At key_acquire, a reference to an SP (and an SA) is held - key_sendup_mbuf will try to take key_so_mtx - Some other thread may try to localcount_drain to the SP with holding key_so_mtx in say key_api_spdflush - In this case localcount_drain never return because key_sendup_mbuf that has stuck on key_so_mtx never release a reference to the SP
Fix the deadlock by deferring key_sendup_mbuf to the timer (key_timehandler). --- Fix that prev isn't cleared on retry --- Limit the number of mbufs queued for deferred key_sendup_mbuf
It's easy to be queued hundreds of mbufs on the list under heavy network load. --- MP-ify SAD (savlist)
localcount(9) is used to protect savlist of sah. The basic design is similar to MP-ifications of SPD and SAD sahlist. Please read the locking notes of SAD for more details. --- Simplify ipsec_reinject_ipstack (NFC) --- Add per-CPU rtcache to ipsec_reinject_ipstack
It reduces route lookups and also reduces rtcache lock contentions when NET_MPSAFE is enabled. --- Use pool_cache(9) instead of pool(9) for tdb_crypto objects
The change improves network throughput especially on multi-core systems. --- Update
ipsec(4), opencrypto(9) and vlan(4) are now MP-safe. --- Write known issues on scalability --- Share a global dummy SP between PCBs
It's never be changed so it can be pre-allocated and shared safely between PCBs. --- Fix race condition on the rawcb list shared by rtsock and keysock
keysock now protects itself by its own mutex, which means that the rawcb list is protected by two different mutexes (keysock's one and softnet_lock for rtsock), of course it's useless.
Fix the situation by having a discrete rawcb list for each. --- Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE --- fix localcount leak in sav. fixed by ozaki-r@n.o.
I commit on behalf of him. --- remove unnecessary comment. --- Fix deadlock between pserialize_perform and localcount_drain
A typical ussage of localcount_drain looks like this:
mutex_enter(&mtx); item = remove_from_list(); pserialize_perform(psz); localcount_drain(&item->localcount, &cv, &mtx); mutex_exit(&mtx);
This sequence can cause a deadlock which happens for example on the following situation:
- Thread A calls localcount_drain which calls xc_broadcast after releasing a specified mutex - Thread B enters the sequence and calls pserialize_perform with holding the mutex while pserialize_perform also calls xc_broadcast - Thread C (xc_thread) that calls an xcall callback of localcount_drain tries to hold the mutex
xc_broadcast of thread B doesn't start until xc_broadcast of thread A finishes, which is a feature of xcall(9). This means that pserialize_perform never complete until xc_broadcast of thread A finishes. On the other hand, thread C that is a callee of xc_broadcast of thread A sticks on the mutex. Finally the threads block each other (A blocks B, B blocks C and C blocks A).
A possible fix is to serialize executions of the above sequence by another mutex, but adding another mutex makes the code complex, so fix the deadlock by another way; the fix is to release the mutex before pserialize_perform and instead use a condvar to prevent pserialize_perform from being called simultaneously.
Note that the deadlock has happened only if NET_MPSAFE is enabled. --- Add missing ifdef NET_MPSAFE --- Take softnet_lock on pr_input properly if NET_MPSAFE
Currently softnet_lock is taken unnecessarily in some cases, e.g., icmp_input and encap4_input from ip_input, or not taken even if needed, e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.
NFC if NET_MPSAFE is disabled (default). --- - sanitize key debugging so that we don't print extra newlines or unassociated debugging messages. - remove unused functions and make internal ones static - print information in one line per message --- humanize printing of ip addresses --- cast reduction, NFC. --- Fix typo in comment --- Pull out ipsec_fill_saidx_bymbuf (NFC) --- Don't abuse key_checkrequest just for looking up sav
It does more than expected for example key_acquire. --- Fix SP is broken on transport mode
isr->saidx was modified accidentally in ipsec_nextisr.
Reported by christos@ Helped investigations by christos@ and knakahara@ --- Constify isr at many places (NFC) --- Include socketvar.h for softnet_lock --- Fix buffer length for ipsec_logsastr
|
1.184.2.5 |
| 18-Jan-2019 |
pgoyette | Synch with HEAD
|
1.184.2.4 |
| 30-Sep-2018 |
pgoyette | Ssync with HEAD
|
1.184.2.3 |
| 06-Sep-2018 |
pgoyette | Sync with HEAD
Resolve a couple of conflicts (result of the uimin/uimax changes)
|
1.184.2.2 |
| 02-May-2018 |
pgoyette | Synch with HEAD
|
1.184.2.1 |
| 30-Mar-2018 |
pgoyette | Resolve conflicts between branch and HEAD
|
1.186.2.1 |
| 10-Jun-2019 |
christos | Sync with HEAD
|
1.192.4.1 |
| 03-Apr-2021 |
thorpej | Sync with HEAD.
|
1.195.4.1 |
| 01-Aug-2021 |
thorpej | Sync with HEAD.
|