History log of /src/sys/netinet/tcp_output.c |
Revision | | Date | Author | Comments |
1.222 |
| 08-Sep-2024 |
rillig | fix a/an grammar in obvious cases
|
1.221 |
| 05-Jul-2024 |
rin | sys: Drop redundant NULL check before m_freem(9)
m_freem(9) safely has accepted NULL argument at least since 4.2BSD: https://www.tuhs.org/cgi-bin/utree.pl?file=4.2BSD/usr/src/sys/sys/uipc_mbuf.c
Compile-tested on amd64/ALL.
Suggested by knakahara@
|
1.220 |
| 29-Jun-2024 |
riastradh | branches: 1.220.2; netinet: Use _NET_STAT* API instead of direct array access.
PR kern/58380
|
1.219 |
| 13-Sep-2023 |
bouyer | Handle EHOSTDOWN the same way as EHOSTUNREACH and ENETDOWN for established connections. Avoid premature end of tcp connection with "Host is down" error in case of transient link-layer failure. Discussed and patch proposed in http://mail-index.netbsd.org/tech-net/2023/09/11/msg008610.html and followups.
|
1.218 |
| 04-Nov-2022 |
ozaki-r | branches: 1.218.2; inpcb: rename functions to in6pcb_*
|
1.217 |
| 04-Nov-2022 |
ozaki-r | inpcb: rename functions to inpcb_*
Inspired by rmind-smpnet patches.
|
1.216 |
| 28-Oct-2022 |
ozaki-r | inpcb: separate inpcb again to reduce the size of PCB for IPv4
The data size of PCB for IPv4 increased because of the merge of struct in6pcb. The change decreases the size to the original size by separating struct inpcb (again). struct in4pcb and in6pcb that embed struct inpcb are introduced.
Even after the separation, users don't need to realize the separation and only have to use some macros to access dedicated data. For example, inp->inp_laddr is now accessed through in4p_laddr(inp).
|
1.215 |
| 28-Oct-2022 |
ozaki-r | inpcb: integrate data structures of PCB into one
Data structures of network protocol control blocks (PCBs), i.e., struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of the data structures have to handle them separately and thus the code is cluttered and duplicated.
The commit integrates the data structures into one, struct inpcb. As a result, users of PCBs only have to handle just one data structure, so the code becomes simple.
One drawback is that the data size of PCB for IPv4 increases by 40 bytes (from 248 bytes to 288 bytes).
|
1.214 |
| 30-Dec-2021 |
andvar | s/bandwith/bandwidth/
|
1.213 |
| 12-Jun-2020 |
roy | Remove in-kernel handling of Router Advertisements
This is much better handled by a user-land tool. Proposed on tech-net here: https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html
Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.
Compat is fully provided where it makes sense, but trying to turn on RA handling will obviously throw an error as it no longer exists.
Note that if you use IPv6 temporary addresses, this now needs to be turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
|
1.212 |
| 17-Nov-2019 |
mlelstv | Don't allow zero sized segments that will panic the stack. Reported-by: syzbot+5542516fa4afe7a101e6@syzkaller.appspotmail.com
|
1.211 |
| 25-Feb-2019 |
maxv | Improve panic messages.
|
1.210 |
| 27-Dec-2018 |
maxv | Remove unused arguments.
|
1.209 |
| 03-Sep-2018 |
riastradh | Rename min/max -> uimin/uimax for better honesty.
These functions are defined on unsigned int. The generic name min/max should not silently truncate to 32 bits on 64-bit systems. This is purely a name change -- no functional change intended.
HOWEVER! Some subsystems have
#define min(a, b) ((a) < (b) ? (a) : (b)) #define max(a, b) ((a) > (b) ? (a) : (b))
even though our standard name for that is MIN/MAX. Although these may invite multiple evaluation bugs, these do _not_ cause integer truncation.
To avoid `fixing' these cases, I first changed the name in libkern, and then compile-tested every file where min/max occurred in order to confirm that it failed -- and thus confirm that nothing shadowed min/max -- before changing it.
I have left a handful of bootloaders that are too annoying to compile-test, and some dead code:
cobalt ews4800mips hp300 hppa ia64 luna68k vax acorn32/if_ie.c (not included in any kernels) macppc/if_gm.c (superseded by gem(4))
It should be easy to fix the fallout once identified -- this way of doing things fails safe, and the goal here, after all, is to _avoid_ silent integer truncations, not introduce them.
Maybe one day we can reintroduce min/max as type-generic things that never silently truncate. But we should avoid doing that for a while, so that existing code has a chance to be detected by the compiler for conversion to uimin/uimax without changing the semantics until we can properly audit it all. (Who knows, maybe in some cases integer truncation is actually intended!)
|
1.208 |
| 17-May-2018 |
maxv | branches: 1.208.2; Remove reference to tcpiphdr in comment.
|
1.207 |
| 07-May-2018 |
uwe | Fix unsigned wraparound on window size calculations.
This is another instance where tp->rcv_adv - tp->rcv_nxt can wrap around after successful zero-window probe from the peer. The first one was fixed by chs@ in revision 1.112 on 2004-05-08.
While here, CSE and de-obfuscate the code a bit.
|
1.206 |
| 03-May-2018 |
maxv | Remove now unused tcpip.h includes. Some were already unused before.
|
1.205 |
| 03-Apr-2018 |
maxv | bcopy -> memcpy, it's obvious the areas don't overlap.
|
1.204 |
| 01-Apr-2018 |
maxv | Change the check to be <= instead of <. This fixes one occurrence of an apparently widespread division-by-zero bug in our TCP code: if a user adds huge IPv6 options with setsockopt, and if the total size of the options happens to be equal to the available space calculated for the TCP payload, t_segsz gets set to zero, and given that we then divide several things by it, the kernel crashes.
|
1.203 |
| 01-Apr-2018 |
maxv | Reorder and style, for clarity.
|
1.202 |
| 30-Mar-2018 |
maxv | Remove dead code. It was introduced in rev1 (25 years ago), and is irrelevant today.
|
1.201 |
| 30-Mar-2018 |
maxv | Style, use NULL for pointers, use KASSERT, and don't inline huge functions, we want to debug them with DDB (and not just with GPROF).
|
1.200 |
| 29-Mar-2018 |
maxv | Remove #ifdef INET. Same as tcp_input.c. Makes the code easier to understand.
Also make tcp6_mtudisc() static in tcp_subr.c.
|
1.199 |
| 10-Mar-2018 |
khorben | Fix spello in a comment
|
1.198 |
| 12-Feb-2018 |
maxv | branches: 1.198.2; Remove unused argument from tcp_signature_getsav.
|
1.197 |
| 03-Aug-2017 |
ozaki-r | Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future
KEY_SA_UNREF is still key_freesav so no functional change for now.
This change reduces diff of further changes.
|
1.196 |
| 02-Jun-2017 |
ozaki-r | branches: 1.196.2; Assert inph_locked on ipsec_pcb_skip_ipsec (was IPSEC_PCB_SKIP_IPSEC)
The assertion confirms SP caches are accessed under inph lock (solock).
|
1.195 |
| 03-Mar-2017 |
ozaki-r | Pass inpcb/in6pcb instead of socket to ip_output/ip6_output
- Passing a socket to Layer 3 is layer violation and even unnecessary - The change makes codes of callers and IPsec a bit simple
|
1.194 |
| 04-Jan-2017 |
martin | branches: 1.194.2; Fix optlen calculation for the SACK block - 2 bytes too few were calculated, causing corruption in PR kern/51767.
|
1.193 |
| 04-Jan-2017 |
kre | Remove redundant tests: if optlen === 0, then optlen % 4 != 2 (it is 0) so there is no need to test both.
|
1.192 |
| 03-Jan-2017 |
christos | use symbolic constants; no functional change.
|
1.191 |
| 03-Jan-2017 |
christos | put it the way we had it before; since we check for the resulting size after we added the extra space we can be equal to the size of the buffer.
|
1.190 |
| 03-Jan-2017 |
christos | fix off-by-one
|
1.189 |
| 02-Jan-2017 |
christos | make sure that the reset label is defined without TCP_SIGNATURE.
|
1.188 |
| 02-Jan-2017 |
christos | Fix TCP signature code: 1. pack options more tightly instead of being generous with no/op 2. put TCP_SIGNATURE option before SACK 3. fix computation of options length, by deferring it XXX: Really we should move the options setting code in one place instead of having two copies one for input and one for output. XXX: tcp_optlen/tcp_hdrsiz need to be fixed; they were wrong before too.
|
1.187 |
| 08-Dec-2016 |
ozaki-r | Add rtcache_unref to release points of rtentry stemming from rtcache
In the MP-safe world, a rtentry stemming from a rtcache can be freed at any points. So we need to protect rtentries somehow say by reference couting or passive references. Regardless of the method, we need to call some release function of a rtentry after using it.
The change adds a new function rtcache_unref to release a rtentry. At this point, this function does nothing because for now we don't add a reference to a rtentry when we get one from a rtcache. We will add something useful in a further commit.
This change is a part of changes for MP-safe routing table. It is separated to avoid one big change that makes difficult to debug by bisecting.
|
1.186 |
| 10-Jun-2016 |
ozaki-r | branches: 1.186.2; Introduce m_set_rcvif and m_reset_rcvif
The API is used to set (or reset) a received interface of a mbuf. They are counterpart of m_get_rcvif, which will come in another commit, hide internal of rcvif operation, and reduce the diff of the upcoming change.
No functional change.
|
1.185 |
| 24-Aug-2015 |
pooka | sprinkle _KERNEL_OPT
|
1.184 |
| 24-Jul-2015 |
matt | If we are sending a window probe and there's unacked data in the socket, make sure at least the persist timer is running.
|
1.183 |
| 16-May-2015 |
kefren | Don't put segment on the wire if security request can't be fulfilled
|
1.182 |
| 27-Apr-2015 |
christos | Apply Revision 220794 from FreeBSD to avoid dup ACKs:
When checking to see if a window update should be sent to the remote peer, don't force a window update if the window would not actually grow due to window scaling. Specifically, if the window scaling factor is larger than 2 * MSS, then after the local reader has drained 2 * MSS bytes from the socket, a window update can end up advertising the same window. If this happens, the supposed window update actually ends up being a duplicate ACK. This can result in an excessive number of duplicate ACKs when using a higher maximum socket buffer size.
Pointed out by Ricky Charlet, in tech-net.
|
1.181 |
| 27-Apr-2015 |
ozaki-r | Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp
It consolidates a scattered routine: (rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
|
1.180 |
| 14-Feb-2015 |
he | Port over the TCP_INFO socket option from FreeBSD, originally from the Linux 2.6 TCP API. This permits the caller to query certain information about a TCP connection, and is used by pkgsrc's net/iperf3 test program if available.
This extends struct tcbcb with three fields to count retransmits, out-of-sequence receives and zero window announcements, and will therefore warrant a kernel revision bump (done separately).
|
1.179 |
| 10-Nov-2014 |
maxv | branches: 1.179.2; Do not uselessly include <sys/malloc.h>.
|
1.178 |
| 25-Oct-2014 |
christos | Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks to Jonathan Looney for pointing this out.
|
1.177 |
| 21-Oct-2014 |
hikaru | Fix wrong condition checking TSO capability. ipsec_used is not necessary condition. IPsec outbound policy will not be checked when ipsec_used is false.
|
1.176 |
| 30-May-2014 |
christos | branches: 1.176.2; Introduce 2 new variables: ipsec_enabled and ipsec_used. Ipsec enabled is controlled by sysctl and determines if is allowed. ipsec_used is set automatically based on ipsec being enabled, and rules existing.
|
1.175 |
| 05-Jun-2013 |
christos | branches: 1.175.2; 1.175.6; IPSEC has not come in two speeds for a long time now (IPSEC == kame, FAST_IPSEC). Make everything refer to IPSEC to avoid confusion.
|
1.174 |
| 22-Mar-2012 |
drochner | branches: 1.174.2; remove KAME IPSEC, replaced by FAST_IPSEC
|
1.173 |
| 31-Dec-2011 |
christos | branches: 1.173.2; 1.173.6; 1.173.8; - fix offsetof usage, and redundant defines - kill pointer casts to 0
|
1.172 |
| 19-Dec-2011 |
drochner | rename the IPSEC in-kernel CPP variable and config(8) option to KAME_IPSEC, and make IPSEC define it so that existing kernel config files work as before Now the default can be easily be changed to FAST_IPSEC just by setting the IPSEC alias to FAST_IPSEC.
|
1.171 |
| 14-Apr-2011 |
yamt | branches: 1.171.4; 1.171.8; simplify a compile-time assertion
|
1.170 |
| 21-Mar-2011 |
matt | Clean up setting ECN bit in TOS. Fixes PR 44742
|
1.169 |
| 26-Jan-2010 |
pooka | branches: 1.169.4; 1.169.6; tcp sockbuf autoscaling was initially added turned off because it was experimental. People (including myself) have been running with it turned on for eons now, so flip the default to enabled.
|
1.168 |
| 18-Mar-2009 |
cegger | bzero -> memset
|
1.167 |
| 28-Apr-2008 |
martin | branches: 1.167.8; 1.167.10; 1.167.14; 1.167.16; 1.167.20; Remove clause 3 and 4 from TNF licenses
|
1.166 |
| 12-Apr-2008 |
thorpej | branches: 1.166.2; 1.166.4; Make IP, TCP, UDP, and ICMP statistics per-CPU. The stats are collated when the user requests them via sysctl.
|
1.165 |
| 08-Apr-2008 |
thorpej | Change TCP stats from a structure to an array of uint64_t's.
Note: This is ABI-compatible with the old tcpstat structure; old netstat binaries will continue to work properly.
|
1.164 |
| 14-Jan-2008 |
dyoung | branches: 1.164.6; Use rtcache_validate() instead of rtcache_getrt(). Shorten staircase in in_losing().
|
1.163 |
| 20-Dec-2007 |
dyoung | Poison struct route->ro_rt uses in the kernel by changing the name to _ro_rt. Use rtcache_getrt() to access a route cache's struct rtentry *.
Introduce struct ifnet->if_dl that always points at the interface identifier/link-layer address. Make code that treated the first ifaddr on struct ifnet->if_addrlist as the interface address use if_dl, instead.
Remove stale debugging code from net/route.c. Move the rtflush() code into rtcache_clear() and delete rtflush(). Delete rtalloc(), because nothing uses it any more.
Make ND6_HINT an inline, lowercase subroutine, nd6_hint.
I've done my best to convert IP Filter, the ISO stack, and the AppleTalk stack to rtcache_getrt(). They compile, but I have not tested them. I have given the changes to PF, GRE, IPv4 and IPv6 stacks a lot of exercise.
|
1.162 |
| 02-Sep-2007 |
dyoung | branches: 1.162.6; 1.162.8; 1.162.12; m_copy() was deprecated, apparently, long ago. m_copy(...) -> m_copym(..., M_DONTWAIT).
|
1.161 |
| 02-Aug-2007 |
yamt | branches: 1.161.2; 1.161.4; 1.161.6; make rfbuf_ts a tcp timestamp so that calculations in tcp_input make sense.
|
1.160 |
| 02-Aug-2007 |
rmind | TCP socket buffers automatic sizing - ported from FreeBSD. http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html
! Disabled by default, marked as experimental. Testers are very needed. ! Someone should thoroughly test this, and improve if possible.
Discussed on <tech-net>: http://mail-index.netbsd.org/tech-net/2007/07/12/0002.html Thanks Greg Troxel for comments.
OK by the long silence on <tech-net>.
|
1.159 |
| 18-May-2007 |
riz | branches: 1.159.2; Fix compilation in the TCP_SIGNATURE case:
- don't use void * for pointer arithmetic - don't try to modify const parameters
A kernel with 'options TCP_SIGNATURE' works as well as it ever did, now. (ie, clunky, but passable)
|
1.158 |
| 02-May-2007 |
dyoung | Eliminate address family-specific route caches (struct route, struct route_in6, struct route_iso), replacing all caches with a struct route.
The principle benefit of this change is that all of the protocol families can benefit from route cache-invalidation, which is necessary for correct routing. Route-cache invalidation fixes an ancient PR, kern/3508, at long last; it fixes various other PRs, also.
Discussions with and ideas from Joerg Sonnenberger influenced this work tremendously. Of course, all design oversights and bugs are mine.
DETAILS
1 I added to each address family a pool of sockaddrs. I have introduced routines for allocating, copying, and duplicating, and freeing sockaddrs:
struct sockaddr *sockaddr_alloc(sa_family_t af, int flags); struct sockaddr *sockaddr_copy(struct sockaddr *dst, const struct sockaddr *src); struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags); void sockaddr_free(struct sockaddr *sa);
sockaddr_alloc() returns either a sockaddr from the pool belonging to the specified family, or NULL if the pool is exhausted. The returned sockaddr has the right size for that family; sa_family and sa_len fields are initialized to the family and sockaddr length---e.g., sa_family = AF_INET and sa_len = sizeof(struct sockaddr_in). sockaddr_free() puts the given sockaddr back into its family's pool.
sockaddr_dup() and sockaddr_copy() work analogously to strdup() and strcpy(), respectively. sockaddr_copy() KASSERTs that the family of the destination and source sockaddrs are alike.
The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is passed directly to pool_get(9).
2 I added routines for initializing sockaddrs in each address family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(), etc. They are fairly self-explanatory.
3 structs route_in6 and route_iso are no more. All protocol families use struct route. I have changed the route cache, 'struct route', so that it does not contain storage space for a sockaddr. Instead, struct route points to a sockaddr coming from the pool the sockaddr belongs to. I added a new method to struct route, rtcache_setdst(), for setting the cache destination:
int rtcache_setdst(struct route *, const struct sockaddr *);
rtcache_setdst() returns 0 on success, or ENOMEM if no memory is available to create the sockaddr storage.
It is now possible for rtcache_getdst() to return NULL if, say, rtcache_setdst() failed. I check the return value for NULL everywhere in the kernel.
4 Each routing domain (struct domain) has a list of live route caches, dom_rtcache. rtflushall(sa_family_t af) looks up the domain indicated by 'af', walks the domain's list of route caches and invalidates each one.
|
1.157 |
| 04-Mar-2007 |
christos | branches: 1.157.2; 1.157.4; Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
|
1.156 |
| 22-Feb-2007 |
thorpej | TRUE -> true, FALSE -> false
|
1.155 |
| 21-Feb-2007 |
thorpej | Replace the Mach-derived boolean_t type with the C99 bool type. A future commit will replace use of TRUE and FALSE with true and false.
|
1.154 |
| 10-Feb-2007 |
degroote | branches: 1.154.2; Commit my SoC work Add ipv6 support for fast_ipsec Note that currently, packet with extensions headers are not correctly supported Change the ipcomp logic
|
1.153 |
| 25-Nov-2006 |
yamt | branches: 1.153.2; 1.153.4; move tso-by-software code to their own files. no functional changes.
|
1.152 |
| 23-Nov-2006 |
martin | Make it compile on IPv4-only kernels
|
1.151 |
| 23-Nov-2006 |
yamt | implement ipv6 TSO. partly from Matthias Scheler. tested by him.
|
1.150 |
| 17-Oct-2006 |
yamt | tcp_output: as a comment in tcp_sack_newack says, actually send one or two segments on partial acks. even if sack_bytes_rxmt==0, if we are in fast recovory with sack, snd_cwnd has somewhat special meaning here. PR/34749.
|
1.149 |
| 09-Oct-2006 |
rpaulo | Modular (I tried ;-) TCP congestion control API. Whenever certain conditions happen in the TCP stack, this interface calls the specified callback to handle the situation according to the currently selected congestion control algorithm. A new sysctl node was created: net.inet.tcp.congctl.{available,selected} with obvious meanings. The old net.inet.tcp.newreno MIB was removed. The API is discussed in tcp_congctl(9).
In the near future, it will be possible to selected a congestion control algorithm on a per-socket basis.
Discussed on tech-net and reviewed by <yamt>.
|
1.148 |
| 08-Oct-2006 |
yamt | tcp_output: don't make TSO duplicate CWR/ECE.
|
1.147 |
| 08-Oct-2006 |
yamt | tcp_output: don't try to send SACK option larger than txsegsize. fix a panic like "panic: m_copydata: off 0, len -7".
|
1.146 |
| 07-Oct-2006 |
yamt | tcp_output: remove duplicated code and tweak indent. no functional changes.
|
1.145 |
| 01-Oct-2006 |
dbj | back out revision 1.144 calculating txsegsizep since it unmasks other bugs. See PR kern/34674
|
1.144 |
| 28-Sep-2006 |
dbj | consider sb_lowat when limiting the transmit length to keep acks on the wire
|
1.143 |
| 05-Sep-2006 |
rpaulo | branches: 1.143.2; 1.143.4; Import of TCP ECN algorithm for congestion control. Both available for IPv4 and IPv6. Basic implementation test results are available at http://netbsd-soc.sourceforge.net/projects/ecn/testresults.html.
Work sponsored by the Google Summer of Code project 2006. Special thanks to Kentaro Kurahone, Allen Briggs and Matt Thomas for their help, comments and support during the project.
|
1.142 |
| 25-Mar-2006 |
seanb | Slight simplification of hdr len calculation in tcp_segsize(). No functional change.
|
1.141 |
| 24-Dec-2005 |
perry | branches: 1.141.4; 1.141.6; 1.141.8; 1.141.10; 1.141.12; Remove leading __ from __(const|inline|signed|volatile) -- it is obsolete.
|
1.140 |
| 11-Dec-2005 |
christos | merge ktrace-lwp.
|
1.139 |
| 10-Aug-2005 |
yamt | wrap INET-only code by #if defined(INET).
|
1.138 |
| 10-Aug-2005 |
yamt | ipv6 tx checksum offloading. reviewed by Jason Thorpe.
|
1.137 |
| 19-Jul-2005 |
christos | Implement PMTU checks from:
http://www.gont.com.ar/drafts/icmp-attacks-against-tcp.html
1. Don't act on ICMP-need-frag immediately if adhoc checks on the advertised MTU fail. The MTU update is delayed until a TCP retransmit happens. 2. Ignore ICMP Source Quench messages meant for TCP connections.
From OpenBSD.
|
1.136 |
| 28-Jun-2005 |
drochner | branches: 1.136.2; typo in comment
|
1.135 |
| 29-May-2005 |
christos | - add const - remove bogus casts - avoid nested variables
|
1.134 |
| 08-May-2005 |
yamt | tcp_output: account FIN when building sack option.
|
1.133 |
| 08-May-2005 |
yamt | tcp_output: don't try to send more data than we have. PR/30160.
|
1.132 |
| 08-May-2005 |
yamt | tcp_output: clear TH_FIN where appropriate. related to PR/30160.
|
1.131 |
| 18-Apr-2005 |
yamt | add a function to handle M_CSUM_TSOv4 by software.
|
1.130 |
| 18-Apr-2005 |
yamt | fix problems related to loopback interface checksum omission. PR/29971.
- for ipv4, defer decision to ip layer as h/w checksum offloading does so that it can check the actual interface the packet is going to. - for ipv6, disable it. (maybe will be revisited when it implements h/w checksum offloading.)
ok'ed by Jason Thorpe.
|
1.129 |
| 29-Mar-2005 |
yamt | tcp_output: lock reass queue when building sack.
|
1.128 |
| 16-Mar-2005 |
yamt | branches: 1.128.2; simplify data receiver side sack processing. - introduce t_segqlen, the number of segments in segq/timeq. the name is from freebsd. - rather than maintaining a copy of sack blocks (rcv_sack_block[]), build it directly from the segment list when needed.
|
1.127 |
| 16-Mar-2005 |
yamt | - use full sized segments unless we actually have SACKs to send. - avoid TSO duplicate D-SACK. - send SACKs regardless of TF_ACKNOW. - don't clear rcv_sack_num when transmitting.
discussed on tech-net@.
|
1.126 |
| 12-Mar-2005 |
yamt | don't try to use TSO to transmit a single segment. - there's no benefit. - rtl8169 seems to be stuck with it.
|
1.125 |
| 09-Mar-2005 |
matt | For AF_INET, always set m->m_pkthdr.csum_data. Don't or TSOv4, just set it.
|
1.124 |
| 07-Mar-2005 |
yamt | tcp_sack_option: the max number of sack blocks in a packet is 4, not 3.
|
1.123 |
| 06-Mar-2005 |
thorpej | Add a /*CONSTCOND*/ to last.
|
1.122 |
| 06-Mar-2005 |
matt | Fix typo. Opposite of >= is <, not ==.
|
1.121 |
| 06-Mar-2005 |
matt | Replace some gotos with a do while (0) and breaks. No functional change.
|
1.120 |
| 06-Mar-2005 |
matt | Add IPv4/TCP hooks for TCP Segment Offload on transmit.
|
1.119 |
| 02-Mar-2005 |
mycroft | Copyright maintenance.
|
1.118 |
| 28-Feb-2005 |
jonathan | Commit TCP SACK patches from Kentaro A. Karahone's patch at: http://www.sigusr1.org/~kurahone/tcp-sack-netbsd-02152005.diff.gz
Fixes in that patch for pre-existing TCP pcb initializations were already committed to NetBSD-current, so are not included in this commit.
The SACK patch has been observed to correctly negotiate and respond, to SACKs in wide-area traffic.
There are two indepenently-observed, as-yet-unresolved anomalies: First, seeing unexplained delays between in fast retransmission (potentially explainable by an 0.2sec RTT between adjacent ethernet/wifi NICs); and second, peculiar and unepxlained TCP retransmits observed over an ath0 card.
After discussion with several interested developers, I'm committing this now, as-is, for more eyes to use and look over. Current hypothesis is that the anomalies above may in fact be due to link/level (hardware, driver, HAL, firmware) abberations in the test setup, affecting both Kentaro's wired-Ethernet NIC and in my two (different) WiFi NICs.
|
1.117 |
| 26-Feb-2005 |
perry | nuke trailing whitespace
|
1.116 |
| 03-Feb-2005 |
perry | ANSIfy function declarations
|
1.115 |
| 15-Dec-2004 |
thorpej | branches: 1.115.2; 1.115.4; Don't perform checksums on loopback interfaces. They can be reenabled with the net.inet.*.do_loopback_cksum sysctl.
Approved by: groo
|
1.114 |
| 20-May-2004 |
jonathan | With FAST_IPSEC, include <netipsec/key.h>, as Itojun's recent changes now require KEY_FREESAV() to be in scope.
|
1.113 |
| 18-May-2004 |
itojun | fix MD5 signature support to actually validate inbound signature, and drop packet if fails.
|
1.112 |
| 08-May-2004 |
chs | work around an LP64 problem where we report an excessively large window due to incorrect mixing of types.
|
1.111 |
| 26-Apr-2004 |
itojun | make TCP MD5 signature work with KAME IPSEC (#define IPSEC).
support IPv6 if KAME IPSEC (RFC is not explicit about how we make data stream for checksum with IPv6, but i'm pretty sure using normal pseudo-header is the right thing).
XXX current TCP MD5 signature code has giant flaw: it does not validate signature on input (can't believe it! what is the point?)
|
1.110 |
| 25-Apr-2004 |
jonathan | Initial commit of a port of the FreeBSD implementation of RFC 2385 (MD5 signatures for TCP, as used with BGP). Credit for original FreeBSD code goes to Bruce M. Simpson, with FreeBSD sponsorship credited to sentex.net. Shortening of the setsockopt() name attributed to Vincent Jardin.
This commit is a minimal, working version of the FreeBSD code, as MFC'ed to FreeBSD-4. It has received minimal testing with a ttcp modified to set the TCP-MD5 option; BMS's additions to tcpdump-current (tcpdump -M) confirm that the MD5 signatures are correct. Committed as-is for further testing between a NetBSD BGP speaker (e.g., quagga) and industry-standard BGP speakers (e.g., Cisco, Juniper).
NOTE: This version has two potential flaws. First, I do see any code that verifies recieved TCP-MD5 signatures. Second, the TCP-MD5 options are internally padded and assumed to be 32-bit aligned. A more space-efficient scheme is to pack all TCP options densely (and possibly unaligned) into the TCP header ; then do one final padding to a 4-byte boundary. Pre-existing comments note that accounting for TCP-option space when we add SACK is yet to be done. For now, I'm punting on that; we can solve it properly, in a way that will handle SACK blocks, as a separate exercise.
In case a pullup to NetBSD-2 is requested, this adds sys/netipsec/xform_tcp.c ,and modifies:
sys/net/pfkeyv2.h,v 1.15 sys/netinet/files.netinet,v 1.5 sys/netinet/ip.h,v 1.25 sys/netinet/tcp.h,v 1.15 sys/netinet/tcp_input.c,v 1.200 sys/netinet/tcp_output.c,v 1.109 sys/netinet/tcp_subr.c,v 1.165 sys/netinet/tcp_usrreq.c,v 1.89 sys/netinet/tcp_var.h,v 1.109 sys/netipsec/files.netipsec,v 1.3 sys/netipsec/ipsec.c,v 1.11 sys/netipsec/ipsec.h,v 1.7 sys/netipsec/key.c,v 1.11 share/man/man4/tcp.4,v 1.16 lib/libipsec/pfkey.c,v 1.20 lib/libipsec/pfkey_dump.c,v 1.17 lib/libipsec/policy_token.l,v 1.8 sbin/setkey/parse.y,v 1.14 sbin/setkey/setkey.8,v 1.27 sbin/setkey/token.l,v 1.15
Note that the preceding two revisions to tcp.4 will be required to cleanly apply this diff.
|
1.109 |
| 30-Mar-2004 |
christos | Make sure we disarm the persist timer before we arm the rexmit timer, otherwise there is a tiny window where both timers are active, and this is not correct according to the comments in the code. I believe that this is the cause of the to_ticks <= 0 assertion failure in callout_schedule() that I've been getting.
|
1.108 |
| 03-Mar-2004 |
thorpej | branches: 1.108.2; Use IPSEC_PCB_SKIP_IPSEC() to short-circuit calls to ipsec{4,6}_hdrsiz_tcp().
|
1.107 |
| 04-Feb-2004 |
itojun | deal with IPv6 path MTU < 1280 (RFC2460 section 5 last paragraph). check if there really is room for TCP data.
|
1.106 |
| 12-Nov-2003 |
ragge | Remove the FAST_MBSEARCH ifdef, send packet prediction is now default.
|
1.105 |
| 24-Oct-2003 |
ragge | Fix the bug in the tcp transmit prediction code. During testing the prediction counters show a hit-rate on about 85% for packets sent on a local LAN, and better than 99% for intercontinental high-speed bulk traffic (!).
|
1.104 |
| 24-Oct-2003 |
enami | Make this file compile again when TCP_OUTPUT_COUNTERS defined.
|
1.103 |
| 23-Oct-2003 |
thorpej | Oops, FAST_MBSEARCH counters were swapped; fix it. Pointed out by yamt@.
|
1.102 |
| 21-Oct-2003 |
thorpej | Add event counters that measure FAST_MBSEARCH.
|
1.101 |
| 22-Aug-2003 |
itojun | remove ipsec_set/getsocket. now we explicitly pass socket * to ip{,6}_output.
|
1.100 |
| 22-Aug-2003 |
itojun | change the additional arg to be passed to ip{,6}_output to struct socket *.
this fixes KAME policy lookup which was broken by the previous commit.
|
1.99 |
| 22-Aug-2003 |
jonathan | Replace the set_socket() method of passing an extra struct socket* argument to ip6_output() with a new explicit struct in6pcb* argument. (The underlying socket can be obtained via in6pcb->inp6_socket.)
In preparation for fast-ipsec. Reviewed by itojun.
|
1.98 |
| 15-Aug-2003 |
jonathan | (fast-ipsec): Add hooks to pass IPv4 IPsec traffic into fast-ipsec, if configured with ``options FAST_IPSEC''. Kernels with KAME IPsec or with no IPsec should work as before.
All calls to ip_output() now always pass an additional compulsory argument: the inpcb associated with the packet being sent, or 0 if no inpcb is available.
Fast-ipsec tested with ICMP or UDP over ESP. TCP doesn't work, yet.
|
1.97 |
| 07-Aug-2003 |
agc | Move UCB-licensed code from 4-clause to 3-clause licence.
Patches provided by Joel Baker in PR 22364, verified by myself.
|
1.96 |
| 02-Jul-2003 |
ragge | Make the fast-search stuff an option. There are still reports on problem with it.
|
1.95 |
| 02-Jul-2003 |
ragge | Fix previous bug. Thanks to Enami for spotting the (obvious) error, and to other people with much help with bug reports etc. While fixing, change some of the code I added last time to make it cleaner and simpler.
|
1.94 |
| 30-Jun-2003 |
ragge | branches: 1.94.2; Disable the code I checked in yesterday; reports that samba (!) are crashing machines with it. Will do some more tests.
|
1.93 |
| 29-Jun-2003 |
fvdl | Back out the lwp/ktrace changes. They contained a lot of colateral damage, and need to be examined and discussed more.
|
1.92 |
| 29-Jun-2003 |
ragge | Add code to remember where in the send queue of mbufs the last packet was sent from. This change avoid a linear search through all mbufs when using large TCP windows, and therefore permit high-speed connections on long distances.
Tested on a 1 Gigabit connection between Lule� and San Francisco, a distance of about 15000km. With TCP windows of just over 20 Mbytes it could keep up with 950Mbit/s.
After discussions with Matt Thomas and Jason Thorpe.
|
1.91 |
| 17-May-2003 |
itojun | no need for ip_v recovery in output path too (tcp_template includes ip_v setting)
|
1.90 |
| 01-Mar-2003 |
thorpej | Allow TCP connections to hosts on a local network to use a larger slow start initial window. Default this larger initial window to 4 packets, allowing it to be adjusted with net.inet.tcp.init_win_local.
|
1.89 |
| 26-Feb-2003 |
matt | Add MBUFTRACE kernel option. Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *) to m_get*(M_WAIT, *). These are not performance critical and making them call m_get saves considerable space. Add m_clget analogue of MCLGET and make corresponding change for M_WAIT uses. Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE. Begin to change netstat to use sysctl.
|
1.88 |
| 24-Nov-2002 |
scw | Fix a genuine uninitialised variable warning.
|
1.87 |
| 02-Nov-2002 |
itojun | cleanup ipsec.h dependency. commented by perry, sync w/kame
|
1.86 |
| 13-Sep-2002 |
mycroft | In the txsegsize bounding code, it is not necessary to adjust for the options length.
|
1.85 |
| 20-Aug-2002 |
thorpej | Never send more than half a socket buffer of data. This insures that we can always keep 2 packets on the wire, no matter what SO_SNDBUF is, and therefore ACKs will never be delayed unless we run out of data to transmit. The problem is quite easy to tickle when the MTU of the outgoing interface is larger than the socket buffer size (e.g. loopback).
Fix from Charles Hannum.
|
1.84 |
| 14-Aug-2002 |
itojun | avoid swapping endian of ip_len and ip_off on mbuf, to meet with M_LEADINGSPACE optimization made last year. should solve PR 17867 and 10195.
IP_HDRINCL behavior of raw ip socket is kept unchanged. we may want to provide IP_HDRINCL variant that does not swap endian.
|
1.83 |
| 13-Jun-2002 |
thorpej | Disable TCP Congestion Window Monitoring by default; there are performance problems in the face of tinygrams.
|
1.82 |
| 09-Jun-2002 |
itojun | whitespace
|
1.81 |
| 29-May-2002 |
itojun | attach nd_ifinfo structure into if_afdata. split IPv6 link MTU (advertised by RA) from real link MTU. sync with kame
|
1.80 |
| 26-May-2002 |
itojun | path MTU discovery blackhole detection. PR 12790 (sorry for not committing it for a long time)
|
1.79 |
| 27-Apr-2002 |
thorpej | branches: 1.79.2; 1.79.4; * Instrument tcp_build_datapkt(). * Remove the code that allocates a cluster if the packet would fit in one; it totally defeats doing references to M_EXT mbufs in the socket buffer. This drastically reduces the number of data copies in the tcp_output() path for applications which use large writes. Kudos to Matt Thomas for pointing me in the right direction.
|
1.78 |
| 01-Mar-2002 |
thorpej | In tcp_segsize(), move a label so that option length is considered when using the default TCP MSS as well. From Matt Thomas.
|
1.77 |
| 24-Jan-2002 |
itojun | place NRL copyright notice itself, not a reference to it.
|
1.76 |
| 03-Dec-2001 |
jmcneill | Fix TCP segment size computation. From Rick Byersm, PR kern/14799.
|
1.75 |
| 13-Nov-2001 |
lukem | add RCSIDs
|
1.74 |
| 10-Sep-2001 |
thorpej | Use callouts for TCP timers, rather than traversing the list of all open TCP connections in tcp_slowtimo() (which is called 2x per second). It's fairly rare for TCP timers to actually fire, so saving this list traversal is good, especially if you want to scale to thousands of open connections.
|
1.73 |
| 10-Sep-2001 |
thorpej | Change the way receive idle time and round trip time are measured. Instead of incrementing t_idle and t_rtt in tcp_slowtimo(), we now take a timstamp (via tcp_now) and use subtraction to compute the delta when we actually need it (using unsigned arithmetic so that tcp_now wrapping is handled correctly).
Based on similar changes in FreeBSD.
|
1.72 |
| 10-Sep-2001 |
thorpej | Enable Congestion Window Monitoring by default.
|
1.71 |
| 10-Sep-2001 |
thorpej | Use a callout for the delayed ACK timer, and delete tcp_fasttimo(). Expose the delayed ACK timer as net.inet.tcp.delack_ticks.
|
1.70 |
| 31-Jul-2001 |
thorpej | branches: 1.70.2; Carve off the code that builds a TCP data packet into its own function, and inline it, except when profiling... so we can profile it.
|
1.69 |
| 31-Jul-2001 |
thorpej | Count the number of times we "self-quench" (ip_output() returns ENOBUFS), and don't inline tcp_segsize() if profiling.
|
1.68 |
| 26-Jul-2001 |
thorpej | Slight cosmetic change.
|
1.67 |
| 08-Jul-2001 |
abs | branches: 1.67.2; Rename TCPDEBUG to TCP_DEBUG, defopt TCP_DEBUG and TCP_NDEBUG, and make all usage of tcp_trace dependent on TCP_DEBUG - resulting in a 31K saving on an INET enabled i386 kernel.
|
1.66 |
| 02-Jun-2001 |
thorpej | Implement support for IP/TCP/UDP checksum offloading provided by network interfaces. This works by pre-computing the pseudo-header checksum and caching it, delaying the actual checksum to ip_output() if the hardware cannot perform the sum for us. In-bound checksums can either be fully-checked by hardware, or summed up for final verification by software. This method was modeled after how this is done in FreeBSD, although the code is significantly different in most places.
We don't delay checksums for IPv6/TCP, but we do take advantage of the cached pseudo-header checksum.
Note: hardware-assisted checksumming defaults to "off". It is enabled with ifconfig(8). See the manual page for details.
Implement hardware-assisted checksumming on the DP83820 Gigabit Ethernet, 3c90xB/3c90xC 10/100 Ethernet, and Alteon Tigon/Tigon2 Gigabit Ethernet.
|
1.65 |
| 03-Apr-2001 |
itojun | check ip_mtudisc only for TCP over IPv4. PMTUD is mandatory for TCP over IPv6 (if packets > 1280).
|
1.64 |
| 20-Mar-2001 |
thorpej | Two changes, designed to make us even more resilient against TCP ISS attacks (which we already fend off quite well).
1. First-cut implementation of RFC1948, Steve Bellovin's cryptographic hash method of generating TCP ISS values. Note, this code is experimental and disabled by default (experimental enough that I don't export the variable via sysctl yet, either). There are a couple of issues I'd like to discuss with Steve, so this code should only be used by people who really know what they're doing.
2. Per a recent thread on Bugtraq, it's possible to determine a system's uptime by snooping the RFC1323 TCP timestamp options sent by a host; in 4.4BSD, timestamps are created by incrementing the tcp_now variable at 2 Hz; there's even a company out there that uses this to determine web server uptime. According to Newsham's paper "The Problem With Random Increments", while NetBSD's TCP ISS generation method is much better than the "random increment" method used by FreeBSD and OpenBSD, it is still theoretically possible to mount an attack against NetBSD's method if the attacker knows how many times the tcp_iss_seq variable has been incremented. By not leaking uptime information, we can make that much harder to determine. So, we avoid the leak by giving each TCP connection a timebase of 0.
|
1.63 |
| 24-Jan-2001 |
itojun | branches: 1.63.2; - record IPsec packet history into m_aux structure. - let ipfilter look at wire-format packet only (not the decapsulated ones), so that VPN setting can work with NAT/ipfilter settings. sync with kame.
TODO: use header history for stricter inbound validation
|
1.62 |
| 06-Nov-2000 |
itojun | fix IPv4 TTL selection with AF_INET6 API. sync with kame. From: jdc
|
1.61 |
| 19-Oct-2000 |
itojun | remove #ifdef TCP6. it is not likely for us to bring in sys/netinet6/tcp6*.c (separate TCP/IPv6 stack) into netbsd-current.
|
1.60 |
| 17-Oct-2000 |
itojun | be more friendly with INET-less build. XXX we need to do more to do a working INET-less build
|
1.59 |
| 17-Oct-2000 |
thorpej | Add an IP_MTUDISC flag to the flags that can be passed to ip_output(). This flag, if set, causes ip_output() to set DF in the IP header if the MTU in the route is not locked.
This allows a bunch of redundant code, which I was never really all that happy about adding in the first place, to be eliminated.
Inspired by a similar change made by provos@openbsd.org when he integrated NetBSD's Path MTU Discovery code into OpenBSD.
|
1.58 |
| 28-Jul-2000 |
itojun | forgot to call tcp6_quench(). sync with kame.
|
1.57 |
| 30-Jun-2000 |
itojun | remove old mbuf assumption (ip header and tcp header are on the same mbuf). this is for m_pulldown use. (sync with kame)
|
1.56 |
| 30-Mar-2000 |
augustss | branches: 1.56.4; Remove register declarations.
|
1.55 |
| 01-Mar-2000 |
itojun | introduce m->m_pkthdr.aux to hold random data which needs to be passed between protocol handlers.
ipsec socket pointers, ipsec decryption/auth information, tunnel decapsulation information are in my mind - there can be several other usage. at this moment, we use this for ipsec socket pointer passing. this will avoid reuse of m->m_pkthdr.rcvif in ipsec code.
due to the change, MHLEN will be decreased by sizeof(void *) - for example, for i386, MHLEN was 100 bytes, but is now 96 bytes. we may want to increase MSIZE from 128 to 256 for some of our architectures.
take caution if you use it for keeping some data item for long period of time - use extra caution on M_PREPEND() or m_adj(), as they may result in loss of m->m_pkthdr.aux pointer (and mbuf leak).
this will bump kernel version.
(as discussed in tech-net, tested in kame tree)
|
1.54 |
| 09-Feb-2000 |
itojun | optimize mbuf allocation for ip/tcp/tcpopt part.
|
1.53 |
| 13-Dec-1999 |
itojun | sync IPv6 part with latest KAME tree. IPsec part is left unmodified due to massive changes in KAME side. - IPv6 output goes through nd6_output - faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator using heavily modified DNS servers - per-interface statistics (required for IPv6 MIB) - interface autoconfig is revisited - udp input handling has a big change for mapped address support. - introduce in4_cksum() for non-overwriting checksumming - introduce m_pulldown() - neighbor discovery cleanups/improvements - netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland) - IFA_STATS is fixed a bit (not tested) - and more more more.
TODO: - cleanup os-independency #ifdef - avoid rcvif dual use (for IPsec) to help ifdetach
(sorry for jumbo commit, I can't separate this any more...)
|
1.52 |
| 23-Sep-1999 |
itojun | branches: 1.52.2; 1.52.8; cleanup and correct TCP MSS consideration with IPsec headers.
MSS advertisement must always be: max(if mtu) - ip hdr siz - tcp hdr siz We violated this in the previous code so it was fixed.
tcp_mss_to_advertise() now takes af (af on wire) as its argument, to compute right ip hdr siz.
tcp_segsize() will take care of IPsec header size. One thing I'm not really sure is how to handle IPsec header size in *rxsegsizep (inbound segment size estimation). The current code subtracts possible *outbound* IPsec size from *rxsegsizep, hoping that the peer is using the same IPsec policy as me. It may not be applicable, could TCP gulu please comment...
|
1.51 |
| 09-Jul-1999 |
thorpej | defopt IPSEC and IPSEC_ESP (both into opt_ipsec.h).
|
1.50 |
| 02-Jul-1999 |
fvdl | Fix for -Wunitialized warnings broke compiles without INET6, refix.
|
1.49 |
| 02-Jul-1999 |
itojun | avoid "variable not initialized" warnings on some of the platforms.
|
1.48 |
| 01-Jul-1999 |
itojun | IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628. (Sorry for a big commit, I can't separate this into several pieces...) Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.
- sys/kern: do not assume single mbuf, accept chained mbuf on passing data from userland to kernel (or other way round). - "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ package (ftp://ftp.csl.sony.co.jp/pub/kjc/). - sys/netinet/tcp*: IPv4/v6 dual stack tcp support. - sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those file to be there so we patch it up. - sys/netinet: IPsec additions are here and there. - sys/netinet6/*: most of IPv6 code sits here. - sys/netkey: IPsec key management code - dev/pci/pcidevs: regen
In my understanding no code here is subject to export control so it should be safe.
|
1.47 |
| 20-Jan-1999 |
thorpej | branches: 1.47.4; 1.47.6; Fix a problem pointed out by Charles Hannum; DF wasn't being set in SYN,ACK packets during Path MTU Discovery. Fix tcp_respond() to do the appropriate route lookup and set DF as appropriate.
Also, fixup similar code in tcp_output() to relookup the route if it is down.
|
1.46 |
| 16-Dec-1998 |
thorpej | Delay sending if SS_MORETOCOME is set in so_state. This avoids the case where the user issued a write with a length greater than MLEN but less than MINCLSIZE, thus causing two mbufs to be used. The loop in sosend() would then call PRU_SEND twice, causing TCP to transmit 2 packets when it could have transmitted one.
Suggested by Justin Walker <justin@apple.com> on the freebsd-net mailing list.
|
1.45 |
| 06-Oct-1998 |
matt | Add a sysctl for newreno (default to off).
|
1.44 |
| 04-Oct-1998 |
matt | Adapt the NEWRENO changes from the UCSB diffs of BSDI 3.0's TCP to NetBSD. Ignore the SACK & FACK stuff for now.
|
1.43 |
| 21-Jul-1998 |
mycroft | Implement a better fix for the `gratuitous FIN' problem, as mentioned on tcp-impl but with a bit more commentary.
|
1.42 |
| 17-Jul-1998 |
thorpej | Add a comment wrt. a current issue w/ CWM.
|
1.41 |
| 17-Jul-1998 |
thorpej | Comment where the Restart Window is computed, and in the non-CWM case, make sure it never _increases_ cwnd.
|
1.40 |
| 07-Jul-1998 |
sommerfe | Delete bogus (void) cast of m_freem (which is already a void function..)
|
1.39 |
| 11-May-1998 |
thorpej | Nuke TUBA per my note to tech-net; there's no reason to keep it around.
|
1.38 |
| 06-May-1998 |
thorpej | Use macros from tcp_timer.h to manipulate TCP timers, so that their implementation can be changed easily.
|
1.37 |
| 02-May-1998 |
thorpej | Correct a comment related to Congestion Window Monitoring.
|
1.36 |
| 30-Apr-1998 |
thorpej | In the CWM code, don't use the Floyd initial window computation as the burst size allowed, but rather a fixed number of packets, as described in the Internet Draft. Default allowed burst is 4 packets, per the Draft.
Make the use of CWM and the allowed burst size tunable via sysctl.
|
1.35 |
| 29-Apr-1998 |
kml | Add support for deletion of routes added by path MTU discovery; uses new generic route timeout code. Add sysctl for timeout period.
|
1.34 |
| 13-Apr-1998 |
kml | Fix to ensure that the correct MSS is advertised for loopback TCP connections by using the MTU of the interface. Also added a knob, mss_ifmtu, to force all connections to use the MTU of the interface to calculate the advertised MSS.
|
1.33 |
| 01-Apr-1998 |
thorpej | Implement Congestion Window Monitoring as described in the TCPIMPL meeting of IETF #41 by Amy Hughes <ahughes@isi.edu>, and in an upcoming internet draft from Hughes, Touch, and Heidemann.
CWM eliminates line-rate bursts after idle periods by counting pending (unacknowledged) packets and limiting the congestion window to the initial congestion window plus the pending packet count. This has the effect of allowing us to use the window as long as we continue to transmit, but as soon as we stop transmitting, we go back to a slow-start (also known as `use it or lose it').
This is not enabled by default. You can enable this behavior by patching the "tcp_cwm" global (set it to non-zero) or by building a kernel with the TCP_CWM option.
|
1.32 |
| 31-Mar-1998 |
thorpej | Fix a potential-congestion case in the larger initial congestion window code, as clarified in the TCPIMPL WG meeting at IETF #41: If the SYN (active open) or SYN,ACK (passive open) was retransmitted, the initial congestion window for the first slow start of that connection must be one segment.
|
1.31 |
| 24-Mar-1998 |
kml | Ensure that we take the IP option length into account when we calculate the effective maximum send size for TCP. ip_optlen() and tcp_optlen() should probably be inlined for efficiency.
|
1.30 |
| 19-Mar-1998 |
kml | Fix a retransmission bug introduced by the Brakmo and Peterson RTO estimation changes. Under some circumstances it would return a value of 0, while the old Van Jacobson RTO code would return a minimum of 3. This would result in 12 retransmissions, each 1 second apart. This takes care of those instances, and ensures that t_rttmin is used everywhere as a lower bound.
|
1.29 |
| 17-Mar-1998 |
kml | Ensure that the TCP segment size reflects the size of TCP options in the packet. This fixes a bug that was resulting in extra packets in retransmissions (the second packet would be 12 bytes long, reflecting the RFC1323 timestamp option size).
|
1.28 |
| 19-Feb-1998 |
thorpej | Update copyright (sigh, should have done this long ago).
|
1.27 |
| 05-Jan-1998 |
thorpej | Finishing merging 4.4BSD-Lite2 netinet. At this point, the only changes left were SCCS IDs and Copyright dates.
|
1.26 |
| 31-Dec-1997 |
thorpej | Implement a queue for delayed ACK processing. This queue is used in tcp_fasttimo() in lieu of scanning all open TCP connections.
|
1.25 |
| 17-Dec-1997 |
thorpej | From 4.4BSD-Lite2: - If we fail to allocate mbufs for the outgoing segment, free the header and abort.
From Stevens: - Ensure the persist timer is running if the send window reaches zero. Part of the fix for kern/2335 (pete@daemon.net).
|
1.24 |
| 11-Dec-1997 |
thorpej | Implement an infrastructure to allow larger initial congestion windows. The sysctl'able variable "tcp_init_win", when set to 0, selects an auto-tuning algorithm for selecting the initial window, based on transmit segment size, per discussion in the IETF tcpimpl working group.
Default initial window is still 1 segment, but will soon become 2 segments, per discussion in tcpimpl.
|
1.23 |
| 11-Dec-1997 |
thorpej | Count delayed ACKs after they have been sucessfully transmitted.
|
1.22 |
| 20-Nov-1997 |
thorpej | Add missing (implied) int to a variable declaration.
|
1.21 |
| 08-Nov-1997 |
kml | TCP MSS fixes to provide cleaner slow-start and recovery.
|
1.20 |
| 18-Oct-1997 |
kml | branches: 1.20.2; change sysctl net.inet.icmp.mtudisc to net.inet.ip.mtudisc
|
1.19 |
| 17-Oct-1997 |
kml | Path MTU Discovery support. This is turned off by default. Use sysctl -w net.inet.icmp.mtudisc=1 to turn on. Still to come: path removal after some period, black hole detection
|
1.18 |
| 08-Oct-1997 |
thorpej | Fix an oversight in my previous MSS-related changes:
Basically, in silly window avoidance, don't use the raw MSS we advertised to the peer. What we really want here is the _expected_ size of received segments, so we need to account for the path MTU (eventually; right now, the interface MTU for "local" addresses and loopback or tcp_mssdflt for non-local addresses). Without this, silly window avoidance would never kick in if we advertised a very large (e.g. ~64k) MSS to the peer.
|
1.17 |
| 22-Sep-1997 |
thorpej | Fix several annoyances related to MSS handling in BSD TCP: - Don't overload t_maxseg. Previous behavior was to set it to the min of the peer's advertised MSS, our advertised MSS, and tcp_mssdflt (for non-local networks). This breaks PMTU discovery running on either host. Instead, remember the MSS we advertise, and use it as appropriate (in silly window avoidance). - Per last bullet, split tcp_mss() into several functions for handling MSS (ours and peer's), and performing various tasks when a connection becomes ESTABLISHED. - Introduce a new function, tcp_segsize(), which computes the max size for every segment transmitted in tcp_output(). This will eventually be used to hook in PMTU discovery.
|
1.16 |
| 03-Jun-1997 |
kml | branches: 1.16.4; Fix urgent pointer overflow problems when used with large windows
|
1.15 |
| 10-Dec-1996 |
mycroft | Fix RTT scaling problems introduced with Brakmo and Peterson changes.
|
1.14 |
| 13-Feb-1996 |
christos | branches: 1.14.4; netinet prototypes
|
1.13 |
| 13-Apr-1995 |
cgd | oops; missed the chance to fix a cast, that then becamse a compiler warning.
|
1.12 |
| 13-Apr-1995 |
cgd | be a bit more careful and explicit with types. (basically a large no-op.)
|
1.11 |
| 23-Jan-1995 |
mycroft | Fix a condition where we sometimes sent a FIN too early. Also, a small optimization.
|
1.10 |
| 29-Jun-1994 |
cgd | New RCS ID's, take two. they're more aesthecially pleasant, and use 'NetBSD'
|
1.9 |
| 13-May-1994 |
mycroft | Update to 4.4-Lite networking code, with a few local changes.
|
1.8 |
| 12-Apr-1994 |
mycroft | Acks with no data should have the highest sequence number sent.
|
1.7 |
| 10-Jan-1994 |
mycroft | Should compile now with or without `options MULTICAST'.
|
1.6 |
| 08-Jan-1994 |
mycroft | Prototypes.
|
1.5 |
| 08-Jan-1994 |
mycroft | Fix some inconsistent spacing; spaces at the end of lines, etc.
|
1.4 |
| 18-Dec-1993 |
mycroft | Canonicalize all #includes.
|
1.3 |
| 22-May-1993 |
cgd | add include of select.h if necessary for protos, or delete if extraneous
|
1.2 |
| 18-May-1993 |
cgd | make kernel select interface be one-stop shopping & clean it all up.
|
1.1 |
| 21-Mar-1993 |
cgd | branches: 1.1.1; Initial revision
|
1.1.1.3 |
| 05-Jan-1998 |
thorpej | Import sys/netinet from 4.4BSD-Lite2 for reference purposes.
|
1.1.1.2 |
| 05-Jan-1998 |
thorpej | Import sys/netinet from 4.4BSD-Lite for reference purposes.
|
1.1.1.1 |
| 21-Mar-1993 |
cgd | initial import of 386bsd-0.1 sources
|
1.14.4.1 |
| 10-Dec-1996 |
mycroft | From trunk: Fix RTT scaling problems introduced with Brakmo and Peterson changes.
|
1.16.4.2 |
| 14-Oct-1997 |
thorpej | Update marc-pcmcia branch from trunk.
|
1.16.4.1 |
| 29-Sep-1997 |
thorpej | Update marc-pcmcia branch from trunk.
|
1.20.2.6 |
| 09-May-1998 |
mycroft | Pull up patch from kml.
|
1.20.2.5 |
| 05-May-1998 |
mycroft | Pull up 1.29, per request of kml.
|
1.20.2.4 |
| 05-May-1998 |
mycroft | Pull up 1.30, per request of kml.
|
1.20.2.3 |
| 29-Jan-1998 |
mellon | Pull up 1.24-1.27 (thorpej)
|
1.20.2.2 |
| 21-Nov-1997 |
thorpej | Sync w/ trunk: add a missing (previously implied) int.
|
1.20.2.1 |
| 08-Nov-1997 |
thorpej | Pull up from trunk: TCP MSS fixes to provide cleaner slow-start and recovery. (kml)
|
1.47.6.3 |
| 30-Nov-1999 |
itojun | bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch just for reference purposes. This commit includes 1.4 -> 1.4.1 sync for kame branch.
The branch does not compile at all (due to the lack of ALTQ and some other source code). Please do not try to modify the branch, this is just for referenre purposes.
synchronization to latest KAME will take place on HEAD branch soon.
|
1.47.6.2 |
| 06-Jul-1999 |
itojun | KAME/NetBSD 1.4, SNAP kit 1999/07/05. NOTE: this branch is just for reference purposes (i.e. for taking cvs diff). do not touch anything on the branch. actual work must be done on HEAD branch.
|
1.47.6.1 |
| 28-Jun-1999 |
itojun | KAME/NetBSD 1.4 SNAP kit, dated 19990628.
NOTE: this branch (kame) is used just for refernce. this may not compile due to multiple reasons.
|
1.47.4.2 |
| 02-Aug-1999 |
thorpej | Update from trunk.
|
1.47.4.1 |
| 01-Jul-1999 |
thorpej | Sync w/ -current.
|
1.52.8.1 |
| 27-Dec-1999 |
wrstuden | Pull up to last week's -current.
|
1.52.2.5 |
| 21-Apr-2001 |
bouyer | Sync with HEAD
|
1.52.2.4 |
| 27-Mar-2001 |
bouyer | Sync with HEAD.
|
1.52.2.3 |
| 11-Feb-2001 |
bouyer | Sync with HEAD.
|
1.52.2.2 |
| 22-Nov-2000 |
bouyer | Sync with HEAD.
|
1.52.2.1 |
| 20-Nov-2000 |
bouyer | Update thorpej_scsipi to -current as of a month ago
|
1.56.4.5 |
| 24-Jan-2002 |
he | Pull up revision 1.77 (requested by itojun): Clean up the NRL copyright.
|
1.56.4.4 |
| 06-Apr-2001 |
he | Pull up revision 1.63 (requested by itojun): Record IPsec packet history in m_aux structure. Let ipfilter look at wire-format packet only (not the decapsulated ones), so that VPN setting can work with NAT/ipfilter settings.
|
1.56.4.3 |
| 10-Nov-2000 |
tv | Pullup 1.62 [itojun]: fix IPv4 TTL selection with AF_INET6 API. sync with kame. From: jdc
|
1.56.4.2 |
| 15-Aug-2000 |
itojun | pullup 1.57 -> 1.58 (approved by releng-1-5)
> forgot to call tcp6_quench(). sync with kame.
|
1.56.4.1 |
| 23-Jul-2000 |
itojun | pullup from main trunc (approved by releng-1-5)
remove old mbuf assumption (ip header and tcp header are on the same mbuf). this is for m_pulldown use. (sync with kame)
1.108 -> 1.109 syssrc/sys/netinet/tcp_input.c 1.56 -> 1.57 syssrc/sys/netinet/tcp_output.c 1.91 -> 1.92 syssrc/sys/netinet/tcp_subr.c
|
1.63.2.14 |
| 11-Dec-2002 |
thorpej | Sync with HEAD.
|
1.63.2.13 |
| 11-Nov-2002 |
nathanw | Catch up to -current
|
1.63.2.12 |
| 17-Sep-2002 |
nathanw | Catch up to -current.
|
1.63.2.11 |
| 27-Aug-2002 |
nathanw | Catch up to -current.
|
1.63.2.10 |
| 20-Jun-2002 |
nathanw | Catch up to -current.
|
1.63.2.9 |
| 04-May-2002 |
thorpej | Update from trunk.
|
1.63.2.8 |
| 01-Apr-2002 |
nathanw | Catch up to -current. (CVS: It's not just a program. It's an adventure!)
|
1.63.2.7 |
| 28-Feb-2002 |
nathanw | Catch up to -current.
|
1.63.2.6 |
| 08-Jan-2002 |
nathanw | Catch up to -current.
|
1.63.2.5 |
| 14-Nov-2001 |
nathanw | Catch up to -current.
|
1.63.2.4 |
| 21-Sep-2001 |
nathanw | Catch up to -current.
|
1.63.2.3 |
| 24-Aug-2001 |
nathanw | Catch up with -current.
|
1.63.2.2 |
| 21-Jun-2001 |
nathanw | Catch up to -current.
|
1.63.2.1 |
| 09-Apr-2001 |
nathanw | Catch up with -current.
|
1.67.2.8 |
| 10-Oct-2002 |
jdolecek | sync kqueue with -current; this includes merge of gehenna-devsw branch, merge of i386 MP branch, and part of autoconf rototil work
|
1.67.2.7 |
| 06-Sep-2002 |
jdolecek | sync kqueue branch with HEAD
|
1.67.2.6 |
| 23-Jun-2002 |
jdolecek | catch up with -current on kqueue branch
|
1.67.2.5 |
| 16-Mar-2002 |
jdolecek | Catch up with -current.
|
1.67.2.4 |
| 11-Feb-2002 |
jdolecek | Sync w/ -current.
|
1.67.2.3 |
| 10-Jan-2002 |
thorpej | Sync kqueue branch with -current.
|
1.67.2.2 |
| 13-Sep-2001 |
thorpej | Update the kqueue branch to HEAD.
|
1.67.2.1 |
| 03-Aug-2001 |
lukem | update to -current
|
1.70.2.1 |
| 01-Oct-2001 |
fvdl | Catch up with -current.
|
1.79.4.5 |
| 07-Feb-2004 |
jmc | Pullup rev 1.107 (requested by itojun in ticket #1605)
Deal with IPv6 path MTU < 1280 (RFC2460 section 5 last paragraph) Check if there really is room for TCP data.
|
1.79.4.4 |
| 05-Sep-2003 |
tron | Pull up revision 1.80 (requested by tls in ticket #1445): path MTU discovery blackhole detection. PR 12790 (sorry for not committing it for a long time)
|
1.79.4.3 |
| 30-Nov-2002 |
he | Pull up revision 1.86 (requested by thorpej in ticket #795): In the txsegsize bounding code, it is not necessary to adjust for the options length.
|
1.79.4.2 |
| 21-Nov-2002 |
he | Pull up revision 1.85 (requested by thorpej in ticket #707): Never send more than half a socket buffer of data in a segment. This ensures that we can always keep 2 packets on the wire, and we will therefore not cause any delayed ACKs. Otherwise, this causes performance problems when using large-MTU interfaces, such as the loopback interface.
|
1.79.4.1 |
| 14-Jun-2002 |
lukem | Pull up revision 1.83 (requested by thorpej in ticket #267): Disable TCP Congestion Window Monitoring by default; there are performance problems in the face of tinygrams.
|
1.79.2.3 |
| 29-Aug-2002 |
gehenna | catch up with -current.
|
1.79.2.2 |
| 20-Jun-2002 |
gehenna | catch up with -current.
|
1.79.2.1 |
| 30-May-2002 |
gehenna | Catch up with -current.
|
1.94.2.9 |
| 10-Nov-2005 |
skrll | Sync with HEAD. Here we go again...
|
1.94.2.8 |
| 01-Apr-2005 |
skrll | Sync with HEAD.
|
1.94.2.7 |
| 08-Mar-2005 |
skrll | Sync with HEAD.
|
1.94.2.6 |
| 04-Mar-2005 |
skrll | Sync with HEAD.
Hi Perry!
|
1.94.2.5 |
| 04-Feb-2005 |
skrll | Sync with HEAD.
|
1.94.2.4 |
| 18-Dec-2004 |
skrll | Sync with HEAD.
|
1.94.2.3 |
| 21-Sep-2004 |
skrll | Fix the sync with head I botched.
|
1.94.2.2 |
| 18-Sep-2004 |
skrll | Sync with HEAD.
|
1.94.2.1 |
| 03-Aug-2004 |
skrll | Sync with HEAD
|
1.108.2.1 |
| 11-May-2004 |
tron | Pull up revision 1.112 (requested by chs in ticket #292): work around an LP64 problem where we report an excessively large window due to incorrect mixing of types.
|
1.115.4.2 |
| 19-Mar-2005 |
yamt | sync with head. xen and whitespace. xen part is not finished.
|
1.115.4.1 |
| 12-Feb-2005 |
yamt | sync with head.
|
1.115.2.1 |
| 29-Apr-2005 |
kent | sync with -current
|
1.128.2.5 |
| 11-May-2005 |
tron | Pull up revision 1.134 (requested by yamt in ticket #294): tcp_output: account FIN when building sack option.
|
1.128.2.4 |
| 11-May-2005 |
tron | Pull up revision 1.133 (requested by yamt in ticket #293): tcp_output: don't try to send more data than we have. PR/30160.
|
1.128.2.3 |
| 11-May-2005 |
tron | Pull up revision 1.132 (requested by yamt in ticket #293): tcp_output: clear TH_FIN where appropriate. related to PR/30160.
|
1.128.2.2 |
| 06-May-2005 |
tron | Pull up revision 1.130 (requested by yamt in ticket #251): fix problems related to loopback interface checksum omission. PR/29971. - for ipv4, defer decision to ip layer as h/w checksum offloading does so that it can check the actual interface the packet is going to. - for ipv6, disable it. (maybe will be revisited when it implements h/w checksum offloading.) ok'ed by Jason Thorpe.
|
1.128.2.1 |
| 04-Apr-2005 |
tron | Pull up revision 1.129 (requested by yamt in ticket #89): tcp_output: lock reass queue when building sack.
|
1.136.2.5 |
| 21-Jan-2008 |
yamt | sync with head
|
1.136.2.4 |
| 03-Sep-2007 |
yamt | sync with head.
|
1.136.2.3 |
| 26-Feb-2007 |
yamt | sync with head.
|
1.136.2.2 |
| 30-Dec-2006 |
yamt | sync with head.
|
1.136.2.1 |
| 21-Jun-2006 |
yamt | sync with head.
|
1.141.12.1 |
| 28-Mar-2006 |
tron | Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
|
1.141.10.1 |
| 19-Apr-2006 |
elad | sync with head.
|
1.141.8.2 |
| 14-Sep-2006 |
yamt | sync with head.
|
1.141.8.1 |
| 01-Apr-2006 |
yamt | sync with head.
|
1.141.6.1 |
| 22-Apr-2006 |
simonb | Sync with head.
|
1.141.4.2 |
| 09-Sep-2006 |
rpaulo | sync with head
|
1.141.4.1 |
| 05-Feb-2006 |
rpaulo | <netinet6/in6_pcb.h> went away. Bye!
|
1.143.4.2 |
| 10-Dec-2006 |
yamt | sync with head.
|
1.143.4.1 |
| 22-Oct-2006 |
yamt | sync with head
|
1.143.2.2 |
| 12-Jan-2007 |
ad | Sync with head.
|
1.143.2.1 |
| 18-Nov-2006 |
ad | Sync with head.
|
1.153.4.1 |
| 04-Jun-2007 |
wrstuden | Update to today's netbsd-4.
|
1.153.2.2 |
| 03-Apr-2011 |
riz | Pull up following revision(s) (requested by spz in ticket #1424): sys/netinet/tcp_output.c: revision 1.170 Clean up setting ECN bit in TOS. Fixes PR 44742
|
1.153.2.1 |
| 24-May-2007 |
pavel | branches: 1.153.2.1.4; Pull up following revision(s) (requested by degroote in ticket #667): sys/netinet/tcp_input.c: revision 1.260 sys/netinet/tcp_output.c: revision 1.154 sys/netinet/tcp_subr.c: revision 1.210 sys/netinet6/icmp6.c: revision 1.129 sys/netinet6/in6_proto.c: revision 1.70 sys/netinet6/ip6_forward.c: revision 1.54 sys/netinet6/ip6_input.c: revision 1.94 sys/netinet6/ip6_output.c: revision 1.114 sys/netinet6/raw_ip6.c: revision 1.81 sys/netipsec/ipcomp_var.h: revision 1.4 sys/netipsec/ipsec.c: revision 1.26 via patch,1.31-1.32 sys/netipsec/ipsec6.h: revision 1.5 sys/netipsec/ipsec_input.c: revision 1.14 sys/netipsec/ipsec_netbsd.c: revision 1.18,1.26 sys/netipsec/ipsec_output.c: revision 1.21 via patch sys/netipsec/key.c: revision 1.33,1.44 sys/netipsec/xform_ipcomp.c: revision 1.9 sys/netipsec/xform_ipip.c: revision 1.15 sys/opencrypto/deflate.c: revision 1.8 Commit my SoC work Add ipv6 support for fast_ipsec Note that currently, packet with extensions headers are not correctly supported Change the ipcomp logic
Add sysctl tree to modify the fast_ipsec options related to ipv6. Similar to the sysctl kame interface.
Choose the good default policy, depending of the adress family of the desired policy
Increase the refcount for the default ipv6 policy so nobody can reclaim it
Always compute the sp index even if we don't have any sp in spd. It will let us to choose the right default policy (based on the adress family requested). While here, fix an error message
Use dynamic array instead of an static array to decompress. It lets us to decompress any data, whatever is the radio decompressed data / compressed data. It fixes the last issues with fast_ipsec and ipcomp. While here, bzero -> memset, bcopy -> memcpy, FREE -> free Reviewed a long time ago by sam@
|
1.153.2.1.4.1 |
| 03-Apr-2011 |
riz | Pull up following revision(s) (requested by spz in ticket #1424): sys/netinet/tcp_output.c: revision 1.170 Clean up setting ECN bit in TOS. Fixes PR 44742
|
1.154.2.3 |
| 07-May-2007 |
yamt | sync with head.
|
1.154.2.2 |
| 12-Mar-2007 |
rmind | Sync with HEAD.
|
1.154.2.1 |
| 27-Feb-2007 |
yamt | - sync with head. - move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
|
1.157.4.1 |
| 11-Jul-2007 |
mjf | Sync with head.
|
1.157.2.3 |
| 09-Oct-2007 |
ad | Sync with head.
|
1.157.2.2 |
| 20-Aug-2007 |
ad | Sync with HEAD.
|
1.157.2.1 |
| 08-Jun-2007 |
ad | Sync with head.
|
1.159.2.2 |
| 03-Sep-2007 |
skrll | Sync with HEAD.
|
1.159.2.1 |
| 15-Aug-2007 |
skrll | Sync with HEAD.
|
1.161.6.2 |
| 02-Aug-2007 |
yamt | make rfbuf_ts a tcp timestamp so that calculations in tcp_input make sense.
|
1.161.6.1 |
| 02-Aug-2007 |
yamt | file tcp_output.c was added on branch matt-mips64 on 2007-08-02 13:12:36 +0000
|
1.161.4.3 |
| 23-Mar-2008 |
matt | sync with HEAD
|
1.161.4.2 |
| 09-Jan-2008 |
matt | sync with HEAD
|
1.161.4.1 |
| 06-Nov-2007 |
matt | sync with HEAD
|
1.161.2.1 |
| 03-Sep-2007 |
jmcneill | Sync with HEAD.
|
1.162.12.2 |
| 19-Jan-2008 |
bouyer | Sync with HEAD
|
1.162.12.1 |
| 02-Jan-2008 |
bouyer | Sync with HEAD
|
1.162.8.1 |
| 26-Dec-2007 |
ad | Sync with head.
|
1.162.6.1 |
| 18-Feb-2008 |
mjf | Sync with HEAD.
|
1.164.6.1 |
| 02-Jun-2008 |
mjf | Sync with HEAD.
|
1.166.4.3 |
| 11-Mar-2010 |
yamt | sync with head
|
1.166.4.2 |
| 04-May-2009 |
yamt | sync with head.
|
1.166.4.1 |
| 16-May-2008 |
yamt | sync with head.
|
1.166.2.1 |
| 18-May-2008 |
yamt | sync with head.
|
1.167.20.2 |
| 24-Jul-2015 |
martin | Pull up following revision(s) (requested by matt in ticket #1973): sys/netinet/tcp_output.c: revision 1.184 sys/netinet/tcp_input.c: revision 1.343
If we are sending a window probe and there's unacked data in the socket, make sure at least the persist timer is running. Make sure that snd_win doesn't go negative.
|
1.167.20.1 |
| 29-Mar-2011 |
riz | Pull up following revision(s) (requested by spz in ticket #1586): sys/netinet/tcp_output.c: revision 1.170 Clean up setting ECN bit in TOS. Fixes PR 44742
|
1.167.16.1 |
| 29-Mar-2011 |
riz | Pull up following revision(s) (requested by spz in ticket #1586): sys/netinet/tcp_output.c: revision 1.170 Clean up setting ECN bit in TOS. Fixes PR 44742
|
1.167.14.1 |
| 13-May-2009 |
jym | Sync with HEAD.
Commit is split, to avoid a "too many arguments" protocol error.
|
1.167.10.2 |
| 24-Jul-2015 |
martin | Pull up following revision(s) (requested by matt in ticket #1973): sys/netinet/tcp_output.c: revision 1.184 sys/netinet/tcp_input.c: revision 1.343
If we are sending a window probe and there's unacked data in the socket, make sure at least the persist timer is running. Make sure that snd_win doesn't go negative.
|
1.167.10.1 |
| 29-Mar-2011 |
riz | branches: 1.167.10.1.2; Pull up following revision(s) (requested by spz in ticket #1586): sys/netinet/tcp_output.c: revision 1.170 Clean up setting ECN bit in TOS. Fixes PR 44742
|
1.167.10.1.2.1 |
| 24-Jul-2015 |
martin | Pull up following revision(s) (requested by matt in ticket #1973): sys/netinet/tcp_output.c: revision 1.184 sys/netinet/tcp_input.c: revision 1.343
If we are sending a window probe and there's unacked data in the socket, make sure at least the persist timer is running. Make sure that snd_win doesn't go negative.
|
1.167.8.1 |
| 28-Apr-2009 |
skrll | Sync with HEAD.
|
1.169.6.1 |
| 06-Jun-2011 |
jruoho | Sync with HEAD.
|
1.169.4.1 |
| 21-Apr-2011 |
rmind | sync with head
|
1.171.8.2 |
| 05-Apr-2012 |
mrg | sync to latest -current.
|
1.171.8.1 |
| 18-Feb-2012 |
mrg | merge to -current.
|
1.171.4.2 |
| 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
1.171.4.1 |
| 17-Apr-2012 |
yamt | sync with head
|
1.173.8.2 |
| 24-Jul-2015 |
martin | Pull up following revision(s) (requested by matt in ticket #1315): sys/netinet/tcp_output.c: revision 1.184 sys/netinet/tcp_input.c: revision 1.343
If we are sending a window probe and there's unacked data in the socket, make sure at least the persist timer is running. Make sure that snd_win doesn't go negative.
|
1.173.8.1 |
| 03-Nov-2014 |
msaitoh | Pull up following revision(s) (requested by christos in ticket #1174): sys/netinet/tcp_output.c: revision 1.178 Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks to Jonathan Looney for pointing this out.
|
1.173.6.2 |
| 24-Jul-2015 |
martin | Pull up following revision(s) (requested by matt in ticket #1315): sys/netinet/tcp_output.c: revision 1.184 sys/netinet/tcp_input.c: revision 1.343
If we are sending a window probe and there's unacked data in the socket, make sure at least the persist timer is running. Make sure that snd_win doesn't go negative.
|
1.173.6.1 |
| 03-Nov-2014 |
msaitoh | Pull up following revision(s) (requested by christos in ticket #1174): sys/netinet/tcp_output.c: revision 1.178 Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks to Jonathan Looney for pointing this out.
|
1.173.2.2 |
| 24-Jul-2015 |
martin | Pull up following revision(s) (requested by matt in ticket #1315): sys/netinet/tcp_output.c: revision 1.184 sys/netinet/tcp_input.c: revision 1.343
If we are sending a window probe and there's unacked data in the socket, make sure at least the persist timer is running. Make sure that snd_win doesn't go negative.
|
1.173.2.1 |
| 03-Nov-2014 |
msaitoh | Pull up following revision(s) (requested by christos in ticket #1174): sys/netinet/tcp_output.c: revision 1.178 Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks to Jonathan Looney for pointing this out.
|
1.174.2.3 |
| 03-Dec-2017 |
jdolecek | update from HEAD
|
1.174.2.2 |
| 20-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
1.174.2.1 |
| 23-Jun-2013 |
tls | resync from head
|
1.175.6.1 |
| 10-Aug-2014 |
tls | Rebase.
|
1.175.2.1 |
| 17-Jul-2013 |
rmind | Checkpoint work in progress: - Move PCB structures under __INPCB_PRIVATE, adjust most of the callers and thus make IPv4 PCB structures mostly opaque. Any volunteers for merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)? - Move various global vars to the modules where they belong, make them static. - Some preliminary work for IPv4 PCB locking scheme. - Make raw IP code mostly MP-safe. Simplify some of it. - Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should run from a software interrupt, rather than hard. - Rework tun(4) pseudo interface to be MP-safe. - Work towards making some other interfaces more strict.
|
1.176.2.5 |
| 24-Jul-2015 |
martin | Pull up following revision(s) (requested by matt in ticket #886): sys/netinet/tcp_output.c: revision 1.184 sys/netinet/tcp_input.c: revision 1.343
If we are sending a window probe and there's unacked data in the socket, make sure at least the persist timer is running. Make sure that snd_win doesn't go negative.
|
1.176.2.4 |
| 21-Feb-2015 |
martin | Pull up following revision(s) (requested by he in ticket #530): sys/netinet/tcp_output.c: revision 1.180 sys/netinet/tcp_input.c: revision 1.336 sys/netinet/tcp_usrreq.c: revision 1.203 share/man/man4/tcp.4: revision 1.30 sys/netinet/tcp.h: revision 1.31 sys/netinet/tcp_subr.c: revision 1.258 sys/netinet/tcp_var.h: revision 1.176 sys/netinet/tcp_var.h: revision 1.177 sys/sys/param.h: bump revision
Port over the TCP_INFO socket option from FreeBSD, originally from the Linux 2.6 TCP API. This permits the caller to query certain information about a TCP connection, and is used by pkgsrc's net/iperf3 test program if available.
This extends struct tcbcb with three fields to count retransmits, out-of-sequence receives and zero window announcements, and will therefore warrant a kernel revision bump (done separately).
Change the new counter variables in struct tcpcb to uint32_t, as per christos' comments.
|
1.176.2.3 |
| 17-Jan-2015 |
martin | Pull up following revision(s) (requested by maxv in ticket #427): sys/compat/svr4/svr4_schedctl.c: revision 1.8 sys/netinet/tcp_timer.c: revision 1.88 sys/miscfs/genfs/layer_vfsops.c: revision 1.45 sys/compat/svr4/svr4_ioctl.c: revision 1.37 sys/ufs/chfs/chfs_vfsops.c: revision 1.14 sys/miscfs/fdesc/fdesc_vfsops.c: revision 1.91 sys/compat/linux/arch/i386/linux_ptrace.c: revision 1.30 sys/compat/common/kern_time_50.c: revision 1.28 sys/netinet6/ip6_forward.c: revision 1.74 sys/miscfs/umapfs/umap_vnops.c: revision 1.57 sys/compat/svr4/svr4_fcntl.c: revision 1.74 distrib/sets/lists/comp/mi: revision 1.1931 sys/netinet6/udp6_output.c: revision 1.46 sys/fs/puffs/puffs_compat.c: revision 1.3 sys/fs/udf/udf_rename.c: revision 1.11 sys/compat/svr4/svr4_filio.c: revision 1.24 sys/fs/udf/udf_rename.c: revision 1.12 sys/netinet/tcp_usrreq.c: revision 1.202 sys/miscfs/umapfs/umap_subr.c: revision 1.29 sys/compat/linux/common/linux_fadvise64.c: revision 1.3 sys/netinet/if_atm.c: revision 1.34 sys/miscfs/procfs/procfs_subr.c: revision 1.106 sys/miscfs/genfs/layer_subr.c: revision 1.37 sys/netinet/tcp_sack.c: revision 1.30 sys/compat/freebsd/freebsd_misc.c: revision 1.33 sys/compat/freebsd/freebsd_file.c: revision 1.33 sys/ufs/chfs/chfs_vnode.c: revision 1.12 sys/compat/svr4/svr4_ttold.c: revision 1.34 sys/compat/linux/common/linux_file.c: revision 1.114 sys/compat/linux/arch/mips/linux_machdep.c: revision 1.43 sys/compat/linux/common/linux_signal.c: revision 1.76 sys/compat/common/compat_util.c: revision 1.46 sys/compat/linux/arch/arm/linux_ptrace.c: revision 1.18 sys/compat/svr4/svr4_sockio.c: revision 1.36 sys/compat/linux/arch/arm/linux_machdep.c: revision 1.32 sys/compat/svr4/svr4_signal.c: revision 1.66 sys/kern/kern_exec.c: revision 1.410 sys/fs/puffs/puffs_vfsops.c: revision 1.115 sys/compat/svr4/svr4_exec_elf64.c: revision 1.15 sys/compat/linux/arch/i386/linux_machdep.c: revision 1.159 sys/compat/linux/arch/alpha/linux_machdep.c: revision 1.50 sys/compat/linux32/common/linux32_misc.c: revision 1.24 sys/netinet/in_pcb.c: revision 1.153 sys/sys/malloc.h: revision 1.116 sys/compat/common/if_43.c: revision 1.9 share/man/man9/Makefile: revision 1.380 sys/netinet/tcp_vtw.c: revision 1.12 sys/miscfs/umapfs/umap_vfsops.c: revision 1.95 sys/ufs/ext2fs/ext2fs_vfsops.c: revision 1.186 sys/compat/common/uipc_syscalls_43.c: revision 1.46 sys/ufs/ext2fs/ext2fs_vnops.c: revision 1.115 sys/fs/puffs/puffs_msgif.c: revision 1.97 sys/compat/svr4/svr4_ipc.c: revision 1.27 sys/compat/linux/common/linux_exec.c: revision 1.117 sys/ufs/ext2fs/ext2fs_readwrite.c: revision 1.66 sys/netinet/tcp_output.c: revision 1.179 sys/compat/svr4/svr4_termios.c: revision 1.28 sys/fs/udf/udf_strat_bootstrap.c: revision 1.4 sys/fs/puffs/puffs_subr.c: revision 1.67 sys/fs/puffs/puffs_node.c: revision 1.36 sys/miscfs/overlay/overlay_vnops.c: revision 1.21 sys/fs/cd9660/cd9660_node.c: revision 1.34 sys/netinet/raw_ip.c: revision 1.146 sys/sys/mallocvar.h: revision 1.13 sys/miscfs/overlay/overlay_vfsops.c: revision 1.63 share/man/man9/malloc.9: revision 1.50 sys/netinet6/dest6.c: revision 1.18 sys/compat/linux/common/linux_uselib.c: revision 1.33 sys/compat/linux/common/linux_socket.c: revision 1.120 share/man/man9/malloc.9: revision 1.51 sys/netinet/tcp_subr.c: revision 1.257 sys/compat/linux/common/linux_socketcall.c: revision 1.45 sys/compat/linux/common/linux_fadvise64_64.c: revision 1.3 sys/compat/freebsd/freebsd_ipc.c: revision 1.17 sys/compat/linux/common/linux_misc_notalpha.c: revision 1.109 sys/compat/linux/arch/alpha/linux_pipe.c: revision 1.17 sys/netinet6/in6_pcb.c: revision 1.132 sys/netinet6/in6_ifattach.c: revision 1.94 sys/compat/svr4/svr4_exec_elf32.c: revision 1.15 sys/miscfs/nullfs/null_vfsops.c: revision 1.90 sys/fs/cd9660/cd9660_util.c: revision 1.12 sys/compat/linux/arch/powerpc/linux_machdep.c: revision 1.48 sys/compat/freebsd/freebsd_exec_elf32.c: revision 1.20 sys/miscfs/procfs/procfs_vfsops.c: revision 1.94 sys/compat/linux/arch/powerpc/linux_ptrace.c: revision 1.28 sys/compat/linux/common/linux_sched.c: revision 1.67 sys/compat/linux/common/linux_exec_aout.c: revision 1.67 sys/compat/linux/common/linux_pipe.c: revision 1.67 sys/compat/linux/common/linux_llseek.c: revision 1.34 sys/compat/linux/arch/mips/linux_ptrace.c: revision 1.10 Do not uselessly include <sys/malloc.h>. Cleanup: - remove struct kmembuckets (dead) - correctly deadify MALLOC_XX - remove MALLOC_DEFINE_LIMIT and MALLOC_JUSTDEFINE_LIMIT (dead) - remove malloc_roundup(), malloc_type_setlimit(), MALLOC_DEFINE_LIMIT() and MALLOC_JUSTDEFINE_LIMIT() from man 9 malloc New sentence, new line. Bump date for previous. Obsolete malloc_roundup(9), malloc_type_setlimit(9) and MALLOC_DEFINE_LIMIT(9) man pages.
|
1.176.2.2 |
| 26-Oct-2014 |
martin | Pull up following revision(s) (requested by christos in ticket #157): sys/netinet/tcp_output.c: revision 1.178 Avoid stack overflow when SACK and TCP_SIGNATURE are both present. Thanks to Jonathan Looney for pointing this out.
|
1.176.2.1 |
| 24-Oct-2014 |
martin | Pull up following revision(s) (requested by hikaru in ticket #154): sys/netinet/tcp_output.c: revision 1.177 Fix wrong condition checking TSO capability. ipsec_used is not necessary condition. IPsec outbound policy will not be checked when ipsec_used is false.
|
1.179.2.6 |
| 28-Aug-2017 |
skrll | Sync with HEAD
|
1.179.2.5 |
| 05-Feb-2017 |
skrll | Sync with HEAD
|
1.179.2.4 |
| 09-Jul-2016 |
skrll | Sync with HEAD
|
1.179.2.3 |
| 22-Sep-2015 |
skrll | Sync with HEAD
|
1.179.2.2 |
| 06-Jun-2015 |
skrll | Sync with HEAD
|
1.179.2.1 |
| 06-Apr-2015 |
skrll | Sync with HEAD
|
1.186.2.2 |
| 20-Mar-2017 |
pgoyette | Sync with HEAD
|
1.186.2.1 |
| 07-Jan-2017 |
pgoyette | Sync with HEAD. (Note that most of these changes are simply $NetBSD$ tag issues.)
|
1.194.2.1 |
| 21-Apr-2017 |
bouyer | Sync with HEAD
|
1.196.2.1 |
| 21-Oct-2017 |
snj | Pull up following revision(s) (requested by ozaki-r in ticket #300): crypto/dist/ipsec-tools/src/setkey/parse.y: 1.19 crypto/dist/ipsec-tools/src/setkey/token.l: 1.20 distrib/sets/lists/tests/mi: 1.754, 1.757, 1.759 doc/TODO.smpnet: 1.12-1.13 sys/net/pfkeyv2.h: 1.32 sys/net/raw_cb.c: 1.23-1.24, 1.28 sys/net/raw_cb.h: 1.28 sys/net/raw_usrreq.c: 1.57-1.58 sys/net/rtsock.c: 1.228-1.229 sys/netinet/in_proto.c: 1.125 sys/netinet/ip_input.c: 1.359-1.361 sys/netinet/tcp_input.c: 1.359-1.360 sys/netinet/tcp_output.c: 1.197 sys/netinet/tcp_var.h: 1.178 sys/netinet6/icmp6.c: 1.213 sys/netinet6/in6_proto.c: 1.119 sys/netinet6/ip6_forward.c: 1.88 sys/netinet6/ip6_input.c: 1.181-1.182 sys/netinet6/ip6_output.c: 1.193 sys/netinet6/ip6protosw.h: 1.26 sys/netipsec/ipsec.c: 1.100-1.122 sys/netipsec/ipsec.h: 1.51-1.61 sys/netipsec/ipsec6.h: 1.18-1.20 sys/netipsec/ipsec_input.c: 1.44-1.51 sys/netipsec/ipsec_netbsd.c: 1.41-1.45 sys/netipsec/ipsec_output.c: 1.49-1.64 sys/netipsec/ipsec_private.h: 1.5 sys/netipsec/key.c: 1.164-1.234 sys/netipsec/key.h: 1.20-1.32 sys/netipsec/key_debug.c: 1.18-1.21 sys/netipsec/key_debug.h: 1.9 sys/netipsec/keydb.h: 1.16-1.20 sys/netipsec/keysock.c: 1.59-1.62 sys/netipsec/keysock.h: 1.10 sys/netipsec/xform.h: 1.9-1.12 sys/netipsec/xform_ah.c: 1.55-1.74 sys/netipsec/xform_esp.c: 1.56-1.72 sys/netipsec/xform_ipcomp.c: 1.39-1.53 sys/netipsec/xform_ipip.c: 1.50-1.54 sys/netipsec/xform_tcp.c: 1.12-1.16 sys/rump/librump/rumpkern/Makefile.rumpkern: 1.170 sys/rump/librump/rumpnet/net_stub.c: 1.27 sys/sys/protosw.h: 1.67-1.68 tests/net/carp/t_basic.sh: 1.7 tests/net/if_gif/t_gif.sh: 1.11 tests/net/if_l2tp/t_l2tp.sh: 1.3 tests/net/ipsec/Makefile: 1.7-1.9 tests/net/ipsec/algorithms.sh: 1.5 tests/net/ipsec/common.sh: 1.4-1.6 tests/net/ipsec/t_ipsec_ah_keys.sh: 1.2 tests/net/ipsec/t_ipsec_esp_keys.sh: 1.2 tests/net/ipsec/t_ipsec_gif.sh: 1.6-1.7 tests/net/ipsec/t_ipsec_l2tp.sh: 1.6-1.7 tests/net/ipsec/t_ipsec_misc.sh: 1.8-1.18 tests/net/ipsec/t_ipsec_sockopt.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_tcp.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_transport.sh: 1.5-1.6 tests/net/ipsec/t_ipsec_tunnel.sh: 1.9 tests/net/ipsec/t_ipsec_tunnel_ipcomp.sh: 1.1-1.2 tests/net/ipsec/t_ipsec_tunnel_odd.sh: 1.3 tests/net/mcast/t_mcast.sh: 1.6 tests/net/net/t_ipaddress.sh: 1.11 tests/net/net_common.sh: 1.20 tests/net/npf/t_npf.sh: 1.3 tests/net/route/t_flags.sh: 1.20 tests/net/route/t_flags6.sh: 1.16 usr.bin/netstat/fast_ipsec.c: 1.22 Do m_pullup before mtod
It may fix panicks of some tests on anita/sparc and anita/GuruPlug. --- KNF --- Enable DEBUG for babylon5 --- Apply C99-style struct initialization to xformsw --- Tweak outputs of netstat -s for IPsec
- Get rid of "Fast" - Use ipsec and ipsec6 for titles to clarify protocol - Indent outputs of sub protocols
Original outputs were organized like this:
(Fast) IPsec: IPsec ah: IPsec esp: IPsec ipip: IPsec ipcomp: (Fast) IPsec: IPsec ah: IPsec esp: IPsec ipip: IPsec ipcomp:
New outputs are organized like this:
ipsec: ah: esp: ipip: ipcomp: ipsec6: ah: esp: ipip: ipcomp: --- Add test cases for IPComp --- Simplify IPSEC_OSTAT macro (NFC) --- KNF; replace leading whitespaces with hard tabs --- Introduce and use SADB_SASTATE_USABLE_P --- KNF --- Add update command for testing
Updating an SA (SADB_UPDATE) requires that a process issuing SADB_UPDATE is the same as a process issued SADB_ADD (or SADB_GETSPI). This means that update command must be used with add command in a configuration of setkey. This usage is normally meaningless but useful for testing (and debugging) purposes. --- Add test cases for updating SA/SP
The tests require newly-added udpate command of setkey. --- PR/52346: Frank Kardel: Fix checksumming for NAT-T See XXX for improvements. --- Remove codes for PACKET_TAG_IPSEC_IN_CRYPTO_DONE
It seems that PACKET_TAG_IPSEC_IN_CRYPTO_DONE is for network adapters that have IPsec accelerators; a driver sets the mtag to a packet when its device has already encrypted the packet.
Unfortunately no driver implements such offload features for long years and seems unlikely to implement them soon. (Note that neither FreeBSD nor Linux doesn't have such drivers.) Let's remove related (unused) codes and simplify the IPsec code. --- Fix usages of sadb_msg_errno --- Avoid updating sav directly
On SADB_UPDATE a target sav was updated directly, which was unsafe. Instead allocate another sav, copy variables of the old sav to the new one and replace the old one with the new one. --- Simplify; we can assume sav->tdb_xform cannot be NULL while it's valid --- Rename key_alloc* functions (NFC)
We shouldn't use the term "alloc" for functions that just look up data and actually don't allocate memory. --- Use explicit_memset to surely zero-clear key_auth and key_enc --- Make sure to clear keys on error paths of key_setsaval --- Add missing KEY_FREESAV --- Make sure a sav is inserted to a sah list after its initialization completes --- Remove unnecessary zero-clearing codes from key_setsaval
key_setsaval is now used only for a newly-allocated sav. (It was used to reset variables of an existing sav.) --- Correct wrong assumption of sav->refcnt in key_delsah
A sav in a list is basically not to be sav->refcnt == 0. And also KEY_FREESAV assumes sav->refcnt > 0. --- Let key_getsavbyspi take a reference of a returning sav --- Use time_mono_to_wall (NFC) --- Separate sending message routine (NFC) --- Simplify; remove unnecessary zero-clears
key_freesaval is used only when a target sav is being destroyed. --- Omit NULL checks for sav->lft_c
sav->lft_c can be NULL only when initializing or destroying sav. --- Omit unnecessary NULL checks for sav->sah --- Omit unnecessary check of sav->state
key_allocsa_policy picks a sav of either MATURE or DYING so we don't need to check its state again. --- Simplify; omit unnecessary saidx passing
- ipsec_nextisr returns a saidx but no caller uses it - key_checkrequest is passed a saidx but it can be gotton by another argument (isr) --- Fix splx isn't called on some error paths --- Fix header size calculation of esp where sav is NULL --- Fix header size calculation of ah in the case sav is NULL
This fix was also needed for esp. --- Pass sav directly to opencrypto callback
In a callback, use a passed sav as-is by default and look up a sav only if the passed sav is dead. --- Avoid examining freshness of sav on packet processing
If a sav list is sorted (by lft_c->sadb_lifetime_addtime) in advance, we don't need to examine each sav and also don't need to delete one on the fly and send up a message. Fortunately every sav lists are sorted as we need.
Added key_validate_savlist validates that each sav list is surely sorted (run only if DEBUG because it's not cheap). --- Add test cases for SAs with different SPIs --- Prepare to stop using isr->sav
isr is a shared resource and using isr->sav as a temporal storage for each packet processing is racy. And also having a reference from isr to sav makes the lifetime of sav non-deterministic; such a reference is removed when a packet is processed and isr->sav is overwritten by new one. Let's have a sav locally for each packet processing instead of using shared isr->sav.
However this change doesn't stop using isr->sav yet because there are some users of isr->sav. isr->sav will be removed after the users find a way to not use isr->sav. --- Fix wrong argument handling --- fix printf format. --- Don't validate sav lists of LARVAL or DEAD states
We don't sort the lists so the validation will always fail.
Fix PR kern/52405 --- Make sure to sort the list when changing the state by key_sa_chgstate --- Rename key_allocsa_policy to key_lookup_sa_bysaidx --- Separate test files --- Calculate ah_max_authsize on initialization as well as esp_max_ivlen --- Remove m_tag_find(PACKET_TAG_IPSEC_PENDING_TDB) because nobody sets the tag --- Restore a comment removed in previous
The comment is valid for the below code. --- Make tests more stable
sleep command seems to wait longer than expected on anita so use polling to wait for a state change. --- Add tests that explicitly delete SAs instead of waiting for expirations --- Remove invalid M_AUTHIPDGM check on ESP isr->sav
M_AUTHIPDGM flag is set to a mbuf in ah_input_cb. An sav of ESP can have AH authentication as sav->tdb_authalgxform. However, in that case esp_input and esp_input_cb are used to do ESP decryption and AH authentication and M_AUTHIPDGM never be set to a mbuf. So checking M_AUTHIPDGM of a mbuf on isr->sav of ESP is meaningless. --- Look up sav instead of relying on unstable sp->req->sav
This code is executed only in an error path so an additional lookup doesn't matter. --- Correct a comment --- Don't release sav if calling crypto_dispatch again --- Remove extra KEY_FREESAV from ipsec_process_done
It should be done by the caller. --- Don't bother the case of crp->crp_buf == NULL in callbacks --- Hold a reference to an SP during opencrypto processing
An SP has a list of isr (ipsecrequest) that represents a sequence of IPsec encryption/authentication processing. One isr corresponds to one opencrypto processing. The lifetime of an isr follows its SP.
We pass an isr to a callback function of opencrypto to continue to a next encryption/authentication processing. However nobody guaranteed that the isr wasn't freed, i.e., its SP wasn't destroyed.
In order to avoid such unexpected destruction of isr, hold a reference to its SP during opencrypto processing. --- Don't make SAs expired on tests that delete SAs explicitly --- Fix a debug message --- Dedup error paths (NFC) --- Use pool to allocate tdb_crypto
For ESP and AH, we need to allocate an extra variable space in addition to struct tdb_crypto. The fixed size of pool items may be larger than an actual requisite size of a buffer, but still the performance improvement by replacing malloc with pool wins. --- Don't use unstable isr->sav for header size calculations
We may need to optimize to not look up sav here for users that don't need to know an exact size of headers (e.g., TCP segmemt size caclulation). --- Don't use sp->req->sav when handling NAT-T ESP fragmentation
In order to do this we need to look up a sav however an additional look-up degrades performance. A sav is later looked up in ipsec4_process_packet so delay the fragmentation check until then to avoid an extra look-up. --- Don't use key_lookup_sp that depends on unstable sp->req->sav
It provided a fast look-up of SP. We will provide an alternative method in the future (after basic MP-ification finishes). --- Stop setting isr->sav on looking up sav in key_checkrequest --- Remove ipsecrequest#sav --- Stop setting mtag of PACKET_TAG_IPSEC_IN_DONE because there is no users anymore --- Skip ipsec_spi_*_*_preferred_new_timeout when running on qemu
Probably due to PR 43997 --- Add localcount to rump kernels --- Remove unused macro --- Fix key_getcomb_setlifetime
The fix adjusts a soft limit to be 80% of a corresponding hard limit.
I'm not sure the fix is really correct though, at least the original code is wrong. A passed comb is zero-cleared before calling key_getcomb_setlifetime, so comb->sadb_comb_soft_addtime = comb->sadb_comb_soft_addtime * 80 / 100; is meaningless. --- Provide and apply key_sp_refcnt (NFC)
It simplifies further changes. --- Fix indentation
Pointed out by knakahara@ --- Use pslist(9) for sptree --- Don't acquire global locks for IPsec if NET_MPSAFE
Note that the change is just to make testing easy and IPsec isn't MP-safe yet. --- Let PF_KEY socks hold their own lock instead of softnet_lock
Operations on SAD and SPD are executed via PF_KEY socks. The operations include deletions of SAs and SPs that will use synchronization mechanisms such as pserialize_perform to wait for references to SAs and SPs to be released. It is known that using such mechanisms with holding softnet_lock causes a dead lock. We should avoid the situation. --- Make IPsec SPD MP-safe
We use localcount(9), not psref(9), to make the sptree and secpolicy (SP) entries MP-safe because SPs need to be referenced over opencrypto processing that executes a callback in a different context.
SPs on sockets aren't managed by the sptree and can be destroyed in softint. localcount_drain cannot be used in softint so we delay the destruction of such SPs to a thread context. To do so, a list to manage such SPs is added (key_socksplist) and key_timehandler_spd deletes dead SPs in the list.
For more details please read the locking notes in key.c.
Proposed on tech-kern@ and tech-net@ --- Fix updating ipsec_used
- key_update_used wasn't called in key_api_spddelete2 and key_api_spdflush - key_update_used wasn't called if an SP had been added/deleted but a reply to userland failed --- Fix updating ipsec_used; turn on when SPs on sockets are added --- Add missing IPsec policy checks to icmp6_rip6_input
icmp6_rip6_input is quite similar to rip6_input and the same checks exist in rip6_input. --- Add test cases for setsockopt(IP_IPSEC_POLICY) --- Don't use KEY_NEWSP for dummy SP entries
By the change KEY_NEWSP is now not called from softint anymore and we can use kmem_zalloc with KM_SLEEP for KEY_NEWSP. --- Comment out unused functions --- Add test cases that there are SPs but no relevant SAs --- Don't allow sav->lft_c to be NULL
lft_c of an sav that was created by SADB_GETSPI could be NULL. --- Clean up clunky eval strings
- Remove unnecessary \ at EOL - This allows to omit ; too - Remove unnecessary quotes for arguments of atf_set - Don't expand $DEBUG in eval - We expect it's expanded on execution
Suggested by kre@ --- Remove unnecessary KEY_FREESAV in an error path
sav should be freed (unreferenced) by the caller. --- Use pslist(9) for sahtree --- Use pslist(9) for sah->savtree --- Rename local variable newsah to sah
It may not be new. --- MP-ify SAD slightly
- Introduce key_sa_mtx and use it for some list operations - Use pserialize for some list iterations --- Introduce KEY_SA_UNREF and replace KEY_FREESAV with it where sav will never be actually freed in the future
KEY_SA_UNREF is still key_freesav so no functional change for now.
This change reduces diff of further changes. --- Remove out-of-date log output
Pointed out by riastradh@ --- Use KDASSERT instead of KASSERT for mutex_ownable
Because mutex_ownable is too heavy to run in a fast path even for DIAGNOSTIC + LOCKDEBUG.
Suggested by riastradh@ --- Assemble global lists and related locks into cache lines (NFCI)
Also rename variable names from *tree to *list because they are just lists, not trees.
Suggested by riastradh@ --- Move locking notes --- Update the locking notes
- Add locking order - Add locking notes for misc lists such as reglist - Mention pserialize, key_sp_ref and key_sp_unref on SP operations
Requested by riastradh@ --- Describe constraints of key_sp_ref and key_sp_unref
Requested by riastradh@ --- Hold key_sad.lock on SAVLIST_WRITER_INSERT_TAIL --- Add __read_mostly to key_psz
Suggested by riastradh@ --- Tweak wording (pserialize critical section => pserialize read section)
Suggested by riastradh@ --- Add missing mutex_exit --- Fix setkey -D -P outputs
The outputs were tweaked (by me), but I forgot updating libipsec in my local ATF environment... --- MP-ify SAD (key_sad.sahlist and sah entries)
localcount(9) is used to protect key_sad.sahlist and sah entries as well as SPD (and will be used for SAD sav).
Please read the locking notes of SAD for more details. --- Introduce key_sa_refcnt and replace sav->refcnt with it (NFC) --- Destroy sav only in the loop for DEAD sav --- Fix KASSERT(solocked(sb->sb_so)) failure in sbappendaddr that is called eventually from key_sendup_mbuf
If key_sendup_mbuf isn't passed a socket, the assertion fails. Originally in this case sb->sb_so was softnet_lock and callers held softnet_lock so the assertion was magically satisfied. Now sb->sb_so is key_so_mtx and also softnet_lock isn't always held by callers so the assertion can fail.
Fix it by holding key_so_mtx if key_sendup_mbuf isn't passed a socket.
Reported by knakahara@ Tested by knakahara@ and ozaki-r@ --- Fix locking notes of SAD --- Fix deadlock between key_sendup_mbuf called from key_acquire and localcount_drain
If we call key_sendup_mbuf from key_acquire that is called on packet processing, a deadlock can happen like this: - At key_acquire, a reference to an SP (and an SA) is held - key_sendup_mbuf will try to take key_so_mtx - Some other thread may try to localcount_drain to the SP with holding key_so_mtx in say key_api_spdflush - In this case localcount_drain never return because key_sendup_mbuf that has stuck on key_so_mtx never release a reference to the SP
Fix the deadlock by deferring key_sendup_mbuf to the timer (key_timehandler). --- Fix that prev isn't cleared on retry --- Limit the number of mbufs queued for deferred key_sendup_mbuf
It's easy to be queued hundreds of mbufs on the list under heavy network load. --- MP-ify SAD (savlist)
localcount(9) is used to protect savlist of sah. The basic design is similar to MP-ifications of SPD and SAD sahlist. Please read the locking notes of SAD for more details. --- Simplify ipsec_reinject_ipstack (NFC) --- Add per-CPU rtcache to ipsec_reinject_ipstack
It reduces route lookups and also reduces rtcache lock contentions when NET_MPSAFE is enabled. --- Use pool_cache(9) instead of pool(9) for tdb_crypto objects
The change improves network throughput especially on multi-core systems. --- Update
ipsec(4), opencrypto(9) and vlan(4) are now MP-safe. --- Write known issues on scalability --- Share a global dummy SP between PCBs
It's never be changed so it can be pre-allocated and shared safely between PCBs. --- Fix race condition on the rawcb list shared by rtsock and keysock
keysock now protects itself by its own mutex, which means that the rawcb list is protected by two different mutexes (keysock's one and softnet_lock for rtsock), of course it's useless.
Fix the situation by having a discrete rawcb list for each. --- Use a dedicated mutex for rt_rawcb instead of softnet_lock if NET_MPSAFE --- fix localcount leak in sav. fixed by ozaki-r@n.o.
I commit on behalf of him. --- remove unnecessary comment. --- Fix deadlock between pserialize_perform and localcount_drain
A typical ussage of localcount_drain looks like this:
mutex_enter(&mtx); item = remove_from_list(); pserialize_perform(psz); localcount_drain(&item->localcount, &cv, &mtx); mutex_exit(&mtx);
This sequence can cause a deadlock which happens for example on the following situation:
- Thread A calls localcount_drain which calls xc_broadcast after releasing a specified mutex - Thread B enters the sequence and calls pserialize_perform with holding the mutex while pserialize_perform also calls xc_broadcast - Thread C (xc_thread) that calls an xcall callback of localcount_drain tries to hold the mutex
xc_broadcast of thread B doesn't start until xc_broadcast of thread A finishes, which is a feature of xcall(9). This means that pserialize_perform never complete until xc_broadcast of thread A finishes. On the other hand, thread C that is a callee of xc_broadcast of thread A sticks on the mutex. Finally the threads block each other (A blocks B, B blocks C and C blocks A).
A possible fix is to serialize executions of the above sequence by another mutex, but adding another mutex makes the code complex, so fix the deadlock by another way; the fix is to release the mutex before pserialize_perform and instead use a condvar to prevent pserialize_perform from being called simultaneously.
Note that the deadlock has happened only if NET_MPSAFE is enabled. --- Add missing ifdef NET_MPSAFE --- Take softnet_lock on pr_input properly if NET_MPSAFE
Currently softnet_lock is taken unnecessarily in some cases, e.g., icmp_input and encap4_input from ip_input, or not taken even if needed, e.g., udp_input and tcp_input from ipsec4_common_input_cb. Fix them.
NFC if NET_MPSAFE is disabled (default). --- - sanitize key debugging so that we don't print extra newlines or unassociated debugging messages. - remove unused functions and make internal ones static - print information in one line per message --- humanize printing of ip addresses --- cast reduction, NFC. --- Fix typo in comment --- Pull out ipsec_fill_saidx_bymbuf (NFC) --- Don't abuse key_checkrequest just for looking up sav
It does more than expected for example key_acquire. --- Fix SP is broken on transport mode
isr->saidx was modified accidentally in ipsec_nextisr.
Reported by christos@ Helped investigations by christos@ and knakahara@ --- Constify isr at many places (NFC) --- Include socketvar.h for softnet_lock --- Fix buffer length for ipsec_logsastr
|
1.198.2.6 |
| 18-Jan-2019 |
pgoyette | Synch with HEAD
|
1.198.2.5 |
| 06-Sep-2018 |
pgoyette | Sync with HEAD
Resolve a couple of conflicts (result of the uimin/uimax changes)
|
1.198.2.4 |
| 21-May-2018 |
pgoyette | Sync with HEAD
|
1.198.2.3 |
| 07-Apr-2018 |
pgoyette | Sync with HEAD. 77 conflicts resolved - all of them $NetBSD$
|
1.198.2.2 |
| 30-Mar-2018 |
pgoyette | Resolve conflicts between branch and HEAD
|
1.198.2.1 |
| 15-Mar-2018 |
pgoyette | Synch with HEAD
|
1.208.2.2 |
| 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|
1.208.2.1 |
| 10-Jun-2019 |
christos | Sync with HEAD
|
1.218.2.1 |
| 21-Sep-2023 |
martin | Pull up following revision(s) (requested by bouyer in ticket #377):
sys/netinet/tcp_output.c: revision 1.219
Handle EHOSTDOWN the same way as EHOSTUNREACH and ENETDOWN for established connections. Avoid premature end of tcp connection with "Host is down" error in case of transient link-layer failure.
Discussed and patch proposed in http://mail-index.netbsd.org/tech-net/2023/09/11/msg008610.html and followups.
|
1.220.2.1 |
| 02-Aug-2025 |
perseant | Sync with HEAD
|