Home | History | Annotate | Download | only in netinet6
History log of /src/sys/netinet6/in6_src.c
RevisionDateAuthorComments
 1.92  03-Aug-2023  ozaki-r in6: add missing rtcache_unref to in6_selectroute

By default, this issue is harmless. However, if NET_MPSAFE
is enabled, it could eventually lead to a kernel panic.
 1.91  04-Nov-2022  ozaki-r branches: 1.91.2;
inpcb: rename functions to in6pcb_*
 1.90  28-Oct-2022  ozaki-r inpcb: separate inpcb again to reduce the size of PCB for IPv4

The data size of PCB for IPv4 increased because of the merge of
struct in6pcb. The change decreases the size to the original size by
separating struct inpcb (again). struct in4pcb and in6pcb that embed
struct inpcb are introduced.

Even after the separation, users don't need to realize the separation
and only have to use some macros to access dedicated data. For example,
inp->inp_laddr is now accessed through in4p_laddr(inp).
 1.89  28-Oct-2022  ozaki-r inpcb: integrate data structures of PCB into one

Data structures of network protocol control blocks (PCBs), i.e.,
struct inpcb, in6pcb and inpcb_hdr, are not organized well. Users of
the data structures have to handle them separately and thus the code
is cluttered and duplicated.

The commit integrates the data structures into one, struct inpcb. As a
result, users of PCBs only have to handle just one data structure, so
the code becomes simple.

One drawback is that the data size of PCB for IPv4 increases by 40 bytes
(from 248 bytes to 288 bytes).
 1.88  10-Aug-2021  kardel PR kern/56348
MTU discovery fails with IPv6 sockets bound to IPv4 mapped address

pick up the IPv4 route for IPv4 mapped IPv6 address to get the correct
MTU and not any unrelated/inappropriate MTU from IPv6 routes. IPv4 mapped
IPv6 addresses are always handled by the IPv4 stack and MTU discovery
is solely handled with the IPv4 routing table.
 1.87  28-Aug-2020  ozaki-r inet6: reduce silent packet discards
 1.86  13-Nov-2019  ozaki-r Get rid of unnecessary NULL checks for rt_ifa and ifa_ifp

They are always non-NULL nowadays.
 1.85  01-May-2018  maxv branches: 1.85.2; 1.85.6;
Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.84  06-Dec-2017  roy branches: 1.84.2;
Treat unvalidated addresses as deprecated in rule 3.
 1.83  24-Nov-2017  roy Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.
 1.82  20-Nov-2017  ozaki-r Mention IPv6 address selection policy isn't MP-safe yet

Though it's not a problem until a policy is set.
 1.81  17-Sep-2017  christos Skip the scope test for loopback addresses in non-loopback interfaces.
While this test is also done in in6_setscope, testing here allows us
to log an error for other callers.
 1.80  27-Aug-2017  christos PR/52382: BERTRAND Joel: Fix mapped IPv4 source selection; this got broken
in the last code refactoring. in6_selectif failing is not fatal.
XXX: pullup-8
 1.79  17-Feb-2017  ozaki-r branches: 1.79.6;
Rename if_acquire_NOMPSAFE to if_acquire

It can be used in MP-safe ways. So let's remove the confusing postfix.
If it's used in a unsafe way, warn NOMPSAFE in a comment.
 1.78  16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.77  16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.76  08-Dec-2016  ozaki-r branches: 1.76.2;
Add rtcache_unref to release points of rtentry stemming from rtcache

In the MP-safe world, a rtentry stemming from a rtcache can be freed at any
points. So we need to protect rtentries somehow say by reference couting or
passive references. Regardless of the method, we need to call some release
function of a rtentry after using it.

The change adds a new function rtcache_unref to release a rtentry. At this
point, this function does nothing because for now we don't add a reference
to a rtentry when we get one from a rtcache. We will add something useful
in a further commit.

This change is a part of changes for MP-safe routing table. It is separated
to avoid one big change that makes difficult to debug by bisecting.
 1.75  02-Dec-2016  ozaki-r CID 1396598, CID 1396634: Fix null pointer dereferences
 1.74  10-Nov-2016  ozaki-r Tidy up in6_select*

This change tidies up in6_select* functions, especially
selectroute.

selectroute is annoying because:
- It returns both/either of a rtentry and/or an ifp
- Yes, it may return only an ifp!
- It is valid but selectroute shouldn't handle the case
- Such conditional behavior makes it difficult
to apply locking/psref thingy
- It may return a rtentry even if error
- It may use opt->ip6po_nextroute rtcache implicitly
- The caller can know if it is used
by rtcache_validate(&opt->ip6po_nextroute)
but it's racy in MP-safe world
- Even if it uses opt->ip6po_nextroute, it may
return a rtentry that isn't derived from the rtcache

The change includes:
- Rename selectroute to in6_selectroute
- Let a remaining caller of selectroute, in6_selectif,
use in6_selectroute instead
- Let in6_selectroute return only an rtentry
- If error, it doesn't return an rtentry
- A caller gets an ifp from a returned rtentry
- Allow in6_selectroute to modify a passed rtcache
and a caller can know if opt->ip6po_nextroute is
used via the rtcache
- Let callers (ip6_output and in6_selectif) handle
the case that only an ifp is required

Inspired by OpenBSD
Proposed on tech-kern and tech-net
LGTM by roy@
 1.73  31-Oct-2016  ozaki-r Pull best address selection code out of in6_selectsrc

No functional change.
 1.72  31-Oct-2016  ozaki-r Fix race condition of in6_selectsrc

in6_selectsrc returned a pointer to in6_addr that wan't guaranteed to be
safe by pserialize (or psref), which was racy. Let callers pass a pointer
to in6_addr and in6_selectsrc copy a result to it inside pserialize
critical sections.
 1.71  31-Oct-2016  ozaki-r Remove unnecessary NULL checks
 1.70  26-Aug-2016  roy Simplify.
 1.69  26-Aug-2016  roy Allow explicit binding to detached addresss.
Fixes PR kern/51435.
 1.68  23-Aug-2016  roy White space police.
 1.67  23-Aug-2016  roy Sync denied flags.
 1.66  01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.65  15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.64  15-Jul-2016  ozaki-r Use ifatoia6 macro

No functional change.
 1.63  04-Jul-2016  ozaki-r branches: 1.63.2;
Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.62  21-Jun-2016  ozaki-r Make sure returning ifp from in6_select* functions psref-ed

To this end, callers need to pass struct psref to the functions
and the fuctions acquire a reference of ifp with it. In some cases,
we can simply use if_get_byindex, however, in other cases
(say rt->rt_ifp and ia->ifa_ifp), we have no MP-safe way for now.
In order to take a reference anyway we use non MP-safe function
if_acquire_NOMPSAFE for the latter cases. They should be fixed in
the future somehow.
 1.61  21-Jun-2016  ozaki-r Replace ifp of ip_moptions and ip6_moptions with if_index

The motivation is the same as the mbuf's rcvif case; avoid having a pointer
of an ifnet object in ip_moptions and ip6_moptions, which is not MP-safe.

ip_moptions and ip6_moptions can be stored in a PCB for inet or inet6
that's life time is different from ifnet one and so an ifnet object can be
disappeared anytime we get it via them. Thus we need to look up an ifnet
object by if_index every time for safe.
 1.60  18-May-2016  ozaki-r Get rid of unnecessary NULL check

It's already checked just some lines above.
 1.59  12-Dec-2015  christos Hook up the addrctl stuff that's already there.
 1.58  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.57  27-Apr-2015  ozaki-r Introduce in6_selecthlim_rt to consolidate an idiom for rt->rt_ifp

It consolidates a scattered routine:
(rt = rtcache_validate(&in6p->in6p_route)) != NULL ? rt->rt_ifp : NULL
 1.56  20-Jan-2015  roy Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.

Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.55  05-Sep-2014  matt branches: 1.55.2;
Don't use C++ keyword as variable.
Use different prefix for nd6_prefixctl members than for nd6_prefix members.
 1.54  17-May-2014  rmind branches: 1.54.2;
Replace open-coded access (and boundary checking) of ifindex2ifnet with
if_byindex() function.
 1.53  25-Jun-2012  christos branches: 1.53.2; 1.53.4; 1.53.12;
rename rfc6056 -> portalgo, requested by yamt
 1.52  24-Sep-2011  christos branches: 1.52.2;
Add inet6 part of the rfc6056 code contributed by Vlad Balan as part of
Google SoC-2011
 1.51  17-May-2011  dholland Add missing $NetBSD$ header.
 1.50  03-May-2011  dyoung Reduces the resources demanded by TCP sessions in TIME_WAIT-state using
methods called Vestigial Time-Wait (VTW) and Maximum Segment Lifetime
Truncation (MSLT).

MSLT and VTW were contributed by Coyote Point Systems, Inc.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.
 1.49  25-May-2009  pooka branches: 1.49.4; 1.49.6;
Remove declaration of unused extern struct ifnet loif[NLOOP], which
was already removed once, but brought back in a wholesale import.
While here, mop up the #ifdef __SomeotherOS__ noise.
 1.48  12-May-2009  elad Implicit EPERM -> explicit EACCES.

Requested by ad@ and yamt@.
 1.47  30-Apr-2009  elad Commit changes to netinet6/in6_src.c, forgot in previous commit:

http://mail-index.netbsd.org/source-changes/2009/04/30/msg220547.html

Make in_pcbsetport() set the port number selected before passing "sin" to
kauth(9).
 1.46  18-Mar-2009  cegger bzero -> memset
 1.45  11-Jan-2009  christos branches: 1.45.2;
merge christos-time_t
 1.44  17-Dec-2008  cegger kill MALLOC and FREE macros.
 1.43  15-Apr-2008  thorpej branches: 1.43.4; 1.43.12;
Make ip6 and icmp6 stats per-cpu.
 1.42  08-Apr-2008  thorpej Change IPv6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old ip6stat structure; old netstat
binaries will continue to work properly.
 1.41  27-Feb-2008  matt branches: 1.41.2;
Convert to ansi definitions from old-style definitons.
Remember that func() is not ansi, func(void) is.
 1.40  26-Nov-2007  yamt branches: 1.40.10; 1.40.14;
in6_pcbsetport: add missing htons. (fixes ephemeral port allocation.)
 1.39  24-Oct-2007  dyoung branches: 1.39.2;
Replace rote sockaddr_in6 initializations (memset(), set sa6_family,
sa6_len, and sa6_add) with sockaddr_in6_init() calls.

De-__P(). Constify. KNF. Shorten a staircase. Change bcmp() to
memcmp().

Extract subroutine in6_setzoneid() from in6_setscope(), for re-use
soon.
 1.38  23-May-2007  christos branches: 1.38.6; 1.38.8; 1.38.12;
Ansify + add a few comments, from Karl Sjödahl
 1.37  02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.36  04-Mar-2007  christos branches: 1.36.2; 1.36.4;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.35  17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.34  04-Jan-2007  elad branches: 1.34.2;
Consistent usage of KAUTH_GENERIC_ISSUSER.
 1.33  15-Dec-2006  joerg Introduce new helper functions to abstract the route caching.
rtcache_init and rtcache_init_noclone lookup ro_dst and store
the result in ro_rt, taking care of the reference counting and
calling the domain specific route cache.
rtcache_free checks if a route was cashed and frees the reference.
rtcache_copy copies ro_dst of the given struct route, checking that
enough space is available and incrementing the reference count of the
cached rtentry if necessary.
rtcache_check validates that the cached route is still up. If it isn't,
it tries to look it up again. Afterwards ro_rt is either a valid again
or NULL.
rtcache_copy is used internally.

Adjust to callers of rtalloc/rtflush in the tree to check the sanity of
ro_dst first (if necessary). If it doesn't fit the expectations, free
the cache, otherwise check if the cached route is still valid. After
that combination, a single check for ro_rt == NULL is enough to decide
whether a new lookup needs to be done with a different ro_dst.
Make the route checking in gre stricter by repeating the loop check
after revalidation.
Remove some unused RADIX_MPATH code in in6_src.c. The logic is slightly
changed here to first validate the route and check RTF_GATEWAY
afterwards. This is sementically equivalent though.
etherip doesn't need sc_route_expire similiar to the gif changes from
dyoung@ earlier.

Based on the earlier patch from dyoung@, reviewed and discussed with
him.
 1.32  09-Dec-2006  dyoung Here are various changes designed to protect against bad IPv4
routing caused by stale route caches (struct route). Route caches
are sprinkled throughout PCBs, the IP fast-forwarding table, and
IP tunnel interfaces (gre, gif, stf).

Stale IPv6 and ISO route caches will be treated by separate patches.

Thank you to Christoph Badura for suggesting the general approach
to invalidating route caches that I take here.

Here are the details:

Add hooks to struct domain for tracking and for invalidating each
domain's route caches: dom_rtcache, dom_rtflush, and dom_rtflushall.

Introduce helper subroutines, rtflush(ro) for invalidating a route
cache, rtflushall(family) for invalidating all route caches in a
routing domain, and rtcache(ro) for notifying the domain of a new
cached route.

Chain together all IPv4 route caches where ro_rt != NULL. Provide
in_rtcache() for adding a route to the chain. Provide in_rtflush()
and in_rtflushall() for invalidating IPv4 route caches. In
in_rtflush(), set ro_rt to NULL, and remove the route from the
chain. In in_rtflushall(), walk the chain and remove every route
cache.

In rtrequest1(), call rtflushall() to invalidate route caches when
a route is added.

In gif(4), discard the workaround for stale caches that involves
expiring them every so often.

Replace the pattern 'RTFREE(ro->ro_rt); ro->ro_rt = NULL;' with a
call to rtflush(ro).

Update ipflow_fastforward() and all other users of route caches so
that they expect a cached route, ro->ro_rt, to turn to NULL.

Take care when moving a 'struct route' to rtflush() the source and
to rtcache() the destination.

In domain initializers, use .dom_xxx tags.

KNF here and there.
 1.31  02-Dec-2006  dyoung Use the queue(3) macros instead of open-coding them. Shorten
staircases. Remove unnecessary casts. Where appropriate, s/8/NBBY/.
De-__P(). KNF.

No functional changes intended.
 1.30  16-Nov-2006  christos branches: 1.30.2; 1.30.4;
__unused removal on arguments; approved by core.
 1.29  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.28  01-Sep-2006  dyoung branches: 1.28.2; 1.28.4;
Restore historical kernel behavior: let an application bind(2) an
IPv6 interface address (e.g., sin6_addr fe80::200:24ff:fec3:4bac
sin6_scope_id 1), set a multicast interface with
setsockopt(,IPPROTO_IPV6,IPV6_MULTICAST_IF,), and sendto(2) multicast
destinations with "wildcard" scope ID, 0, without error EHOSTUNREACH.

Prior to this patch, sendto(2) would exit with EHOSTUNREACH, even
though the scope ID was unambiguously specified both by bind(2)
and setsockopt(2). This was a bug because it broke old applications.

Thanks JINMEI Tatuya for the patch!
 1.27  23-Jul-2006  ad branches: 1.27.2;
Use the LWP cached credentials where sane.
 1.26  14-May-2006  elad integrate kauth.
 1.25  05-May-2006  rpaulo Add support for RFC 3542 Adv. Socket API for IPv6 (which obsoletes 2292).
* RFC 3542 isn't binary compatible with RFC 2292.
* RFC 2292 support is on by default but can be disabled.
* update ping6, telnet and traceroute6 to the new API.

From the KAME project (www.kame.net).
Reviewed by core.
 1.24  15-Apr-2006  christos Coverity CID 607: Remove bogus test.
 1.23  21-Jan-2006  rpaulo branches: 1.23.2; 1.23.4; 1.23.6; 1.23.8; 1.23.10;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.22  11-Dec-2005  christos branches: 1.22.2;
merge ktrace-lwp.
 1.21  29-May-2005  christos branches: 1.21.2;
- avoid shadowed variables
- sprinkle const.
 1.20  01-Feb-2005  drochner branches: 1.20.4;
sin6_scope_id maps to interface indices for link local addresses only!
(unlikely to be used with other scopes for now, but we should be
correct anyway)
 1.19  04-Dec-2004  peter branches: 1.19.4; 1.19.6;
Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.18  10-Dec-2003  itojun use if_indexlim (instead of if_index) and ifindex2ifnet[x] != NULL
to check if interface exists, as (1) if_index has different meaning
(2) ifindex2ifnet could become NULL when interface gets destroyed,
since when we have introduced dynamically-created interfaces. from kame
 1.17  04-Sep-2003  itojun revamp inpcb/in6pcb so that they are more aligned with each other.
in6pcb lookup now uses hash(9).
 1.16  07-Aug-2003  agc Move UCB-licensed code from 4-clause to 3-clause licence.

Patches provided by Joel Baker in PR 22364, verified by myself.
 1.15  11-Sep-2002  itojun branches: 1.15.6;
KNF - return is not a function. sync w/kame.
 1.14  26-Aug-2002  itojun pass proc * to in6_pcbsetport. PR 18073
 1.13  08-Jun-2002  itojun whitespace cleanup
 1.12  29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.11  29-May-2002  itojun rm obsolete comment
 1.10  22-Jan-2002  itojun branches: 1.10.8; 1.10.10;
make sure to check address family on route cache. with IPv4 mapped
address we can see both AF_INET/INET6.
 1.9  13-Nov-2001  lukem add RCSIDs
 1.8  16-Oct-2001  itojun more whitespace/comment sync with kame
 1.7  06-Jun-2001  mrg branches: 1.7.2;
fix a IPNOPRIVPORTS unused variable botch. noted by proff.
 1.6  30-Mar-2001  itojun enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.
 1.5  08-Feb-2001  itojun branches: 1.5.2;
amove in6_{embed,recover}scope prototypes to in6_var.h (kernel only).
add in6_clearscope. sync with kame
 1.4  26-Aug-2000  itojun branches: 1.4.2;
implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.
 1.3  26-Aug-2000  itojun add missing IPNOPRIVPORTS case
 1.2  07-Jul-2000  itojun sync with kame.
introduce in6_{recover,embed}scope, for in-kernel scoped-address manipulation.
improve in6_pcbnotify.
 1.1  03-Jun-2000  itojun branches: 1.1.2; 1.1.4;
sync with kame.
- use latest source address selection code - in6_src.c.
- correct frag header insertion.
- deep copy ip6 header portion in ip6_mloopback to avoid overwrite.
- do not bark when we forward packet to loopback.
- some cosmetics.
 1.1.4.2  22-Jun-2000  minoura Sync w/ netbsd-1-5-base.
 1.1.4.1  03-Jun-2000  minoura file in6_src.c was added on branch minoura-xpg4dl on 2000-06-22 17:09:58 +0000
 1.1.2.2  27-Aug-2000  itojun pullup (approved by releng-1-5)

> implement net.inet6.ip6.{anon,low}port{min,max} sysctl variable.

> cvs rdiff -r1.67 -r1.68 basesrc/lib/libc/gen/sysctl.3
> cvs rdiff -r1.53 -r1.54 basesrc/sbin/sysctl/sysctl.8
> cvs rdiff -r1.18 -r1.19 syssrc/sys/netinet6/in6.h
> cvs rdiff -r1.29 -r1.30 syssrc/sys/netinet6/in6_pcb.c
> cvs rdiff -r1.3 -r1.4 syssrc/sys/netinet6/in6_src.c
> cvs rdiff -r1.25 -r1.26 syssrc/sys/netinet6/ip6_input.c
> cvs rdiff -r1.14 -r1.15 syssrc/sys/netinet6/ip6_var.h
 1.1.2.1  27-Aug-2000  itojun pullup 1.2 -> 1.3 (approved by releng-1-5)

> add missing IPNOPRIVPORTS case
 1.4.2.4  21-Apr-2001  bouyer Sync with HEAD
 1.4.2.3  11-Feb-2001  bouyer Sync with HEAD.
 1.4.2.2  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.4.2.1  26-Aug-2000  bouyer file in6_src.c was added on branch thorpej_scsipi on 2000-11-20 18:10:50 +0000
 1.5.2.13  17-Sep-2002  nathanw Catch up to -current.
 1.5.2.12  27-Aug-2002  nathanw Catch up to -current.
 1.5.2.11  15-Jul-2002  nathanw Whitespace.
 1.5.2.10  12-Jul-2002  nathanw No longer need to pull in lwp.h; proc.h pulls it in for us.
 1.5.2.9  24-Jun-2002  nathanw Curproc->curlwp renaming.

Change uses of "curproc->l_proc" back to "curproc", which is more like the
original use. Bare uses of "curproc" are now "curlwp".

"curproc" is now #defined in proc.h as ((curlwp) ? (curlwp)->l_proc) : NULL)
so that it is always safe to reference curproc (*de*referencing curproc
is another story, but that's always been true).
 1.5.2.8  20-Jun-2002  nathanw Catch up to -current.
 1.5.2.7  28-Feb-2002  nathanw Catch up to -current.
 1.5.2.6  14-Nov-2001  nathanw Catch up to -current.
 1.5.2.5  22-Oct-2001  nathanw Catch up to -current.
 1.5.2.4  21-Jun-2001  nathanw Catch up to -current.
 1.5.2.3  09-Apr-2001  nathanw Catch up with -current.
 1.5.2.2  13-Mar-2001  nathanw Be more careful not to dereference curproc when there might not be
a process context.
 1.5.2.1  05-Mar-2001  nathanw Initial commit of scheduler activations and lightweight process support.
 1.7.2.5  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.7.2.4  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.7.2.3  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.7.2.2  11-Feb-2002  jdolecek Sync w/ -current.
 1.7.2.1  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.10.10.1  27-Aug-2002  lukem Pull up revision 1.14 (requested by itojun in ticket #731):
pass proc * to in6_pcbsetport. PR 18073
 1.10.8.3  29-Aug-2002  gehenna catch up with -current.
 1.10.8.2  20-Jun-2002  gehenna catch up with -current.
 1.10.8.1  30-May-2002  gehenna Catch up with -current.
 1.15.6.6  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.15.6.5  04-Feb-2005  skrll Sync with HEAD.
 1.15.6.4  18-Dec-2004  skrll Sync with HEAD.
 1.15.6.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.15.6.2  18-Sep-2004  skrll Sync with HEAD.
 1.15.6.1  03-Aug-2004  skrll Sync with HEAD
 1.19.6.1  12-Feb-2005  yamt sync with head.
 1.19.4.1  29-Apr-2005  kent sync with -current
 1.20.4.1  02-Dec-2007  bouyer Pull up following revision(s) (requested by yamt in ticket #1881):
sys/netinet6/in6_src.c: revision 1.40 via patch
in6_pcbsetport: add missing htons. (fixes ephemeral port allocation.)
 1.21.2.7  17-Mar-2008  yamt sync with head.
 1.21.2.6  07-Dec-2007  yamt sync with head
 1.21.2.5  27-Oct-2007  yamt sync with head.
 1.21.2.4  03-Sep-2007  yamt sync with head.
 1.21.2.3  26-Feb-2007  yamt sync with head.
 1.21.2.2  30-Dec-2006  yamt sync with head.
 1.21.2.1  21-Jun-2006  yamt sync with head.
 1.22.2.1  01-Feb-2006  yamt sync with head.
 1.23.10.1  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.23.8.5  11-May-2006  elad sync with head
 1.23.8.4  06-May-2006  christos - Move kauth_cred_t declaration to <sys/types.h>
- Cleanup struct ucred; forward declarations that are unused.
- Don't include <sys/kauth.h> in any header, but include it in the c files
that need it.

Approved by core.
 1.23.8.3  19-Apr-2006  elad sync with head.
 1.23.8.2  10-Mar-2006  elad generic_authorize() -> kauth_authorize_generic().
 1.23.8.1  08-Mar-2006  elad Adapt to kernel authorization KPI.
 1.23.6.3  03-Sep-2006  yamt sync with head.
 1.23.6.2  11-Aug-2006  yamt sync with head
 1.23.6.1  24-May-2006  yamt sync with head.
 1.23.4.2  01-Jun-2006  kardel Sync with head.
 1.23.4.1  22-Apr-2006  simonb Sync with head.
 1.23.2.4  09-Sep-2006  rpaulo sync with head
 1.23.2.3  23-Feb-2006  rpaulo Last in6pcb in in6_selecthlim().
 1.23.2.2  14-Feb-2006  rpaulo Replace in6pcb with inpcb and IN6P_BOUND with INP_BOUND.
 1.23.2.1  07-Feb-2006  rpaulo remove in6_pcb.h and include in_pcb.h.
 1.27.2.1  03-Sep-2006  riz Pull up following revision(s) (requested by rpaulo in ticket #106):
sys/netinet6/in6_src.c: revision 1.28
Restore historical kernel behavior: let an application bind(2) an
IPv6 interface address (e.g., sin6_addr fe80::200:24ff:fec3:4bac
sin6_scope_id 1), set a multicast interface with
setsockopt(,IPPROTO_IPV6,IPV6_MULTICAST_IF,), and sendto(2) multicast
destinations with "wildcard" scope ID, 0, without error EHOSTUNREACH.
Prior to this patch, sendto(2) would exit with EHOSTUNREACH, even
though the scope ID was unambiguously specified both by bind(2)
and setsockopt(2). This was a bug because it broke old applications.
Thanks JINMEI Tatuya for the patch!
 1.28.4.3  18-Dec-2006  yamt sync with head.
 1.28.4.2  10-Dec-2006  yamt sync with head.
 1.28.4.1  22-Oct-2006  yamt sync with head
 1.28.2.2  12-Jan-2007  ad Sync with head.
 1.28.2.1  18-Nov-2006  ad Sync with head.
 1.30.4.1  03-Jun-2008  skrll Sync with netbsd-4.
 1.30.2.1  01-Feb-2008  riz Pull up following revision(s) (requested by yamt in ticket #1006):
sys/netinet6/in6_src.c: revision 1.40
in6_pcbsetport: add missing htons. (fixes ephemeral port allocation.)
 1.34.2.3  07-May-2007  yamt sync with head.
 1.34.2.2  12-Mar-2007  rmind Sync with HEAD.
 1.34.2.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.36.4.1  11-Jul-2007  mjf Sync with head.
 1.36.2.1  08-Jun-2007  ad Sync with head.
 1.38.12.1  13-Nov-2007  bouyer Sync with HEAD
 1.38.8.3  23-Mar-2008  matt sync with HEAD
 1.38.8.2  09-Jan-2008  matt sync with HEAD
 1.38.8.1  06-Nov-2007  matt sync with HEAD
 1.38.6.2  27-Nov-2007  joerg Sync with HEAD. amd64 Xen support needs testing.
 1.38.6.1  26-Oct-2007  joerg Sync with HEAD.

Follow the merge of pmap.c on i386 and amd64 and move
pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup
code to restore CR4 before jumping back into kernel space as the large
page option might cover that.
 1.39.2.1  08-Dec-2007  mjf Sync with HEAD.
 1.40.14.3  17-Jan-2009  mjf Sync with HEAD.
 1.40.14.2  02-Jun-2008  mjf Sync with HEAD.
 1.40.14.1  03-Apr-2008  mjf Sync with HEAD.
 1.40.10.2  24-Mar-2008  keiichi sync with head.
 1.40.10.1  22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.41.2.3  27-Dec-2008  christos merge with head.
 1.41.2.2  01-Nov-2008  christos Sync with head.
 1.41.2.1  29-Mar-2008  christos Welcome to the time_t=long long dev_t=uint64_t branch.
 1.43.12.2  28-Apr-2009  skrll Sync with HEAD.
 1.43.12.1  19-Jan-2009  skrll Sync with HEAD.
 1.43.4.3  20-Jun-2009  yamt sync with head
 1.43.4.2  16-May-2009  yamt sync with head
 1.43.4.1  04-May-2009  yamt sync with head.
 1.45.2.2  23-Jul-2009  jym Sync with HEAD.
 1.45.2.1  13-May-2009  jym Sync with HEAD.

Commit is split, to avoid a "too many arguments" protocol error.
 1.49.6.1  06-Jun-2011  jruoho Sync with HEAD.
 1.49.4.1  31-May-2011  rmind sync with head
 1.52.2.1  30-Oct-2012  yamt sync with head
 1.53.12.1  10-Aug-2014  tls Rebase.
 1.53.4.2  18-May-2014  rmind sync with head
 1.53.4.1  17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.53.2.2  03-Dec-2017  jdolecek update from HEAD
 1.53.2.1  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.54.2.1  23-Jan-2015  martin Pull up following revision(s) (requested by pettai in ticket #441):
sys/netinet6/ip6_var.h: revision 1.64
sys/netinet6/in6.h: revision 1.82
sys/netinet6/in6_src.c: revision 1.56
sys/netinet6/mld6.c: revision 1.62
sys/netinet6/ip6_input.c: revision 1.150
sys/netinet6/ip6_output.c: revision 1.161
Add net.inet6.ip6.prefer_tempaddr sysctl knob so that we can prefer
IPv6 temporary addresses as the source address.
Fixes PR kern/47100 based on a patch by Dieter Roelants.
 1.55.2.10  28-Aug-2017  skrll Sync with HEAD
 1.55.2.9  05-Feb-2017  skrll Sync with HEAD
 1.55.2.8  05-Dec-2016  skrll Sync with HEAD
 1.55.2.7  05-Oct-2016  skrll Sync with HEAD
 1.55.2.6  09-Jul-2016  skrll Sync with HEAD
 1.55.2.5  29-May-2016  skrll Sync with HEAD
 1.55.2.4  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.55.2.3  22-Sep-2015  skrll Sync with HEAD
 1.55.2.2  06-Jun-2015  skrll Sync with HEAD
 1.55.2.1  06-Apr-2015  skrll Sync with HEAD
 1.63.2.5  20-Mar-2017  pgoyette Sync with HEAD
 1.63.2.4  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.63.2.3  04-Nov-2016  pgoyette Sync with HEAD
 1.63.2.2  06-Aug-2016  pgoyette Sync with HEAD
 1.63.2.1  26-Jul-2016  pgoyette Sync with HEAD
 1.76.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.79.6.4  04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1883):

sys/netinet6/in6_src.c: revision 1.92

in6: add missing rtcache_unref to in6_selectroute

By default, this issue is harmless. However, if NET_MPSAFE
is enabled, it could eventually lead to a kernel panic.
 1.79.6.3  11-Aug-2021  martin Pull up following revision(s) (requested by kardel in ticket #1690):

sys/netinet6/in6_src.c: revision 1.88

PR kern/56348

MTU discovery fails with IPv6 sockets bound to IPv4 mapped address
pick up the IPv4 route for IPv4 mapped IPv6 address to get the correct
MTU and not any unrelated/inappropriate MTU from IPv6 routes. IPv4 mapped
IPv6 addresses are always handled by the IPv4 stack and MTU discovery
is solely handled with the IPv4 routing table.
 1.79.6.2  10-Dec-2017  snj Pull up following revision(s) (requested by roy in ticket #390):
sys/netinet/ip_input.c: 1.363
sys/netinet6/ip6_input.c: 1.184-1.185
sys/netinet6/ip6_output.c: 1.194-1.195
sys/netinet6/in6_src.c: 1.83-1.84
Allow local communication over DETACHED addresses.
Allow binding to DETACHED or TENTATIVE addresses as we deny
sending upstream from them anyway.
Prefer non DETACHED or TENTATIVE addresses.
--
Attempt to restore v6 networking. Not 100% certain that these
changes are all that is needed, but they're certainly a big part of it
(especially the ip6_input.c change.)
--
Treat unvalidated addresses as deprecated in rule 3.
 1.79.6.1  31-Aug-2017  martin Pull up following revision(s) (requested by christos in ticket #243):
sys/netinet6/in6_src.c: revision 1.80
PR/52382: BERTRAND Joel: Fix mapped IPv4 source selection; this got broken
in the last code refactoring. in6_selectif failing is not fatal.
XXX: pullup-8
 1.84.2.1  02-May-2018  pgoyette Synch with HEAD
 1.85.6.2  04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #1706):

sys/netinet6/in6_src.c: revision 1.92

in6: add missing rtcache_unref to in6_selectroute

By default, this issue is harmless. However, if NET_MPSAFE
is enabled, it could eventually lead to a kernel panic.
 1.85.6.1  11-Aug-2021  martin Pull up following revision(s) (requested by kardel in ticket #1332):

sys/netinet6/in6_src.c: revision 1.88

PR kern/56348

MTU discovery fails with IPv6 sockets bound to IPv4 mapped address
pick up the IPv4 route for IPv4 mapped IPv6 address to get the correct
MTU and not any unrelated/inappropriate MTU from IPv6 routes. IPv4 mapped
IPv6 addresses are always handled by the IPv4 stack and MTU discovery
is solely handled with the IPv4 routing table.
 1.85.2.1  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.91.2.1  04-Aug-2023  martin Pull up following revision(s) (requested by ozaki-r in ticket #309):

sys/netinet6/in6_src.c: revision 1.92

in6: add missing rtcache_unref to in6_selectroute

By default, this issue is harmless. However, if NET_MPSAFE
is enabled, it could eventually lead to a kernel panic.

RSS XML Feed