Home | History | Annotate | Download | only in netinet6
History log of /src/sys/netinet6/nd6.c
RevisionDateAuthorComments
 1.284  05-Jun-2025  ozaki-r Apply in6ifa_first_lladdr() and in6ifa_first_lladdr_psref()
 1.283  31-Mar-2025  ozaki-r nd6: send packets through the fast path even if DELAY and PROBE

If there is a valid ND cache, we can send packets for the destination
of the cache. If the state of the cache is STALE, we need to go
through the slow path to change its state. In the other cases
including the DELAY and PROBE states, we can send packets through
the fast path.
 1.282  11-Apr-2024  knakahara Fix invalid IPv6 route when ipsecif(4) is deleted tunnel. Pointed out by ohishi@IIJ.

The pointed bug is fixed by modification in nd6_need_cache().
Others are similar bugs.

XXX pullup-9, 10
 1.281  09-Dec-2023  pgoyette Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

XXX pullup-10 - hopefully before RC2
 1.280  11-Oct-2023  msaitoh s/Neighour/Neighbor/ in comment. No functional change.
 1.279  01-Sep-2022  riastradh branches: 1.279.4;
nd6: Take ifnet psref around cprng_fast in nd6_slowtimo.

This may sleep on an adpative mutex, the global entropy lock, so
pserialize is forbidden.
 1.278  31-Dec-2021  andvar s/quetion/question/
 1.277  17-Aug-2021  ozaki-r nd6: prevent ln from being freed while releasing held packets
 1.276  28-Dec-2020  nia Add more guards against NULL deref, since KUBSAN still complains.
 1.275  26-Dec-2020  nia Avoid NULL pointer dereference, noticed by KUBSAN.

"Looks fine" roy@
 1.274  15-Sep-2020  roy branches: 1.274.2;
Implement RFC 7048, making Neighbor Unreachability Detection less impatient

RFC 7048 Section 3 says in the UNREACHABLE state packets continue to be
sent to the link-layer address and then backoff exponentially.
We adjust this slightly and move to the INCOMPLETE state after
`nd_mmaxtries` probes and then start backing off.

This results in simpler code whilst providing a more robust model which
doubles the time to failure over what we did before.
We don't want to be back to the old ARP model where no unreachability
errors are returned because very few applications would look at
unreachability hints provided such as ND_LLINFO_UNREACHABLE or RTM_MISS.
 1.273  14-Sep-2020  roy nd: Name l3addr union of llentry and use in-place of nd_addr.

Probably makes more sense and makes nd.h less messy.
 1.272  11-Sep-2020  roy inet6: Use generic Neighor Detection rather than IPv6 specific

No functional change intended.
 1.271  12-Jun-2020  roy Remove in-kernel handling of Router Advertisements

This is much better handled by a user-land tool.
Proposed on tech-net here:
https://mail-index.netbsd.org/tech-net/2020/04/22/msg007766.html

Note that the ioctl SIOCGIFINFO_IN6 no longer sets flags. That now
needs to be done using the pre-existing SIOCSIFINFO_FLAGS ioctl.

Compat is fully provided where it makes sense, but trying to turn on
RA handling will obviously throw an error as it no longer exists.

Note that if you use IPv6 temporary addresses, this now needs to be
turned on in dhcpcd.conf(5) rather than in sysctl.conf(5).
 1.270  28-Apr-2020  roy inet6: Ensure that route MTU is guarded by ARC_PHDS_MAXMTU

This mirrors the ARP behavior for ARCnet interfaces based on current
kernel RA handling.
 1.269  12-Apr-2020  roy nd6: RTM_MISS reports RTA_AUTHOR once more

Just moves the logic to send RTM_MISS after the ICMP6 report as we
rely on that function to extract the requesting address.

Fixes PR kern/55164.
 1.268  03-Apr-2020  christos branches: 1.268.2;
PR/55030: Avoid locking against myself panic by moving the icmp error outside
the lock. Thanks ozaki-r!
 1.267  09-Mar-2020  roy route: RTM_MISS now puts the message source address in RTA_AUTHOR

route(8) also reports this.
A userland app could use this to blacklist nodes who probe for machines
that doesn't exist on a subnet / prefix.
 1.266  20-Jan-2020  thorpej Remove FDDI support.
 1.265  25-Sep-2019  ozaki-r branches: 1.265.2;
Make panic messages more informative
 1.264  25-Sep-2019  ozaki-r Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.
 1.263  01-Sep-2019  roy inet6: Re-introduce ND6_LLINFO_WAITDELETE so we can return EHOSTDOWN

Once we've sent nd6_mmaxtries NS messages, send RTM_MISS and move to the
ND6_LLINFO_WAITDELETE state rather than freeing the llentry right away.
Wait for a probe cycle and then free the llentry.

If a connection attempts to re-use the llentry during ND6_LLINFO_WAITDELETE,
return EHOSTDOWN (or EHOSTUNREACH if a gateway) to match inet behaviour.
Continue to ND6_LLINFO_INCOMPLETE and send another NS probe in hope of a
reply. Rinse and repeat.

This reverts part of nd6.c r1.14 - an 18 year old commit!
 1.262  01-Sep-2019  roy inet6: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.
This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted (RTM_DELETED)
or has failed to been resolved (RTM_MISS). The latter case can be
interpreted as unreachable.
 1.261  31-Aug-2019  roy inet6: don't set an invalid lladdr in nd6_free()

We don't want to announce that we've deleted a hwaddr of all zeros.
 1.260  27-Aug-2019  roy inet6: nd6_free assumes all routers are processed by kernel RA

This hasn't been the case for a long time if you're a dhcpcd
user with a default config. As such, it's possible for the default
IPv6 router as set by dhcpcd could be erroneously gc'ed by nd6_free.

This reduces the scope of the ND6_WLOCK taken as well as fixing an
issue where we write to ln->ln_state without a lock being held.
 1.259  22-Aug-2019  roy nd6: notify userland of neighbour lla updates once more

XXX pullup -8 -9
 1.258  22-Aug-2019  roy rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9
 1.257  14-Aug-2019  ozaki-r Add missing IFNET_LOCK for regen_tmpaddr

Reported by ryo@
 1.256  26-Jul-2019  christos branches: 1.256.2;
Decrease the reference count before freeing, so that the entries actually
get free'd. (Ryota Ozaki)
 1.255  28-Jun-2019  ozaki-r nd6: restore a missing reachability confirmation

On sending a packet over a STALE cache, the cache should be tried a reachability
confirmation, which is described in RFC 2461/4861 7.3.3. On the fast path in
nd6_resolve, however, the treatment for STALE caches has been skipped
accidentally. So STALE caches never be back to the REACHABLE state.

To fix the issue, branch to the fast path only when the cache entry is the
REACHABLE state and leave other caches to the slow path that includes the
treatment. To this end we need to allow to return a link-layer address if a
valid address is available on the slow path too, which is the same behavior as
FreeBSD and OpenBSD.
 1.254  13-May-2019  christos print the name of the interface that was disabled.
 1.253  29-Apr-2019  roy rtsock: Route address message simplification

Rename rt_newaddrmsg to rt_addrmsg_rt.
Add rt_addrmsg which drops the error and route arguments which are only
needed by one caller.
 1.252  16-Dec-2018  roy netinet6: only flush prefixes and routers for the given interface.

Unless it's lo0, where we then flush the lot.
The maintains the status-quo with ndp(8) and allows dhcpcd(8) to at least
try and work with kernel RA on one interface and dhcpcd on another.
 1.251  30-Oct-2018  ozaki-r Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.
 1.250  03-Sep-2018  riastradh Rename min/max -> uimin/uimax for better honesty.

These functions are defined on unsigned int. The generic name
min/max should not silently truncate to 32 bits on 64-bit systems.
This is purely a name change -- no functional change intended.

HOWEVER! Some subsystems have

#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) > (b) ? (a) : (b))

even though our standard name for that is MIN/MAX. Although these
may invite multiple evaluation bugs, these do _not_ cause integer
truncation.

To avoid `fixing' these cases, I first changed the name in libkern,
and then compile-tested every file where min/max occurred in order to
confirm that it failed -- and thus confirm that nothing shadowed
min/max -- before changing it.

I have left a handful of bootloaders that are too annoying to
compile-test, and some dead code:

cobalt ews4800mips hp300 hppa ia64 luna68k vax
acorn32/if_ie.c (not included in any kernels)
macppc/if_gm.c (superseded by gem(4))

It should be easy to fix the fallout once identified -- this way of
doing things fails safe, and the goal here, after all, is to _avoid_
silent integer truncations, not introduce them.

Maybe one day we can reintroduce min/max as type-generic things that
never silently truncate. But we should avoid doing that for a while,
so that existing code has a chance to be detected by the compiler for
conversion to uimin/uimax without changing the semantics until we can
properly audit it all. (Who knows, maybe in some cases integer
truncation is actually intended!)
 1.249  29-May-2018  ozaki-r branches: 1.249.2;
Make a deletion of in6m in nd6_rtrequest atomic
 1.248  01-May-2018  maxv Remove now unused net_osdep.h includes, the other BSDs did the same.
 1.247  06-Mar-2018  roy nd6: add a nonce to DaD probes in-case they are looped back to us

This implements RFC 7527, based a similar change in FreeBSD.
 1.246  06-Mar-2018  ozaki-r Fix reference leaks of llentry

callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).

While here, we can remove remaining abuses of mutex_owned for softnet_lock.
 1.245  29-Jan-2018  christos branches: 1.245.2;
more cleanup (don't allow oldlenp == NULL)
 1.244  29-Jan-2018  pgoyette One more from christos@

No need to initialize fill_func
 1.243  29-Jan-2018  pgoyette More simplification, this time from ozaki-r@

No need to break after return.
 1.242  29-Jan-2018  pgoyette Simplify, from christos@
 1.241  29-Jan-2018  pgoyette Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.

Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.

Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.

As discussed on IRC.
OK kamil@ and christos@

XXX Needs pull-up to netbsd-8 branch.
 1.240  15-Dec-2017  ozaki-r Ensure to call if_mcast_op with holding IFNET_LOCK

Note that CARP doesn't deal with IFNET_LOCK yet.
 1.239  17-Nov-2017  ozaki-r Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch

It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.

No functional change
 1.238  10-Nov-2017  ozaki-r Use psref instead of pserialize because that code is sleepable
 1.237  10-Nov-2017  ozaki-r Fix a deadlock between a route update and lltable

It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.

If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.

A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update

Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.

Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.236  05-Oct-2017  ozaki-r Add missing NULL check

PR kern/52554
 1.235  22-Jun-2017  ozaki-r Remove unused function (nd6_rem_ifa_lle)
 1.234  21-Jun-2017  ozaki-r Don't create a permanent L2 cache entry on adding an address to an interface

It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
 1.233  16-Jun-2017  ozaki-r Sending a routing message (RTM_ADD) on adding an llentry

A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.

Requested by ryo@
 1.232  01-Jun-2017  chs branches: 1.232.2;
remove checks for failure after memory allocation calls that cannot fail:

kmem_alloc() with KM_SLEEP
kmem_zalloc() with KM_SLEEP
percpu_alloc()
pserialize_create()
psref_class_create()

all of these paths include an assertion that the allocation has not failed,
so callers should not assert that again.
 1.231  01-Mar-2017  ozaki-r Restore/add some softnet_lock for nd6_rt_flush and defrouter_addreq

May help PR kern/52015
 1.230  22-Feb-2017  ozaki-r Stop using useless IN6_*_MULTI macros
 1.229  22-Feb-2017  ozaki-r Use kmem istead of malloc
 1.228  22-Feb-2017  ozaki-r Fix prefix invalidation via nd6_timer

We cannot remove a prefix there. Instead just invalidate it; the prefix
will be removed when purging an associated address. This is the same as
the original behavior.
 1.227  14-Feb-2017  ozaki-r Do ND in L2_output in the same manner as arpresolve

The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
 1.226  16-Jan-2017  christos ip6_sprintf -> IN6_PRINT so that we pass the size.
 1.225  16-Jan-2017  ryo Make ip6_sprintf(), in_fmtaddr(), lla_snprintf() and icmp6_redirect_diag() mpsafe.

Reviewed by ozaki-r@
 1.224  11-Jan-2017  ozaki-r branches: 1.224.2;
Get rid of unnecessary header inclusions
 1.223  22-Dec-2016  ozaki-r Remove assertion that the lock isn't held

It's useless in this case, because without it we can know that
the lock is held or not on a next lock acquisition and even more
if LOCKDEBUG is enabled a failure on the acquisition will provide
useful information for debugging while an assertion failure will
provide just the fact that the assertion failed.
 1.222  21-Dec-2016  ozaki-r Fix deadlock between llentry timers and destruction of llentry

llentry timer (of nd6) holds both llentry's lock and softnet_lock.
A caller also holds them and calls callout_halt to wait for the
timer to quit. However we can pass only one lock to callout_halt,
so passing either of them can cause a deadlock. Fix it by avoid
calling callout_halt without holding llentry's lock.

BTW in the first place we cannot pass llentry's lock to callout_halt
because it's a rwlock...
 1.221  21-Dec-2016  ozaki-r Hold the big locks only where they are needed
 1.220  19-Dec-2016  ozaki-r Protect IPv6 default router and prefix lists with coarse-grained rwlock

in6_purgeaddr (in6_unlink_ifa) itself unrefernces a prefix entry and calls
nd6_prelist_remove if the counter becomes 0, so callers doesn't need to
handle the reference counting.

Performance-sensitive paths (sending/forwarding packets) call just one
reader lock. This is a trade-off between performance impact vs. the amount
of efforts; if we want to remove the reader lock, we need huge amount of
works including destroying objects with psz/psref in softint, for example.
 1.219  19-Dec-2016  ozaki-r Kill pr->ndpr_refcnt = 0

The reference counter represents the numuber of references from IPv6
addresses to a prefix entry. If all IPv6 addresses assigned to an
interface are purged, all references to a prefix for the interface are
also released. For now nd6_purge is always called after purging all IPv6
addresses, so we can get rid of clearing pr->ndpr_refcnt from nd6_purge
and instead we can assert it's 0 there.

Note that nd6_ifdetach is only called via dom_ifdetach when processing
if_detach where dom_ifdetach is called after pr_purgeif that eventually
calls in6_ifdetach. So in the call path nd6_purge in nd6_ifdetach does
nothing. That said, we should explicitly make it sure to purge all
IPv6 addresses before nd6_purge for future changes (or the case I missed
something). So if_purgeaddrs is added to nd6_ifdetach.
 1.218  19-Dec-2016  ozaki-r Get rid of extra nd6_purge from in6_ifdetach

There were two nd6_purge in in6_ifdetach for some reason, but at least now
We don't need extra nd6_purge. Remove it and instead add assertions that
check if surely purged.
 1.217  14-Dec-2016  ozaki-r Make functions static
 1.216  12-Dec-2016  ozaki-r Make the routing table and rtcaches MP-safe

See the following descriptions for details.

Proposed on tech-kern and tech-net


Overview
 1.215  12-Dec-2016  ozaki-r Introduce macros for the prefix list

No functional change.
 1.214  12-Dec-2016  ozaki-r Introduce macros for the default router list

No functional change.
 1.213  11-Dec-2016  ozaki-r Add nd6_ prefix to exported functions
 1.212  11-Dec-2016  ozaki-r Move default interface things from nd6_rtr.c to nd6.c
 1.211  14-Nov-2016  ozaki-r Add missing rtfree
 1.210  02-Nov-2016  ozaki-r Add missing pserialize_read_exit
 1.209  18-Oct-2016  ozaki-r Don't hold global locks if NET_MPSAFE is enabled

If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.

By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.

Reviewed by knakahara@
 1.208  18-Oct-2016  ozaki-r Fix indentation
 1.207  02-Sep-2016  ozaki-r Don't GC an NDP cache that is added just before GC

This fixes unstable test results of ndp_neighborgcthresh.
 1.206  06-Aug-2016  roy Set RTF_CONNECTED instead of setting only RTF_CONNECTED.
 1.205  01-Aug-2016  ozaki-r Apply pserialize and psref to struct ifaddr and its variants

This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.
 1.204  15-Jul-2016  ozaki-r Use sin6tosa and sin6tocsa macros

No functional change.
 1.203  11-Jul-2016  ozaki-r branches: 1.203.2;
Run timers in workqueue

Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).

Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.

Proposed on tech-net and tech-kern.
 1.202  07-Jul-2016  ozaki-r Switch the address list of intefaces to pslist(9)

As usual, we leave the old list to avoid breaking kvm(3) users.
 1.201  05-Jul-2016  ozaki-r Constify an argument of regen_tmpaddr
 1.200  05-Jul-2016  ozaki-r KNF
 1.199  04-Jul-2016  ozaki-r Use pslist(9) for the global in6_ifaddr list

psz and psref will be applied in another commit.

No functional change intended.
 1.198  30-Jun-2016  ozaki-r Make sure that ifaddr is published after its initialization finished

Basically we should insert an item to a collection (say a list) after
item's initialization has been completed to avoid accessing an item
that is initialized halfway. ifaddr (in{,6}_ifaddr) isn't processed
like so and needs to be fixed.

In order to do so, we need to tweak {arp,nd6}_rtrequest that depend
on that an ifaddr is inserted during its initialization; they explore
interface's address list to determine that rt_getkey(rt) of a given
rtentry is in the list to know whether the route's interface should
be a loopback, which doesn't work after the change. To make it work,
first check RTF_LOCAL flag that is set in rt_ifa_addlocal that calls
{arp,nd6}_rtrequest eventually. Note that we still need the original
code for the case to remove and re-add a local interface route.
 1.197  21-Jun-2016  ozaki-r Fix nd6_output (if_output_lock conversion mistake)
 1.196  20-Jun-2016  knakahara apply if_output_lock() to L3 callers which call ifp->if_output() of L2(or L3 tunneling).
 1.195  18-May-2016  ozaki-r Get rid of unnecessary assignment
 1.194  12-May-2016  ozaki-r Protect ifnet list with psz and psref

The change ensures that ifnet objects in the ifnet list aren't freed during
list iterations by using pserialize(9) and psref(9).

Note that the change adds a pslist(9) for ifnet but doesn't remove the
original ifnet list (ifnet_list) to avoid breaking kvm(3) users. We
shouldn't use the original list in the kernel anymore.
 1.193  26-Apr-2016  ozaki-r Stop using rt_gwroute on packet sending paths

rt_gwroute of rtentry is a reference to a rtentry of the gateway
for a rtentry with RTF_GATEWAY. That was used by L2 (arp and ndp)
to look up L2 addresses. By separating L2 nexthop caches, we don't
need a route for the purpose and we can stop using rt_gwroute.
By doing so, we can reduce referencing and modifying rtentries,
which makes it easy to apply a lock (and/or psref) to the
routing table and rtentries.

One issue to do this is to keep RTF_REJECT behavior. It seems it
was broken when we moved rtalloc1 things from L2 output routines
(e.g., ether_output) to ip_hresolv_output, but (fortunately?)
it works unexpectedly. What we mistook are:
- RTF_REJECT was checked for any routes in L2 output routines,
but in ip_hresolv_output it is checked only when the route
is RTF_GATEWAY
- The RTF_REJECT check wasn't copied to IPv6 (nd6_output)

It seems that rt_gwroute checks hid the mistakes and it looked
work (unexpectedly) and removing rt_gwroute checks unveil the
issue. So we need to fix RTF_REJECT checks in ip_hresolv_output
and also add them to nd6_output.

One more point we have to care is returning an errno; we need
to mimic looutput behavior. Originally RTF_REJECT check was
done either in L2 output routines or in looutput. The latter is
applied when a reject route directs to a loopback interface.
However, now RTF_REJECT check is done before looutput so to keep
the original behavior we need to return an errno which looutput
chooses. Added rt_check_reject_route does such tweaks.
 1.192  25-Apr-2016  ozaki-r Check error of rt_setgate and rt_settag
 1.191  21-Apr-2016  ozaki-r Fix RTF_{REJECT,BLACKHOLE} behavior for IPv6 routes

We still need a nexthop route to reflect RTF_{REJECT,BLACKHOLE}.
In the future, we would do it w/o looking up a route.
 1.190  10-Apr-2016  ozaki-r Don't call pfxlist_onlink_check with holding llentry lock

Sync nd6_free with FreeBSD (as of 2016-04-10).

Should fix PR kern/51056.
 1.189  04-Apr-2016  roy all1_sa is no longer used.
 1.188  04-Apr-2016  ozaki-r Separate nexthop caches from the routing table

By this change, nexthop caches (IP-MAC address pair) are not stored
in the routing table anymore. Instead nexthop caches are stored in
each network interface; we already have lltable/llentry data structure
for this purpose. This change also obsoletes the concept of cloning/cloned
routes. Cloned routes no longer exist while cloning routes still exist
with renamed to connected routes.

Noticeable changes are:
- Nexthop caches aren't listed in route show/netstat -r
- sysctl(NET_RT_DUMP) doesn't return them
- If RTF_LLDATA is specified, it returns nexthop caches
- Several definitions of routing flags and messages are removed
- RTF_CLONING, RTF_XRESOLVE, RTF_LLINFO, RTF_CLONED and RTM_RESOLVE
- RTF_CONNECTED is added
- It has the same value of RTF_CLONING for backward compatibility
- route's -xresolve, -[no]cloned and -llinfo options are removed
- -[no]cloning remains because it seems there are users
- -[no]connected is introduced and recommended
to be used instead of -[no]cloning
- route show/netstat -r drops some flags
- 'L' and 'c' are not seen anymore
- 'C' now indicates a connected route
- Gateway value of a route of an interface address is now not
a L2 address but "link#N" like a connected (cloning) route
- Proxy ARP: "arp -s ... pub" doesn't create a route

You can know details of behavior changes by seeing diffs under tests/.

Proposed on tech-net and tech-kern:
http://mail-index.netbsd.org/tech-net/2016/03/11/msg005701.html
 1.187  01-Apr-2016  ozaki-r Refine nd6log

Add __func__ to nd6log itself instead of adding it to callers.
 1.186  01-Apr-2016  ozaki-r Tidy up nd6_timer initialization
 1.185  04-Feb-2016  riastradh Declare in6_tmpaddrtimer_ch in in6_var.h.

Do not declare extern variables in .c files!
 1.184  08-Jan-2016  ozaki-r Add missing RTF_LOCAL; sync with arp_setgate
 1.183  18-Dec-2015  ozaki-r Add missing LLE_WUNLOCK to nd6_free
 1.182  07-Dec-2015  ozaki-r CID 1341546: Fix integer handling issue (CONSTANT_EXPRESSION_RESULT)

n > INT_MAX where n is a long integer variable never be true on 32bit
architectures. Use time_t(int64_t) instead of long for the variable.
 1.181  25-Nov-2015  ozaki-r Use lltable/llentry for NDP

lltable and llentry were introduced to replace ARP cache data structure
for further restructuring of the routing table: L2 nexthop cache
separation. This change replaces the NDP cache data structure
(llinfo_nd6) with them as well as ARP.

One noticeable change is for neighbor cache GC mechanism that was
introduced to prevent IPv6 DoS attacks. net.inet6.ip6.neighborgcthresh
was the max number of caches that we store in the system. After
introducing lltable/llentry, the value is changed to be per-interface
basis because lltable/llentry stores neighbor caches in each interface
separately. And the change brings one degradation; the old GC mechanism
dropped exceeded packets based on LRU while the new implementation drops
packets in order from the beginning of lltable (a hash table + linked
lists). It would be improved in the future.

Added functions in in6.c come from FreeBSD (as of r286629) and are
tweaked for NetBSD.

Proposed on tech-kern and tech-net.
 1.180  19-Nov-2015  ozaki-r Call icmp6_error2 after releasing ln

This is a restructuring for coming changes.

From FreeBSD
 1.179  18-Nov-2015  ozaki-r Stop passing llinfo_nd6 to nd6_ns_output

This is a restructuring for coming changes to nd6 (replacing
llinfo_nd6 with llentry). Once we have a lock of llinfo_nd6,
we need to pass it to nd6_ns_output with holding the lock.
However, in a function subsequent to nd6_ns_output, the llinfo_nd6
may be looked up, i.e., its lock would be acquired again.
To avoid such a situation, pass only required data (in6_addr) to
nd6_ns_output instead of passing whole llinfo_nd6.

Inspired by FreeBSD
 1.178  18-Nov-2015  ozaki-r Unify nd6_ns_output calls in nd6_llinfo_timer

Inspired by FreeBSD
 1.177  11-Sep-2015  roy If, for whatever reason, a local interface route is removed and then
re-added, mark it as a local route.

While here, if changing the route to go via the loopback interface
remove any inherited MTU value.
 1.176  04-Sep-2015  ozaki-r Pull nexthop determination routine from nd6_output

It simplifies nd6_output and the nexthop determination routine slightly.
 1.175  03-Sep-2015  ozaki-r Fix rtfree in nd6_output

We have to check and avoid to rtfree the original rtentry passed to
nd6_output even when manipulating gateway routes.

This fixes panic on assertion "ro->_ro_rt ==NULL || ro->_ro_rt->rt_refcnt > 0"
failure and probably PR kern/50161.
 1.174  02-Sep-2015  ozaki-r Do rt_refcnt++ when set a rtentry to another rtentry's rt_gwroute

And also do rtfree when deref a rtentry from rt_gwroute.
 1.173  02-Sep-2015  ozaki-r Use KASSERT to check programming errors
 1.172  01-Sep-2015  ozaki-r Move a rtentry definition to reduce its scope

No functional change.
 1.171  01-Sep-2015  ozaki-r Cleanup nd6_nud_hint

The deleted rtfree was never called.
 1.170  31-Aug-2015  ozaki-r Remove leading whitespaces
 1.169  24-Aug-2015  pooka sprinkle _KERNEL_OPT
 1.168  11-Aug-2015  ozaki-r Fix double rtfree
 1.167  11-Aug-2015  ozaki-r Free rtentry when we successfully obtain it but return NULL
 1.166  07-Aug-2015  ozaki-r Use time_uptime instead of time_second to avoid time leaps

Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.

This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.

Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html

Discussed on tech-kern and tech-net.
 1.165  17-Jul-2015  ozaki-r Reform use of rt_refcnt

rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.

These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.

This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)

Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)
 1.164  15-Jul-2015  ozaki-r Remove unused arguments and the associated code from nd6_nud_hint()

from OpenBSD
 1.163  30-Jun-2015  ozaki-r Use KASSERT for argument NULL checks
 1.162  30-Apr-2015  ozaki-r Don't take KERNEL_LOCK for if_output when NET_MPSAFE
 1.161  30-Mar-2015  ozaki-r Tidy up opt_ipsec.h inclusions
 1.160  25-Feb-2015  roy Rename nd6_rtmsg() to rt_newmsg() and move into the generic routing code
as it's not IPv6 specific and will be used elsewhere.
 1.159  25-Feb-2015  roy Retire nd6_newaddrmsg and use rt_newaddrmsg directly instead so that
we don't spam route changes when the route hasn't changed.
 1.158  23-Feb-2015  martin Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.157  17-Feb-2015  christos "something odd happens" is not a useful error message.
 1.156  16-Dec-2014  roy Report route additions/changes/deletions for cached neighbours to userland.
 1.155  03-Dec-2014  christos more debugging info...
 1.154  18-Oct-2014  snj branches: 1.154.2;
src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
 1.153  14-Oct-2014  roy Tests for neighbour now work correctly on bridge(4) and carp(4) interfaces.
 1.152  06-Jun-2014  rmind branches: 1.152.2;
- Eliminate RTFREE() macro in favour of rtfree() function.
- Make rtcache() function static.
 1.151  05-Jun-2014  roy Add IPV6CTL_AUTO_LINKLOCAL and ND6_IFF_AUTO_LINKLOCAL toggles which
control the automatic creation of IPv6 link-local addresses when an
interface is brought up.

Taken from FreeBSD.
 1.150  20-May-2014  bouyer Sync with the ipv4 code and call ifp->if_output() with KERNEL_LOCK
held.
Problem reported and fix tested by njoly@ on current-users@
 1.149  17-May-2014  rmind - Move IFNET_*() macros under #ifdef _KERNEL.
- Replace TAILQ_FOREACH on ifnet with IFNET_FOREACH().
 1.148  20-Mar-2014  roy branches: 1.148.2;
If IPv6 is disabled for an interface, mark all addresses as tentative.
If enabled, check for a duplicated link-local address and abort enabling
as per RFC 4862, section 5.4.5. If allowed to enable, perform DAD
on the tentative addresses.

Taken from FreeBSD.
 1.147  15-Jan-2014  roy If the address matches a cloning route, it is also a neighbor.
This allows us to use prefixes which userland may have added.
 1.146  17-Dec-2013  martin Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
 1.145  21-May-2013  roy branches: 1.145.2;
For IPv6, emit RTM_NEWADDR once DAD completes and also when address flag
changes. Tentative addresses are not emitted.

Version bumped so userland can detect this behaviour change.
 1.144  24-Jan-2013  joerg Use rt_getkey.
 1.143  23-Jun-2012  christos branches: 1.143.2;
4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.142  22-Mar-2012  drochner remove KAME IPSEC, replaced by FAST_IPSEC
 1.141  03-Feb-2012  christos branches: 1.141.2; 1.141.6; 1.141.8;
PR/45764, PR/45914
Part 1:
nd6_purge can be called after dom_ifdetach, and if_afdata[AF_INET6] is
going to be freed and point to garbage. Make sure we check for NULL, before
taking the pointer offset.
While I am here, add an M_ZERO.
 1.140  02-Feb-2012  christos use FOREACH_SAFE.
 1.139  19-Dec-2011  drochner rename the IPSEC in-kernel CPP variable and config(8) option to
KAME_IPSEC, and make IPSEC define it so that existing kernel
config files work as before
Now the default can be easily be changed to FAST_IPSEC just by
setting the IPSEC alias to FAST_IPSEC.
 1.138  19-Nov-2011  tls branches: 1.138.2;
First step of random number subsystem rework described in
<20111022023242.BA26F14A158@mail.netbsd.org>. This change includes
the following:

An initial cleanup and minor reorganization of the entropy pool
code in sys/dev/rnd.c and sys/dev/rndpool.c. Several bugs are
fixed. Some effort is made to accumulate entropy more quickly at
boot time.

A generic interface, "rndsink", is added, for stream generators to
request that they be re-keyed with good quality entropy from the pool
as soon as it is available.

The arc4random()/arc4randbytes() implementation in libkern is
adjusted to use the rndsink interface for rekeying, which helps
address the problem of low-quality keys at boot time.

An implementation of the FIPS 140-2 statistical tests for random
number generator quality is provided (libkern/rngtest.c). This
is based on Greg Rose's implementation from Qualcomm.

A new random stream generator, nist_ctr_drbg, is provided. It is
based on an implementation of the NIST SP800-90 CTR_DRBG by
Henric Jungheim. This generator users AES in a modified counter
mode to generate a backtracking-resistant random stream.

An abstraction layer, "cprng", is provided for in-kernel consumers
of randomness. The arc4random/arc4randbytes API is deprecated for
in-kernel use. It is replaced by "cprng_strong". The current
cprng_fast implementation wraps the existing arc4random
implementation. The current cprng_strong implementation wraps the
new CTR_DRBG implementation. Both interfaces are rekeyed from
the entropy pool automatically at intervals justifiable from best
current cryptographic practice.

In some quick tests, cprng_fast() is about the same speed as
the old arc4randbytes(), and cprng_strong() is about 20% faster
than rnd_extract_data(). Performance is expected to improve.

The AES code in src/crypto/rijndael is no longer an optional
kernel component, as it is required by cprng_strong, which is
not an optional kernel component.

The entropy pool output is subjected to the rngtest tests at
startup time; if it fails, the system will reboot. There is
approximately a 3/10000 chance of a false positive from these
tests. Entropy pool _input_ from hardware random numbers is
subjected to the rngtest tests at attach time, as well as the
FIPS continuous-output test, to detect bad or stuck hardware
RNGs; if any are detected, they are detached, but the system
continues to run.

A problem with rndctl(8) is fixed -- datastructures with
pointers in arrays are no longer passed to userspace (this
was not a security problem, but rather a major issue for
compat32). A new kernel will require a new rndctl.

The sysctl kern.arandom() and kern.urandom() nodes are hooked
up to the new generators, but the /dev/*random pseudodevices
are not, yet.

Manual pages for the new kernel interfaces are forthcoming.
 1.137  10-Nov-2011  seanb - Remove unused variable from nd6_timer().
 1.136  15-Jul-2010  dyoung branches: 1.136.8;
To help find the cause of kernel complaints such as "/netbsd:
nd6_storelladdr: sdl_alen == 0, dst=... if=wm1", add printfs for some
"impossible" conditions, and make the nd6_storelladdr() printf more
informative by printing the value of sdl_alen.
 1.135  06-Nov-2009  dyoung branches: 1.135.2; 1.135.4;
Fix net.inet6.ip6.accept_rtadv and 'ndp -i <interface> accept_rtadv':

Add a flag ND6_IFF_OVERRIDE_RTADV that tells the kernel to override
ip6_accept_rtadv (net.inet6.ip6.accept_rtadv) on an interface.

Add a routine nd6_accepts_rtadv(ndi) that evaluates both the flags
on the interface represented by ndi and ip6_accept_rtadv, and
returns 'true' if the given interface should accept Router
Advertisements, and 'false' if not.

Now, ND6_IFF_ACCEPT_RTADV works as it was historically documented:
if it is set, then accept router advertisements iff ip6_accept_rtadv
!= 0. Otherwise, do not accept router advertisements.

If ND6_IFF_OVERRIDE_RTADV is set, then the flag ND6_IFF_ACCEPT_RTADV
overrides ip6_accept_rtadv: if ND6_IFF_ACCEPT_RTADV is set, accept;
otherwise reject. Ignore ip6_accept_rtadv.

If neither ND6_IFF_ACCEPT_RTADV nor ND6_IFF_OVERRIDE_RTADV is set,
reject Router Advertisements.
 1.134  31-Aug-2009  yamt nd6_ifattach: fix a missing parens bug in rev.1.132.
 1.133  06-Aug-2009  cegger Check if ndi is valid before use.
ok tonnerre@
 1.132  25-Jul-2009  tonnerre Instead of using the net.inet6.ip6.accept_rtadv sysctl for all devices,
make net.inet6.ip6.accept_rtadv the default for individual per-device
settings so people can use the ndp(8) utility to set per-device whether
or not to accept route advertisements.

rtadvd changes to follow.

(Debated on tech-net@ before but almost two weeks passed by without any
comment on the patch.)
 1.131  07-Nov-2008  dyoung *** Summary ***

When a link-layer address changes (e.g., ifconfig ex0 link
02:de:ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.

Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)

Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.

Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.

Return consistent, appropriate error codes from network drivers.

Improve readability. KNF.

*** Details ***

In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.

In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.

Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.

In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.

Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:

switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}

Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,

switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}

unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).

In ipw(4), remove an if_set_sadl() call that is out of place.

In nfe(4), reuse the jumbo MTU logic in ether_ioctl().

Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.

Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.

Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.

Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.

In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.

Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.

In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.

Let ifioctl_common() handle SIOCGIFADDR.

Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.

In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.

bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
 1.130  24-Oct-2008  dyoung branches: 1.130.2; 1.130.4; 1.130.10; 1.130.14;
Constify the rt_addrinfo argument to the ifa_rtrequest member
function of struct ifaddr.
 1.129  24-Oct-2008  dyoung bzero -> memset. Do not "test truth" of pointers, but compare with
NULL, instead. Do not gratuitously cast to void *. Use NULL
instead of (type *)0.

No functional changes intended.
 1.128  15-May-2008  dyoung branches: 1.128.4;
Simplify RT_DPRINTF() calls.
 1.127  11-May-2008  dyoung Compare route with NULL instead of testing truth. Where applicable,
s/0/NULL/. s/u_char/uint8_t/. Remove superfluous curly braces.
 1.126  24-Apr-2008  ad branches: 1.126.2; 1.126.4;
Merge the socket locking patch:

- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.

With much feedback from matt@ and plunky@.
 1.125  15-Apr-2008  thorpej branches: 1.125.2;
Make ip6 and icmp6 stats per-cpu.
 1.124  08-Apr-2008  thorpej Change ICMP6 stats from a structure to an array of uint64_t's.

Note: This is ABI-compatible with the old icmp6stat structure; old netstat
binaries will continue to work properly.
 1.123  04-Dec-2007  dyoung branches: 1.123.8; 1.123.12;
Use IFNET_FOREACH() and IFADDR_FOREACH().
 1.122  10-Nov-2007  dyoung branches: 1.122.2;
Use sockaddr_in6_init(). Use a static initializer for all1_sa.
Constify a cast (may as well). No functional change intended.
 1.121  01-Nov-2007  dyoung branches: 1.121.2;
De-__P().
 1.120  02-Sep-2007  dyoung branches: 1.120.4;
We cannot sleep in a software interrupt, so do not sockaddr_dl_alloc(...,
M_WAITOK). Instead, sockaddr_dl_init() a sockaddr_dl on the stack.
 1.119  30-Aug-2007  dyoung Use malloc(9) for sockaddrs instead of pool(9), and remove dom_sa_pool
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.

Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.

Avoid using sizeof(struct sockaddr_dl) in the kernel.

Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.

Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().

Constify: LLADDR() -> CLLADDR().

Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
 1.118  07-Aug-2007  dyoung branches: 1.118.2; 1.118.4;
Avoid writing past the end of the buffer [lldst, lldst + dstsize)
in nd6_storelladdr().

Use sockaddr_dl_setaddr(). Constify some sockaddr_dl's. Constify
a sockaddr argument to nd6_na_output(). Change SDL() to "standard"
satocsdl() or satosdl(). Change SIN6() to satocsin6() or satosin6().

bcmp -> memcmp, bcopy -> memcpy.
 1.117  19-Jul-2007  dyoung branches: 1.117.4;
Take steps to hide the radix_node implementation of the forwarding table
from the forwarding table's users:

Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.

Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.

Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).

Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).

Cosmetic:

Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.

Stop using variadic arguments for rip6_output(), it is
unnecessary.

Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.

Make rt_maskedcopy() easier to read by using meaningful variable
names.

Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.

Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.

One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.

I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.

Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
 1.116  09-Jul-2007  ad branches: 1.116.2;
Merge some of the less invasive changes from the vmlocking branch:

- kthread, callout, devsw API changes
- select()/poll() improvements
- miscellaneous MT safety improvements
 1.115  17-May-2007  dyoung Fix the memory leak reported in kern/36337. Thanks Matthias Scheler
for the heads-up. My fix is based on the following patches from
FreeBSD, however, I extracted the code into a subroutine,
nd6_llinfo_release_pkts():

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/nd6.c.diff?r1=1.48.2.18;r2=1.48.2.19
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/netinet6/nd6_nbr.c.diff?r1=1.29.2.8;r2=1.29.2.9
 1.114  02-May-2007  dyoung Eliminate address family-specific route caches (struct route, struct
route_in6, struct route_iso), replacing all caches with a struct
route.

The principle benefit of this change is that all of the protocol
families can benefit from route cache-invalidation, which is
necessary for correct routing. Route-cache invalidation fixes an
ancient PR, kern/3508, at long last; it fixes various other PRs,
also.

Discussions with and ideas from Joerg Sonnenberger influenced this
work tremendously. Of course, all design oversights and bugs are
mine.

DETAILS

1 I added to each address family a pool of sockaddrs. I have
introduced routines for allocating, copying, and duplicating,
and freeing sockaddrs:

struct sockaddr *sockaddr_alloc(sa_family_t af, int flags);
struct sockaddr *sockaddr_copy(struct sockaddr *dst,
const struct sockaddr *src);
struct sockaddr *sockaddr_dup(const struct sockaddr *src, int flags);
void sockaddr_free(struct sockaddr *sa);

sockaddr_alloc() returns either a sockaddr from the pool belonging
to the specified family, or NULL if the pool is exhausted. The
returned sockaddr has the right size for that family; sa_family
and sa_len fields are initialized to the family and sockaddr
length---e.g., sa_family = AF_INET and sa_len = sizeof(struct
sockaddr_in). sockaddr_free() puts the given sockaddr back into
its family's pool.

sockaddr_dup() and sockaddr_copy() work analogously to strdup()
and strcpy(), respectively. sockaddr_copy() KASSERTs that the
family of the destination and source sockaddrs are alike.

The 'flags' argumet for sockaddr_alloc() and sockaddr_dup() is
passed directly to pool_get(9).

2 I added routines for initializing sockaddrs in each address
family, sockaddr_in_init(), sockaddr_in6_init(), sockaddr_iso_init(),
etc. They are fairly self-explanatory.

3 structs route_in6 and route_iso are no more. All protocol families
use struct route. I have changed the route cache, 'struct route',
so that it does not contain storage space for a sockaddr. Instead,
struct route points to a sockaddr coming from the pool the sockaddr
belongs to. I added a new method to struct route, rtcache_setdst(),
for setting the cache destination:

int rtcache_setdst(struct route *, const struct sockaddr *);

rtcache_setdst() returns 0 on success, or ENOMEM if no memory is
available to create the sockaddr storage.

It is now possible for rtcache_getdst() to return NULL if, say,
rtcache_setdst() failed. I check the return value for NULL
everywhere in the kernel.

4 Each routing domain (struct domain) has a list of live route
caches, dom_rtcache. rtflushall(sa_family_t af) looks up the
domain indicated by 'af', walks the domain's list of route caches
and invalidates each one.
 1.113  17-Mar-2007  dyoung In nd6_rtrequest(), when we lookup/create a route whose destination
is equal to one of the host's IPv6 addresses, do not stop at setting
the route's interface to lo0, but also clear the route's RTF_CLONED
flag, if it is present, so that ip6_input() will accept packets
sent to that destination. This is necessary because ip6_input()
will not accept a packet if it looks up the packet's destination
and finds a route with RTF_CLONED set.

I believe this will help IPv6 networking survive '/etc/rc.d/network
restart'. See the problem report, kern/33279.
 1.112  15-Mar-2007  dyoung In nd6_lookup, shorten a staircase. KNF: change return (expr); to
return expr; throughout. Fix K&R prototypes and parameter type
declarations.
 1.111  04-Mar-2007  christos branches: 1.111.2; 1.111.4; 1.111.6;
Kill caddr_t; there will be some MI fallout, but it will be fixed shortly.
 1.110  17-Feb-2007  dyoung KNF: de-__P, bzero -> memset, bcmp -> memcmp. Remove extraneous
parentheses in return statements.

Cosmetic: don't open-code TAILQ_FOREACH().

Cosmetic: change types of variables to avoid oodles of casts: in
in6_src.c, avoid casts by changing several route_in6 pointers
to struct route pointers. Remove unnecessary casts to caddr_t
elsewhere.

Pave the way for eliminating address family-specific route caches:
soon, struct route will not embed a sockaddr, but it will hold
a reference to an external sockaddr, instead. We will set the
destination sockaddr using rtcache_setdst(). (I created a stub
for it, but it isn't used anywhere, yet.) rtcache_free() will
free the sockaddr. I have extracted from rtcache_free() a helper
subroutine, rtcache_clear(). rtcache_clear() will "forget" a
cached route, but it will not forget the destination by releasing
the sockaddr. I use rtcache_clear() instead of rtcache_free()
in rtcache_update(), because rtcache_update() is not supposed
to forget the destination.

Constify:

1 Introduce const accessor for route->ro_dst, rtcache_getdst().

2 Constify the 'dst' argument to ifnet->if_output(). This
led me to constify a lot of code called by output routines.

3 Constify the sockaddr argument to protosw->pr_ctlinput. This
led me to constify a lot of code called by ctlinput routines.

4 Introduce const macros for converting from a generic sockaddr
to family-specific sockaddrs, e.g., sockaddr_in: satocsin6,
satocsin, et cetera.
 1.109  24-Nov-2006  christos branches: 1.109.4;
fix spelling of accommodate; from Zapher.
 1.108  20-Nov-2006  dyoung Use LIST_/TAILQ_ macros, esp. LIST_FOREACH() and TAILQ_FOREACH().
Use the usual idiom for iterating over a list where we might
_REMOVE() entries,

for (x = TAILQ_FIRST(...); x != NULL; x = nx) {
nx = TAILQ_NEXT(x, ...);
...
}
 1.107  16-Nov-2006  christos __unused removal on arguments; approved by core.
 1.106  13-Nov-2006  dyoung Add a source-address selection policy mechanism to the kernel.

Also, add ioctls SIOCGIFADDRPREF/SIOCSIFADDRPREF to get/set preference
numbers for addresses. Make ifconfig(8) set/display preference
numbers.

To activate source-address selection policies in your kernel, add
'options IPSELSRC' to your kernel configuration.

Miscellaneous changes in support of source-address selection:

1 Factor out some common code, producing rt_replace_ifa().

2 Abbreviate a for-loop with TAILQ_FOREACH().

3 Add the predicates on IPv4 addresses IN_LINKLOCAL() and
IN_PRIVATE(), that are true for link-local unicast
(169.254/16) and RFC1918 private addresses, respectively.
Add the predicate IN_ANY_LOCAL() that is true for link-local
unicast and multicast.

4 Add IPv4-specific interface attach/detach routines,
in_domifattach and in_domifdetach, which build #ifdef
IPSELSRC.

See in_getifa(9) for a more thorough description of source-address
selection policy.
 1.105  12-Oct-2006  christos - sprinkle __unused on function decls.
- fix a couple of unused bugs
- no more -Wno-unused for i386
 1.104  02-Sep-2006  christos branches: 1.104.2; 1.104.4;
- fix initializers
- add const
- remove dead code
 1.103  07-Jun-2006  kardel merge FreeBSD timecounters from branch simonb-timecounters
- struct timeval time is gone
time.tv_sec -> time_second
- struct timeval mono_time is gone
mono_time.tv_sec -> time_uptime
- access to time via
{get,}{micro,nano,bin}time()
get* versions are fast but less precise
- support NTP nanokernel implementation (NTP API 4)
- further reading:
Timecounter Paper: http://phk.freebsd.dk/pubs/timecounter.pdf
NTP Nanokernel: http://www.eecis.udel.edu/~mills/ntp/html/kern.html
 1.102  18-May-2006  liamjfoy branches: 1.102.2;
Integrate Common Address Redundancy Procotol (CARP) from OpenBSD

'pseudo-device carp'

Thanks to: joerg@ christos@ riz@ and others who tested
Ok: core@
 1.101  15-Apr-2006  christos Coverity CID 857: Prevent NULL deref.
 1.100  24-Mar-2006  rpaulo From KAME via SUZUKI Shinsuke:
fixed a memory leak when net.inet6.icmp6.nd6_maxqueuelen is
greater than 1.
 1.99  05-Mar-2006  rpaulo branches: 1.99.2; 1.99.4;
NDP-related improvements:
RFC4191
- supports host-side router-preference

RFC3542
- if DAD fails on a interface, disables IPv6 operation on the
interface
- don't advertise MLD report before DAD finishes

Others
- fixes integer overflow for valid and preferred lifetimes
- improves timer granularity for MLD, using callout-timer.
- reflects rtadvd's IPv6 host variable information into kernel
(router only)
- adds a sysctl option to enable/disable pMTUd for multicast
packets
- performs NUD on PPP/GRE interface by default
- Redirect works regardless of ip6_accept_rtadv
- removes RFC1885-related code

From the KAME project via SUZUKI Shinsuke.
Reviewed by core.
 1.98  03-Mar-2006  rpaulo branches: 1.98.2;
Fix typos in comments.

From: the KAME project via SUZUKI Shinsuke.
 1.97  02-Mar-2006  dyoung In nd6_llinfo_timer, don't duplicate part of nd6_llinfo_settimer's
logic, and then call nd6_llinfo_settimer. Instead, call
nd6_llinfo_settimer immediately.

This should cause no functional change. I've been running this
patch for months.
 1.96  21-Jan-2006  rpaulo branches: 1.96.2; 1.96.4;
Better support of IPv6 scoped addresses.

- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.
- and, while there, don't try to remove the ff02::/32 interface route
entry in in6_ifdetach() as it's already gone.

This also includes some level of support for the standard source
address selection algorithm defined in RFC3484, which will be
completed on in the future.

From the KAME project via JINMEI Tatuya.
Approved by core@.
 1.95  11-Dec-2005  christos branches: 1.95.2;
merge ktrace-lwp.
 1.94  29-May-2005  christos branches: 1.94.2;
- avoid shadowed variables
- sprinkle const.
 1.93  27-May-2005  seanb - Arithmetic error when calculating ticks to nd6_llinfo_settimer().
- Reviewed by christos.
 1.92  03-Apr-2005  tron Make sure that prefixes get purged. This fixes PR kern/21189,
PR kern/25968 and PR kern/27873.
 1.91  04-Dec-2004  peter branches: 1.91.4; 1.91.10;
Convert lo(4) to a clonable device.

This also removes the loif array and changes all code to use the new
lo0ifp pointer which points to the lo0 ifnet structure.

Approved by christos.
 1.90  19-May-2004  itojun do not loop on nd6_output() when transmission fails. from kame
 1.89  11-Feb-2004  itojun branches: 1.89.2; 1.89.4;
avoid ugly typecast
 1.88  30-Oct-2003  simonb Remove some assigned-to but otherwise unused variables.
 1.87  22-Aug-2003  itojun correct missing inclusion of opt_ipsec.h
 1.86  27-Jun-2003  itojun branches: 1.86.2;
split ND6 cache timer management to per-entry. increased accuracy,
no O(N) loop. sync w/ kame
 1.85  24-Jun-2003  itojun remove unneeded checks of accept_rtadv. from kame
 1.84  24-Jun-2003  itojun * kame/sys/netinet6/nd6.c (nd6_rtrequest): changed a condition to
decide whether to create an empty llinfo stricter so that a user
can manually change the link-layer address of an existing neighbor
cache.
Pointed out by: KIU Shueng Chuan

from kame
 1.83  24-Jun-2003  itojun use time.tv_sec directly
 1.82  24-Jun-2003  itojun clear ln_hold earlier. from kame
 1.81  04-May-2003  christos print how big the mtu needs to be for ipv6 ppp.
 1.80  25-Feb-2003  he Make sure to initialize callout structs.
 1.79  01-Feb-2003  thorpej Add extensible malloc types, adapted from FreeBSD. This turns
malloc types into a structure, a pointer to which is passed around,
instead of an int constant. Allow the limit to be adjusted when the
malloc type is defined, or with a function call, as suggested by
Jonathan Stone.
 1.78  17-Jan-2003  itojun switch from kame-based m_aux mbuf auxiliary data, to openbsd m_tag
implementation. it will simplify porting across *bsd (such as kame/altq),
and make us more synchronized. from Joel Wilsson
 1.77  09-Oct-2002  itojun suppress too noisy log by default (can be re-enabled by sysctl). sync w/kame
 1.76  27-Sep-2002  provos remove trailing \n in panic(). approved perry.
 1.75  23-Sep-2002  itojun better fix to PR 18163 ("deprecated" flag manipulation). sync w/kame
 1.74  23-Sep-2002  simonb Remove breaks after returns, unreachable returns and returns after
returns(!).
 1.73  11-Sep-2002  itojun KNF - return is not a function. sync w/kame.
 1.72  04-Sep-2002  itojun allow "deprecated" bit to be manually set. PR 18163
 1.71  19-Aug-2002  itojun check error from copyout
 1.70  19-Aug-2002  itojun typo in comment
 1.69  19-Aug-2002  itojun fix copyout() logic. more proper fix to be done on kame tree.
 1.68  19-Aug-2002  itojun copyout only if oldp is non-null
 1.67  19-Aug-2002  itojun need explicit copyout(), apparently
 1.66  09-Jun-2002  itojun whitespace cleanup
 1.65  08-Jun-2002  itojun sync with latest KAME in6_ifaddr/prefix/default router manipulation.
behavior changes:
- two iocts used by ndp(8) are now obsolete (backward compat provided).
use sysctl path instead.
- lo0 does not get ::1 automatically. it will get ::1 when lo0 comes up.
 1.64  07-Jun-2002  itojun If there has been no NS for the neighbor after entering the
INCOMPLETE state, send the first solicitation in nd6_output(), regardless
of the timer value.
revised comments about rate-limiting accordingly.

sync w/kame
 1.63  03-Jun-2002  itojun whitespace at EOL
 1.62  03-Jun-2002  itojun do not hardcode if_mtu values in here, except for IFT_{ARC,FDDI} -
they need special handling. makes it possible to take advantage of 9k ether
frames.
 1.61  30-May-2002  itojun improve nd6_setmtu(), to warn too-small MTU on SIOCSIFMTU. sync w/kame
 1.60  29-May-2002  itojun missing bzero
 1.59  29-May-2002  itojun receivedra field is gone
 1.58  29-May-2002  itojun attach nd_ifinfo structure into if_afdata.
split IPv6 link MTU (advertised by RA) from real link MTU.
sync with kame
 1.57  20-Mar-2002  itojun branches: 1.57.4; 1.57.6;
remove obsolete comment
 1.56  18-Dec-2001  itojun reduce white space/cosmetic diffs w/kame.
 1.55  13-Nov-2001  lukem add RCSIDs
 1.54  17-Oct-2001  itojun do not change neighbor cache state on entry timeout,
if the cache entry is for outgoing router.

perform on-linkness check before default router (re-)seletion.

do not play with interface direct route on nd6_rtrequest.

sync a lot of cosmetic changes. sync with kame
 1.53  17-Oct-2001  itojun unifdef OLDIP6OUTPUT
 1.52  16-Oct-2001  itojun more whitespace/comment sync with kame
 1.51  25-Jul-2001  itojun ifidex2ifnet could contain NULL after if_detach(). sync with kame
 1.50  20-Jul-2001  itojun sync rt_ifp check with IPv4 counterpart (see sys/net/if_ethersubr.c 1.27).
sync with kame
 1.49  29-Jun-2001  itojun branches: 1.49.2;
call defrouter_select() only if it is autoconfigured host.
 1.48  27-Jun-2001  itojun refresh default router list on nd6_detach(), only if we are an
autoconfigured host. bug was that, we will lose default route on
"ifconfig gif0 destroy" even if default is not pointing to gif0.
reported by ume@mahoroba.org. sync with kame
 1.47  22-Jun-2001  itojun select default router again, when L2 address of the router changes
 1.46  24-May-2001  itojun print more diag message on in6_addmulti() failures.
 1.45  30-Mar-2001  itojun enable FAKE_LOOPBACK_IF case by default.
now traffic on loopback interface will be presented to bpf as normal wire
format packet (without KAME scopeid in s6_addr16[1]).

fix KAME PR 250 (host mistakenly accepts packets to fe80::x%lo0).

sync with kame.
 1.44  21-Mar-2001  itojun in nd6_cache_lladdr(), set nd6_gctimer to ln_expire just after the state
transition to STALE. fixes tahi test breakage. sync with kame.
 1.43  08-Mar-2001  itojun nd6_storelladdr() was not consistent about m_freem() policy.
do not touch RTF_STATIC entries (static ND entries) on ND cache update.
couple of costmetic sync. sync with kame
 1.42  23-Feb-2001  itojun branches: 1.42.2;
garbage-collect stale ND entries (default: 1 day).
RFC 2461 5.3. sync with kame.
 1.41  23-Feb-2001  itojun remove unnecessary state, ND6_LLINFO_WAITDELETE, from neighbor cache
state machine.
no need for RTF_REJECT on neighbor cache entires, they are leftover from
ARP code.
sync with kame.
 1.40  21-Feb-2001  itojun make validation code more strict for ND6/dest6 variable length headers.
check duplicated nd6_ifinfo table initialization in a better way.
sync with kame
 1.39  21-Feb-2001  itojun style, to make kame sync easier
 1.38  10-Feb-2001  itojun to sync with kame better, (1) remove register declaration for variables,
(2) sync whitespaces, (3) update comments. (4) bring in some of portability
and logging enhancements. no functional changes here.
 1.37  08-Feb-2001  itojun when chasing nd6_llinfo chain, make sure we do not touch dangling
pointer (due to RTM_DELETE during default router list management).
from kame
 1.36  07-Feb-2001  itojun during ip6/icmp6 inbound packet processing, do not call log() nor printf() in
normal operation (/var can get filled up by flodding bogus packets).
sysctl net.inet6.icmp6.nd6_debug will turn on diagnostic messages.
(#define ND6_DEBUG will turn it on by default)

improve stats in ND6 code.

lots of synchronziation with kame (including comments and cometic ones).
 1.35  05-Feb-2001  chs expose the definitions of MIN() and MAX() in sys/param.h to the kernel
and use those in favor of a dozen copies scattered around the source tree.
 1.34  17-Jan-2001  itojun pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2 (UCB copyrighted).

have sys/net/route.c:rtrequest1(), which takes rt_addrinfo * as the argument.
pass rt_addrinfo all the way down to rtrequest, and ifa->ifa_rtrequest.
3rd arg of ifa->ifa_rtrequest is now rt_addrinfo * instead of sockaddr *
(almost noone is using it anyways).

benefit: the follwoing command now works. previously we need two route(8)
invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

remove unsafe typecast in rtrequest(), from rtentry * to sockaddr *. it was
introduced by 4.3BSD-reno and never corrected.

XXX is eon_rtrequest() change correct regarding to 3rd arg?
eon_rtrequest() and rtrequest() were incorrect since 4.3BSD-reno,
so i do not have correct answer in the source code.
someone with more clue about netiso-over-ip, please help.
 1.33  05-Nov-2000  onoe First Prototype implementation of network interface part for IEEE1394 (if_fw).

Current status:
Only OHCI chip is supported (fwohci).
ping (IPv4) works with Sony's implementation (SmartConnect) on Win98.
sometimes works but not stable.
Not implemented yet:
IRM (Isochronous Resource Manager) functionality.
Link layer fragmentation.
Topology map.
More to do:
clean ups
MCAP
charactor device part
dhcp

There is no entry in GENERIC config file yet.
Follow sys/dev/ieee1394/IMPLEMENTATION to enable if_fw.
 1.32  15-Oct-2000  itojun suppress warning on nd6_storelladdr failure. the failure could happen
easily when we have routing table with too many entries. sync with kame.
 1.31  06-Jul-2000  itojun - do not use bitfield for router renumbering header.
- add protection mechanism against ND cache corruption due to bad NUD hints.
- more stats
- icmp6 pps limitation. TOOD: should implement ppsratecheck(9).
 1.30  19-May-2000  itojun branches: 1.30.4;
do not mistakingly forward link-local scoped packet (the bug was added
with "beyondscope" icmp6 support).
"options FAKE_LOOPBACK_IF" will honor scope on loopback outputs. rcvif will
be real interface, not the loopback, just like when multicast loopback.

(sync with kame)
 1.29  09-May-2000  itojun do not try NUD unless the gateway is a real neighbor.
real fix to KAME PR 245 (workaround has been implemented).
 1.28  27-Apr-2000  itojun temporary workaround against GIF NUD issue (when you configure globals
onto GIF, NUD prevents packet from going out)
KAME PR 245. From: Andreas Wrede <andreas@planix.com>
 1.27  19-Apr-2000  itojun add boundary check for nd6_ifinfo (otherwise ndp -i can make out-of-bound
accesses).
 1.26  16-Apr-2000  itojun perform neighbor unreachability detection on p2p links (spec requires
it for bidir p2p links).
improve -i in ndp(8) to allow tweaking per-interface ND flag on.
fix ndp(8) infinite loop on certain routing table setup.
 1.25  16-Apr-2000  itojun better sync with latest kame (cosmetic only).
 1.24  13-Apr-2000  itojun add comment on sdl_alen check (sync with kame)
 1.23  13-Apr-2000  itojun bark if sdl_alen == 0. test code for KAME PR 235.
 1.22  13-Apr-2000  itojun even if nd6_nud_hint is called, do not change a neighbor's status
unless the old status is probably reachable (i.e. the link-layer address
has already been resolved).
KAME PR 235.
 1.21  12-Apr-2000  itojun revisit in6_ifattach().
- be persistent on initializing interfaces, even if there's manually-
assigned linklocal, multicast/whatever initialization is necessary.
- do not cache mac addr in the kernel. grab mac addr from existing cards
(this is important when you swap ethernet cards back and forth)
now ppp6 works just fine!

call in6_ifattach() on ATM PVC interface to assign link-local, using
hardware MAC address as seed.

(the change is in sync with kame tree).
 1.20  23-Mar-2000  thorpej New callout mechanism with two major improvements over the old
timeout()/untimeout() API:
- Clients supply callout handle storage, thus eliminating problems of
resource allocation.
- Insertion and removal of callouts is constant time, important as
this facility is used quite a lot in the kernel.

The old timeout()/untimeout() API has been removed from the kernel.
 1.19  28-Feb-2000  itojun remove some of cross-BSD portability #ifdef.
remove xxCTL_VARS, which is BSDI specific.
 1.18  26-Feb-2000  itojun bring in recent KAME changes (only important and stable ones, as usual).
- remove net.inet6.ip6.nd6_proxyall. introduce proxy NDP code works
just like "arp -s".
- revise source address selection.
be more careful about use of yet-to-be-valid addresses as source.
- as router, transmit ICMP6_DST_UNREACH_BEYONDSCOPE against out-of-scope
packet forwarding attempt.
- path MTU discovery takes care of routing header properly.
- be more strict about mbuf chain parsing.
 1.17  06-Feb-2000  itojun fix include pathname for better rfc2292 compliance.
 1.16  04-Feb-2000  itojun avoid calling in6_control(SIOCDIFADDR_IN6) from interrupt context.
it is not supposed to work.
logging fix: add "\n" to some of log() in in6_prefix.c.

improve in6_ifdetach(). now almost all structure depend on ifnet
will be cleared up.
possible loose ends:
- cached route_in6 in static varaiables needs to be cleared as well
- there are ifaddr manipulation without reference counting,
which should be fixed
we still see panics after card removal, though... not sure what is left.

(sync with kame)
 1.15  03-Feb-2000  itojun remove #if 0'ed code
 1.14  01-Feb-2000  thorpej First-draft if_detach() implementation, originally from Bill Studnemund,
although this version has been changed somewhat:
- reference counting on ifaddrs isn't as complete as Bill's original
work was. This is hard to get right, and we should attack one
protocol at a time.
- This doesn't do reference counting or dynamic allocation of ifnets yet.
- This version introduces a new PRU -- PRU_PURGEADDR, which is used to
purge an ifaddr from a protocol. The old method Bill used didn't work
on all protocols, and it only worked on some because it was Very Lucky.

This mostly works ... i.e. works for my USB Ethernet, except for a dangling
ifaddr reference left by the IPv6 code; have not yet tracked this down.
 1.13  06-Jan-2000  itojun remove extra portability #ifdef (like #ifdef __FreeBSD__) in KAME IPv6/IPsec
code, from netbsd-current repository.
#ifdef'ed version is always available from ftp.kame.net.

XXX please do not make too many diff-unfriendly changes, we'll need to take
bunch of diffs on upgrade...
 1.12  13-Dec-1999  itojun sync IPv6 part with latest KAME tree. IPsec part is left unmodified
due to massive changes in KAME side.
- IPv6 output goes through nd6_output
- faith can capture IPv4 packets as well - you can run IPv4-to-IPv6 translator
using heavily modified DNS servers
- per-interface statistics (required for IPv6 MIB)
- interface autoconfig is revisited
- udp input handling has a big change for mapped address support.
- introduce in4_cksum() for non-overwriting checksumming
- introduce m_pulldown()
- neighbor discovery cleanups/improvements
- netinet/in.h strictly conforms to RFC2553 (no extra defs visible to userland)
- IFA_STATS is fixed a bit (not tested)
- and more more more.

TODO:
- cleanup os-independency #ifdef
- avoid rcvif dual use (for IPsec) to help ifdetach

(sorry for jumbo commit, I can't separate this any more...)
 1.11  10-Dec-1999  itojun add missing splx(). a critical bug fix from kame.
 1.10  20-Sep-1999  itojun branches: 1.10.2; 1.10.8;
tiny fix to ARCnet IPv6 support.
- in in6_ifattach_getifid(), we can grab interface id source iff the source
is universally (worldwide) unique. ARCnet hardware address is of 8bit and
does not satisfy the condition.
(in6_ifattach_getifid() is for getting interface id usable for pseudo
interfaces like gif*)
- xx_to_eui64() should return EUI64 format, not IPv6 interface id format.
this may seem awkward so I wish to clean these things up.
- in nd6.c, change if clause into case clause to allow future addition
of IFT_xxx easier.
 1.9  19-Sep-1999  is Zeroth version of IPv6 support for ARCnet. Correct MTU handling still needs
to be done.
 1.8  31-Jul-1999  itojun sync with recent KAME.
- loosen ipsec restriction on packet diredtion.
- revise icmp6 redirect handling on IsRouter bit.
- tcp/udp notification processing (link-local address case)
- cosmetic fixes (better code share across *BSD).
 1.7  30-Jul-1999  itojun remove reference to in6_systm.h (file itself will be removed afterwords)
 1.6  06-Jul-1999  itojun sync with KAME/NetBSD 1.4, SNAP kit 19990705.
key changes are:
- icmp6 redirect fix (dst check)
- revised ip6 multicast check for loopback i/f
- several RCS ID cleanups
 1.5  04-Jul-1999  itojun s/splnet/splsoftnet/ in IPv6/IPsec part.
hope I made no mistake (the kernel works fine but I need a regress test)

Suggested by: thorpej
 1.4  03-Jul-1999  thorpej RCS ID police.
 1.3  02-Jul-1999  itojun expand insque/remque (quick hack). fundamental fix should be done
while clarifying relationship between inpcb and in6pcb.

PR: 7891
 1.2  01-Jul-1999  itojun branches: 1.2.2;
IPv6 kernel code, based on KAME/NetBSD 1.4, SNAP kit 19990628.
(Sorry for a big commit, I can't separate this into several pieces...)
Pls check sys/netinet6/TODO and sys/netinet6/IMPLEMENTATION for details.

- sys/kern: do not assume single mbuf, accept chained mbuf on passing
data from userland to kernel (or other way round).
- "midway" ATM card: ATM PVC pseudo device support, like those done in ALTQ
package (ftp://ftp.csl.sony.co.jp/pub/kjc/).
- sys/netinet/tcp*: IPv4/v6 dual stack tcp support.
- sys/netinet/{ip6,icmp6}.h, sys/net/pfkeyv2.h: IETF document assumes those
file to be there so we patch it up.
- sys/netinet: IPsec additions are here and there.
- sys/netinet6/*: most of IPv6 code sits here.
- sys/netkey: IPsec key management code
- dev/pci/pcidevs: regen

In my understanding no code here is subject to export control so it
should be safe.
 1.1  28-Jun-1999  itojun branches: 1.1.2;
file nd6.c was initially added on branch kame.
 1.1.2.3  30-Nov-1999  itojun bring in latest KAME (as of 19991130, KAME/NetBSD141) into kame branch
just for reference purposes.
This commit includes 1.4 -> 1.4.1 sync for kame branch.

The branch does not compile at all (due to the lack of ALTQ and some other
source code). Please do not try to modify the branch, this is just for
referenre purposes.

synchronization to latest KAME will take place on HEAD branch soon.
 1.1.2.2  06-Jul-1999  itojun KAME/NetBSD 1.4, SNAP kit 1999/07/05.
NOTE: this branch is just for reference purposes (i.e. for taking cvs diff).
do not touch anything on the branch. actual work must be done on HEAD branch.
 1.1.2.1  28-Jun-1999  itojun KAME/NetBSD 1.4 SNAP kit, dated 19990628.

NOTE: this branch (kame) is used just for refernce. this may not compile
due to multiple reasons.
 1.2.2.3  02-Aug-1999  thorpej Update from trunk.
 1.2.2.2  01-Jul-1999  thorpej Sync w/ -current.
 1.2.2.1  01-Jul-1999  thorpej file nd6.c was added on branch chs-ubc2 on 1999-07-01 23:48:29 +0000
 1.10.8.1  27-Dec-1999  wrstuden Pull up to last week's -current.
 1.10.2.7  21-Apr-2001  bouyer Sync with HEAD
 1.10.2.6  27-Mar-2001  bouyer Sync with HEAD.
 1.10.2.5  12-Mar-2001  bouyer Sync with HEAD.
 1.10.2.4  11-Feb-2001  bouyer Sync with HEAD.
 1.10.2.3  18-Jan-2001  bouyer Sync with head (for UBC+NFS fixes, mostly).
 1.10.2.2  22-Nov-2000  bouyer Sync with HEAD.
 1.10.2.1  20-Nov-2000  bouyer Update thorpej_scsipi to -current as of a month ago
 1.30.4.3  09-May-2001  he Pull up revision 1.36 (requested by itojun):
Suppress ND6 logs that are too noisy for normal use. Can be
re-enabled by net.inet6.icmp6.nd6_debug.
 1.30.4.2  26-Feb-2001  he Pull up revision 1.40 (via patch, requested by itojun):
Tighten IPv6 ND6/dest6 option chasing bounds check.
 1.30.4.1  20-Jul-2000  itojun pullup from main trunc (approved by releng-1-5)
- add protection mechanism against ND cache corruption due to bad NUD hints.

this is part of:
sys/netinet/icmp6.h 1.9 -> 1.10
sys/netinet/tcp_input.c 1.111 -> 1.112
sys/netinet6/icmp6.c 1.34 -> 1.35
sys/netinet6/nd6.c 1.30 -> 1.31
sys/netinet6/nd6.h 1.14 -> 1.15
 1.42.2.12  17-Jan-2003  thorpej Sync with HEAD.
 1.42.2.11  18-Oct-2002  nathanw Catch up to -current.
 1.42.2.10  17-Sep-2002  nathanw Catch up to -current.
 1.42.2.9  27-Aug-2002  nathanw Catch up to -current.
 1.42.2.8  20-Jun-2002  nathanw Catch up to -current.
 1.42.2.7  01-Apr-2002  nathanw Catch up to -current.
(CVS: It's not just a program. It's an adventure!)
 1.42.2.6  08-Jan-2002  nathanw Catch up to -current.
 1.42.2.5  14-Nov-2001  nathanw Catch up to -current.
 1.42.2.4  22-Oct-2001  nathanw Catch up to -current.
 1.42.2.3  24-Aug-2001  nathanw Catch up with -current.
 1.42.2.2  21-Jun-2001  nathanw Catch up to -current.
 1.42.2.1  09-Apr-2001  nathanw Catch up with -current.
 1.49.2.5  10-Oct-2002  jdolecek sync kqueue with -current; this includes merge of gehenna-devsw branch,
merge of i386 MP branch, and part of autoconf rototil work
 1.49.2.4  06-Sep-2002  jdolecek sync kqueue branch with HEAD
 1.49.2.3  23-Jun-2002  jdolecek catch up with -current on kqueue branch
 1.49.2.2  10-Jan-2002  thorpej Sync kqueue branch with -current.
 1.49.2.1  03-Aug-2001  lukem update to -current
 1.57.6.1  04-Jun-2002  lukem Pull up revision 1.62 (via manual patch) (requested by itojun in ticket #145):
do not hardcode if_mtu values in here, except for IFT_{ARC,FDDI} -
they need special handling. makes it possible to take advantage of 9k ether
frames.
 1.57.4.3  29-Aug-2002  gehenna catch up with -current.
 1.57.4.2  20-Jun-2002  gehenna catch up with -current.
 1.57.4.1  30-May-2002  gehenna Catch up with -current.
 1.86.2.5  10-Nov-2005  skrll Sync with HEAD. Here we go again...
 1.86.2.4  18-Dec-2004  skrll Sync with HEAD.
 1.86.2.3  21-Sep-2004  skrll Fix the sync with head I botched.
 1.86.2.2  18-Sep-2004  skrll Sync with HEAD.
 1.86.2.1  03-Aug-2004  skrll Sync with HEAD
 1.89.4.1  07-Apr-2005  he Pull up revision 1.92 (requested by tron in ticket #1394):
Make sure that prefixes get purged. Fixes PR#21189,
PR#25968, and PR#37873.
 1.89.2.1  07-Apr-2005  he Pull up revision 1.92 (requested by tron in ticket #1394):
Make sure that prefixes get purged. Fixes PR#21189,
PR#25968, and PR#37873.
 1.91.10.1  07-Apr-2005  jmc Pullup rev 1.92 (requested by tron in ticket #105)

Make sure that prefixes get purged. PR#21189, PR#25968, PR#27873
 1.91.4.1  29-Apr-2005  kent sync with -current
 1.94.2.6  07-Dec-2007  yamt sync with head
 1.94.2.5  15-Nov-2007  yamt sync with head.
 1.94.2.4  03-Sep-2007  yamt sync with head.
 1.94.2.3  26-Feb-2007  yamt sync with head.
 1.94.2.2  30-Dec-2006  yamt sync with head.
 1.94.2.1  21-Jun-2006  yamt sync with head.
 1.95.2.1  01-Feb-2006  yamt sync with head.
 1.96.4.3  01-Jun-2006  kardel Sync with head.
 1.96.4.2  22-Apr-2006  simonb Sync with head.
 1.96.4.1  04-Feb-2006  simonb Adapt for timecounters: mostly use get*time(), use bintime's for timeout
calculations and use "time_second" instead of "time.tv_sec".
 1.96.2.1  09-Sep-2006  rpaulo sync with head
 1.98.2.5  03-Sep-2006  yamt sync with head.
 1.98.2.4  26-Jun-2006  yamt sync with head.
 1.98.2.3  24-May-2006  yamt sync with head.
 1.98.2.2  01-Apr-2006  yamt sync with head.
 1.98.2.1  13-Mar-2006  yamt sync with head.
 1.99.4.2  24-May-2006  tron Merge 2006-05-24 NetBSD-current into the "peter-altq" branch.
 1.99.4.1  28-Mar-2006  tron Merge 2006-03-28 NetBSD-current into the "peter-altq" branch.
 1.99.2.1  19-Apr-2006  elad sync with head.
 1.102.2.1  19-Jun-2006  chap Sync with head.
 1.104.4.2  10-Dec-2006  yamt sync with head.
 1.104.4.1  22-Oct-2006  yamt sync with head
 1.104.2.2  12-Jan-2007  ad Sync with head.
 1.104.2.1  18-Nov-2006  ad Sync with head.
 1.109.4.5  17-May-2007  yamt sync with head.
 1.109.4.4  07-May-2007  yamt sync with head.
 1.109.4.3  24-Mar-2007  yamt sync with head.
 1.109.4.2  12-Mar-2007  rmind Sync with HEAD.
 1.109.4.1  27-Feb-2007  yamt - sync with head.
- move sched_changepri back to kern_synch.c as it doesn't know PPQ anymore.
 1.111.6.1  18-Mar-2007  reinoud First attempt to bring branch in sync with HEAD
 1.111.4.1  11-Jul-2007  mjf Sync with head.
 1.111.2.5  09-Oct-2007  ad Sync with head.
 1.111.2.4  20-Aug-2007  ad Sync with HEAD.
 1.111.2.3  01-Jul-2007  ad Adapt to callout API change.
 1.111.2.2  08-Jun-2007  ad Sync with head.
 1.111.2.1  10-Apr-2007  ad Sync with head.
 1.116.2.2  03-Sep-2007  skrll Sync with HEAD.
 1.116.2.1  15-Aug-2007  skrll Sync with HEAD.
 1.117.4.5  09-Dec-2007  jmcneill Sync with HEAD.
 1.117.4.4  11-Nov-2007  joerg Sync with HEAD.
 1.117.4.3  04-Nov-2007  jmcneill Sync with HEAD.
 1.117.4.2  03-Sep-2007  jmcneill Sync with HEAD.
 1.117.4.1  09-Aug-2007  jmcneill Sync with HEAD.
 1.118.4.2  07-Aug-2007  dyoung Avoid writing past the end of the buffer [lldst, lldst + dstsize)
in nd6_storelladdr().

Use sockaddr_dl_setaddr(). Constify some sockaddr_dl's. Constify
a sockaddr argument to nd6_na_output(). Change SDL() to "standard"
satocsdl() or satosdl(). Change SIN6() to satocsin6() or satosin6().

bcmp -> memcmp, bcopy -> memcpy.
 1.118.4.1  07-Aug-2007  dyoung file nd6.c was added on branch matt-mips64 on 2007-08-07 04:35:43 +0000
 1.118.2.2  09-Jan-2008  matt sync with HEAD
 1.118.2.1  06-Nov-2007  matt sync with HEAD
 1.120.4.1  13-Nov-2007  bouyer Sync with HEAD
 1.121.2.2  08-Dec-2007  mjf Sync with HEAD.
 1.121.2.1  19-Nov-2007  mjf Sync with HEAD.
 1.122.2.1  08-Dec-2007  ad Sync with head.
 1.123.12.2  17-Jan-2009  mjf Sync with HEAD.
 1.123.12.1  02-Jun-2008  mjf Sync with HEAD.
 1.123.8.1  22-Feb-2008  keiichi imported Mobile IPv6 code developed by the SHISA project
(http://www.mobileip.jp/).
 1.125.2.1  18-May-2008  yamt sync with head.
 1.126.4.1  23-Jun-2008  wrstuden Sync w/ -current. 34 merge conflicts to follow.
 1.126.2.6  11-Aug-2010  yamt sync with head.
 1.126.2.5  11-Mar-2010  yamt sync with head
 1.126.2.4  16-Sep-2009  yamt sync with head
 1.126.2.3  19-Aug-2009  yamt sync with head.
 1.126.2.2  04-May-2009  yamt sync with head.
 1.126.2.1  16-May-2008  yamt sync with head.
 1.128.4.1  13-Dec-2008  haad Update haad-dm branch to haad-dm-base2.
 1.130.14.1  17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #1892):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.130.10.1  17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #1892):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.130.4.1  17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #1892):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.130.2.1  19-Jan-2009  skrll Sync with HEAD.
 1.135.4.1  05-Mar-2011  rmind sync with head
 1.135.2.1  17-Aug-2010  uebayasi Sync with HEAD.
 1.136.8.3  22-May-2014  yamt sync with head.

for a reference, the tree before this commit was tagged
as yamt-pagecache-tag8.

this commit was splitted into small chunks to avoid
a limitation of cvs. ("Protocol error: too many arguments")
 1.136.8.2  30-Oct-2012  yamt sync with head
 1.136.8.1  17-Apr-2012  yamt sync with head
 1.138.2.2  05-Apr-2012  mrg sync to latest -current.
 1.138.2.1  18-Feb-2012  mrg merge to -current.
 1.141.8.3  18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.141.8.2  17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #998):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.141.8.1  08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.141.6.3  18-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.141.6.2  17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #998):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.141.6.1  08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.141.2.3  03-Jun-2014  msaitoh Pull up following revision(s) (requested by bouyer in ticket #1067):
sys/dist/ipf/netinet/ip_fil_netbsd.c 1.9 via patch
sys/net/if_ethersubr.c 1.197 via patch
sys/net/if_loop.c 1.77 via patch
sys/net/if_vlan.c 1.70 via patch
sys/netinet/if_arp.c 1.158
sys/netinet/ip_carp.c 1.54 via patch
sys/netinet6/ip6_flow.c 1.23 via patch
sys/netinet6/nd6.c 1.150 via patch
sys/rump/librump/rumpkern/klock.c 1.4

Make sure *(if_output)() is called with KERNEL_LOCK held to avoid mbuf leak.
See http://mail-index.netbsd.org/tech-net/2014/04/09/msg004511.html
for details. For netinet6, the problem report, fix and test were done
by njoly@ on current-users@
 1.141.2.2  17-Dec-2013  bouyer Pull up following revision(s) (requested by martin in ticket #998):
usr.sbin/ndp/ndp.c: revision 1.42
sys/netinet6/nd6.c: revision 1.146
Instead of voodo casts use simple byte pointer arithmetic and memcpy to
create the "packed" binary format we pass out to userland when querying
the router/prefix list.
Simplify code to print the router/prefix list: use memcpy and local structs
properly aligned on the stack to decode the binary format passed by the
kernel - instead of (bogusly) assuming the format will obey all local
alignement requirements.
 1.141.2.1  08-Jul-2013  jdc Pull up revisions:
src/share/man/man7/sysctl.7 revision 1.73 via patch
src/sys/netinet6/icmp6.c revision 1.161 via patch
src/sys/netinet6/in6.c revision 1.161 via patch
src/sys/netinet6/in6_proto.c revision 1.97 via patch
src/sys/netinet6/in6_var.h revision 1.65 via patch
src/sys/netinet6/ip6_input.c revision 1.139 via patch
src/sys/netinet6/ip6_var.h revision 1.59 via patch
src/sys/netinet6/nd6.c revision 1.143 via patch
src/sys/netinet6/nd6.h revision 1.57 via patch
src/sys/netinet6/nd6_rtr.c revision 1.83 via patch
(requested by christos in ticket #905).
Patch by Loganaden Velvindron.

4 new sysctls to avoid ipv6 DoS attacks from OpenBSD
 1.143.2.4  03-Dec-2017  jdolecek update from HEAD
 1.143.2.3  20-Aug-2014  tls Rebase to HEAD as of a few days ago.
 1.143.2.2  23-Jun-2013  tls resync from head
 1.143.2.1  25-Feb-2013  tls resync with head
 1.145.2.2  18-May-2014  rmind sync with head
 1.145.2.1  17-Jul-2013  rmind Checkpoint work in progress:
- Move PCB structures under __INPCB_PRIVATE, adjust most of the callers
and thus make IPv4 PCB structures mostly opaque. Any volunteers for
merging in6pcb with inpcb (see rpaulo-netinet-merge-pcb branch)?
- Move various global vars to the modules where they belong, make them static.
- Some preliminary work for IPv4 PCB locking scheme.
- Make raw IP code mostly MP-safe. Simplify some of it.
- Rework "fast" IP forwarding (ipflow) code to be mostly MP-safe. It should
run from a software interrupt, rather than hard.
- Rework tun(4) pseudo interface to be MP-safe.
- Work towards making some other interfaces more strict.
 1.148.2.1  10-Aug-2014  tls Rebase.
 1.152.2.4  12-May-2017  snj Pull up following revision(s) (requested by skrll/ozaki-r in ticket #1402):
sys/net/route.c: revision 1.170 via patch
sys/netinet/ip_flow.c: revision 1.73 via patch
sys/netinet6/ip6_flow.c: revision 1.28 via patch
sys/netinet6/nd6.c: revision 1.203 via patch
Run timers in workqueue
Timers (such as nd6_timer) typically free/destroy some data in callout
(softint). If we apply psz/psref for such data, we cannot do free/destroy
process in there because synchronization of psz/psref cannot be used in
softint. So run timer callbacks in workqueue works (normal LWP context).
Doing workqueue_enqueue a work twice (i.e., call workqueue_enqueue before
a previous task is scheduled) isn't allowed. For nd6_timer and
rt_timer_timer, this doesn't happen because callout_reset is called only
from workqueue's work. OTOH, ip{,6}flow_slowtimo's callout can be called
before its work starts and completes because the callout is periodically
called regardless of completion of the work. To avoid such a situation,
add a flag for each protocol; the flag is set true when a work is
enqueued and set false after the work finished. workqueue_enqueue is
called only if the flag is false.
Proposed on tech-net and tech-kern.
 1.152.2.3  06-Apr-2015  snj Pull up following revision(s) (requested by martin in ticket #655):
sys/netinet6/in6.c: revision 1.182 via patch
sys/netinet6/in6_ifattach.c: revision 1.95 via patch
sys/netinet6/nd6.c: revision 1.158 via patch
sys/netinet6/nd6.h: revision 1.62 via patch
sys/netinet6/nd6_nbr.c: revision 1.104 via patch
sys/netinet6/nd6_rtr.c: revision 1.96 via patch
Rearange interface detachement slightly: before we free the INET6 specific
per-interface data, make sure to call nd6_purge() with it to remove
routing entries pointing to the going interface.
When we should happen to call this function again later, with the data
already gone, just return.
Fixes PR kern/49682, ok: christos.
 1.152.2.2  17-Dec-2014  martin Pull up following revision(s) (requested by roy in ticket #332):
sys/netinet6/nd6_nbr.c: revision 1.103
sys/netinet6/nd6_rtr.c: revision 1.95
sys/netinet6/nd6.h: revision 1.61
sys/netinet6/nd6.c: revision 1.156
Report route additions/changes/deletions for cached neighbours to userland.
 1.152.2.1  27-Oct-2014  martin Pull up following revision(s) (requested by roy in ticket #159):
sys/netinet6/nd6.c: revision 1.153
Tests for neighbour now work correctly on bridge(4) and carp(4) interfaces.
 1.154.2.12  28-Aug-2017  skrll Sync with HEAD
 1.154.2.11  05-Feb-2017  skrll Sync with HEAD
 1.154.2.10  05-Dec-2016  skrll Sync with HEAD
 1.154.2.9  05-Oct-2016  skrll Sync with HEAD
 1.154.2.8  09-Jul-2016  skrll Sync with HEAD
 1.154.2.7  29-May-2016  skrll Sync with HEAD
 1.154.2.6  22-Apr-2016  skrll Sync with HEAD
 1.154.2.5  19-Mar-2016  skrll Sync with HEAD
 1.154.2.4  27-Dec-2015  skrll Sync with HEAD (as of 26th Dec)
 1.154.2.3  22-Sep-2015  skrll Sync with HEAD
 1.154.2.2  06-Jun-2015  skrll Sync with HEAD
 1.154.2.1  06-Apr-2015  skrll Sync with HEAD
 1.203.2.5  20-Mar-2017  pgoyette Sync with HEAD
 1.203.2.4  07-Jan-2017  pgoyette Sync with HEAD. (Note that most of these changes are simply $NetBSD$
tag issues.)
 1.203.2.3  04-Nov-2016  pgoyette Sync with HEAD
 1.203.2.2  06-Aug-2016  pgoyette Sync with HEAD
 1.203.2.1  26-Jul-2016  pgoyette Sync with HEAD
 1.224.2.1  21-Apr-2017  bouyer Sync with HEAD
 1.232.2.14  20-Aug-2021  martin Pull up following revision(s) (requested by ozaki-r in ticket #1692):

sys/netinet6/nd6.c: revision 1.277

nd6: prevent ln from being freed while releasing held packets
 1.232.2.13  30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1396):

sys/netinet6/nd6.h: revision 1.88
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.288 (patch)

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.
 1.232.2.12  19-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1340):

sys/netinet6/nd6.c: revision 1.257

Add missing IFNET_LOCK for regen_tmpaddr
Reported by ryo@
 1.232.2.11  26-Jul-2019  martin Pull up following revision(s) (requested by christos in ticket #1307):

sys/netinet6/nd6.c: revision 1.256

Decrease the reference count before freeing, so that the entries actually
get free'd. (Ryota Ozaki)
 1.232.2.10  08-Jul-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #1285):

sys/netinet6/nd6.c: revision 1.255
tests/net/ndp/t_ndp.sh: revision 1.32

nd6: restore a missing reachability confirmation

On sending a packet over a STALE cache, the cache should be tried a reachability
confirmation, which is described in RFC 2461/4861 7.3.3. On the fast path in
nd6_resolve, however, the treatment for STALE caches has been skipped
accidentally. So STALE caches never be back to the REACHABLE state.

To fix the issue, branch to the fast path only when the cache entry is the
REACHABLE state and leave other caches to the slow path that includes the
treatment. To this end we need to allow to return a link-layer address if a
valid address is available on the slow path too, which is the same behavior as
FreeBSD and OpenBSD.

tests: test state transitions of neighbor caches
 1.232.2.9  06-Nov-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #1080):

sys/netinet6/nd6.c: revision 1.251
sys/netinet/if_arp.c: revision 1.276
sys/net/if.c: revision 1.438
sys/net/if.c: revision 1.439
sys/net/route.c: revision 1.214
sys/net/route.c: revision 1.215
sys/net/route.c: revision 1.216
sys/netinet6/in6.c: revision 1.270
sys/net/route.h: revision 1.120
sys/net/if.c: revision 1.440

Remove a wrong assertion in ifaref

-

Doing ifref on an ifa with IFA_DESTROYING is not a problem; the reference should
be dropped during the destruction of the ifa.

-

Use atomic operations for ifa_refcnt

-

Avoid a dangling pointer during rt_replace_ifa

-

Avoid double rt_replace_ifa on rtrequest1(RTM_ADD)

Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.

-

Use rt_update framework on updating a rtentry
 1.232.2.8  07-Jun-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #842):

sys/netinet6/mld6.c: revision 1.93-1.99
sys/netinet6/in6_var.h: revision 1.99,1.100
sys/netinet6/in6.c: revision 1.267,1.268
sys/netinet6/nd6.c: revision 1.249

Don't hold softnet_lock in mld_timeo
Then we can get rid of remaining abuses of mutex_owned(softnet_lock).

Release in6_multilock on callout_halt of mld_timeo to avoid a deadlock
Improve atomicity of in6_leavegroup and in6_delmulti

Avoid NULL pointer dereference on imm->i6mm_maddr

Make a refcount decrement and a removal from a list of an item atomic
in6m_refcount of an in6m can be incremented if the in6m is on the list
(if_multiaddrs) in in6_addmulti or mld_input. So we must avoid such an
increment when we try to destroy an in6m. To this end we must make
an in6m_refcount decrement and a removal of an in6m from if_multiaddrs
atomic.

Make a deletion of in6m in nd6_rtrequest atomic

Move LIST_REMOVE
mld_stoptimer releases in6_multilock temporarily, so we must LIST_REMOVE first.

Avoid double LIST_REMOVE which corrupts lists
Mark in6m as used for non-DIAGNOSTIC builds.
 1.232.2.7  13-Mar-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #622):
sys/netinet/if_arp.c: revision 1.270
sys/net/if_llatbl.c: revision 1.24 (patch)
sys/net/if_llatbl.c: revision 1.25
sys/net/if_llatbl.c: revision 1.26
sys/net/route.c: revision 1.204
sys/netinet6/in6.c: revision 1.261
sys/netinet6/in6.c: revision 1.262 (patch)
sys/netinet6/in6.c: revision 1.263
sys/netinet/in.c: revision 1.216
sys/netinet6/in6.c: revision 1.264
sys/netinet6/nd6.c: revision 1.246 (patch)
sys/netinet/if_arp.c: revision 1.269
sys/net/if_llatbl.h: revision 1.14
sys/netinet6/in6.c: revision 1.259
sys/netinet/in.c: revision 1.220
sys/netinet/in.c: revision 1.221 (patch)
sys/netinet/in.c: revision 1.222
sys/netinet/in.c: revision 1.223

Suppress noisy debugging outputs
Even if DEBUG they are too noisy under load.

Tweak sanity checks

Scheduling a timer of static entries is wrong.

Add assertions

We must not destroy llentries holding mbufs.

Fix reference leaks of llentry
callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).
While here, we can remove remaining abuses of mutex_owned for softnet_lock.

Fix memory leaks on arp -d and ndp -d for static entries
We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.

Use pool(9) for llentry allocations
llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.

Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
 1.232.2.6  05-Feb-2018  martin Pull up following revision(s) (requested by ozaki-r in ticket #528):
sys/net/agr/if_agr.c: revision 1.42
sys/netinet6/nd6_rtr.c: revision 1.137
sys/netinet6/nd6_rtr.c: revision 1.138
sys/net/agr/if_agr.c: revision 1.46
sys/net/route.c: revision 1.206
sys/net/if.c: revision 1.419
sys/net/agr/if_agrether.c: revision 1.10
sys/netinet6/nd6.c: revision 1.241
sys/netinet6/nd6.c: revision 1.242
sys/netinet6/nd6.c: revision 1.243
sys/netinet6/nd6.c: revision 1.244
sys/netinet6/nd6.c: revision 1.245
sys/netipsec/ipsec_input.c: revision 1.52
sys/netipsec/ipsec_input.c: revision 1.53
sys/net/agr/if_agrsubr.h: revision 1.5
sys/kern/subr_workqueue.c: revision 1.35
sys/netipsec/ipsec.c: revision 1.124
sys/net/agr/if_agrsubr.c: revision 1.11
sys/net/agr/if_agrsubr.c: revision 1.12
Simplify; share agr_vlan_add and agr_vlan_del (NFCI)
Fix late NULL-checking (CID 1427782: Null pointer dereferences (REVERSE_INULL))
KNF: replace soft tabs with hard tabs
Add missing NULL-checking for m_pullup (CID 1427770: Null pointer dereferences (NULL_RETURNS))
Add locking.
Revert "Get rid of unnecessary splsoftnet" (v1.133)
It's not always true that softnet_lock is held these places.
See PR kern/52947.
Get rid of unnecessary splsoftnet (redo)
Unless NET_MPSAFE, splsoftnet is still needed for rt_* functions.
Use existing fill_[pd]rlist() functions to calculate size of buffer to
allocate, rather than relying on an arbitrary length passed in from
userland.
Allow copyout() of partial results if the user buffer is too small, to
be consistent with the way sysctl(3) is documented.
Garbage-collect now-unused third parrameter in the fill_[pd]rlist()
functions.
As discussed on IRC.
OK kamil@ and christos@
XXX Needs pull-up to netbsd-8 branch.
Simplify, from christos@
More simplification, this time from ozaki-r@
No need to break after return.
One more from christos@
No need to initialize fill_func
more cleanup (don't allow oldlenp == NULL)
Destroy ifq_lock at the end of if_detach
It still can be used in if_detach.
Prevent rt_free_global.wk from being enqueued to workqueue doubly
Check if a queued work is tried to be enqueued again, which is not allowed
 1.232.2.5  02-Jan-2018  snj Pull up following revision(s) (requested by ozaki-r in ticket #456):
sys/arch/arm/sunxi/sunxi_emac.c: 1.9
sys/dev/ic/dwc_gmac.c: 1.43-1.44
sys/dev/pci/if_iwm.c: 1.75
sys/dev/pci/if_wm.c: 1.543
sys/dev/pci/ixgbe/ixgbe.c: 1.112
sys/dev/pci/ixgbe/ixv.c: 1.74
sys/kern/sys_socket.c: 1.75
sys/net/agr/if_agr.c: 1.43
sys/net/bpf.c: 1.219
sys/net/if.c: 1.397, 1.399, 1.401-1.403, 1.406-1.410, 1.412-1.416
sys/net/if.h: 1.242-1.247, 1.250, 1.252-1.257
sys/net/if_bridge.c: 1.140 via patch, 1.142-1.146
sys/net/if_etherip.c: 1.40
sys/net/if_ethersubr.c: 1.243, 1.246
sys/net/if_faith.c: 1.57
sys/net/if_gif.c: 1.132
sys/net/if_l2tp.c: 1.15, 1.17
sys/net/if_loop.c: 1.98-1.101
sys/net/if_media.c: 1.35
sys/net/if_pppoe.c: 1.131-1.132
sys/net/if_spppsubr.c: 1.176-1.177
sys/net/if_tun.c: 1.142
sys/net/if_vlan.c: 1.107, 1.109, 1.114-1.121
sys/net/npf/npf_ifaddr.c: 1.3
sys/net/npf/npf_os.c: 1.8-1.9
sys/net/rtsock.c: 1.230
sys/netcan/if_canloop.c: 1.3-1.5
sys/netinet/if_arp.c: 1.255
sys/netinet/igmp.c: 1.65
sys/netinet/in.c: 1.210-1.211
sys/netinet/in_pcb.c: 1.180
sys/netinet/ip_carp.c: 1.92, 1.94
sys/netinet/ip_flow.c: 1.81
sys/netinet/ip_input.c: 1.362
sys/netinet/ip_mroute.c: 1.147
sys/netinet/ip_output.c: 1.283, 1.285, 1.287
sys/netinet6/frag6.c: 1.61
sys/netinet6/in6.c: 1.251, 1.255
sys/netinet6/in6_pcb.c: 1.162
sys/netinet6/ip6_flow.c: 1.35
sys/netinet6/ip6_input.c: 1.183
sys/netinet6/ip6_output.c: 1.196
sys/netinet6/mld6.c: 1.90
sys/netinet6/nd6.c: 1.239-1.240
sys/netinet6/nd6_nbr.c: 1.139
sys/netinet6/nd6_rtr.c: 1.136
sys/netipsec/ipsec_output.c: 1.65
sys/rump/net/lib/libnetinet/netinet_component.c: 1.9-1.10
kmem_intr_free kmem_intr_[z]alloced memory
the underlying pools are the same but api-wise those should match
Unify IFEF_*_MPSAFE into IFEF_MPSAFE
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Provide macros for softnet_lock and KERNEL_LOCK hiding NET_MPSAFE switch
It reduces C&P codes such as "#ifndef NET_MPSAFE KERNEL_LOCK(1, NULL); ..."
scattered all over the source code and makes it easy to identify remaining
KERNEL_LOCK and/or softnet_lock that are held even if NET_MPSAFE.
No functional change
Hold KERNEL_LOCK on if_ioctl selectively based on IFEF_MPSAFE
If IFEF_MPSAFE is set, hold the lock and otherwise don't hold.
This change requires additions of KERNEL_LOCK to subsequence functions from
if_ioctl such as ifmedia_ioctl and ifioctl_common to protect non-MP-safe
components.
Proposed on tech-kern@ and tech-net@
Ensure to hold if_ioctl_lock when calling if_flags_set
Fix locking against myself on ifpromisc
vlan_unconfig_locked could be called with holding if_ioctl_lock.
Ensure to not turn on IFF_RUNNING of an interface until its initialization completes
And ensure to turn off it before destruction as per IFF_RUNNING's description
"resource allocated". (The description is a bit doubtful though, I believe the
change is still proper.)
Ensure to hold if_ioctl_lock on if_up and if_down
One exception for if_down is if_detach; in the case the lock isn't needed
because it's guaranteed that no other one can access ifp at that point.
Make if_link_queue MP-safe if IFEF_MPSAFE
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).
Use IFADDR_WRITER_FOREACH instead of IFADDR_READER_FOREACH
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
Ensure to call if_addr_init with holding if_ioctl_lock
Get rid of outdated comments
Fix build of kernels without ether
By throwing out if_enable_vlan_mtu and if_disable_vlan_mtu that
created a unnecessary dependency from if.c to if_ethersubr.c.
PR kern/52790
Rename IFNET_LOCK to IFNET_GLOBAL_LOCK
IFNET_LOCK will be used in another lock, if_ioctl_lock (might be renamed then).
Wrap if_ioctl_lock with IFNET_* macros (NFC)
Also if_ioctl_lock perhaps needs to be renamed to something because it's now
not just for ioctl...
Reorder some destruction routines in if_detach
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Ensure to call if_mcast_op with holding IFNET_LOCK
Note that CARP doesn't deal with IFNET_LOCK yet.
Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
Describe which lock is used to protect each member variable of struct ifnet
Requested by skrll@
Write a guideline for converting an interface to IFEF_MPSAFE
Requested by skrll@
Note that IFNET_LOCK must not be held in softint
Don't set IFEF_MPSAFE unless NET_MPSAFE at this point
Because recent investigations show that interfaces with IFEF_MPSAFE need to
follow additional restrictions to work with the flag safely. We should enable it
on an interface by default only if the interface surely satisfies the
restrictions, which are described in if.h.
Note that enabling IFEF_MPSAFE solely gains a few benefit on performance because
the network stack is still serialized by the big kernel locks by default.
 1.232.2.4  17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #354):
sys/netinet6/in6_ifattach.c: revision 1.113
sys/netinet6/nd6.c: revision 1.238
Use psref instead of pserialize because that code is sleepable
--
Use psref instead of pserialize because that code is sleepable
 1.232.2.3  17-Nov-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #353):
sys/net/if_llatbl.c: 1.22
sys/net/if_llatbl.h: 1.13
sys/netinet/if_arp.c: 1.254
sys/netinet/in.c: 1.208-1.209
sys/netinet6/in6.c: 1.249-1.250
sys/netinet6/nd6.c: 1.237
Remove redundant KASSERTMSG
The function is static, has just one caller and the caller does the same check.
--
Fix a deadlock between a route update and lltable
It happens because rtalloc1 is called from lltable with holding
IF_AFDATA_WLOCK.
If a route update is in action, rtalloc1 would wait for its completion with
holding IF_AFDATA_WLOCK. At the same moment, a softint (e.g., arpintr) may try
to take IF_AFDATA_WLOCK and get stuck on it. Unfortunately the stuck softint
prevents the route update from progressing because the route update calls
psref_target_destroy that needs the softint to complete.
A resource allocation graph of the senario looks like this:
route update =(psref_target_destroy)=> softint => IF_AFDATA_WLOCK
=(rt_update_wait)=> route update
Fix the deadlock by pulling rtalloc1 out of the lltable codes inside
IF_AFDATA_WLOCK.
Note that the deadlock happens only if NET_MPSAFE is enabled.
 1.232.2.2  24-Oct-2017  snj Pull up following revision(s) (requested by ozaki-r in ticket #307):
sys/netinet6/nd6.c: revision 1.236
Add missing NULL check
PR kern/52554
 1.232.2.1  07-Jul-2017  martin Pull up following revision(s) (requested by ozaki-r in ticket #107):
usr.sbin/arp/arp.c: revision 1.56
sys/net/rtsock.c: revision 1.218
sys/net/if_llatbl.c: revision 1.20
usr.sbin/arp/arp.c: revision 1.57
sys/net/rtsock.c: revision 1.219
sys/net/if_llatbl.c: revision 1.21
usr.sbin/arp/arp.c: revision 1.58
tests/net/net_common.sh: revision 1.19
sys/netinet6/nd6.h: revision 1.84
sys/netinet6/nd6.h: revision 1.85
tests/net/arp/t_arp.sh: revision 1.23
sys/netinet6/in6.c: revision 1.246
tests/net/arp/t_arp.sh: revision 1.24
sys/netinet6/in6.c: revision 1.247
tests/net/arp/t_arp.sh: revision 1.25
sys/netinet6/in6.c: revision 1.248
tests/net/arp/t_arp.sh: revision 1.26
usr.sbin/ndp/ndp.c: revision 1.49
tests/net/arp/t_arp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.20
tests/net/arp/t_arp.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.21
tests/net/arp/t_arp.sh: revision 1.29
tests/net/ndp/t_ndp.sh: revision 1.22
tests/net/ndp/t_ndp.sh: revision 1.23
tests/net/route/t_flags6.sh: revision 1.13
tests/net/ndp/t_ndp.sh: revision 1.24
tests/net/route/t_flags6.sh: revision 1.14
tests/net/ndp/t_ndp.sh: revision 1.25
tests/net/route/t_flags6.sh: revision 1.15
tests/net/ndp/t_ndp.sh: revision 1.26
sbin/route/rtutil.c: revision 1.9
tests/net/ndp/t_ndp.sh: revision 1.27
tests/net/ndp/t_ndp.sh: revision 1.28
tests/net/net/t_ipv6address.sh: revision 1.14
tests/net/ndp/t_ra.sh: revision 1.28
tests/net/ndp/t_ndp.sh: revision 1.29
sys/net/route.h: revision 1.113
tests/net/ndp/t_ra.sh: revision 1.29
sys/net/rtsock.c: revision 1.220
sys/net/rtsock.c: revision 1.221
sys/net/rtsock.c: revision 1.222
sys/net/rtsock.c: revision 1.223
tests/net/route/t_route.sh: revision 1.13
sys/net/rtsock.c: revision 1.224
sys/net/route.c: revision 1.196
sys/net/if_llatbl.c: revision 1.19
sys/net/route.c: revision 1.197
sbin/route/route.c: revision 1.156
tests/net/route/t_flags.sh: revision 1.16
tests/net/route/t_flags.sh: revision 1.17
usr.sbin/ndp/ndp.c: revision 1.50
tests/net/route/t_flags.sh: revision 1.18
sys/netinet/in.c: revision 1.204
tests/net/route/t_flags.sh: revision 1.19
sys/netinet/in.c: revision 1.205
tests/net/arp/t_arp.sh: revision 1.30
tests/net/arp/t_arp.sh: revision 1.31
sys/net/if_llatbl.h: revision 1.11
tests/net/arp/t_arp.sh: revision 1.32
sys/net/if_llatbl.h: revision 1.12
tests/net/arp/t_arp.sh: revision 1.33
sys/netinet6/nd6.c: revision 1.233
sys/netinet6/nd6.c: revision 1.234
sys/netinet/if_arp.c: revision 1.251
sys/netinet6/nd6.c: revision 1.235
sys/netinet/if_arp.c: revision 1.252
sbin/route/route.8: revision 1.57
sys/net/rtsock.c: revision 1.214
sys/net/rtsock.c: revision 1.215
sys/net/rtsock.c: revision 1.216
sys/net/rtsock.c: revision 1.217
whitespace police
Simplify
We can assume that rt_ifp is always non-NULL.
Sending a routing message (RTM_ADD) on adding an llentry
A message used to be sent on adding a cloned route. Restore the
behavior for backward compatibility.
Requested by ryo@
Drop RTF_CONNECTED from a result of RTM_GET for ARP/NDP entries
ARP/NDP entries aren't connected routes.
Reported by ryo@
Support -c <count> option for route monitor
route command exits if it receives <count> routing messages where
<count> is a value specified by -c.
The option is useful to get only particular message(s) in a test script.
Test routing messages emitted on operations of ARP/NDP entries
Do netstat -a for an appropriate protocol
Add missing declarations for cleanup
Set net.inet.arp.keep only if it's required
Don't create a permanent L2 cache entry on adding an address to an interface
It was created to copy FreeBSD, however actually the cache isn't
necessary. Remove it to simplify the code and reduce the cost to
maintain it (e.g., keep a consistency with a corresponding local
route).
Fix typo
Fix in_lltable_match_prefix
The function has not been used but will be used soon.
Remove unused function (nd6_rem_ifa_lle)
Allow in6_lltable_free_entry to be called without holding the afdata lock of ifp as well as in_lltable_free_entry
This behavior is a bit odd and should be fixed in the future...
Purge ARP/NDP entries on an interface when the interface is down
Fix PR kern/51179
Purge all related L2 caches on removing a route
The change addresses situations similar to PR 51179.
Purge L2 caches on changing an interface of a route
The change addresses situations similar to PR 51179.
Test implicit removals of ARP/NDP entries
One test case reproudces PR 51179.
Fix build of kernels without both INET and INET6
Tweak lltable_sysctl_dumparp
- Rename lltable_sysctl_dumparp to lltable_sysctl_dump
because it's not only for ARP
- Enable it not only for INET but also for INET6
Fix usage of routing messages on arp -d and ndp -d
It didn't work as we expected; we should set RTA_GATEWAY not
RTA_IFP on RTM_GET to return an if_index and the kernel should
use it on RTM_DELETE.
Improve backward compatibility of (fake) routing messages on adding an ARP/NDP entry
A message originally included only DST and GATEWAY. Restore it.
Fix ifdef; care about a case w/ INET6 and w/o INET
Drop RTF_UP from a routing message of a deleted ARP/NDP entry
Check existence of ARP/NDP entries
Checking ARP/NDP entries is valid rather than checking routes.
Fix wrong comment
Drop RTF_LLINFO flag (now it's RTF_LLDATA) from local routes
They don't have llinfo anymore. And also the change fixes unexpected
behavior of ARP proxy.
Restore ARP/NDP entries to route show and netstat -r
Requested by dyoung@ some time ago
Enable to remove multiple ARP/NDP entries for one destination
The kernel can have multiple ARP/NDP entries which have an indentical
destination on different interfaces. This is normal and can be
reproduce easily by ping -I or ping6 -S. We should be able to remove
such entries.
arp -d <ip> and ndp -d <ip> are changed to fetch all ARP/NDP entries
and remove matched entries. So we can remove multiple entries
described above. This fetch all and selective removal behavior is
the same as arp <ip> and ndp <ip>; they also do fetch all entries
and show only matched entries.
Related to PR 51179
Check if ARP/NDP entries are purged when a related route is deleted
 1.245.2.6  26-Dec-2018  pgoyette Sync with HEAD, resolve a few conflicts
 1.245.2.5  26-Nov-2018  pgoyette Sync with HEAD, resolve a couple of conflicts
 1.245.2.4  06-Sep-2018  pgoyette Sync with HEAD

Resolve a couple of conflicts (result of the uimin/uimax changes)
 1.245.2.3  25-Jun-2018  pgoyette Sync with HEAD
 1.245.2.2  02-May-2018  pgoyette Synch with HEAD
 1.245.2.1  15-Mar-2018  pgoyette Synch with HEAD
 1.249.2.3  21-Apr-2020  martin Sync with HEAD
 1.249.2.2  13-Apr-2020  martin Mostly merge changes from HEAD upto 20200411
 1.249.2.1  10-Jun-2019  christos Sync with HEAD
 1.256.2.9  08-Aug-2022  martin Apply patch, requested by kim in ticket #1497:

sys/netinet6/nd6.c (apply patch)

PR 55680: avoid duplicate free of link layer entries (code in HEAD is
different)
 1.256.2.8  20-Aug-2021  martin Pull up following revision(s) (requested by ozaki-r in ticket #1338):

sys/netinet6/nd6.c: revision 1.277

nd6: prevent ln from being freed while releasing held packets
 1.256.2.7  30-Sep-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #269):

sys/netinet6/nd6.h: revision 1.88
sys/net/rtsock_shared.c: revision 1.10
sys/netinet6/nd6_nbr.c: revision 1.174
sys/netinet6/nd6.c: revision 1.264
sys/netinet/if_arp.c: revision 1.283
sys/netinet/if_arp.c: revision 1.288

Initialize DAD components properly

The original code initialized each component in non-init functions such as
arp_dad_start and nd6_dad_find, conditionally based on a global flag for each.
However, it was racy because the flag and the code around it were not
protected by a lock and could cause a kernel panic at worst.

Fix the issue by initializing the components in bootup as usual.

-

Initialize dom_mowner for MBUFTRACE
 1.256.2.6  05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #169):

sys/netinet6/nd6.h: revision 1.87
sys/netinet6/nd6.c: revision 1.263

inet6: Re-introduce ND6_LLINFO_WAITDELETE so we can return EHOSTDOWN

Once we've sent nd6_mmaxtries NS messages, send RTM_MISS and move to the
ND6_LLINFO_WAITDELETE state rather than freeing the llentry right away.
Wait for a probe cycle and then free the llentry.

If a connection attempts to re-use the llentry during ND6_LLINFO_WAITDELETE,
return EHOSTDOWN (or EHOSTUNREACH if a gateway) to match inet behaviour.

Continue to ND6_LLINFO_INCOMPLETE and send another NS probe in hope of a
reply. Rinse and repeat.

This reverts part of nd6.c r1.14 - an 18 year old commit!
 1.256.2.5  05-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #168):

sys/net/rtsock.c: revision 1.252
sys/netinet6/nd6_nbr.c: revision 1.168 - 1.172
sys/netinet6/nd6.c: revision 1.262

inet6: Send RTM_MISS when we fail to resolve an address.

Takes the same approach as when adding a new address - we no longer
announce the new lladdr right away but we announce the result.

This will either be RTM_ADD or RTM_MISS.
RTM_DELETE is only sent if we have a lladdr assigned OR gc'ed.

This results in less messages via route(4) and tells us when a new
lladdr has been added (RTM_ADD), changed (RTM_CHANGE), deleted
(RTM_DELETED) or has failed to been resolved (RTM_MISS).

The latter case can be interpreted as unreachable.

inet6: change rt_announce and llchange to bool in nd6_na_input()
more bool
 1.256.2.4  01-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #148):

sys/netinet6/nd6.c: revision 1.261

inet6: don't set an invalid lladdr in nd6_free()

We don't want to announce that we've deleted a hwaddr of all zeros.
 1.256.2.3  01-Sep-2019  martin Pull up following revision(s) (requested by roy in ticket #131):

sys/netinet6/nd6.c: revision 1.260

inet6: nd6_free assumes all routers are processed by kernel RA

This hasn't been the case for a long time if you're a dhcpcd
user with a default config. As such, it's possible for the default
IPv6 router as set by dhcpcd could be erroneously gc'ed by nd6_free.

This reduces the scope of the ND6_WLOCK taken as well as fixing an
issue where we write to ln->ln_state without a lock being held.
 1.256.2.2  26-Aug-2019  martin Pull up following revision(s) (requested by roy in ticket #109):

sys/net/route.h: revision 1.124
sys/netinet6/nd6.c: revision 1.258
sys/netinet6/nd6.c: revision 1.259
sys/net/rtsock.c: revision 1.251
sys/netinet/if_arp.c: revision 1.284
sys/netinet6/nd6_nbr.c: revision 1.167

rtsock: rework rt_clonedmsg to take a message type and lladdr

We will use this in a future patch to notify userland of lladdr
changes.

XXX pullup -8 -9

-

nd6: notify userland of neighbour lla updates once more

XXX pullup -8 -9
 1.256.2.1  19-Aug-2019  martin Pull up following revision(s) (requested by ozaki-r in ticket #97):

sys/netinet6/nd6.c: revision 1.257

Add missing IFNET_LOCK for regen_tmpaddr
Reported by ryo@
 1.265.2.1  25-Jan-2020  ad Sync with head.
 1.268.2.1  20-Apr-2020  bouyer Sync with HEAD
 1.274.2.1  03-Jan-2021  thorpej Sync w/ HEAD.
 1.279.4.3  12-Apr-2025  martin Pull up following revision(s) (requested by ozaki-r in ticket #1089):

sys/netinet6/nd6.c: revision 1.283

nd6: send packets through the fast path even if DELAY and PROBE

If there is a valid ND cache, we can send packets for the destination
of the cache. If the state of the cache is STALE, we need to go
through the slow path to change its state. In the other cases
including the DELAY and PROBE states, we can send packets through
the fast path.
 1.279.4.2  18-Apr-2024  martin Pull up following revision(s) (requested by knakahara in ticket #659):

sys/netinet6/in6_ifattach.c: revision 1.122
sys/netinet/sctp_asconf.c: revision 1.14
sys/netinet6/nd6.c: revision 1.282

Fix invalid IPv6 route when ipsecif(4) is deleted tunnel. Pointed out by ohishi@IIJ.
The pointed bug is fixed by modification in nd6_need_cache().
Others are similar bugs.
 1.279.4.1  10-Dec-2023  martin Pull up following revision(s) (requested by pgoyette in ticket #487):

sys/compat/common/compat_90_mod.c: revision 1.5
sys/compat/common/compat_90_mod.c: revision 1.6
sys/netinet6/in6.c: revision 1.290
sys/netinet6/in6.c: revision 1.291
sys/compat/common/files.common: revision 1.11
sys/netinet6/icmp6.c: revision 1.255
sys/compat/common/net_inet6_nd_90.c: revision 1.1
sys/compat/common/net_inet6_nd_90.c: revision 1.2
sys/modules/compat_90/Makefile: revision 1.2
sys/modules/compat_90/Makefile: revision 1.3
sys/netinet6/nd6.c: revision 1.281
sys/compat/common/compat_mod.h: revision 1.10
sys/kern/compat_stub.c: revision 1.23
sys/sys/compat_stub.h: revision 1.27

Identify the need to rework the COMPAT_* code to be more
module-aware.
This is an XXX comment block only, NFCI.

Modularize the COMPAT_90 code that resulted from the removal of
netinet6/nd6 from the kernel. Now, the minimal compat code can
be successfully loaded and unloaded along with the rest of the
COMPAT_90 code.

Allow kernels builds which don't define INET6 to compile compat bits
too.

Default the build of compat_90 module to include IPv6, as is done
for other INET6-sensitive modules (see if_lagg).

RSS XML Feed