History log of /src/sys/arch/x86/include/pmap.h |
Revision | | Date | Author | Comments |
1.134 |
| 20-Aug-2022 |
riastradh | x86: Move definition of struct pmap to pmap_private.h.
This makes pmap_resident_count and pmap_wired_count out-of-line functions instead of inline. No functional change intended otherwise.
|
1.133 |
| 20-Aug-2022 |
riastradh | x86: Split most of pmap.h into pmap_private.h or vmparam.h.
This way pmap.h only contains the MD definition of the MI pmap(9) API, which loads of things in the kernel rely on, so changing x86 pmap internals no longer requires recompiling the entire kernel every time.
Callers needing these internals must now use machine/pmap_private.h. Note: This is not x86/pmap_private.h because it contains three parts:
1. CPU-specific (different for i386/amd64) definitions used by...
2. common definitions, including Xenisms like xpmap_ptetomach, further used by...
3. more CPU-specific inlines for pmap_pte_* operations
So {amd64,i386}/pmap_private.h defines 1, includes x86/pmap_private.h for 2, and then defines 3. Maybe we should split that out into a new pmap_pte.h to reduce this trouble.
No functional change intended, other than that some .c files must include machine/pmap_private.h when previously uvm/uvm_pmap.h polluted the namespace with pmap internals.
Note: This migrates part of i386/pmap.h into i386/vmparam.h -- specifically the parts that are needed for several constants defined in vmparam.h:
VM_MAXUSER_ADDRESS VM_MAX_ADDRESS VM_MAX_KERNEL_ADDRESS VM_MIN_KERNEL_ADDRESS
Since i386 needs PDP_SIZE in vmparam.h, I added it there on amd64 too, just to keep things parallel.
|
1.132 |
| 20-Aug-2022 |
riastradh | x86: Move pl*_i, pl_i_roundup, and ptp_va2o out of x86/pmap.h.
- pl[1-4]_i -> x86/pte.h - pl_i, pl_i_roundup, ptp_va2o -> x86/pmap.c
|
1.131 |
| 20-Aug-2022 |
riastradh | x86: Move struct vm_page_md to common x86/pmap.h.
|
1.130 |
| 20-Aug-2022 |
riastradh | x86: Split bootspace out of x86/pmap.h into new x86/bootspace.h.
|
1.129 |
| 20-Aug-2022 |
riastradh | x86: Move page attribute table bits to x86/pat.h.
|
1.128 |
| 18-Jun-2022 |
andvar | fix typos in word "functions" in comments, mainly s/fuctions/functions/.
|
1.127 |
| 30-Apr-2021 |
christos | Merge the x86 gdt function and constant definitions
|
1.126 |
| 30-Apr-2021 |
christos | Bump MAX_USERLDT_SIZE to the max size (wastes some memory). wine needs more than PAGE_SIZE and fails spuriously. XXX: Note the duplicate definition hacks. Should really create <x86/gdt.h>, put the just the constants there and unify them. This would also avoid the hack in: src/tests/lib/libi386/t_user_ldt.c#46
|
1.125 |
| 19-Jul-2020 |
maxv | branches: 1.125.6; Revert most of ad's movs/stos change. Instead do a lot simpler: declare svs_quad_copy() used by SVS only, with no need for instrumentation, because SVS is disabled when sanitizers are on.
|
1.124 |
| 14-Jul-2020 |
yamaguchi | Introduce per-cpu IDTs
This is realized by following modifications: - Add IDT pages and its allocation maps for each cpu in "struct cpu_info" - Load per-cpu IDTs at cpu_init_idt(struct cpu_info*) - Copy the IDT entries for cpu0 to other CPUs at attach - These are, for example, exceptions, db, system calls, etc.
And, added a kernel option named PCPU_IDT to enable the feature.
|
1.123 |
| 24-Jun-2020 |
maxv | remove unused x86_stos
|
1.122 |
| 27-May-2020 |
ad | - Add a couple of wrapper functions around STOS and MOVS and use them to zero and copy PTEs in preference to memset()/memcpy().
- Remove related SSE / pageidlezero stuff.
|
1.121 |
| 26-May-2020 |
bouyer | Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl: pass flags to xpq_update_foreign() Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign() to use the MMU_PT_UPDATE_NO_TRANSLATE flag. make xpq_update_foreign() return the raw Xen error. This will cause pmap_enter_ma() to return a negative error number in this case, but the only user of this code path is privcmd.c and it can deal with it.
Add pmap_enter_gnt()m which maps a set of Xen grant entries at the specified va in the specified pmap. Use the hooks implemented for EPT to keep track of mapped grand entries in the pmap, and unmap them when pmap_remove() is called. This requires pmap_remove() to be split into a pmap_remove_locked(), to be called from pmap_remove_gnt().
|
1.120 |
| 08-May-2020 |
riastradh | Factor randomization out of slotspace_rand.
slotspace_rand becomes deterministic; the randomization moves into the callers instead. Why?
There are two callers of slotspace_rand:
- x86/pmap.c pmap_bootstrap - amd64/amd64.c init_slotspace
When the randomization was introduced, it used an x86-only `cpu_earlyrng' abstraction that would hash rdseed/rdrand and rdtsc output together. Except init_slotspace ran before cpu_probe, so cpu_feature was not yet filled out, so during init_slotspace, the only randomization was rdtsc.
In the course of the recent entropy overhaul, I replaced cpu_earlyrng by entropy_extract, and moved cpu_init_rng much earlier -- but still after cpu_probe -- in order to reduce the number of abstractions lying around and the number of copies of rdrand/rdseed logic. In so doing I added some annoying complication (see curcpu_available) to kern_entropy.c to make it work early enough for init_slotspace, and dropped the rdtsc.
For pmap_bootstrap that didn't substantively change anything. But for init_slotspace, it removed the only randomization. To mitigate this, this commit pulls the randomization out of slotspace_rand into pmap_bootstrap and init_slotspace, so that
(a) init_slotspace can use rdtsc and a little private entropy pool in order to restore the prior (weak) randomization it had, and
(b) pmap_bootstrap, which runs a little bit later, can continue to use entropy_extract normally and get rdrand/rdseed too.
A subsequent commit will move cpu_init_rng just a wee bit later, after cpu_init_msrs, so the kern_entropy.c complications can go away. Perhaps someone else more wizardly with x86 can find a way to make init_slotspace run a little later too, after cpu_probe and after cpu_init_msrs and after cpu_rng_init, but I am not that wizardly.
|
1.119 |
| 25-Apr-2020 |
bouyer | Merge the bouyer-xenpvh branch, bringing in Xen PV drivers support under HVM guests in GENERIC. Xen support can be disabled at runtime with boot -c disable hypervisor
|
1.118 |
| 24-Apr-2020 |
maxv | Give the ldt a fixed size of one page (512 slots), and drop the variable- sized mechanism that was too complex.
This fixes a race between USER_LDT and SVS: during context switches, the way SVS installs the new ldt relies on the ldt pointer AND the ldt size, but both cannot be accessed atomically at the same time.
|
1.117 |
| 05-Apr-2020 |
ad | branches: 1.117.2; Allocate PV entries in PAGE_SIZE chunks, and cache partially allocated PV pages with the pmap. Worth about 2-3% sys time on build.sh for me.
|
1.116 |
| 22-Mar-2020 |
ad | x86 pmap:
- Give pmap_remove_all() its own version of pmap_remove_ptes() that on native x86 does the bare minimum needed to clear out PTPs. Cuts ~4% sys time on 'build.sh release' for me.
- pmap_sync_pv(): there's no need to issue a redundant TLB shootdown. The caller waits for the competing operation to finish.
- Bring 'options TLBSTATS' up to date.
|
1.115 |
| 17-Mar-2020 |
ad | Hallelujah, the bug has been found. Resurrect prior changes, to be fixed with following commit.
|
1.114 |
| 17-Mar-2020 |
ad | Back out the recent pmap changes until I can figure out what is going on with pmap_page_remove() (to pmap.c rev 1.365).
|
1.113 |
| 14-Mar-2020 |
ad | PR kern/55071 (Panic shortly after running X11 due to kernel diagnostic assertion "mutex_owned(&pp->pp_lock)")
- Fix a locking bug in pmap_pp_clear_attrs() and in pmap_pp_remove() do the TLB shootdown while still holding the target pmap's lock.
Also:
- Finish PV list locking for x86 & update comments around same.
- Keep track of the min/max index of PTEs inserted into each PTP, and use that to clip ranges of VAs passed to pmap_remove_ptes().
- Based on the above, implement a pmap_remove_all() for x86 that clears out the pmap in a single pass. Makes exit() / fork() much cheaper.
|
1.112 |
| 14-Mar-2020 |
ad | pmap_remove_all(): Return a boolean value to indicate the behaviour. If true, all mappings have been removed, the pmap is totally cleared out, and UVM can then avoid doing the work to call pmap_remove() for each map entry. If false, either nothing has been done, or some helpful arch-specific voodoo has taken place.
|
1.111 |
| 10-Mar-2020 |
ad | - pmap_check_inuse() is expensive so make it DEBUG not DIAGNOSTIC.
- Put PV locking back in place with only a minor performance impact. pmap_enter() still needs more work - it's not easy to satisfy all the competing requirements so I'll do that with another change.
- Use pmap_find_ptp() (lookup only) in preference to pmap_get_ptp() (alloc). Make pm_ptphint indexed by VA not PA. Replace the per-pmap radixtree for dynamic PV entries with a per-PTP rbtree. Cuts system time during kernel build by ~10% for me.
|
1.110 |
| 23-Feb-2020 |
ad | UVM locking changes, proposed on tech-kern:
- Change the lock on uvm_object, vm_amap and vm_anon to be a RW lock. - Break v_interlock and vmobjlock apart. v_interlock remains a mutex. - Do partial PV list locking in the x86 pmap. Others to follow later.
|
1.109 |
| 12-Jan-2020 |
ad | x86 pmap:
- It turns out that every page the pmap frees is necessarily zeroed. Tell the VM system about this and use the pmap as a source of pre-zeroed pages.
- Redo deferred freeing of PTPs more elegantly, including the integration with pmap_remove_all(). This fixes problems with nvmm, and possibly also a crash discovered during fuzzing.
Reported-by: syzbot+a97186518c84f1d85c0c@syzkaller.appspotmail.com
|
1.108 |
| 04-Jan-2020 |
ad | branches: 1.108.2; x86 pmap improvements, reducing system time during a build by about 15% on my test machine:
- Replace the global pv_hash with a per-pmap record of dynamically allocated pv entries. The data structure used for this can be changed easily, and has no special concurrency requirements. For now go with radixtree.
- Change pmap_pdp_cache back into a pool; cache the page directory with the pmap, and avoid contention on pmaps_lock by adjusting the global list in the pool_cache ctor & dtor. Align struct pmap and its lock, and update some comments.
- Simplify pv_entry lists slightly. Allow both PP_EMBEDDED and dynamically allocated entries to co-exist on a single page. This adds a pointer to struct vm_page on x86, but shrinks pv_entry to 32 bytes (which also gets it nicely aligned).
- More elegantly solve the chicken-and-egg problem introduced into the pmap with radixtree lookup for pages, where we need PTEs mapped and page allocations to happen under a single hold of the pmap's lock. While here undo some cut-n-paste.
- Don't adjust pmap_kernel's stats with atomics, because its mutex is now held in the places the stats are changed.
|
1.107 |
| 15-Dec-2019 |
ad | uvm_pagerealloc() can now block because of radixtree manipulation, so defer freeing PTPs until pmap_unmap_ptes(), where we still have the pmap locked but can finally tolerate context switches again.
To be revisited soon: pmap_map_ptes() seems broken WRT other pmap load.
Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com Reported-by: syzbot+3e7bbf37d37d451b25d7@syzkaller.appspotmail.com Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com Reported-by: syzbot+689fb7dab41abff8e75a@syzkaller.appspotmail.com Reported-by: syzbot+3e7bbf37d37d451b25d7@syzkaller.appspotmail.com
|
1.106 |
| 08-Dec-2019 |
ad | Merge x86 pmap changes from yamt-pagecache:
- Deal better with the multi-level pmap object locking kludge. - Handle uvm_pagealloc() being able to block.
|
1.105 |
| 14-Nov-2019 |
maxv | Add support for Kernel Memory Sanitizer (kMSan). It detects uninitialized memory used by the kernel at run time, and just like kASan and kCSan, it is an excellent feature. It has already detected 38 uninitialized variables in the kernel during my testing, which I have since discreetly fixed.
We use two shadows: - "shad", to track uninitialized memory with a bit granularity (1:1). Each bit set to 1 in the shad corresponds to one uninitialized bit of real kernel memory. - "orig", to track the origin of the memory with a 4-byte granularity (1:1). Each uint32_t cell in the orig indicates the origin of the associated uint32_t of real kernel memory.
The memory consumption of these shadows is consequent, so at least 4GB of RAM is recommended to run kMSan.
The compiler inserts calls to specific __msan_* functions on each memory access, to manage both the shad and the orig and detect uninitialized memory accesses that change the execution flow (like an "if" on an uninitialized variable).
We mark as uninit several types of memory buffers (stack, pools, kmem, malloc, uvm_km), and check each buffer passed to copyout, copyoutstr, bwrite, if_transmit_lock and DMA operations, to detect uninitialized memory that leaves the system. This allows us to detect kernel info leaks in a way that is more efficient and also more user-friendly than KLEAK.
Contrary to kASan, kMSan requires comprehensive coverage, ie we cannot tolerate having one non-instrumented function, because this could cause false positives. kMSan cannot instrument ASM functions, so I converted most of them to __asm__ inlines, which kMSan is able to instrument. Those that remain receive special treatment.
Contrary to kASan again, kMSan uses a TLS, so we must context-switch this TLS during interrupts. We use different contexts depending on the interrupt level.
The orig tracks precisely the origin of a buffer. We use a special encoding for the orig values, and pack together in each uint32_t cell of the orig: - a code designating the type of memory (Stack, Pool, etc), and - a compressed pointer, which points either (1) to a string containing the name of the variable associated with the cell, or (2) to an area in the kernel .text section which we resolve to a symbol name + offset.
This encoding allows us not to consume extra memory for associating information with each cell, and produces a precise output, that can tell for example the name of an uninitialized variable on the stack, the function in which it was pushed on the stack, and the function where we accessed this uninitialized variable.
kMSan is available with LLVM, but not with GCC.
The code is organized in a way that is similar to kASan and kCSan, so it means that other architectures than amd64 can be supported.
|
1.104 |
| 13-Nov-2019 |
maxv | Rename: PP_ATTRS_M -> PP_ATTRS_D PP_ATTRS_U -> PP_ATTRS_A For consistency.
|
1.103 |
| 05-Oct-2019 |
maxv | Switch to the new PTE naming. No binary diff (tested with MKREPRO).
|
1.102 |
| 07-Aug-2019 |
maxv | Add support for USER_LDT in SVS. This allows us to have both enabled at the same time.
We allocate an LDT for each CPU in the GDT and map an area for it, in addition to the default LDT already present. In context switches between different processes, we choose between the default or the per-cpu LDT selector: if the user set specific LDT entries, we memcpy them to the per-cpu LDT and load the per-cpu selector.
Tested by Naveen Narayanan (with Wine on amd64).
|
1.101 |
| 29-May-2019 |
maxv | branches: 1.101.2; Add PCID support in SVS. This avoids TLB flushes during kernel<->user transitions, which greatly reduces the performance penalty introduced by SVS.
We use two ASIDs, 0 (kern) and 1 (user), and use invpcid to flush pages in both ASIDs.
The read-only machdep.svs.pcid={0,1} sysctl is added, and indicates whether SVS+PCID is in use.
|
1.100 |
| 10-Mar-2019 |
maxv | Two changes:
* Allow large pages to be passed in pmap_pdes_valid, this happens under DDB when it reads RIP (.text), called via pmap_extract.
* Invert a branch in pmap_extract, so that 'l_cpu' is not touched if we're dealing with the kernel pmap.
This fixes 'boot -d'.
|
1.99 |
| 09-Mar-2019 |
maxv | Start replacing the x86 PTE bits.
|
1.98 |
| 23-Feb-2019 |
maxv | Move PATENTRY into pmap.h, will be used outside.
|
1.97 |
| 13-Feb-2019 |
maxv | Add the EPT pmap code, used by Intel-VMX.
The idea is that under NVMM, we don't want to implement the hypervisor page tables manually in NVMM directly, because we want pageable guests; that is, we want to allow UVM to unmap guest pages when the host comes under pressure.
Contrary to AMD-SVM, Intel-VMX uses a different set of PTE bits from native, and this has three important consequences:
- We can't use the native PTE bits, so each time we want to modify the page tables, we need to know whether we're dealing with a native pmap or an EPT pmap. This is accomplished with callbacks, that handle everything PTE-related.
- There is no recursive slot possible, so we can't use pmap_map_ptes(). Rather, we walk down the EPT trees via the direct map, and that's actually a lot simpler (and probably faster too...).
- The kernel is never mapped in an EPT pmap. An EPT pmap cannot be loaded on the host. This has two sub-consequences: at creation time we must zero out all of the top-level PTEs, and at destruction time we force the page out of the pool cache and into the pool, to ensure that a next allocation will invoke pmap_pdp_ctor() to create a native pmap and not recycle some stale EPT entries.
To create an EPT pmap, the caller must invoke pmap_ept_transform() on a newly-allocated native pmap. And that's about it, from then on the EPT callbacks will be invoked, and the pmap can be destroyed via the usual pmap_destroy(). The TLB shootdown callback is not initialized however, it is the responsibility of the hypervisor (NVMM) to set it.
There are some twisted cases that we need to handle. For example if pmap_is_referenced() is called on a physical page that is entered both by a native pmap and by an EPT pmap, we take the Accessed bits from the two pmaps using different PTE sets in each case, and combine them into a generic PP_ATTRS_U flag (that does not depend on the pmap type).
Given that the EPT layout is a 4-Level tree with the same address space as native x86_64, we allow ourselves to use a few native macros in EPT, such as pmap_pa2pte(), rather than re-defining them with "ept" in the name.
Even though this EPT code is rather complex, it is not too intrusive: just a few callbacks in a few pmap functions, predicted-false to give priority to native. So this comes with no messy #ifdef or performance cost.
|
1.96 |
| 11-Feb-2019 |
cherry | We reorganise definitions for XEN source support as follows:
XEN - common sources required for baseline XEN support. XENPV - sources required for support of XEN in PV mode. XENPVHVM - sources required for support for XEN in HVM mode. XENPVH - sources required for support for XEN in PVH mode.
|
1.95 |
| 01-Feb-2019 |
maxv | Add the remaining pmap callbacks, will be used by NVMM-VMX.
|
1.94 |
| 01-Feb-2019 |
maxv | Change the format of the pp_attrs field: instead of using PTE bits directly, use abstracted bits that are converted from/to PTE bits when needed (in pmap_sync_pv).
This allows us to use the same pp_attrs for pmaps that have PTE bits at different locations.
|
1.93 |
| 17-Dec-2018 |
maxv | Add two pmap fields, will be used by NVMM-VMX. Also apply a few cosmetic changes.
|
1.92 |
| 06-Dec-2018 |
maxv | Fix inconsistency, these are indexes and not types, no real functional change.
|
1.91 |
| 19-Nov-2018 |
maxv | Introduce pl_pi, will be used soon.
|
1.90 |
| 19-Nov-2018 |
maxv | Rename 'mask' -> 'frame', we will use the real 'mask' soon.
|
1.89 |
| 07-Nov-2018 |
maxv | Add two pmap fields, will be used by NVMM.
|
1.88 |
| 29-Aug-2018 |
maxv | clean up a little
|
1.87 |
| 29-Aug-2018 |
maxv | Remove the constants of the DMAP, they are unused, and move NL4_SLOT_DIRECT into amd64/.
|
1.86 |
| 29-Aug-2018 |
maxv | Simplify the ASLR stuff, we don't care about resizable areas now, and it makes the code more complicated for no good reason.
|
1.85 |
| 20-Aug-2018 |
maxv | Add support for kASan on amd64. Written by me, with some parts inspired from Siddharth Muralee's initial work. This feature can detect several kinds of memory bugs, and it's an excellent feature.
It can be enabled by uncommenting these three lines in GENERIC:
#makeoptions KASAN=1 # Kernel Address Sanitizer #options KASAN #no options SVS
The kernel is compiled without SVS, without DMAP and without PCPU area. A shadow area is created at boot time, and it can cover the upper 128TB of the address space. This area is populated gradually as we allocate memory. With this design the memory consumption is kept at its lowest level.
The compiler calls the __asan_* functions each time a memory access is done. We verify whether this access is legal by looking at the shadow area.
We declare our own special memcpy/memset/etc functions, because the compiler's builtins don't add the __asan_* instrumentation.
Initially all the mappings are marked as valid. During dynamic allocations, we add a redzone, which we mark as invalid. Any access on it will trigger a kASan error message. Additionally, the compiler adds a redzone on global variables, and we mark these redzones as invalid too. The illegal-access detection works with a 1-byte granularity.
For now, we cover three areas:
- global variables - kmem_alloc-ated areas - malloc-ated areas
More will come, but that's a good start.
|
1.84 |
| 12-Aug-2018 |
maxv | Move the PCPU area from slot 384 to slot 510, to avoid creating too much fragmentation in the slot space (384 is in the middle of the kernel half of the VA).
|
1.83 |
| 12-Aug-2018 |
maxv | Randomize the main memory on Xen, same as native. Tested on amd64-dom0.
|
1.82 |
| 12-Aug-2018 |
maxv | Add a new area, SLAREA_HYPV, which indicates the slots used by the hypervisor, in our case Xen.
|
1.81 |
| 21-Jul-2018 |
maxv | More ASLR. Randomize the location of the direct map at boot time on amd64. This doesn't need "options KASLR" and works on GENERIC. Will soon be enabled by default.
The location of the areas is abstracted in a slotspace structure. Ideally we should always use this structure when touching the L4 slots, instead of the current cocktail of global variables and constants.
machdep initializes the structure with the default values, and we then randomize its dmap entry. Ideally machdep should randomize everything at once, but in the case of the direct map its size is determined a little later in the boot procedure, so we're forced to randomize its location later too.
|
1.80 |
| 20-Jun-2018 |
maxv | branches: 1.80.2; Add and use bootspace.smodule. Initialize it in locore/prekern to better hide the specifics from the "upper" layers. This allows for greater flexibility.
|
1.79 |
| 19-May-2018 |
jakllsch | remove some remaining uvm_emap(9)-related function prototypes
|
1.78 |
| 19-May-2018 |
jdolecek | Remove emap support. Unfortunately it never got to state where it would be used and usable, due to reliability and limited & complicated MD support.
Going forward, we need to concentrate on interface which do not map anything into kernel in first place (such as direct map or KVA-less I/O), rather than making those mappings cheaper to do.
|
1.77 |
| 08-May-2018 |
maxv | Mitigation for the SS bug, CVE-2018-8897. We disabled dbregs a month ago in -current and -8 so we are not particularly affected anymore.
The #DB handler runs on ist3, if we decide to process the exception we copy the iret frame on the correct non-ist stack and continue as usual.
|
1.76 |
| 04-Mar-2018 |
jdolecek | branches: 1.76.2; drop pmap_update_2pg(), just call pmap_update_pg() separately for each
|
1.75 |
| 18-Jan-2018 |
maxv | Unmap the kernel heap from the user page tables (SVS).
This implementation is optimized and organized in such a way that we don't need to copy the kernel stack to a safe place during user<->kernel transitions. We create two VAs that point to the same physical page; one will be mapped in userland and is offset in order to contain only the trapframe, the other is mapped in the kernel and maps the entire stack.
Sent on tech-kern@ a week ago.
|
1.74 |
| 11-Jan-2018 |
maxv | Add ist0 to pcpu_entry.
|
1.73 |
| 05-Jan-2018 |
maxv | Add a __HAVE_PCPU_AREA option, enabled by default on native amd64 but not Xen.
With this option, the CPU structures that must always be present in the CPU's page tables are moved on L4 slot 384, which means address 0xffffc00000000000.
A new pcpu_area structure is defined. It contains shared structures (IDT, LDT), and then an array of pcpu_entry structures, indexed by cpu_index(ci). Theoretically the LDT should be in the array, but this will be done later.
During the boot procedure, cpu0 calls pmap_init_pcpu, which creates a page tree that is able to map the pcpu_area structure entirely. cpu0 then immediately maps the shared structures. Later, every CPU goes through cpu_pcpuarea_init, which allocates physical pages and kenters the relevant pcpu_entry to them. Finally, each pointer is replaced to point to pcpuarea.
The point of this change is to make sure that the structures that must always be present in the page tables have their own L4 slot. Until now their L4 slot was that of pmap_kernel, and making a distinction between what must be mapped and what does not need to be was complicated.
Even in the non-speculative-bug case this change makes some sense: there are several x86 instructions that leak the addresses of the CPU structures, and putting these structures inside pmap_kernel actually offered a way to compute the address of the kernel heap - which would have made ASLR on it plainly useless, had we implemented that.
Note that, for now, pcpuarea does not contain rsp0.
Unfortunately this change adds many #ifdefs, and makes the code harder to understand. There is also some duplication, but that will be solved later.
|
1.72 |
| 28-Dec-2017 |
maxv | Use variables in PMAP_DIRECT_*, so that the location of the direct map can change.
|
1.71 |
| 11-Nov-2017 |
maxv | Modify the layout of the bootspace structure, in such a way that it can contain several kernel segments of the same type (eg several .text segments). Some parts are still a bit messy but will be cleaned up soon.
I cannot compile-test this change on i386, but it seems fine enough.
NOTE: you need to rebuild and reinstall a new prekern after this change.
|
1.70 |
| 29-Oct-2017 |
maxv | Add a fifth region, called "head". On kaslr kernels it contains the ELF Header and the ELF Section Headers. On normal kernels it is empty (the headers are in the "boot" region).
Note: if you're using GENERIC_KASLR, you also need to rebuild the prekern.
|
1.69 |
| 30-Sep-2017 |
maxv | Add a bootspace structure. It describes the physical and virtual space layout created by the early kernel bootstrap code. Start using it, and eliminate several references to KERNBASE and other global symbols. While here clean up xen-i386, it's really tiring.
|
1.68 |
| 29-Sep-2017 |
ozaki-r | Fix build
sys/arch/x86/x86/cpu.c:920:20: error: 'pmap_largepages' undeclared (first use in this function) smp_data.large = (pmap_largepages != 0); ^
|
1.67 |
| 17-Jun-2017 |
maxv | Actually, use slot 456 instead, so that it fits a cache line.
|
1.66 |
| 14-Jun-2017 |
maxv | Give the direct map 32 slots (16TB of va). This matches MAXPHYSMEM, in such a way that the direct map is no longer the limiting factor for high memory systems.
|
1.65 |
| 14-Jun-2017 |
maxv | Move the direct map from slot 509 to slot 460. We will increase its size dynamically.
|
1.64 |
| 23-Mar-2017 |
maxv | branches: 1.64.6; Remove PG_k completely.
|
1.63 |
| 05-Mar-2017 |
maxv | Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege separation between the kernel and userland.
On Xen-amd64, the kernel runs in ring3 just like userland, and the separation is guaranteed by the hypervisor - each syscall/trap is intercepted by Xen and sent manually to the kernel. Before that, the hypervisor modifies the page tables so that the kernel becomes accessible. Later, when returning to userland, the hypervisor removes the kernel pages and flushes the TLB.
However, TLB flushes are costly, and in order to reduce the number of pages flushed Xen marks the userland pages as global, while keeping the kernel ones as local. This way, when returning to userland, only the kernel pages get flushed - which makes sense since they are the only ones that got removed from the mapping.
Xen differentiates the userland pages by looking at their PG_u bit in the PTE; if a page has this bit then Xen tags it as global, otherwise Xen manually adds the bit but keeps the page as local. The thing is, since we set PG_u in the kernel pages, Xen believes our kernel pages are in fact userland pages, so it marks them as global. Therefore, when returning to userland, the kernel pages indeed get removed from the page tree, but are not flushed from the TLB. Which means that they are still accessible.
With this - and depending on the DTLB size - userland has a small window where it can read/write to the last kernel pages accessed, which is enough to completely escalate privileges: the sysent structure systematically gets read when performing a syscall, and chances are that it will still be cached in the TLB. Userland can then use this to patch a chosen syscall, make it point to a userland function, retrieve %gs and compute the address of its credentials, and finally grant itself root privileges.
|
1.62 |
| 11-Feb-2017 |
maxv | Instead of using a global array with per-cpu indexes, embed the tmp VAs into cpu_info directly. This concerns only {i386, Xen-i386, Xen-amd64}, because amd64 already has a direct map that is way faster than that.
There are two major issues with the global array: maxcpus entries are allocated while it is unlikely that common i386 machines have so many cpus, and the base VA of these entries is not cache-line-aligned, which mostly guarantees cache-line-thrashing each time the VAs are entered.
Now the number of tmp VAs allocated is proportionate to the number of CPUs attached (which therefore reduces memory consumption), and the base is properly aligned.
On my 3-core AMD, the number of DC_refills_L2 events triggered when performing 5x10^6 calls to pmap_zero_page on two dedicated cores is on average divided by two with this patch.
Discussed on tech-kern a little.
|
1.61 |
| 08-Nov-2016 |
christos | branches: 1.61.2; PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present. XXX: pullup-7
|
1.60 |
| 19-Sep-2016 |
maya | move function prototype to x86, so it is available to amd64 too
|
1.59 |
| 25-Jul-2016 |
maxv | The L1 entry of the first page of the data segment is overwritten for the LAPIC page, and set as RWX+PG_N. The LAPIC pa is fixed, and its va resides in the data segment. Because of this error-prone design, the kernel image map is not linear, and I first thought it was a bug (as I vaguely said in PR/51148). Using large pages for the data segment is therefore wrong, since the first page does not actually belong to the data segment (even if its va is in the range). This bug is not triggered currently, since local_apic is not large-page-aligned.
We will certainly have to allocate a va dynamically instead of using the first page of data; but for now, disable large pages on the data segment, and map the LAPIC as RW.
This is the last x86-specific RWX page.
|
1.58 |
| 01-Jul-2016 |
maxv | branches: 1.58.2; Define pmap_pg_nx globally. Will be used soon.
|
1.57 |
| 11-Nov-2015 |
skrll | Split out the pmap_pv_track stuff for use by others.
Discussed with riastradh@
|
1.56 |
| 03-Apr-2015 |
riastradh | Implement pmap_pv(9) for x86 for P->V tracking of unmanaged pages.
Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
|
1.55 |
| 17-Oct-2013 |
christos | branches: 1.55.4; 1.55.6; __USE() unused variables
|
1.54 |
| 23-Jun-2013 |
uebayasi | branches: 1.54.2; Remove obsolete comment. OK'ed by rmind@.
|
1.53 |
| 13-Nov-2012 |
chs | add a pmap_kremove_local() that doesn't do TLB invalidations on other CPUs. this is only intended for use while writing kernel crash dumps. remove unused pmap_map().
|
1.52 |
| 20-Apr-2012 |
rmind | branches: 1.52.2; - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the limitation of maximum CPUs.
- Support up to 256 CPUs on amd64 architecture by default.
Bug fixes, improvements, completion of Xen part and testing on 64-core AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs) by Manuel Bouyer.
|
1.51 |
| 11-Mar-2012 |
jym | Alternate PTEs got killed a few weeks ago. Clean up unused prototypes.
|
1.50 |
| 17-Feb-2012 |
bouyer | Apply patch proposed in PR port-xen/45975 (this does not solve the exact problem reported here but is part of the solution): xen_kpm_sync() is not working as expected, leading to races between CPUs. 1 the check (xpq_cpu != &x86_curcpu) is always false because we have different x86_curcpu symbols with different addresses in the kernel. Fortunably, all addresses dissaemble to the same code. Because of this we always use the code intended for bootstrap, which doesn't use cross-calls or lock.
2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs, which cause it to sleep and pmap.c doesn't like that. It triggers this KASSERT() in pmap_unmap_ptes(): KASSERT(pmap->pm_ncsw == curlwp->l_ncsw); 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which needs to know on which CPU a pmap is loaded *now*: pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch to a new pmap, leaving a window where a pmap is still in a CPU's ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted by the hypervisor at any time, it can be large enough to let another CPU free the PTP and reuse it as a normal page.
To fix 2), avoid cross-calls and IPIs completely, and instead use a mutex to update all CPU's ci_kpm_pdir from the local CPU. It's safe because we just need to update the table page, a tlbflush IPI will happen later. As a side effect, we don't need a different code for bootstrap, fixing 1). The mutex added to struct cpu needs a small headers reorganisation.
to fix 3), introduce a pm_xen_ptp_cpus which is updated from cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
While there I removed the unused pmap_is_active() function; and added some more details to DIAGNOSTIC panics.
|
1.49 |
| 04-Dec-2011 |
chs | branches: 1.49.2; map all of physical memory using large pages. ported from openbsd years ago by Murray Armfield, updated for changes since then by me.
|
1.48 |
| 23-Nov-2011 |
jym | branches: 1.48.2; No more users of xpmap_update(). Use pmap_pte_*() functions now.
|
1.47 |
| 23-Nov-2011 |
jym | Move Xen-specific functions to Xen pmap. Requested by cherry@.
Un'ifdef XEN in xen_pmap.c, it is always defined there.
|
1.46 |
| 20-Nov-2011 |
jym | Expose pmap_pdp_cache publicly to x86/xen pmap. Provide suspend/resume callbacks for Xen pmap.
Turn static internal callbacks of pmap_pdp_cache.
XXX the implementation of pool_cache_invalidate(9) is still wrong, and IMHO this needs fixing before -6. See http://mail-index.netbsd.org/tech-kern/2011/11/18/msg011924.html
|
1.45 |
| 08-Nov-2011 |
cherry | Expose the PG_k #define pt/pd bit to both xen and "baremetal" x86. This is required, since kernel pages are mapped with user permissions in XEN/amd64 since the VM kernel runs in ring3. Since XEN/i386(including PAE) runs in ring1, supervisor mode is appropriate for these ports. We need to share this since the pmap implementation is still shared. Once the xen implementation is sufficiently independant of the x86 one, this can be made private to xen/include/xenpmap.h
|
1.44 |
| 06-Nov-2011 |
cherry | [merging from cherry-xenmp] Make the xen MMU op queue locking api private. Implement per-cpu queues.
|
1.43 |
| 18-Oct-2011 |
jym | branches: 1.43.2; Make "pmaps" (list of non-kernel pmaps) and "pmaps_lock" externally visible. Required by pmap MD code that could reside in other files, notably Xen's pmap.
|
1.42 |
| 20-Sep-2011 |
jym | Merge jym-xensuspend branch in -current. ok bouyer@.
Goal: save/restore support in NetBSD domUs, for i386, i386 PAE and amd64.
Executive summary: - split all Xen drivers (xenbus(4), grant tables, xbd(4), xennet(4)) in two parts: suspend and resume, and hook them to pmf(9). - modify pmap so that Xen hypervisor does not cry out loud in case it finds "unexpected" recursive memory mappings - provide a sysctl(7), machdep.xen.suspend, to command suspend from userland via powerd(8). Note: a suspend can only be handled correctly when dom0 requested it, so provide a mechanism that will prevent kernel to blindly validate user's commands
The code is still in experimental state, use at your own risk: restore can corrupt backend communications rings; this can completely thrash dom0 as it will loop at a high interrupt level trying to honor all domU requests.
XXX PAE suspend does not work in amd64 currently, due to (yet again!) page validation issues with hypervisor. Will fix.
XXX secondary CPUs are not suspended, I will write the handlers in sync with cherry's Xen MP work.
Tested under i386 and amd64, bear in mind ring corruption though.
No build break expected, GENERICs and XEN* kernels should be fine. ./build.sh distribution still running. In any case: sorry if it does break for you, contact me directly for reports.
|
1.41 |
| 13-Aug-2011 |
cherry | Add locking around ops to the hypervisor MMU "queue".
|
1.40 |
| 13-Jun-2011 |
tls | Fix Xen kernel builds (pmap_is_curpmap can't be static)
|
1.39 |
| 12-Jun-2011 |
rmind | Welcome to 5.99.53! Merge rmind-uvmplock branch:
- Reorganize locking in UVM and provide extra serialisation for pmap(9). New lock order: [vmpage-owner-lock] -> pmap-lock.
- Simplify locking in some pmap(9) modules by removing P->V locking.
- Use lock object on vmobjlock (and thus vnode_t::v_interlock) to share the locks amongst UVM objects where necessary (tmpfs, layerfs, unionfs).
- Rewrite and optimise x86 TLB shootdown code, make it simpler and cleaner. Add TLBSTATS option for x86 to collect statistics about TLB shootdowns.
- Unify /dev/mem et al in MI code and provide required locking (removes kernel-lock on some ports). Also, avoid cache-aliasing issues.
Thanks to Andrew Doran and Joerg Sonnenberger, as their initial patches formed the core changes of this branch.
|
1.38 |
| 07-May-2011 |
jym | branches: 1.38.2; Do as the comment says, use ilog2(). This gets optimized directly at compile time, no call to fls() is needed.
|
1.37 |
| 25-Apr-2011 |
yamt | comment
|
1.36 |
| 25-Apr-2011 |
yamt | remove unused ptei
|
1.35 |
| 11-Feb-2011 |
jmcneill | add bus_space_mmap support for BUS_SPACE_MAP_PREFETCHABLE, ok matt@
|
1.34 |
| 01-Feb-2011 |
chuck | udpate license clauses on my code to match the new-style BSD licenses. remove no-longer-valid wustl email address for me. based on diff that rmind@ sent me.
no functional change with this commit.
|
1.33 |
| 24-Jul-2010 |
jym | branches: 1.33.2; 1.33.4; Welcome PAE inside i386 current.
This patch is inspired by work previously done by Jeremy Morse, ported by me to -current, merged with the work previously done for port-xen, together with additionals fixes and improvements.
PAE option is disabled by default in GENERIC (but will be enabled in ALL in the next few days).
In quick, PAE switches the CPU to a mode where physical addresses become 36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope with the increased size of the physical address, they are manipulated as 64 bits variables by kernel and MMU.
When supported by the CPU, it also allows the use of the NX/XD bit that provides no-execution right enforcement on a per physical page basis.
Notes:
- reworked locore.S
- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the different handling of pmap mappings with PAE vs !PAE, Xen vs native, details are hidden within this function. This helps calling it from assembly, as some features, like BIOS calls, switch to pmap_kernel before mapping trampoline code in low memory.
- some changes in bioscall and kvm86_call, to reflect the above.
- the L3 is "pinned" per-CPU, and is only manipulated by a reduced set of functions within pmap. To track the L3, I added two elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is still 2).
- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd becomes an element of cpu_info (slowly paving the way for MP world).
- bootinfo_source struct declaration is modified, to cope with paddr_t size change with PAE (it is not correct to assume that bs_addr is a paddr_t when compiled with PAE - it should remain 32 bits). bs_addrs is now a void * array (in bootloader's code under i386/stand/, the bs_addrs is a physaddr_t, which is an unsigned long).
- fixes in multiboot code (same reason as bootinfo): paddr_t size change. I used Elf32_* types, use RELOC() where necessary, and move the memcpy() functions out of the if/else if (I do not expect sym and str tables to overlap with ELF).
- 64 bits atomic functions for pmap
- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in struct pmap (it now becomes a PDP_SIZE array, with or without PAE).
- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via loops on PDP_SIZE.
See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html
No objection raised on port-i386@ and port-xen@R for about a week.
XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE kernel dumps (VA => PA macros are slightly different, and need proper 64 bits PA support in kvm_i386).
XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This cannot be solved easily, and needs lots of thinking before being declared safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).
|
1.32 |
| 15-Jul-2010 |
jym | Make the comment about PDPpaddr more thorough.
|
1.31 |
| 06-Jul-2010 |
cegger | Turn PMAP_NOCACHE into MI flag. Add MI flags PMAP_WRITE_COMBINE, PMAP_WRITE_BACK, PMAP_NOCACHE_OVR. Update pmap(9) manpage.
hppa: Remove MD PMAP_NOCACHE flag as it exists as MI flag mips: Rename MD PMAP_NOCACHE to PGC_NOCACHE.
x86: Implement new MI flags using Page-Attribute Tables. x86: Implement BUS_SPACE_MAP_PREFETCHABLE.
Patch presented on tech-kern@: http://mail-index.netbsd.org/tech-kern/2010/06/30/msg008458.html
No comments on this last version.
|
1.30 |
| 10-May-2010 |
dyoung | Provide pmap_enter_ma(), pmap_extract_ma(), pmap_kenter_ma() in all x86 kernels, and use them in the bus_space(9) implementation instead of ugly Xen #ifdef-age. In a non-Xen kernel, the _ma() functions either call or alias the equivalent _pa() functions.
Reviewed on port-xen@netbsd.org and port-i386@netbsd.org. Passes rmind@'s and bouyer@'s inspection. Tested on i386 and on Xen DOMU / DOM0.
|
1.29 |
| 09-Feb-2010 |
jym | branches: 1.29.2; Fix typos in comments.
|
1.28 |
| 11-Nov-2009 |
cegger | branches: 1.28.2; update comment: we use PMAP_NOCACHE for both pmap_enter and pmap_kenter_pa
|
1.27 |
| 07-Nov-2009 |
cegger | Add a flags argument to pmap_kenter_pa(9). Patch showed on tech-kern@ http://mail-index.netbsd.org/tech-kern/2009/11/04/msg006434.html No objections.
|
1.26 |
| 19-Jul-2009 |
rmind | pmap_emap_sync: add an argument, and do not perform pmap_load() during context switch (pmap_destroy() path seems to be unsafe), instead just perform tlbflush(). Slightly inefficient, but good enough for now.
|
1.25 |
| 28-Jun-2009 |
rmind | Ephemeral mapping (emap) implementation. Concept is based on the idea that activity of other threads will perform the TLB flush for the processes using emap as a side effect. To track that, global and per-CPU generation numbers are used. This idea was suggested by Andrew Doran; various improvements to it by me. Notes:
- For now, zero-copy on pipe is not yet enabled. - TCP socket code would likely need more work. - Additional UVM loaning improvements are needed.
Proposed on <tech-kern>, silence there. Quickly reviewed by <ad>.
|
1.24 |
| 22-Apr-2009 |
cegger | change pmap flags argument from int to u_int. forgot to commit this.
|
1.23 |
| 18-Apr-2009 |
cegger | Introduce PMAP_NOCACHE as first PMAP MD bit in x86. Make use of it in pmap_enter(). This safes one extra TLB flush when mapping dma-safe memory. Presented on tech-kern@, port-i386@ and port-amd64@ ok ad@
|
1.22 |
| 21-Mar-2009 |
ad | PR port-i386/40143 Viewing an mpeg transport stream with mplayer causes crash
Fix numerous problems:
1. LDT updates are not atomic.
2. Number of processes running with private LDTs and/or I/O bitmaps is not capped. System with high maxprocs can be paniced.
3. LDTR can be leaked over context switch.
4. GDT slot allocations can race, giving the same LDT slot to two procs.
5. Incomplete interrupt/trap frames can be stacked.
6. In some rare cases segment faults are not handled correctly.
|
1.21 |
| 09-Dec-2008 |
pooka | branches: 1.21.2; Make pmap_kernel() a MI macro for struct pmap *kernel_pmap_ptr, which is now the "API" provided by the pmap module. pmap_kernel() remains as the syntactic sugar.
Bonus cosmetics round: move all the pmap_t pointer typedefs into uvm_pmap.h.
Thanks to Greg Oster for providing cpu muscle for doing test builds.
|
1.20 |
| 16-Sep-2008 |
bouyer | branches: 1.20.2; 1.20.4; Implement the arch-dependent p2m frame lists list. This adds support for 'xm dump-core' for NetBSD domUs. From Jean-Yves Migeon (jean-yves dot migeon at espci dot fr)
|
1.19 |
| 24-Jun-2008 |
jmcneill | branches: 1.19.2; Define PMAP_FORK -- this was lost in the vmlocking merge, and is required by options USER_LDT.
|
1.18 |
| 05-Jun-2008 |
ad | branches: 1.18.2; pmap_remove_all() for x86. Also, always defer freeing ptps to pmap_update(). There may be a better way to do this, but for now this is simple and avoids potential bugs.
Proposed on tech-kern and discussed with chs@.
|
1.17 |
| 04-Jun-2008 |
ad | Revert unintentional change.
|
1.16 |
| 04-Jun-2008 |
ad | vm_page: put TAILQ_ENTRY into a union with LIST_ENTRY, so we can use both.
|
1.15 |
| 02-Jun-2008 |
ad | - Don't bother using sse to copy/zero pages on demand. It turns out not to be worth it. - If the machine has sse, re-enable zeroing pages in the idle loop and use the sse instructions so that we don't blow out the cache.
|
1.14 |
| 03-May-2008 |
ad | branches: 1.14.2; Back out previous which was not thought through properly.
|
1.13 |
| 03-May-2008 |
ad | Implement pmap_remove_all().
|
1.12 |
| 23-Jan-2008 |
bouyer | branches: 1.12.6; 1.12.8; 1.12.10; Merge the bouyer-xeni386 branch. This brings in PAE support to NetBSD xeni386 (domU only). PAE support is enabled by 'options PAE', see the new XEN3PAE_DOMU and INSTALL_XEN3PAE_DOMU kernel config files.
See the comments in arch/i386/include/{pte.h,pmap.h} to see how it works. In short, we still handle it as a 2-level MMU, with the second level page directory being 4 pages in size. pmap switching is done by switching the L2 pages in the L3 entries, instead of loading %cr3. This is almost required by Xen, which handle the last L2 page (the one mapping 0xc0000000 - 0xffffffff) in a very special way. But this approach should also work for native PAE support if ever supported (in fact, the pmap should almost suport native PAE, what's missing is bootstrap code in locore.S).
|
1.11 |
| 20-Jan-2008 |
yamt | - rewrite P->V tracking. - use a hash rather than SPLAY trees. SPLAY tree is a wrong algorithm to use here. will be revisited if it slows down anything other than micro-benchmarks. - optimize the single mapping case (it's a common case) by embedding an entry into mdpage. - don't keep a pmap pointer as it can be obtained from ptp. (discussed on port-i386 some years ago.) ideally, a single paddr_t should be enough to describe a pte. but it needs some more thoughts as it can increase computational costs. - pmap_enter: simplify and fix races with pmap_sync_pv. - don't bother to lock pm_obj[i] where i > 0, unless DIAGNOSTIC. - kill mp_link to save space. - add many KASSERTs.
|
1.10 |
| 11-Jan-2008 |
bouyer | Merge the bouyer-xeni386 branch to head, at tag bouyer-xeni386-merge1 (the branch is still active and will see i386PAE support developement). Sumary of changes: - switch xeni386 to the x86/x86/pmap.c, and the xen/x86/x86_xpmap.c pmap bootstrap. - merge back most of xen/i386/ to i386/i386 - change the build to reduce diffs between i386 and amd64 in file locations - remove include files that were identical to the i386/amd64 counterparts, the build will find them via the xen-ma/machine link.
|
1.9 |
| 08-Jan-2008 |
yamt | kill unused PMF_USER_RELOAD.
|
1.8 |
| 02-Jan-2008 |
yamt | g/c pv_page stuffs.
|
1.7 |
| 25-Dec-2007 |
perry | Convert many of the uses of __attribute__ to equivalent __packed, __unused and __dead macros from cdefs.h
|
1.6 |
| 09-Dec-2007 |
jmcneill | branches: 1.6.2; Merge jmcneill-pm branch.
|
1.5 |
| 22-Nov-2007 |
bouyer | branches: 1.5.2; 1.5.4; Pull up the bouyer-xenamd64 branch to HEAD. This brings in amd64 support to NetBSD/Xen, both Dom0 and DomU.
|
1.4 |
| 15-Nov-2007 |
ad | Remove support for 80386 level CPUs. PR port-i386/36163.
|
1.3 |
| 07-Nov-2007 |
ad | Merge from vmlocking:
- pool_cache changes. - Debugger/procfs locking fixes. - Other minor changes.
|
1.2 |
| 18-Oct-2007 |
yamt | branches: 1.2.2; 1.2.4; 1.2.6; 1.2.8; 1.2.10; merge yamt-x86pmap branch.
- reduce differences between amd64 and i386. notably, share pmap.c between them. it makes several i386 pmap improvements available to amd64, including tlb shootdown reduction and bug fixes from Stephan Uphoff. - implement deferred pmap switching for amd64. - remove LARGEPAGES option. always use large pages if available. also, make it work on amd64.
|
1.1 |
| 08-Oct-2007 |
yamt | branches: 1.1.2; 1.1.4; file pmap.h was initially added on branch yamt-x86pmap.
|
1.1.4.4 |
| 18-Nov-2007 |
bouyer | Sync with HEAD
|
1.1.4.3 |
| 13-Nov-2007 |
bouyer | Sync with HEAD
|
1.1.4.2 |
| 25-Oct-2007 |
bouyer | Finish sync with HEAD. Especially use the new x86 pmap for xenamd64. For this: - rename pmap_pte_set() to pmap_pte_testset() - make pmap_pte_set() a function or macro for non-atomic PTE write - define and use pmap_pa2pte()/pmap_pte2pa() to read/write PTE entries - define pmap_pte_flush() which is a nop in x86 case, and flush the MMUops queue in the Xen case
|
1.1.4.1 |
| 25-Oct-2007 |
bouyer | Sync with HEAD.
|
1.1.2.3 |
| 18-Oct-2007 |
yamt | #ifdef out an unused member for x86_64.
|
1.1.2.2 |
| 14-Oct-2007 |
yamt | move pl_i_roundup to a header.
|
1.1.2.1 |
| 08-Oct-2007 |
yamt | merge some parts of x86 pmap.h.
|
1.2.10.5 |
| 23-Mar-2008 |
matt | sync with HEAD
|
1.2.10.4 |
| 09-Jan-2008 |
matt | sync with HEAD
|
1.2.10.3 |
| 08-Nov-2007 |
matt | sync with -HEAD
|
1.2.10.2 |
| 06-Nov-2007 |
matt | sync with HEAD
|
1.2.10.1 |
| 18-Oct-2007 |
matt | file pmap.h was added on branch matt-armv6 on 2007-11-06 23:23:38 +0000
|
1.2.8.4 |
| 18-Feb-2008 |
mjf | Sync with HEAD.
|
1.2.8.3 |
| 27-Dec-2007 |
mjf | Sync with HEAD.
|
1.2.8.2 |
| 08-Dec-2007 |
mjf | Sync with HEAD.
|
1.2.8.1 |
| 19-Nov-2007 |
mjf | Sync with HEAD.
|
1.2.6.6 |
| 04-Feb-2008 |
yamt | sync with head.
|
1.2.6.5 |
| 21-Jan-2008 |
yamt | sync with head
|
1.2.6.4 |
| 07-Dec-2007 |
yamt | sync with head
|
1.2.6.3 |
| 15-Nov-2007 |
yamt | sync with head.
|
1.2.6.2 |
| 27-Oct-2007 |
yamt | sync with head.
|
1.2.6.1 |
| 18-Oct-2007 |
yamt | file pmap.h was added on branch yamt-lazymbuf on 2007-10-27 11:28:56 +0000
|
1.2.4.6 |
| 27-Nov-2007 |
joerg | Sync with HEAD. amd64 Xen support needs testing.
|
1.2.4.5 |
| 21-Nov-2007 |
joerg | Sync with HEAD.
|
1.2.4.4 |
| 11-Nov-2007 |
joerg | Sync with HEAD.
|
1.2.4.3 |
| 28-Oct-2007 |
joerg | Cosmetic: reduce diff to HEAD.
|
1.2.4.2 |
| 26-Oct-2007 |
joerg | Sync with HEAD.
Follow the merge of pmap.c on i386 and amd64 and move pmap_init_tmp_pgtbl into arch/x86/x86/pmap.c. Modify the ACPI wakeup code to restore CR4 before jumping back into kernel space as the large page option might cover that.
|
1.2.4.1 |
| 18-Oct-2007 |
joerg | file pmap.h was added on branch jmcneill-pm on 2007-10-26 15:43:44 +0000
|
1.2.2.4 |
| 03-Dec-2007 |
ad | Sync with HEAD.
|
1.2.2.3 |
| 24-Oct-2007 |
ad | Use a pool_cache to allocate pv entries. PR port-i386/37193.
|
1.2.2.2 |
| 23-Oct-2007 |
ad | Sync with head.
|
1.2.2.1 |
| 18-Oct-2007 |
ad | file pmap.h was added on branch vmlocking on 2007-10-23 20:36:40 +0000
|
1.5.4.1 |
| 11-Dec-2007 |
yamt | sync with head.
|
1.5.2.1 |
| 26-Dec-2007 |
ad | Sync with head.
|
1.6.2.7 |
| 20-Jan-2008 |
bouyer | Sync with HEAD
|
1.6.2.6 |
| 17-Jan-2008 |
bouyer | - Fix L2_SLOT_APTE value (not sure how I got this value but it was definitively wrong) - Use global variable for the PAE L3 page adresses, so that pmap.c can get it from the bootstrap code - Extent the size of our virtual PDP from 3 to 4 pages, so that pmap->pm_pdir[] is contigous for the whole VA range. The last page is a shadow of the kernel's real PDP (L3[3]). - make pm_pdirpa an array of 4 paddr_t if using PAE. introduce a pmap_pdirpa macro to get the physical address of a given PD entry. - fix pmap_map_pte
The kernel now boots single-user. fsck will cause a kernel fault in pmap_pdes_invalid() on exit.
|
1.6.2.5 |
| 13-Jan-2008 |
bouyer | Work in progress on xeni386 PAE support: Make xeni386 build with a 64bit paddr_t. For this vaddr_t vs paddr_t vs pointers usages had to be clarified. If 'options PAE' is present in a Xen3 kernel, switch paddr_t, pd_entry_t and pt_entry_t to 64bits, and add the PAE entry in the __xen_guest ELF section.
|
1.6.2.4 |
| 10-Jan-2008 |
bouyer | Sync with HEAD
|
1.6.2.3 |
| 02-Jan-2008 |
bouyer | Sync with HEAD
|
1.6.2.2 |
| 13-Dec-2007 |
bouyer | - make amd64 XEN3 kernels build again - pin the pdp pages in the PDP cache contructor, and unpin them in the destructor. garbage-collect PMF_USER_XPIN.
|
1.6.2.1 |
| 11-Dec-2007 |
bouyer | Switch i386 to x86/x86/pmap.c
|
1.12.10.5 |
| 11-Aug-2010 |
yamt | sync with head.
|
1.12.10.4 |
| 11-Mar-2010 |
yamt | sync with head
|
1.12.10.3 |
| 19-Aug-2009 |
yamt | sync with head.
|
1.12.10.2 |
| 18-Jul-2009 |
yamt | sync with head.
|
1.12.10.1 |
| 04-May-2009 |
yamt | sync with head.
|
1.12.8.2 |
| 17-Jun-2008 |
yamt | sync with head.
|
1.12.8.1 |
| 04-Jun-2008 |
yamt | sync with head
|
1.12.6.4 |
| 17-Jan-2009 |
mjf | Sync with HEAD.
|
1.12.6.3 |
| 28-Sep-2008 |
mjf | Sync with HEAD.
|
1.12.6.2 |
| 29-Jun-2008 |
mjf | Sync with HEAD.
|
1.12.6.1 |
| 05-Jun-2008 |
mjf | Sync with HEAD.
Also fix build.
|
1.14.2.3 |
| 24-Sep-2008 |
wrstuden | Merge in changes between wrstuden-revivesa-base-2 and wrstuden-revivesa-base-3.
|
1.14.2.2 |
| 18-Sep-2008 |
wrstuden | Sync with wrstuden-revivesa-base-2.
|
1.14.2.1 |
| 23-Jun-2008 |
wrstuden | Sync w/ -current. 34 merge conflicts to follow.
|
1.18.2.1 |
| 27-Jun-2008 |
simonb | Sync with head.
|
1.19.2.2 |
| 13-Dec-2008 |
haad | Update haad-dm branch to haad-dm-base2.
|
1.19.2.1 |
| 19-Oct-2008 |
haad | Sync with HEAD.
|
1.20.4.1 |
| 04-Apr-2009 |
snj | Pull up following revision(s) (requested by ad in ticket #656): sys/arch/amd64/amd64/gdt.c: revision 1.21 via patch sys/arch/amd64/amd64/machdep.c: revision 1.129 via patch sys/arch/i386/i386/gdt.c: revision 1.47 via patch sys/arch/i386/i386/kvm86.c: revision 1.17 via patch sys/arch/i386/i386/locore.S: revision 1.85 via patch sys/arch/i386/i386/machdep.c: revision 1.666 via patch sys/arch/i386/i386/vector.S: revision 1.45 via patch sys/arch/i386/include/pcb.h: revision 1.47 via patch sys/arch/x86/include/pmap.h: revision 1.22 via patch sys/arch/x86/include/sysarch.h: revision 1.8 via patch sys/arch/x86/x86/pmap.c: revision 1.80 via patch sys/arch/x86/x86/sys_machdep.c: revision 1.17 via patch sys/compat/linux/arch/i386/linux_machdep.c: revision 1.143 via patch sys/kern/init_main.c: revision 1.384 via patch PR port-i386/40143 Viewing an mpeg transport stream with mplayer causes crash Fix numerous problems: 1. LDT updates are not atomic. 2. Number of processes running with private LDTs and/or I/O bitmaps is not capped. System with high maxprocs can be paniced. 3. LDTR can be leaked over context switch. 4. GDT slot allocations can race, giving the same LDT slot to two procs. 5. Incomplete interrupt/trap frames can be stacked. 6. In some rare cases segment faults are not handled correctly.
|
1.20.2.2 |
| 28-Apr-2009 |
skrll | Sync with HEAD.
|
1.20.2.1 |
| 19-Jan-2009 |
skrll | Sync with HEAD.
|
1.21.2.11 |
| 27-Aug-2011 |
jym | Sync with HEAD. Most notably: uvm/pmap work done by rmind@, and MP Xen work of cherry@.
No regression observed on suspend/restore.
|
1.21.2.10 |
| 26-May-2011 |
jym | Pull-up some modifications from -current to my branch.
|
1.21.2.9 |
| 02-May-2011 |
jym | Sync with head.
|
1.21.2.8 |
| 28-Mar-2011 |
jym | Sync with HEAD. TODO before merge: - shortcut for suspend code in sysmon, when powerd(8) is not running. Borrow ``xs_watch'' thread context? - bug hunting in xbd + xennet resume. Rings are currently thrashed upon resume, so current implementation force flush them on suspend. It's not really needed.
|
1.21.2.7 |
| 24-Oct-2010 |
jym | Sync with HEAD
|
1.21.2.6 |
| 01-Nov-2009 |
jym | - Upgrade suspend/resume code to comply with Xen2 removal. - Add support for PAE domUs suspend/resume. - Fix an issue regarding initialization of the xbd ring I/O that could end badly during resume, with invalid block operations submitted to dom0 backend.
NetBSD supports PAE under x86_32 by considering the L2 page as being 4 pages long instead of 1.
Xen validates the page types during resume. Sadly, the hypervisor handles alternative recursive mappings (== PG/PD entries pointing to pages other than self) inadequately, which could lead to incorrect page pinning.
As a result, the important change with this patch is to clear these alternative mappings during suspend, and reset them back to their former self upon resume. For PAE, approx. all 4 PDIR_SLOT_PTEs could be considered as alternative recursive mappings.
See comments in pmap.c for further details.
Now, let the testing and bug hunting begin.
|
1.21.2.5 |
| 01-Nov-2009 |
jym | Sync with HEAD.
|
1.21.2.4 |
| 24-Jul-2009 |
jym | - rework the page pinning API, so that now a function is provided for each level of indirection encountered during virtual memory translations. Update pmap accordingly. Pinning looks cleaner that way, and it offers the possibility to pin lower level pages if necessary (NetBSD does not do it currently).
- some fixes and comments to explain how page validation/invalidation take place during save/restore/migrate under Xen. L2 shadow entries from PAE are now handled, so basically, suspend/resume works with PAE.
- fixes an issue reported by Christoph (cegger@) for xencons suspend/resume in dom0.
TODO:
- PAE save/restore is currently limited to single-user only, multi-user support requires modifications in PAE pmap that should be discussed first. See the comments about the L2 shadow pages cached in pmap_pdp_cache in this commit.
- grant table bug is still there; do not use the kernels of this branch to test suspend/resume, unless you want to experience bad crashes in dom0, and push the big red button.
Now there is light at the end of the tunnel :)
Note: XEN2 kernels will neither build nor work with this branch.
|
1.21.2.3 |
| 23-Jul-2009 |
jym | Sync with HEAD.
|
1.21.2.2 |
| 31-May-2009 |
jym | Modifications for the Xen suspend/migrate/resume branch:
- introduce xenbus_device_{suspend,resume}() functions. These are routines used to suspend/resume MI parts of the Xenbus device interfaces, like updating frontend/backend devices' paths found in XenStore.
- introduce HYPERVISOR_sysctl(), an hypercall used only by Xentools to obtain information from hypervisor (listing VMs, printing console, etc.). I use it to query xenconsole from ddb(), as a last resort in case of a panic() in dom0 (xm being not available). Currently unused in the branch; could be, if requested.
- disable the rwlock(9) used to protect code that could use transient MFNs. It could trigger nasty context switches in place it should not to.
- fix some bugs in the xennet/xbd suspend/resume pmf(9) handlers.
- following XenSource's design, talk_to_otherend() is now called watch_otherend(), and free_otherend_details() is used by Xenbus device suspend/resume routines.
- some slight modifications in pmap regarding APDP. Introduce an inline function (pmap_unmap_apdp_pde()) that clears APDP entry for the current pmap.
- similarly, implement pmap_unmap_all_apdp_pdes() that iterates through all pmaps and tears down APDP, as Xen does not handle them properly.
TODO/XXX:
- pmap_unmap_apdp_pde() does not handle APDP shadow entry of PAE. It will, once I figure out how PAE uses it.
- revisit the pmap locking issue regarding transient MFNs. As NetBSD does not use kernel preemption and MP for Xen, this could be skipped momentarily. See http://mail-index.netbsd.org/port-xen/2009/04/27/msg004903.html for details.
- fix a bug regarding grant tables which could technically DoS a dom0 if ridiculously high consumer/producer indexes are passed down in the ring during a resume.
All in all, once the grant table index issue and APDP PAE are fixed, next step is to torture test this branch.
Tested under i386 PAE and non-PAE, Xen3 dom0 and domU. amd64 is only compile tested.
|
1.21.2.1 |
| 13-May-2009 |
jym | Sync with HEAD.
Commit is split, to avoid a "too many arguments" protocol error.
|
1.28.2.2 |
| 17-Aug-2010 |
uebayasi | Sync with HEAD.
|
1.28.2.1 |
| 30-Apr-2010 |
uebayasi | Sync with HEAD.
|
1.29.2.11 |
| 31-May-2011 |
rmind | sync with head
|
1.29.2.10 |
| 19-May-2011 |
rmind | Implement sharing of vnode_t::v_interlock amongst vnodes: - Lock is shared amongst UVM objects using uvm_obj_setlock() or getnewvnode(). - Adjust vnode cache to handle unsharing, add VI_LOCKSHARE flag for that. - Use sharing in tmpfs and layerfs for underlying object. - Simplify locking in ubc_fault(). - Sprinkle some asserts.
Discussed with ad@.
|
1.29.2.9 |
| 17-Mar-2011 |
rmind | - Fix tlbflushg() to behave like tlbflush(), if page global extension (PGE) is not (yet) enabled. This fixes the issue of stale TLB entry, experienced early on boot, when PGE is not yet set on primary CPU. - Rewrite i386/amd64 TLB interrupt handlers in C (only stubs are in assembly), which simplifies and unifies (under x86) code, plus fixes few bugs. - cpu_attach: remove assignment to cpus_running, as primary CPU might not be attached first, which causes reset (and thus missed secondary CPUs).
|
1.29.2.8 |
| 08-Mar-2011 |
rmind | struct pmap_tlb_mailbox: make tm_pending and tm_gen volatile.
|
1.29.2.7 |
| 05-Mar-2011 |
rmind | sync with head
|
1.29.2.6 |
| 31-May-2010 |
rmind | - Split off Xen versions of pmap_map_ptes/pmap_unmap_ptes into Xen pmap, also move pmap_apte_flush() with pmap_unmap_apdp() there. - Make Xen buildable.
|
1.29.2.5 |
| 30-May-2010 |
rmind | sync with head
|
1.29.2.4 |
| 26-May-2010 |
rmind | Split x86 TLB shootdown code into a separate file. Code part is under TNF license, as per pmap.c 1.105.2.4 revision.
|
1.29.2.3 |
| 26-Apr-2010 |
rmind | Partly rewrite amd64 TLB shutdown handler for the changes in x86 pmap. At this point, branch seems to pass preliminar stress tests on amd64.
|
1.29.2.2 |
| 26-Apr-2010 |
rmind | Apply renovated patch to significantly reduce TLB shootdowns in x86 pmap, also provide TLBSTATS option to measure and track TLB shootdowns. Details:
http://mail-index.netbsd.org/port-i386/2009/01/11/msg001018.html
Patch from Andrew Doran, proposed on tech-x86 [sic], in January 2009.
XXX: amd64 and xen are not yet; work in progress.
|
1.29.2.1 |
| 16-Mar-2010 |
rmind | Change struct uvm_object::vmobjlock to be dynamically allocated with mutex_obj_alloc(). It allows us to share the locks among UVM objects.
|
1.33.4.2 |
| 17-Feb-2011 |
bouyer | Sync with HEAD
|
1.33.4.1 |
| 08-Feb-2011 |
bouyer | Sync with HEAD
|
1.33.2.1 |
| 06-Jun-2011 |
jruoho | Sync with HEAD.
|
1.38.2.3 |
| 20-Sep-2011 |
cherry | Remove the "xpq lock", since we have per-cpu mmu queues now. This may need further testing. Also add some preliminary locking around queue-ops in the network backend driver
|
1.38.2.2 |
| 23-Jun-2011 |
cherry | Catchup with rmind-uvmplock merge.
|
1.38.2.1 |
| 03-Jun-2011 |
cherry | Initial import of xen MP sources, with kernel and userspace tests. - this is a source priview. - boots to single user. - spurious interrupt and pmap related panics are normal
|
1.43.2.6 |
| 22-May-2014 |
yamt | sync with head.
for a reference, the tree before this commit was tagged as yamt-pagecache-tag8.
this commit was splitted into small chunks to avoid a limitation of cvs. ("Protocol error: too many arguments")
|
1.43.2.5 |
| 16-Jan-2013 |
yamt | sync with (a bit old) head
|
1.43.2.4 |
| 23-May-2012 |
yamt | sync with head.
|
1.43.2.3 |
| 17-Apr-2012 |
yamt | sync with head
|
1.43.2.2 |
| 18-Nov-2011 |
yamt | share a lock among pmap uobjs
|
1.43.2.1 |
| 10-Nov-2011 |
yamt | sync with head
|
1.48.2.3 |
| 29-Apr-2012 |
mrg | sync to latest -current.
|
1.48.2.2 |
| 05-Apr-2012 |
mrg | sync to latest -current.
|
1.48.2.1 |
| 18-Feb-2012 |
mrg | merge to -current.
|
1.49.2.3 |
| 06-Mar-2017 |
snj | Pull up following revision(s) (requested by bouyer in ticket #1441): sys/arch/x86/x86/pmap.c: revision 1.241 via patch sys/arch/x86/include/pmap.h: revision 1.63 via patch Should be PG_k, doesn't change anything. -- Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege separation between the kernel and userland. On Xen-amd64, the kernel runs in ring3 just like userland, and the separation is guaranteed by the hypervisor - each syscall/trap is intercepted by Xen and sent manually to the kernel. Before that, the hypervisor modifies the page tables so that the kernel becomes accessible. Later, when returning to userland, the hypervisor removes the kernel pages and flushes the TLB. However, TLB flushes are costly, and in order to reduce the number of pages flushed Xen marks the userland pages as global, while keeping the kernel ones as local. This way, when returning to userland, only the kernel pages get flushed - which makes sense since they are the only ones that got removed from the mapping. Xen differentiates the userland pages by looking at their PG_u bit in the PTE; if a page has this bit then Xen tags it as global, otherwise Xen manually adds the bit but keeps the page as local. The thing is, since we set PG_u in the kernel pages, Xen believes our kernel pages are in fact userland pages, so it marks them as global. Therefore, when returning to userland, the kernel pages indeed get removed from the page tree, but are not flushed from the TLB. Which means that they are still accessible. With this - and depending on the DTLB size - userland has a small window where it can read/write to the last kernel pages accessed, which is enough to completely escalate privileges: the sysent structure systematically gets read when performing a syscall, and chances are that it will still be cached in the TLB. Userland can then use this to patch a chosen syscall, make it point to a userland function, retrieve %gs and compute the address of its credentials, and finally grant itself root privileges.
|
1.49.2.2 |
| 09-May-2012 |
riz | branches: 1.49.2.2.4; 1.49.2.2.6; Pull up following revision(s) (requested by rmind in ticket #202): sys/arch/x86/include/cpuvar.h: revision 1.46 sys/arch/xen/include/xenpmap.h: revision 1.34 sys/arch/i386/include/param.h: revision 1.77 sys/arch/x86/x86/pmap_tlb.c: revision 1.5 sys/arch/x86/x86/pmap_tlb.c: revision 1.6 sys/arch/i386/i386/genassym.cf: revision 1.92 sys/arch/xen/x86/cpu.c: revision 1.91 sys/arch/x86/x86/pmap.c: revision 1.177 sys/arch/xen/x86/xen_pmap.c: revision 1.21 sys/arch/x86/acpi/acpi_wakeup.c: revision 1.31 sys/kern/subr_kcpuset.c: revision 1.5 sys/arch/amd64/include/param.h: revision 1.18 sys/sys/kcpuset.h: revision 1.5 sys/arch/x86/x86/mtrr_i686.c: revision 1.26 sys/arch/x86/x86/mtrr_i686.c: revision 1.27 sys/arch/xen/x86/x86_xpmap.c: revision 1.43 sys/arch/x86/x86/cpu.c: revision 1.98 sys/arch/amd64/amd64/mptramp.S: revision 1.14 sys/kern/sys_sched.c: revision 1.42 sys/arch/amd64/amd64/genassym.cf: revision 1.50 sys/arch/i386/i386/mptramp.S: revision 1.24 sys/arch/x86/include/pmap.h: revision 1.52 sys/arch/x86/include/cpu.h: revision 1.50 - Convert x86 MD code, mainly pmap(9) e.g. TLB shootdown code, to use kcpuset(9) and thus replace hardcoded CPU bitmasks. This removes the limitation of maximum CPUs. - Support up to 256 CPUs on amd64 architecture by default. Bug fixes, improvements, completion of Xen part and testing on 64-core AMD Opteron(tm) Processor 6282 SE (also, as Xen HVM domU with 128 CPUs) by Manuel Bouyer. - pmap_tlb_shootdown: do not overwrite tp_cpumask with pm_cpus, but merge like pm_kernel_cpus. Remove unecessary intersection with kcpuset_running. Do not reset tp_userpmap if pmap_kernel(). - Remove pmap_tlb_mailbox_t wrapping, which is pointless after recent changes. - pmap_tlb_invalidate, pmap_tlb_intr: constify for packet structure. i686_mtrr_init_first: handle the case when there are no variable-size MTRR registers available (i686_mtrr_vcnt == 0).
|
1.49.2.1 |
| 22-Feb-2012 |
riz | Pull up following revision(s) (requested by bouyer in ticket #29): sys/arch/xen/x86/x86_xpmap.c: revision 1.39 sys/arch/xen/include/hypervisor.h: revision 1.37 sys/arch/xen/include/intr.h: revision 1.34 sys/arch/xen/x86/xen_ipi.c: revision 1.10 sys/arch/x86/x86/cpu.c: revision 1.97 sys/arch/x86/include/cpu.h: revision 1.48 sys/uvm/uvm_map.c: revision 1.315 sys/arch/x86/x86/pmap.c: revision 1.165 sys/arch/xen/x86/cpu.c: revision 1.81 sys/arch/x86/x86/pmap.c: revision 1.167 sys/arch/xen/x86/cpu.c: revision 1.82 sys/arch/x86/x86/pmap.c: revision 1.168 sys/arch/xen/x86/xen_pmap.c: revision 1.17 sys/uvm/uvm_km.c: revision 1.122 sys/uvm/uvm_kmguard.c: revision 1.10 sys/arch/x86/include/pmap.h: revision 1.50 Apply patch proposed in PR port-xen/45975 (this does not solve the exact problem reported here but is part of the solution): xen_kpm_sync() is not working as expected, leading to races between CPUs. 1 the check (xpq_cpu != &x86_curcpu) is always false because we have different x86_curcpu symbols with different addresses in the kernel. Fortunably, all addresses dissaemble to the same code. Because of this we always use the code intended for bootstrap, which doesn't use cross-calls or lock. 2 once 1 above is fixed, xen_kpm_sync() will use xcalls to sync other CPUs, which cause it to sleep and pmap.c doesn't like that. It triggers this KASSERT() in pmap_unmap_ptes(): KASSERT(pmap->pm_ncsw == curlwp->l_ncsw); 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which needs to know on which CPU a pmap is loaded *now*: pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch to a new pmap, leaving a window where a pmap is still in a CPU's ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted by the hypervisor at any time, it can be large enough to let another CPU free the PTP and reuse it as a normal page. To fix 2), avoid cross-calls and IPIs completely, and instead use a mutex to update all CPU's ci_kpm_pdir from the local CPU. It's safe because we just need to update the table page, a tlbflush IPI will happen later. As a side effect, we don't need a different code for bootstrap, fixing 1). The mutex added to struct cpu needs a small headers reorganisation. to fix 3), introduce a pm_xen_ptp_cpus which is updated from cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir. While there I removed the unused pmap_is_active() function; and added some more details to DIAGNOSTIC panics. When using uvm_km_pgremove_intrsafe() make sure mappings are removed before returning the pages to the free pool. Otherwise, under Xen, a page which still has a writable mapping could be allocated for a PDP by another CPU and the hypervisor would refuse it (this is PR port-xen/45975). For this, move the pmap_kremove() calls inside uvm_km_pgremove_intrsafe(), and do pmap_kremove()/uvm_pagefree() in batch of (at most) 16 entries (as suggested by Chuck Silvers on tech-kern@, see also http://mail-index.netbsd.org/tech-kern/2012/02/17/msg012727.html and followups). Avoid early use of xen_kpm_sync(); locks are not available at this time. Don't call cpu_init() twice. Makes LOCKDEBUG kernels boot again Revert pmap_pte_flush() -> xpq_flush_queue() in previous.
|
1.49.2.2.6.1 |
| 06-Mar-2017 |
snj | Pull up following revision(s) (requested by bouyer in ticket #1441): sys/arch/x86/x86/pmap.c: revision 1.241 via patch sys/arch/x86/include/pmap.h: revision 1.63 via patch Should be PG_k, doesn't change anything. -- Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege separation between the kernel and userland. On Xen-amd64, the kernel runs in ring3 just like userland, and the separation is guaranteed by the hypervisor - each syscall/trap is intercepted by Xen and sent manually to the kernel. Before that, the hypervisor modifies the page tables so that the kernel becomes accessible. Later, when returning to userland, the hypervisor removes the kernel pages and flushes the TLB. However, TLB flushes are costly, and in order to reduce the number of pages flushed Xen marks the userland pages as global, while keeping the kernel ones as local. This way, when returning to userland, only the kernel pages get flushed - which makes sense since they are the only ones that got removed from the mapping. Xen differentiates the userland pages by looking at their PG_u bit in the PTE; if a page has this bit then Xen tags it as global, otherwise Xen manually adds the bit but keeps the page as local. The thing is, since we set PG_u in the kernel pages, Xen believes our kernel pages are in fact userland pages, so it marks them as global. Therefore, when returning to userland, the kernel pages indeed get removed from the page tree, but are not flushed from the TLB. Which means that they are still accessible. With this - and depending on the DTLB size - userland has a small window where it can read/write to the last kernel pages accessed, which is enough to completely escalate privileges: the sysent structure systematically gets read when performing a syscall, and chances are that it will still be cached in the TLB. Userland can then use this to patch a chosen syscall, make it point to a userland function, retrieve %gs and compute the address of its credentials, and finally grant itself root privileges.
|
1.49.2.2.4.1 |
| 06-Mar-2017 |
snj | Pull up following revision(s) (requested by bouyer in ticket #1441): sys/arch/x86/x86/pmap.c: revision 1.241 via patch sys/arch/x86/include/pmap.h: revision 1.63 via patch Should be PG_k, doesn't change anything. -- Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege separation between the kernel and userland. On Xen-amd64, the kernel runs in ring3 just like userland, and the separation is guaranteed by the hypervisor - each syscall/trap is intercepted by Xen and sent manually to the kernel. Before that, the hypervisor modifies the page tables so that the kernel becomes accessible. Later, when returning to userland, the hypervisor removes the kernel pages and flushes the TLB. However, TLB flushes are costly, and in order to reduce the number of pages flushed Xen marks the userland pages as global, while keeping the kernel ones as local. This way, when returning to userland, only the kernel pages get flushed - which makes sense since they are the only ones that got removed from the mapping. Xen differentiates the userland pages by looking at their PG_u bit in the PTE; if a page has this bit then Xen tags it as global, otherwise Xen manually adds the bit but keeps the page as local. The thing is, since we set PG_u in the kernel pages, Xen believes our kernel pages are in fact userland pages, so it marks them as global. Therefore, when returning to userland, the kernel pages indeed get removed from the page tree, but are not flushed from the TLB. Which means that they are still accessible. With this - and depending on the DTLB size - userland has a small window where it can read/write to the last kernel pages accessed, which is enough to completely escalate privileges: the sysent structure systematically gets read when performing a syscall, and chances are that it will still be cached in the TLB. Userland can then use this to patch a chosen syscall, make it point to a userland function, retrieve %gs and compute the address of its credentials, and finally grant itself root privileges.
|
1.52.2.3 |
| 03-Dec-2017 |
jdolecek | update from HEAD
|
1.52.2.2 |
| 20-Aug-2014 |
tls | Rebase to HEAD as of a few days ago.
|
1.52.2.1 |
| 20-Nov-2012 |
tls | Resync to 2012-11-19 00:00:00 UTC
|
1.54.2.1 |
| 18-May-2014 |
rmind | sync with head
|
1.55.6.6 |
| 28-Aug-2017 |
skrll | Sync with HEAD
|
1.55.6.5 |
| 05-Dec-2016 |
skrll | Sync with HEAD
|
1.55.6.4 |
| 05-Oct-2016 |
skrll | Sync with HEAD
|
1.55.6.3 |
| 09-Jul-2016 |
skrll | Sync with HEAD
|
1.55.6.2 |
| 27-Dec-2015 |
skrll | Sync with HEAD (as of 26th Dec)
|
1.55.6.1 |
| 06-Apr-2015 |
skrll | Sync with HEAD
|
1.55.4.3 |
| 06-Mar-2017 |
snj | Pull up following revision(s) (requested by bouyer in ticket #1388): sys/arch/x86/x86/pmap.c: revision 1.241 Should be PG_k, doesn't change anything. -- Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege separation between the kernel and userland. On Xen-amd64, the kernel runs in ring3 just like userland, and the separation is guaranteed by the hypervisor - each syscall/trap is intercepted by Xen and sent manually to the kernel. Before that, the hypervisor modifies the page tables so that the kernel becomes accessible. Later, when returning to userland, the hypervisor removes the kernel pages and flushes the TLB. However, TLB flushes are costly, and in order to reduce the number of pages flushed Xen marks the userland pages as global, while keeping the kernel ones as local. This way, when returning to userland, only the kernel pages get flushed - which makes sense since they are the only ones that got removed from the mapping. Xen differentiates the userland pages by looking at their PG_u bit in the PTE; if a page has this bit then Xen tags it as global, otherwise Xen manually adds the bit but keeps the page as local. The thing is, since we set PG_u in the kernel pages, Xen believes our kernel pages are in fact userland pages, so it marks them as global. Therefore, when returning to userland, the kernel pages indeed get removed from the page tree, but are not flushed from the TLB. Which means that they are still accessible. With this - and depending on the DTLB size - userland has a small window where it can read/write to the last kernel pages accessed, which is enough to completely escalate privileges: the sysent structure systematically gets read when performing a syscall, and chances are that it will still be cached in the TLB. Userland can then use this to patch a chosen syscall, make it point to a userland function, retrieve %gs and compute the address of its credentials, and finally grant itself root privileges.
|
1.55.4.2 |
| 18-Dec-2016 |
snj | Pull up following revision(s) (requested by riastradh in ticket #1316): sys/arch/x86/x86/pmap.c: revision 1.223 sys/arch/x86/x86/vm_machdep.c: revision 1.26 sys/arch/x86/include/pmap.h: revision 1.61 PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present. XXX: pullup-7
|
1.55.4.1 |
| 23-Apr-2015 |
snj | branches: 1.55.4.1.2; 1.55.4.1.4; Pull up following revision(s) (requested by mrg in ticket #718): sys/arch/x86/include/pmap.h: revision 1.56 sys/arch/x86/x86/pmap.c: revision 1.188 sys/dev/pci/agp_amd64.c: revision 1.8 sys/dev/pci/agp_i810.c: revision 1.118 sys/external/bsd/drm2/dist/drm/i915/i915_dma.c: revision 1.16 sys/external/bsd/drm2/dist/drm/i915/i915_gem.c: revision 1.29 sys/external/bsd/drm2/dist/drm/nouveau/nouveau_agp.c: revision 1.3 sys/external/bsd/drm2/dist/drm/nouveau/nouveau_ttm.c: revision 1.4 sys/external/bsd/drm2/dist/drm/radeon/atombios_crtc.c: revision 1.3 sys/external/bsd/drm2/dist/drm/radeon/radeon_agp.c: revision 1.3 sys/external/bsd/drm2/dist/drm/radeon/radeon_display.c: revision 1.3 sys/external/bsd/drm2/dist/drm/radeon/radeon_legacy_crtc.c: revision 1.2 sys/external/bsd/drm2/dist/drm/radeon/radeon_object.c: revision 1.3 sys/external/bsd/drm2/dist/drm/radeon/radeon_ttm.c: revision 1.7 sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c: revisions 1.7-1.10 sys/external/bsd/drm2/dist/drm/ttm/ttm_bo_util.c: revision 1.5 sys/external/bsd/drm2/i915drm/intelfb.c: revision 1.13 sys/external/bsd/drm2/include/drm/drm_wait_netbsd.h: revisions 1.12, 1.13 sys/external/bsd/drm2/include/linux/mm.h: revision 1.5 sys/external/bsd/drm2/include/linux/pci.h: revisions 1.16, 1.17 sys/external/bsd/drm2/nouveau/nouveaufb.c: revision 1.2 sys/external/bsd/drm2/radeon/radeon_pci.c: revisions 1.8, 1.9 sys/uvm/uvm_init.c: revision 1.46 Hack against the blank console problem: Leave the CLUT alone on ancient cards. At least this leaves us with a semi working console (red and blue are flipped). Leave an example of what seems to be happening but disable it because colors are better than 444 bit greyscale. -- Initialize P->V tracking for unmanaged device pages in uvm_init.
Conditional on __HAVE_PMAP_PV_TRACK until we add it to all pmaps.
MI part of pmap_pv(9) change proposed on tech-kern:
https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html -- Implement pmap_pv(9) for x86 for P->V tracking of unmanaged pages.
Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html -- Use pmap_pv(9) to remove mappings of Intel graphics aperture pages.
Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
Further background at:
https://mail-index.netbsd.org/tech-kern/2014/07/23/msg017392.html -- Use pmap_pv(9) to remove mappings of device pages in TTM.
Adapt nouveau and radeon to do pmap_pv_track for their device pages.
Proposed on tech-kern with no objections:
https://mail-index.netbsd.org/tech-kern/2015/03/26/msg018561.html
Further background at:
https://mail-index.netbsd.org/tech-kern/2014/07/23/msg017392.html -- Fix error branches in agp_amd64.c.
- agp_generic_detach always. - Free asc if it was allocated. (Found by Brainy, noted by maxv@.) - Free the GATT if it was allocated. -- pmf_device_register returns false on failure, not true -- In DRM_SPIN_WAIT_ON, don't stop after waiting only one tick.
Continue the loop to recheck the condition and count the whole duration. -- Don't use the video BIOS memory as an i915 flush page! -- Don't let anyone else allocate the video BIOS either. -- Missed a zero: it's 0x100000, not 0x10000. -- Don't reserve if atomic -- caller must have pre-pinned the buffer. -- Don't reserve if atomic -- caller must have pre-pinned the buffer. -- almost add radeondrmkms suspend/resume support. it unfortunately doesn't work. -- Need the page's uvm object lock to do pmap_page_protect. -- Use KASSERTMSG to show bad base/offset. -- KASSERT about page-alignment on initialization too. -- Don't break when hardclock_ticks wraps around.
Since we now only count time spent in wait, rather than determining the end time and checking whether we've passed it, timeouts might be marginally longer in effect. Unlikely to be an issue. -- Remove broken drm2 vm_mmap stub. Can't possibly have ever worked. -- apply some of the additional changes from Arto Huusko in PR#49645: - call pmf_device_deregister on detach.
i've kept the "resume = true" for radeon_resume_kms() call as it seems to work for me (indeed, code inspection shows it is unused on netbsd :-)
my old nforce4 box that can resume old drm (or could, last i tried several years ago) while X and GL apps were running, can at least survive a resume if X hasn't started. my one attempt so far with X exited, but having run, did not work. -- First attempt to make ttm_buffer_object_transfer less bogus. -- Make sure mem.bus.is_iomem is initialized. PR 49833
|
1.55.4.1.4.2 |
| 13-Mar-2017 |
skrll | Sync with netbsd-7-1-RELEASE
|
1.55.4.1.4.1 |
| 18-Jan-2017 |
skrll | Sync with netbsd-5
|
1.55.4.1.2.2 |
| 06-Mar-2017 |
snj | Pull up following revision(s) (requested by bouyer in ticket #1388): sys/arch/x86/include/pmap.h: revision 1.63 via patch sys/arch/x86/x86/pmap.c: revision 1.241 via patch Should be PG_k, doesn't change anything. -- Remove PG_u from the kernel pages on Xen. Otherwise there is no privilege separation between the kernel and userland. On Xen-amd64, the kernel runs in ring3 just like userland, and the separation is guaranteed by the hypervisor - each syscall/trap is intercepted by Xen and sent manually to the kernel. Before that, the hypervisor modifies the page tables so that the kernel becomes accessible. Later, when returning to userland, the hypervisor removes the kernel pages and flushes the TLB. However, TLB flushes are costly, and in order to reduce the number of pages flushed Xen marks the userland pages as global, while keeping the kernel ones as local. This way, when returning to userland, only the kernel pages get flushed - which makes sense since they are the only ones that got removed from the mapping. Xen differentiates the userland pages by looking at their PG_u bit in the PTE; if a page has this bit then Xen tags it as global, otherwise Xen manually adds the bit but keeps the page as local. The thing is, since we set PG_u in the kernel pages, Xen believes our kernel pages are in fact userland pages, so it marks them as global. Therefore, when returning to userland, the kernel pages indeed get removed from the page tree, but are not flushed from the TLB. Which means that they are still accessible. With this - and depending on the DTLB size - userland has a small window where it can read/write to the last kernel pages accessed, which is enough to completely escalate privileges: the sysent structure systematically gets read when performing a syscall, and chances are that it will still be cached in the TLB. Userland can then use this to patch a chosen syscall, make it point to a userland function, retrieve %gs and compute the address of its credentials, and finally grant itself root privileges.
|
1.55.4.1.2.1 |
| 18-Dec-2016 |
snj | Pull up following revision(s) (requested by riastradh in ticket #1316): sys/arch/x86/x86/pmap.c: revision 1.223 sys/arch/x86/x86/vm_machdep.c: revision 1.26 sys/arch/x86/include/pmap.h: revision 1.61 PR/49691: KAMADA Ken'ichi: free deferred ptp mappings if present. XXX: pullup-7
|
1.58.2.5 |
| 26-Apr-2017 |
pgoyette | Sync with HEAD
|
1.58.2.4 |
| 20-Mar-2017 |
pgoyette | Sync with HEAD
|
1.58.2.3 |
| 07-Jan-2017 |
pgoyette | Sync with HEAD. (Note that most of these changes are simply $NetBSD$ tag issues.)
|
1.58.2.2 |
| 04-Nov-2016 |
pgoyette | Sync with HEAD
|
1.58.2.1 |
| 26-Jul-2016 |
pgoyette | Sync with HEAD
|
1.61.2.1 |
| 21-Apr-2017 |
bouyer | Sync with HEAD
|
1.64.6.2 |
| 22-Mar-2018 |
martin | Pull up the following revisions, requested by maxv in ticket #652:
sys/arch/amd64/amd64/amd64_trap.S upto 1.39 (partial, patch) sys/arch/amd64/amd64/db_machdep.c 1.6 (patch) sys/arch/amd64/amd64/genassym.cf 1.65,1.66,1.67 (patch) sys/arch/amd64/amd64/locore.S upto 1.159 (partial, patch) sys/arch/amd64/amd64/machdep.c 1.299-1.302 (patch) sys/arch/amd64/amd64/trap.c upto 1.113 (partial, patch) sys/arch/amd64/amd64/amd64/vector.S upto 1.61 (partial, patch) sys/arch/amd64/conf/GENERIC 1.477,1.478 (patch) sys/arch/amd64/conf/kern.ldscript 1.26 (patch) sys/arch/amd64/include/frameasm.h upto 1.37 (partial, patch) sys/arch/amd64/include/param.h 1.25 (patch) sys/arch/amd64/include/pmap.h 1.41,1.43,1.44 (patch) sys/arch/x86/conf/files.x86 1.91,1.93 (patch) sys/arch/x86/include/cpu.h 1.88,1.89 (patch) sys/arch/x86/include/pmap.h 1.75 (patch) sys/arch/x86/x86/cpu.c 1.144,1.146,1.148,1.149 (patch) sys/arch/x86/x86/pmap.c upto 1.289 (partial, patch) sys/arch/x86/x86/vm_machdep.c 1.31,1.32 (patch) sys/arch/x86/x86/x86_machdep.c 1.104,1.106,1.108 (patch) sys/arch/x86/x86/svs.c 1.1-1.14 sys/arch/xen/conf/files.compat 1.30 (patch)
Backport SVS. Not enabled yet.
|
1.64.6.1 |
| 16-Mar-2018 |
martin | Pull up the following revisions (via patch), requested by maxv in #635:
sys/arch/amd64/amd64/gdt.c 1.39-1.45 (patch) sys/arch/amd64/amd64/amd64/machdep.c 1.284,1.287,1.288 (patch) sys/arch/amd64/amd64/include/param.h 1.23 (patch) sys/arch/amd64/include/types.h 1.53 (patch) sys/arch/x86/include/cpu.h 1.87 (patch) sys/arch/x86/include/pmap.h 1.73,1.74 (patch) sys/arch/x86/x86/cpu.c 1.142 (patch) sys/arch/x86/x86/intr.c 1.117 (partial),1.120 (patch) sys/arch/x86/x86/pmap.c 1.276 (patch)
Initialize ist0 in cpu_init_tss. Backport __HAVE_PCPU_AREA.
|
1.76.2.6 |
| 26-Dec-2018 |
pgoyette | Sync with HEAD, resolve a few conflicts
|
1.76.2.5 |
| 26-Nov-2018 |
pgoyette | Sync with HEAD, resolve a couple of conflicts
|
1.76.2.4 |
| 06-Sep-2018 |
pgoyette | Sync with HEAD
Resolve a couple of conflicts (result of the uimin/uimax changes)
|
1.76.2.3 |
| 28-Jul-2018 |
pgoyette | Sync with HEAD
|
1.76.2.2 |
| 25-Jun-2018 |
pgoyette | Sync with HEAD
|
1.76.2.1 |
| 21-May-2018 |
pgoyette | Sync with HEAD
|
1.80.2.3 |
| 13-Apr-2020 |
martin | Mostly merge changes from HEAD upto 20200411
|
1.80.2.2 |
| 08-Apr-2020 |
martin | Merge changes from current as of 20200406
|
1.80.2.1 |
| 10-Jun-2019 |
christos | Sync with HEAD
|
1.101.2.1 |
| 31-May-2020 |
martin | Pull up following revision(s) (requested by bouyer in ticket #935):
sys/arch/xen/x86/x86_xpmap.c: revision 1.89 sys/arch/x86/include/pmap.h: revision 1.121 sys/arch/xen/xen/privcmd.c: revision 1.58 sys/external/mit/xen-include-public/dist/xen/include/public/memory.h: revision 1.2 sys/arch/xen/include/xenpmap.h: revision 1.44 sys/arch/xen/include/xenio.h: revision 1.12 sys/arch/x86/x86/pmap.c: revision 1.394 (all via patch)
Ajust pmap_enter_ma() for upcoming new Xen privcmd ioctl: pass flags to xpq_update_foreign()
Introduce a pmap MD flag: PMAP_MD_XEN_NOTR, which cause xpq_update_foreign() to use the MMU_PT_UPDATE_NO_TRANSLATE flag. make xpq_update_foreign() return the raw Xen error. This will cause pmap_enter_ma() to return a negative error number in this case, but the only user of this code path is privcmd.c and it can deal with it.
Add pmap_enter_gnt()m which maps a set of Xen grant entries at the specified va in the specified pmap. Use the hooks implemented for EPT to keep track of mapped grand entries in the pmap, and unmap them when pmap_remove() is called. This requires pmap_remove() to be split into a pmap_remove_locked(), to be called from pmap_remove_gnt().
Implement new ioctl, needed by Xen 4.13: IOCTL_PRIVCMD_MMAPBATCH_V2 IOCTL_PRIVCMD_MMAP_RESOURCE IOCTL_GNTDEV_MMAP_GRANT_REF IOCTL_GNTDEV_ALLOC_GRANT_REF
Always enable declarations needed by privcmd.c
|
1.108.2.2 |
| 29-Feb-2020 |
ad | Sync with head.
|
1.108.2.1 |
| 17-Jan-2020 |
ad | Sync with head.
|
1.117.2.1 |
| 25-Apr-2020 |
bouyer | Sync with bouyer-xenpvh-base2 (HEAD)
|
1.125.6.1 |
| 13-May-2021 |
thorpej | Sync with HEAD.
|