Skip to content

Commit

Permalink
mm: ptep_get() conversion
Browse files Browse the repository at this point in the history
Convert all instances of direct pte_t* dereferencing to instead use
ptep_get() helper.  This means that by default, the accesses change from a
C dereference to a READ_ONCE().  This is technically the correct thing to
do since where pgtables are modified by HW (for access/dirty) they are
volatile and therefore we should always ensure READ_ONCE() semantics.

But more importantly, by always using the helper, it can be overridden by
the architecture to fully encapsulate the contents of the pte.  Arch code
is deliberately not converted, as the arch code knows best.  It is
intended that arch code (arm64) will override the default with its own
implementation that can (e.g.) hide certain bits from the core code, or
determine young/dirty status by mixing in state from another source.

Conversion was done using Coccinelle:

----

// $ make coccicheck \
//          COCCI=ptepget.cocci \
//          SPFLAGS="--include-headers" \
//          MODE=patch

virtual patch

@ depends on patch @
pte_t *v;
@@

- *v
+ ptep_get(v)

----

Then reviewed and hand-edited to avoid multiple unnecessary calls to
ptep_get(), instead opting to store the result of a single call in a
variable, where it is correct to do so.  This aims to negate any cost of
READ_ONCE() and will benefit arch-overrides that may be more complex.

Included is a fix for an issue in an earlier version of this patch that
was pointed out by kernel test robot.  The issue arose because config
MMU=n elides definition of the ptep helper functions, including
ptep_get().  HUGETLB_PAGE=n configs still define a simple
huge_ptep_clear_flush() for linking purposes, which dereferences the ptep.
So when both configs are disabled, this caused a build error because
ptep_get() is not defined.  Fix by continuing to do a direct dereference
when MMU=n.  This is safe because for this config the arch code cannot be
trying to virtualize the ptes because none of the ptep helpers are
defined.

Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
  • Loading branch information
Ryan Roberts authored and akpm00 committed Jun 19, 2023
1 parent 6c1d2a0 commit c33c794
Show file tree
Hide file tree
Showing 47 changed files with 301 additions and 228 deletions.
8 changes: 6 additions & 2 deletions drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
Original file line number Diff line number Diff line change
Expand Up @@ -1681,7 +1681,9 @@ static int igt_mmap_gpu(void *arg)

static int check_present_pte(pte_t *pte, unsigned long addr, void *data)
{
if (!pte_present(*pte) || pte_none(*pte)) {
pte_t ptent = ptep_get(pte);

if (!pte_present(ptent) || pte_none(ptent)) {
pr_err("missing PTE:%lx\n",
(addr - (unsigned long)data) >> PAGE_SHIFT);
return -EINVAL;
Expand All @@ -1692,7 +1694,9 @@ static int check_present_pte(pte_t *pte, unsigned long addr, void *data)

static int check_absent_pte(pte_t *pte, unsigned long addr, void *data)
{
if (pte_present(*pte) && !pte_none(*pte)) {
pte_t ptent = ptep_get(pte);

if (pte_present(ptent) && !pte_none(ptent)) {
pr_err("present PTE:%lx; expected to be revoked\n",
(addr - (unsigned long)data) >> PAGE_SHIFT);
return -EINVAL;
Expand Down
2 changes: 1 addition & 1 deletion drivers/misc/sgi-gru/grufault.c
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ static int atomic_pte_lookup(struct vm_area_struct *vma, unsigned long vaddr,
goto err;
#ifdef CONFIG_X86_64
if (unlikely(pmd_large(*pmdp)))
pte = *(pte_t *) pmdp;
pte = ptep_get((pte_t *)pmdp);
else
#endif
pte = *pte_offset_kernel(pmdp, vaddr);
Expand Down
7 changes: 5 additions & 2 deletions drivers/vfio/vfio_iommu_type1.c
Original file line number Diff line number Diff line change
Expand Up @@ -514,6 +514,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
bool write_fault)
{
pte_t *ptep;
pte_t pte;
spinlock_t *ptl;
int ret;

Expand All @@ -536,10 +537,12 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
return ret;
}

if (write_fault && !pte_write(*ptep))
pte = ptep_get(ptep);

if (write_fault && !pte_write(pte))
ret = -EFAULT;
else
*pfn = pte_pfn(*ptep);
*pfn = pte_pfn(pte);

pte_unmap_unlock(ptep, ptl);
return ret;
Expand Down
2 changes: 1 addition & 1 deletion drivers/xen/privcmd.c
Original file line number Diff line number Diff line change
Expand Up @@ -949,7 +949,7 @@ static int privcmd_mmap(struct file *file, struct vm_area_struct *vma)
*/
static int is_mapped_fn(pte_t *pte, unsigned long addr, void *data)
{
return pte_none(*pte) ? 0 : -EBUSY;
return pte_none(ptep_get(pte)) ? 0 : -EBUSY;
}

static int privcmd_vma_range_is_mapped(
Expand Down
33 changes: 18 additions & 15 deletions fs/proc/task_mmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -538,13 +538,14 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
bool locked = !!(vma->vm_flags & VM_LOCKED);
struct page *page = NULL;
bool migration = false, young = false, dirty = false;
pte_t ptent = ptep_get(pte);

if (pte_present(*pte)) {
page = vm_normal_page(vma, addr, *pte);
young = pte_young(*pte);
dirty = pte_dirty(*pte);
} else if (is_swap_pte(*pte)) {
swp_entry_t swpent = pte_to_swp_entry(*pte);
if (pte_present(ptent)) {
page = vm_normal_page(vma, addr, ptent);
young = pte_young(ptent);
dirty = pte_dirty(ptent);
} else if (is_swap_pte(ptent)) {
swp_entry_t swpent = pte_to_swp_entry(ptent);

if (!non_swap_entry(swpent)) {
int mapcount;
Expand Down Expand Up @@ -732,11 +733,12 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
struct mem_size_stats *mss = walk->private;
struct vm_area_struct *vma = walk->vma;
struct page *page = NULL;
pte_t ptent = ptep_get(pte);

if (pte_present(*pte)) {
page = vm_normal_page(vma, addr, *pte);
} else if (is_swap_pte(*pte)) {
swp_entry_t swpent = pte_to_swp_entry(*pte);
if (pte_present(ptent)) {
page = vm_normal_page(vma, addr, ptent);
} else if (is_swap_pte(ptent)) {
swp_entry_t swpent = pte_to_swp_entry(ptent);

if (is_pfn_swap_entry(swpent))
page = pfn_swap_entry_to_page(swpent);
Expand Down Expand Up @@ -1105,7 +1107,7 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma,
* Documentation/admin-guide/mm/soft-dirty.rst for full description
* of how soft-dirty works.
*/
pte_t ptent = *pte;
pte_t ptent = ptep_get(pte);

if (pte_present(ptent)) {
pte_t old_pte;
Expand Down Expand Up @@ -1194,7 +1196,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
return 0;
}
for (; addr != end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
ptent = ptep_get(pte);

if (cp->type == CLEAR_REFS_SOFT_DIRTY) {
clear_soft_dirty(vma, addr, pte);
Expand Down Expand Up @@ -1550,7 +1552,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
for (; addr < end; pte++, addr += PAGE_SIZE) {
pagemap_entry_t pme;

pme = pte_to_pagemap_entry(pm, vma, addr, *pte);
pme = pte_to_pagemap_entry(pm, vma, addr, ptep_get(pte));
err = add_to_pagemap(addr, &pme, pm);
if (err)
break;
Expand Down Expand Up @@ -1893,10 +1895,11 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
return 0;
}
do {
struct page *page = can_gather_numa_stats(*pte, vma, addr);
pte_t ptent = ptep_get(pte);
struct page *page = can_gather_numa_stats(ptent, vma, addr);
if (!page)
continue;
gather_stats(page, md, pte_dirty(*pte), 1);
gather_stats(page, md, pte_dirty(ptent), 1);

} while (pte++, addr += PAGE_SIZE, addr != end);
pte_unmap_unlock(orig_pte, ptl);
Expand Down
6 changes: 4 additions & 2 deletions fs/userfaultfd.c
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
pud_t *pud;
pmd_t *pmd, _pmd;
pte_t *pte;
pte_t ptent;
bool ret = true;

mmap_assert_locked(mm);
Expand Down Expand Up @@ -374,9 +375,10 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx,
* changes under us. PTE markers should be handled the same as none
* ptes here.
*/
if (pte_none_mostly(*pte))
ptent = ptep_get(pte);
if (pte_none_mostly(ptent))
ret = true;
if (!pte_write(*pte) && (reason & VM_UFFD_WP))
if (!pte_write(ptent) && (reason & VM_UFFD_WP))
ret = true;
pte_unmap(pte);

Expand Down
4 changes: 4 additions & 0 deletions include/linux/hugetlb.h
Original file line number Diff line number Diff line change
Expand Up @@ -1185,7 +1185,11 @@ static inline void hugetlb_count_sub(long l, struct mm_struct *mm)
static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma,
unsigned long addr, pte_t *ptep)
{
#ifdef CONFIG_MMU
return ptep_get(ptep);
#else
return *ptep;
#endif
}

static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
Expand Down
2 changes: 1 addition & 1 deletion include/linux/mm_inline.h
Original file line number Diff line number Diff line change
Expand Up @@ -555,7 +555,7 @@ pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr,
bool arm_uffd_pte = false;

/* The current status of the pte should be "cleared" before calling */
WARN_ON_ONCE(!pte_none(*pte));
WARN_ON_ONCE(!pte_none(ptep_get(pte)));

/*
* NOTE: userfaultfd_wp_unpopulated() doesn't need this whole
Expand Down
6 changes: 3 additions & 3 deletions include/linux/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
unsigned long address,
pte_t *ptep)
{
pte_t pte = *ptep;
pte_t pte = ptep_get(ptep);
int r = 1;
if (!pte_young(pte))
r = 0;
Expand Down Expand Up @@ -318,7 +318,7 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
unsigned long address,
pte_t *ptep)
{
pte_t pte = *ptep;
pte_t pte = ptep_get(ptep);
pte_clear(mm, address, ptep);
page_table_check_pte_clear(mm, address, pte);
return pte;
Expand Down Expand Up @@ -519,7 +519,7 @@ extern pud_t pudp_huge_clear_flush(struct vm_area_struct *vma,
struct mm_struct;
static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep)
{
pte_t old_pte = *ptep;
pte_t old_pte = ptep_get(ptep);
set_pte_at(mm, address, ptep, pte_wrprotect(old_pte));
}
#endif
Expand Down
2 changes: 1 addition & 1 deletion kernel/events/uprobes.c
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
inc_mm_counter(mm, MM_ANONPAGES);
}

flush_cache_page(vma, addr, pte_pfn(*pvmw.pte));
flush_cache_page(vma, addr, pte_pfn(ptep_get(pvmw.pte)));
ptep_clear_flush_notify(vma, addr, pvmw.pte);
if (new_page)
set_pte_at_notify(mm, addr, pvmw.pte,
Expand Down
2 changes: 1 addition & 1 deletion mm/damon/ops-common.c
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ struct folio *damon_get_folio(unsigned long pfn)

void damon_ptep_mkold(pte_t *pte, struct vm_area_struct *vma, unsigned long addr)
{
struct folio *folio = damon_get_folio(pte_pfn(*pte));
struct folio *folio = damon_get_folio(pte_pfn(ptep_get(pte)));

if (!folio)
return;
Expand Down
2 changes: 1 addition & 1 deletion mm/damon/paddr.c
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ static bool __damon_pa_young(struct folio *folio, struct vm_area_struct *vma,
while (page_vma_mapped_walk(&pvmw)) {
addr = pvmw.address;
if (pvmw.pte) {
*accessed = pte_young(*pvmw.pte) ||
*accessed = pte_young(ptep_get(pvmw.pte)) ||
!folio_test_idle(folio) ||
mmu_notifier_test_young(vma->vm_mm, addr);
} else {
Expand Down
10 changes: 6 additions & 4 deletions mm/damon/vaddr.c
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,7 @@ static int damon_mkold_pmd_entry(pmd_t *pmd, unsigned long addr,
walk->action = ACTION_AGAIN;
return 0;
}
if (!pte_present(*pte))
if (!pte_present(ptep_get(pte)))
goto out;
damon_ptep_mkold(pte, walk->vma, addr);
out:
Expand Down Expand Up @@ -433,6 +433,7 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned long addr,
unsigned long next, struct mm_walk *walk)
{
pte_t *pte;
pte_t ptent;
spinlock_t *ptl;
struct folio *folio;
struct damon_young_walk_private *priv = walk->private;
Expand Down Expand Up @@ -471,12 +472,13 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned long addr,
walk->action = ACTION_AGAIN;
return 0;
}
if (!pte_present(*pte))
ptent = ptep_get(pte);
if (!pte_present(ptent))
goto out;
folio = damon_get_folio(pte_pfn(*pte));
folio = damon_get_folio(pte_pfn(ptent));
if (!folio)
goto out;
if (pte_young(*pte) || !folio_test_idle(folio) ||
if (pte_young(ptent) || !folio_test_idle(folio) ||
mmu_notifier_test_young(walk->mm, addr))
priv->young = true;
*priv->folio_sz = folio_size(folio);
Expand Down
2 changes: 1 addition & 1 deletion mm/filemap.c
Original file line number Diff line number Diff line change
Expand Up @@ -3523,7 +3523,7 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
* handled in the specific fault path, and it'll prohibit the
* fault-around logic.
*/
if (!pte_none(*vmf->pte))
if (!pte_none(ptep_get(vmf->pte)))
goto unlock;

/* We're about to handle the fault */
Expand Down
21 changes: 12 additions & 9 deletions mm/gup.c
Original file line number Diff line number Diff line change
Expand Up @@ -477,13 +477,14 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
pte_t *pte, unsigned int flags)
{
if (flags & FOLL_TOUCH) {
pte_t entry = *pte;
pte_t orig_entry = ptep_get(pte);
pte_t entry = orig_entry;

if (flags & FOLL_WRITE)
entry = pte_mkdirty(entry);
entry = pte_mkyoung(entry);

if (!pte_same(*pte, entry)) {
if (!pte_same(orig_entry, entry)) {
set_pte_at(vma->vm_mm, address, pte, entry);
update_mmu_cache(vma, address, pte);
}
Expand Down Expand Up @@ -549,7 +550,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
if (!ptep)
return no_page_table(vma, flags);
pte = *ptep;
pte = ptep_get(ptep);
if (!pte_present(pte))
goto no_page;
if (pte_protnone(pte) && !gup_can_follow_protnone(flags))
Expand Down Expand Up @@ -821,6 +822,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
pte_t entry;
int ret = -EFAULT;

/* user gate pages are read-only */
Expand All @@ -844,16 +846,17 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
pte = pte_offset_map(pmd, address);
if (!pte)
return -EFAULT;
if (pte_none(*pte))
entry = ptep_get(pte);
if (pte_none(entry))
goto unmap;
*vma = get_gate_vma(mm);
if (!page)
goto out;
*page = vm_normal_page(*vma, address, *pte);
*page = vm_normal_page(*vma, address, entry);
if (!*page) {
if ((gup_flags & FOLL_DUMP) || !is_zero_pfn(pte_pfn(*pte)))
if ((gup_flags & FOLL_DUMP) || !is_zero_pfn(pte_pfn(entry)))
goto unmap;
*page = pte_page(*pte);
*page = pte_page(entry);
}
ret = try_grab_page(*page, gup_flags);
if (unlikely(ret))
Expand Down Expand Up @@ -2496,7 +2499,7 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
}

if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) ||
unlikely(pte_val(pte) != pte_val(*ptep))) {
unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) {
gup_put_folio(folio, 1, flags);
goto pte_unmap;
}
Expand Down Expand Up @@ -2693,7 +2696,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
if (!folio)
return 0;

if (unlikely(pte_val(pte) != pte_val(*ptep))) {
if (unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) {
gup_put_folio(folio, refs, flags);
return 0;
}
Expand Down
Loading

0 comments on commit c33c794

Please sign in to comment.