在 Linux 中遍历进程的页表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8980193/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 04:11:22  来源:igfitidea点击:

Walking page tables of a process in Linux

linuxlinux-kernelkernel

提问by MirkoBanchi

i'm trying to navigate the page tables for a process in linux. In a kernel module i realized the following function:

我正在尝试为 linux 中的进程导航页表。在内核模块中,我实现了以下功能:

static struct page *walk_page_table(unsigned long addr)
{
    pgd_t *pgd;
    pte_t *ptep, pte;
    pud_t *pud;
    pmd_t *pmd;

    struct page *page = NULL;
    struct mm_struct *mm = current->mm;

    pgd = pgd_offset(mm, addr);
    if (pgd_none(*pgd) || pgd_bad(*pgd))
        goto out;
    printk(KERN_NOTICE "Valid pgd");

    pud = pud_offset(pgd, addr);
    if (pud_none(*pud) || pud_bad(*pud))
        goto out;
    printk(KERN_NOTICE "Valid pud");

    pmd = pmd_offset(pud, addr);
    if (pmd_none(*pmd) || pmd_bad(*pmd))
        goto out;
    printk(KERN_NOTICE "Valid pmd");

    ptep = pte_offset_map(pmd, addr);
    if (!ptep)
        goto out;
    pte = *ptep;

    page = pte_page(pte);
    if (page)
        printk(KERN_INFO "page frame struct is @ %p", page);

 out:
    return page;
}

This function is called from the ioctland addris a virtual address in process address space:

这个函数是从进程地址空间中调用的ioctladdr是一个虚拟地址:

static int my_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long addr)
{
   struct page *page = walk_page_table(addr);
   ...
   return 0;
}

The strange thing is that calling ioctlin a user space process, this segfaults...but it seems that the way i'm looking for the page table entry is correct because with dmesgi obtain for example for each ioctlcall:

奇怪的是,ioctl在用户空间进程中调用,这个段错误......但似乎我寻找页表条目的方式是正确的,因为dmesg我获得例如每次ioctl调用:

[ 1721.437104] Valid pgd
[ 1721.437108] Valid pud
[ 1721.437108] Valid pmd
[ 1721.437110] page frame struct is @ c17d9b80

So why the process can't complete correcly the `ioctl' call? Maybe i have to lock something before navigating the page tables?

那么为什么进程不能正确地完成“ioctl”调用呢?也许我必须在导航页表之前锁定某些内容?

I'm working with kernel 2.6.35-22 and three levels page tables.

我正在使用内核 2.6.35-22 和三级页表。

Thank you all!

谢谢你们!

采纳答案by Giovanni Cabiddu

pte_unmap(ptep); 

is missing just before the label out. Try to change the code in this way:

在标签出来之前丢失。尝试以这种方式更改代码:

    ...
    page = pte_page(pte);
    if (page)
        printk(KERN_INFO "page frame struct is @ %p", page);

    pte_unmap(ptep); 

out:

回答by Peter Teoh

Look at /proc/<pid>/smapsfilesystem, you can see the userspace memory:

查看/proc/<pid>/smaps文件系统,可以看到用户空间内存:

cat smaps 
bfa60000-bfa81000 rw-p 00000000 00:00 0          [stack]
Size:                136 kB
Rss:                  44 kB

and how it is printed is via fs/proc/task_mmu.c(from kernel source):

以及它的打印方式是通过fs/proc/task_mmu.c(来自内核源代码):

http://lxr.linux.no/linux+v3.0.4/fs/proc/task_mmu.c

http://lxr.linux.no/linux+v3.0.4/fs/proc/task_mmu.c

   if (vma->vm_mm && !is_vm_hugetlb_page(vma))
               walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk);
               show_map_vma(m, vma.....);
        seq_printf(m,
                   "Size:           %8lu kB\n"
                   "Rss:            %8lu kB\n"
                   "Pss:            %8lu kB\n"

And your function is somewhat like that of walk_page_range(). Looking into walk_page_range() you can see that the smaps_walk structure is not supposed to change while it is walking:

你的函数有点像 walk_page_range() 的函数。查看 walk_page_range() 可以看到 smaps_walk 结构在行走时不应该改变:

http://lxr.linux.no/linux+v3.0.4/mm/pagewalk.c#L153

For eg:

                }
 201                if (walk->pgd_entry)
 202                        err = walk->pgd_entry(pgd, addr, next, walk);
 203                if (!err &&
 204                    (walk->pud_entry || walk->pmd_entry || walk->pte_entry

If memory contents were to change, then all the above checking may get inconsistent.

如果内存内容发生变化,那么上述所有检查可能会不一致。

All these just mean that you have to lock the mmap_sem when walking the page table:

所有这些只是意味着您必须在遍历页表时锁定 mmap_sem:

   if (!down_read_trylock(&mm->mmap_sem)) {
            /*
             * Activate page so shrink_inactive_list is unlikely to unmap
             * its ptes while lock is dropped, so swapoff can make progress.
             */
            activate_page(page);
            unlock_page(page);
            down_read(&mm->mmap_sem);
            lock_page(page);
    }

and then followed by unlocking:

然后解锁:

up_read(&mm->mmap_sem);

And of course, when you issue printk() of the pagetable inside your kernel module, the kernel module is running in the process context of your insmod process (just printk the "comm" and you can see "insmod") meaning the mmap_sem is lock, it also mean the process is NOT running, and thus there is no console output till the process is completed (all printk() output goes to memory only).

当然,当您在内核模块内发出页表的 printk() 时,内核模块正在 insmod 进程的进程上下文中运行(只需打印“comm”,您就可以看到“insmod”),这意味着 mmap_sem 是锁定,这也意味着进程没有运行,因此在进程完成之前没有控制台输出(所有 printk() 输出仅进入内存)。

Sounds logical?

听起来合乎逻辑?