Linux 了解 /proc/sys/vm/lowmem_reserve_ratio
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4984190/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Understanding /proc/sys/vm/lowmem_reserve_ratio
提问by Ashish
I am not able to understand the meaning of the variable "lowmem_reserve_ratio" by reading the explanation from Documentation/sysctl/vm.txt. I have also tried to google it but all the explanations found are exactly similar as present in vm.txt.
通过阅读 Documentation/sysctl/vm.txt 中的解释,我无法理解变量“lowmem_reserve_ratio”的含义。我也试过用谷歌搜索,但找到的所有解释都与 vm.txt 中的解释完全相似。
It will be really helpful if sb explains it or mention some link about it. Here goes the original explanation:-
如果 sb 解释或提及有关它的一些链接,这将非常有帮助。这是原始解释:-
The lowmem_reserve_ratio is an array. You can see them by reading this file.
-
% cat /proc/sys/vm/lowmem_reserve_ratio
256 256 32
-
Note: # of this elements is one fewer than number of zones. Because the highest
zone's value is not necessary for following calculation.
But, these values are not used directly. The kernel calculates # of protection
pages for each zones from them. These are shown as array of protection pages
in /proc/zoneinfo like followings. (This is an example of x86-64 box).
Each zone has an array of protection pages like this.
-
Node 0, zone DMA
pages free 1355
min 3
low 3
high 4
:
:
numa_other 0
protection: (0, 2004, 2004, 2004)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pagesets
cpu: 0 pcp: 0
:
-
These protections are added to score to judge whether this zone should be used
for page allocation or should be reclaimed.
In this example, if normal pages (index=2) are required to this DMA zone and
watermark[WMARK_HIGH] is used for watermark, the kernel judges this zone should
not be used because pages_free(1355) is smaller than watermark + protection[2]
(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
normal page requirement. If requirement is DMA zone(index=0), protection[0]
(=0) is used.
zone[i]'s protection[j] is calculated by following expression.
(i < j):
zone[i]->protection[j]
= (total sums of present_pages from zone[i+1] to zone[j] on the node)
/ lowmem_reserve_ratio[i];
(i = j):
(should not be protected. = 0;
(i > j):
(not necessary, but looks 0)
The default values of lowmem_reserve_ratio[i] are
256 (if zone[i] means DMA or DMA32 zone)
32 (others).
As above expression, they are reciprocal number of ratio.
256 means 1/256. # of protection pages becomes about "0.39%" of total present
pages of higher zones on the node.
If you would like to protect more pages, smaller values are effective.
The minimum value is 1 (1/1 -> 100%).
回答by JoKoT3
回答by cha5on
I found the wording in that document really confusing too. Looking at the source in mm/page_alloc.c
helped to clear it up, so let me try my hand at a more straightforward explanation:
我发现该文件中的措辞也非常令人困惑。查看源代码mm/page_alloc.c
有助于清除它,所以让我尝试更直接的解释:
As is said in the page you quoted, these numbers "are reciprocal number of ratio". Worded differently: these numbers are divisors. So when calculating the reserve pages for a given zone in a node, you take the sum of pages in that node in zones higher than that one, divide it by the provided divisor, and that's how many pages you're reserving for that zone.
正如您引用的页面中所说,这些数字“是比率的倒数”。换种说法:这些数字是除数。因此,在计算节点中给定区域的保留页面时,您将该节点中高于该区域的区域中的页面总和除以提供的除数,这就是您为该区域保留的页面数。
Example: let's assume a 1 GiB node with 768 MiB in zone Normal and 256 MiB in zone HighMem (assume no zone DMA). Let's assume the default highmem reserve "ratio" (divisor) of 32. And let's assume the typical 4 KiB page size. Now we can calculate the reserve area for zone Normal:
示例:假设一个 1 GiB 节点在 Normal 区域中有 768 MiB,在 HighMem 区域中有 256 MiB(假设没有区域 DMA)。让我们假设默认的 highmem 保留“比率”(除数)为 32。让我们假设典型的 4 KiB 页面大小。现在我们可以计算正常区域的保留区域:
- Sum of "higher" zones than zone Normal (just HighMem): 256 MiB = (1024 KiB / 1 MiB) * (1 page / 4 KiB) = 65536 pages
- Area reserved in zone Normal for this node: 65536 pages / 32 = 2048 pages = 8 MiB.
- 比正常区域(仅 HighMem)“更高”的区域总和:256 MiB = (1024 KiB / 1 MiB) * (1 page / 4 KiB) = 65536 pages
- 此节点在正常区域中保留的区域:65536 页/32 = 2048 页 = 8 MiB。
The concept stays the same when you add more zones and nodes. Just remember that the reserved size is in pages---you never reserve a fraction of a page.
添加更多区域和节点时,概念保持不变。请记住,保留的大小以页为单位——您永远不会保留页面的一小部分。
回答by Victor Choy
I find the kernel source code that explain very well and clear.
我发现内核源代码解释得非常清楚。
/*
* setup_per_zone_lowmem_reserve - called whenever
* sysctl_lowmem_reserve_ratio changes. Ensures that each zone
* has a correct pages reserved value, so an adequate number of
* pages are left in the zone after a successful __alloc_pages().
*/
static void setup_per_zone_lowmem_reserve(void)
{
struct pglist_data *pgdat;
enum zone_type j, idx;
for_each_online_pgdat(pgdat) {
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
unsigned long managed_pages = zone->managed_pages;
zone->lowmem_reserve[j] = 0;
idx = j;
while (idx) {
struct zone *lower_zone;
idx--;
if (sysctl_lowmem_reserve_ratio[idx] < 1)
sysctl_lowmem_reserve_ratio[idx] = 1;
lower_zone = pgdat->node_zones + idx;
lower_zone->lowmem_reserve[j] = managed_pages /
sysctl_lowmem_reserve_ratio[idx];
managed_pages += lower_zone->managed_pages;
}
}
}
/* update totalreserve_pages */
calculate_totalreserve_pages();
}
And here even list an demo.
这里甚至列出了一个演示。
/*
* results with 256, 32 in the lowmem_reserve sysctl:
* 1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
* 1G machine -> (16M dma, 784M normal, 224M high)
* NORMAL allocation will leave 784M/256 of ram reserved in the ZONE_DMA
* HIGHMEM allocation will leave 224M/32 of ram reserved in ZONE_NORMAL
* HIGHMEM allocation will leave (224M+784M)/256 of ram reserved in ZONE_DMA
*
* TBD: should special case ZONE_DMA32 machines here - in those we normally
* don't need any ZONE_NORMAL reservation
*/
int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
#ifdef CONFIG_ZONE_DMA
256,
#endif
#ifdef CONFIG_ZONE_DMA32
256,
#endif
#ifdef CONFIG_HIGHMEM
32,
#endif
32,
};
In a word, the expression looks like,
总之,表情看起来像,
zone[1]->lowmem_reserve[2] = zone[2]->managed_pages / sysctl_lowmem_reserve_ratio[1]
zone[0]->lowmem_reserve[2] = (zone[1] + zone[2])->managed_pages / sysctl_lowmem_reserve_ratio[0]