Linux 文件系统是否有效地缓存文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7118543/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Does the Linux filesystem cache files efficiently?
提问by laurent
I'm creating a web application running on a Linux server. The application is constantly accessing a 250K file - it loads it in memory, reads it and sends back some info to the user. Since this file is read all the time, my client is suggesting to use something like memcache to cache it to memory, presumably because it will make read operations faster.
我正在创建一个在 Linux 服务器上运行的 Web 应用程序。应用程序不断地访问一个 250K 的文件——它将它加载到内存中,读取它并向用户发送回一些信息。由于这个文件一直被读取,我的客户建议使用类似 memcache 的东西将它缓存到内存中,大概是因为它会使读取操作更快。
However, I'm thinking that the Linux filesystem is probably already caching the file in memory since it's accessed frequently. Is that right? In your opinion, would memcache provide a real improvement? Or is it going to do the same thing that Linux is already doing?
但是,我认为 Linux 文件系统可能已经将文件缓存在内存中,因为它经常被访问。那正确吗?在您看来,memcache 会提供真正的改进吗?或者它会做 Linux 已经在做的事情吗?
I'm not really familiar with neither Linux nor memcache, so I would really appreciate if someone could clarify this.
我对 Linux 和 memcache 都不熟悉,所以如果有人能澄清这一点,我将不胜感激。
采纳答案by Robert Martin
Yes, if you do not modify the file each time you open it.
是的,如果您每次打开文件时都不修改该文件。
Linux will hold the file's information in copy-on-write pages in memory, and "loading" the file into memory should be very fast (page table swap at worst).
Linux 会将文件的信息保存在内存中的写时复制页面中,并且将文件“加载”到内存中应该非常快(最坏的情况是页表交换)。
Edit: Though, as cdhowie points out, there is no 'linux filesystem'. However, I believe the relevant code is in linux's memory management, and is therefore independent of the filesystem in question. If you're curious, you can read in the linux source about handling vm_area_struct objects in linux/mm/mmap.c, mainly.
编辑:虽然,正如 cdhowie 指出的那样,没有“linux 文件系统”。但是,我相信相关代码在 linux 的内存管理中,因此与所讨论的文件系统无关。如果你很好奇,你可以在 linux 源代码中阅读关于在 linux/mm/mmap.c 中处理 vm_area_struct 对象的主要内容。
回答by MarkR
Yes, definitely. It will keep accessed files in memory indefinitely, unless something else needs the memory.
当然是。除非其他东西需要内存,否则它将无限期地将访问的文件保留在内存中。
You can control this behaviour (to some extent) with the fadvise system call. See its "man" page for more details.
您可以使用 fadvise 系统调用来控制此行为(在某种程度上)。有关更多详细信息,请参阅其“手册”页面。
A read/write system call will still normally need to copy the data, so if you see a real bottleneck doing this, consider using mmap() which can avoid the copy, by mapping the cache pages directly into the process.
读/写系统调用通常仍需要复制数据,因此如果您看到这样做的真正瓶颈,请考虑使用 mmap() ,它可以通过将缓存页面直接映射到进程中来避免复制。
回答by Bruce ONeel
As people have mentioned, mmap is a good solution here.
正如人们所提到的,mmap 是一个很好的解决方案。
But, one 250k file is very small. You might want to read it in and put it in some sort of memory structure that matches what you want to send back to the user on startup. Ie, if it is a text file an array of lines might be a good choice, etc.
但是,一个 250k 的文件非常小。您可能希望将其读入并将其放入某种内存结构中,该结构与您要在启动时发送回给用户的内容相匹配。即,如果它是一个文本文件,则行数组可能是一个不错的选择,等等。
回答by shr
I guess putting that file into ramdisk (tmpfs) may make enough advantage without big modifications. Unless you are really serious about response time in microseconds unit.
我想将该文件放入 ramdisk (tmpfs) 可能会产生足够的优势,而无需进行大的修改。除非您真的很重视以微秒为单位的响应时间。
回答by user2152580
The file should be cached, but make sure the noatime option is set on the mount, otherwise the access time will attempt to be saved to the file, invalidating the cache.
文件应该被缓存,但确保在挂载上设置了 noatime 选项,否则访问时间将尝试保存到文件中,使缓存无效。