C++ 多个线程读取同一个文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/823479/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Multiple threads reading from the same file
提问by anru
My platform is windows vista 32, with visual c++ express 2008 .
我的平台是 windows vista 32,带有 Visual c++ express 2008 。
for example:
例如:
if i have a file contains 4000 bytes, can i have 4 threads read from the file at same time? and each thread access a different section of the file.
如果我有一个包含 4000 字节的文件,我可以同时从文件中读取 4 个线程吗?每个线程访问文件的不同部分。
thread 1 read 0-999, thread 2 read 1000 - 2999, etc.
线程 1 读取 0-999,线程 2 读取 1000 - 2999,等等。
please give a example in C language.
请举个C语言的例子。
回答by Francis
If you don't write to them, no need to take care of sync / race condition.
如果您不写信给他们,则无需处理同步/竞争条件。
Just open the file with shared reading as different handles and everything would work. (i.e., you must open the file in the thread's context instead of sharing same file handle).
只需将共享阅读作为不同的句柄打开文件,一切都会正常工作。(即,您必须在线程的上下文中打开文件而不是共享相同的文件句柄)。
#include <stdio.h>
#include <windows.h>
DWORD WINAPI mythread(LPVOID param)
{
int i = (int) param;
BYTE buf[1000];
DWORD numread;
HANDLE h = CreateFile("c:\test.txt", GENERIC_READ, FILE_SHARE_READ,
NULL, OPEN_EXISTING, 0, NULL);
SetFilePointer(h, i * 1000, NULL, FILE_BEGIN);
ReadFile(h, buf, sizeof(buf), &numread, NULL);
printf("buf[%d]: %02X %02X %02X\n", i+1, buf[0], buf[1], buf[2]);
return 0;
}
int main()
{
int i;
HANDLE h[4];
for (i = 0; i < 4; i++)
h[i] = CreateThread(NULL, 0, mythread, (LPVOID)i, 0, NULL);
// for (i = 0; i < 4; i++) WaitForSingleObject(h[i], INFINITE);
WaitForMultipleObjects(4, h, TRUE, INFINITE);
return 0;
}
回答by MSalters
There's not even a big problem writingto the same file, in all honesty.
老实说,写入同一个文件甚至没有大问题。
By far the easiest way is to just memory-map the file. The OS will then give you a void* where the file is mapped into memory. Cast that to a char[], and make sure that each thread uses non-overlapping subarrays.
到目前为止,最简单的方法是仅对文件进行内存映射。然后操作系统会给你一个 void* 文件被映射到内存中。将其转换为 char[],并确保每个线程使用不重叠的子数组。
void foo(char* begin, char*end) { /* .... */ }
void* base_address = myOS_memory_map("example.binary");
myOS_start_thread(&foo, (char*)base_address, (char*)base_address + 1000);
myOS_start_thread(&foo, (char*)base_address+1000, (char*)base_address + 2000);
myOS_start_thread(&foo, (char*)base_address+2000, (char*)base_address + 3000);
回答by bk1e
Windows supports overlapped I/O, which allows a single thread to asynchronously queue multiple I/O requests for better performance. This could conceivably be used by multiple threads simultaneously as long as the file you are accessing supports seeking (i.e. this is not a pipe).
Windows 支持重叠 I/O,它允许单个线程将多个 I/O 请求异步排队以获得更好的性能。只要您正在访问的文件支持查找(即这不是管道),就可以想象这可以被多个线程同时使用。
Passing FILE_FLAG_OVERLAPPED
to CreateFile()
allows simultaneous reads and writes on the same file handle; otherwise, Windows serializes them. Specify the file offset using the Offset
and OffsetHigh
members of the OVERLAPPED
structure.
传递FILE_FLAG_OVERLAPPED
到CreateFile()
允许在同一个文件句柄上同时读写;否则,Windows 会序列化它们。使用结构的Offset
和OffsetHigh
成员指定文件偏移量OVERLAPPED
。
For more information see Synchronization and Overlapped Input and Output.
有关更多信息,请参阅同步和重叠输入和输出。
回答by Stinomus
You can certainly have multiple threads reading from a data structure, race conditions can potentially occur if any writingis taking place.
您当然可以让多个线程从数据结构中读取数据,如果发生任何写入操作,可能会发生竞争条件。
To avoid such race conditions you need to define the boundaries that threads can read, if you have an explicit number of data segments and an explicit number of threads to match these then that is easy.
为了避免这种竞争条件,您需要定义线程可以读取的边界,如果您有明确数量的数据段和明确数量的线程来匹配这些,那么这很容易。
As for an example in C you would need to provide some more information, like the threading library you are using. Attempt it first, then we can help you fix any issues.
对于 C 中的示例,您需要提供更多信息,例如您正在使用的线程库。请先尝试,然后我们可以帮助您解决任何问题。
回答by Martin York
I don't see any real advantage to doing this.
You may have multiple threads reading from the device but your bottleneck will not be CPU but rather disk IO speed.
我看不出这样做有什么真正的好处。
您可能有多个线程从设备读取,但瓶颈不是 CPU,而是磁盘 IO 速度。
If you are not careful you may even slow the processes down (but you will need to measure it to know for certain).
如果您不小心,您甚至可能会减慢进程的速度(但您需要对其进行测量才能确定)。
回答by user665049
The easiest way is to open the file within each parallel instance, but just open it as readonly.
最简单的方法是在每个并行实例中打开文件,但只需以只读方式打开它。
The people who say there may be an IO bottleneck are probably wrong. Any modern operating system caches file reads. Which means the first time you read a file will be the slowest, and any subsequent reads will be lightning fast. A 4000 byte file can even rest inside the processor's cache.
说可能存在IO瓶颈的人可能是错的。任何现代操作系统都会缓存文件读取。这意味着您第一次读取文件将是最慢的,任何后续读取都将是闪电般的快。一个 4000 字节的文件甚至可以放在处理器的缓存中。
回答by CodeBeginner
std::mutex mtx;
void worker(int n)
{
mtx.lock();
char * memblock;
ifstream file ("D:\test.txt", ios::in);
if (file.is_open())
{
memblock = new char [1000];
file.seekg (n * 999, ios::beg);
file.read (memblock, 999);
memblock[999] = '##代码##';
cout << memblock << endl;
file.close();
delete[] memblock;
}
else
cout << "Unable to open file";
mtx.unlock();
}
int main()
{
vector<std::thread> vec;
for(int i=0; i < 3; i++)
{
vec.push_back(std::thread(&worker,i));
}
std::for_each(vec.begin(), vec.end(), [](std::thread& th)
{
th.join();
});
return 0;
}
回答by Peter
You shouldn't need to do anything particularly clever if all they're doing is reading. Obviously you can read it as many times in parallel as you like, as long as you don't exclusively lock it. Writing is clearly another matter of course...
如果他们所做的只是阅读,您就不需要做任何特别聪明的事情。显然,只要您不专门锁定它,您就可以根据需要并行读取它多次。写作显然是另一回事……
I do have to wonder why you'd want to though - it will likely perform badly since your HDD will waste a lot of time seeking back and forth rather than reading it all in one (relatively) uninterrupted sweep. For small files (like your 4000 line example) where that might not be such a problem, it doesn't seem worth the trouble.
我确实想知道您为什么想要 - 它可能会表现不佳,因为您的 HDD 会浪费大量时间来回寻找,而不是在一次(相对)不间断的扫描中阅读所有内容。对于可能不是问题的小文件(如您的 4000 行示例),似乎不值得麻烦。
回答by jon-hanson
It is possible though i'm not sure it will be worth the effort. Have you considered reading the entire file into memory within a single thread and then allow multiple threads to access that data?
虽然我不确定这是否值得付出努力,但这是可能的。您是否考虑过在单个线程中将整个文件读入内存,然后允许多个线程访问该数据?
回答by Xetius
Reading: No need to lock the file. Just open the file as read only or shared read
阅读:无需锁定文件。只需将文件打开为只读或共享读取
Writing: Use a mutex to ensure the file is only written to by one person.
写入:使用互斥锁确保文件仅由一个人写入。