Linux 多个线程能够同时获得 flock

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9462532/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 04:51:26  来源:igfitidea点击:

multiple threads able to get flock at the same time

clinuxglibcflock

提问by user1235176

I was under the impression that flock(2)is thread safe, I recently, ran across the case in the code, where multiple threads are able to get a lock on the same file which are all synchronized with the use of obtaining exclusive lock using the c api flock. The process 25554 is multi-threaded app which has 20 threads, the number of threads having lock to the same file varies when the deadlock happens. The multi threaded app testEventis writer to the file, where was the push is the reader from the file. Unfortunately the lsofdoes not print the LWP value so I cannot find which are the threads that are holding the lock. When the below mentioned condition happens both the process and threads are stuck on the flock call as displayed by the pstackor stracecall on the pid 25569 and 25554. Any suggestions on how to overcome this in RHEL 4.x.

我的印象是flock(2)是线程安全的,我最近在代码中遇到了这种情况,其中多个线程能够在同一个文件上获得锁,这些锁都与使用获取排他锁同步c api 群。进程 25554 是一个多线程的应用程序,它有 20 个线程,当死锁发生时,锁定到同一个文件的线程数会有所不同。多线程应用程序testEvent是文件的写入者,推送是文件中的读取者。不幸的是,lsof它没有打印 LWP 值,所以我找不到持有锁的线程。当下面提到的情况发生时,进程和线程都卡在 flock 调用上,如pstackstrace调用 pid 25569 和 25554。有关如何在 RHEL 4.x 中克服此问题的任何建议。

One thing I wanted to update is flock does not misbehave all the time, when the tx rate of the messages is more than 2 mbps only then I get into this deadlock issue with flock, below that tx rate everything is file. I have kept the num_threads= 20, size_of_msg= 1000bytes constant and just varied the number of messages tx per second start from 10 messages to 100 messages which is 20*1000*100 = 2 mbps, when I increase the number of messages to 150 then flock issue happens.

我想更新的一件事是 flock 不会一直行为不端,当消息的 tx 速率超过 2 mbps 时,我才遇到 flock 的这个死锁问题,低于该 tx 速率,一切都是文件。我一直保持num_threads= 20, size_of_msg= 1000bytes 不变,只是将每秒 tx 的消息数从 10 条消息改为 100 条消息,即 20*1000*100 = 2 mbps,当我将消息数量增加到 150 时,就会出现群发问题发生。

I just wanted to ask what is your opinion about flockfile c api.

我只是想问你对 flockfile c api 有什么看法。

 sudo lsof filename.txt
    COMMAND       PID     USER     FD       TYPE     DEVICE     SIZE   NODE       NAME
    push         25569    root     11u       REG      253.4      1079   49266853   filename.txt
    testEvent    25554    root     27uW      REG      253.4      1079   49266853   filename.txt
    testEvent    25554    root     28uW      REG      253.4      1079   49266853   filename.txt
    testEvent    25554    root     29uW      REG      253.4      1079   49266853   filename.txt
    testEvent    25554    root     30uW      REG      253.4      1079   49266853   filename.txt

The multithreaded test program that will call the write_data_lib_funclib function.

将调用write_data_lib_funclib 函数的多线程测试程序。

void* sendMessage(void *arg)  {

int* numOfMessagesPerSecond = (int*) arg;
std::cout <<" Executing p thread id " << pthread_self() << std::endl;
 while(!terminateTest) {
   Record *er1 = Record::create();
   er1.setDate("some data");

   for(int i = 0 ; i <=*numOfMessagesPerSecond ; i++){
     ec = _write_data_lib_func(*er1);
     if( ec != SUCCESS) {
       std::cout << "write was not successful" << std::endl;

     }

   }
   delete er1;
   sleep(1);
 }

 return NULL;

The above method will be called in the pthreads in the main function of the test.

上面的方法会在test的main函数中的pthreads中调用。

for (i=0; i<_numThreads ; ++i) {
  rc = pthread_create(&threads[i], NULL, sendMessage, (void *)&_num_msgs);
  assert(0 == rc);

}

}

Here is the writer/reader source, due to proprietary reasons I did not want to just cut and paste, the writer source will accessed multiple threads in a process

这是作者/读者源,由于专有原因我不想只是剪切和粘贴,作者源将在一个进程中访问多个线程

int write_data_lib_func(Record * rec) {      
if(fd == -1 ) {  
    fd = open(fn,O_RDWR| O_CREAT | O_APPEND, 0666);
} 
if ( fd >= 0 ) {
   /* some code */ 

   if( flock(fd, LOCK_EX) < 0 ) {
     print "some error message";
   }
   else { 
    if( maxfilesize) {
      off_t len = lseek ( fd,0,SEEK_END);
      ...
      ... 
      ftruncate( fd,0);
      ...
      lseek(fd,0,SEEK_SET); 
   } /* end of max spool size */ 
   if( writev(fd,rec) < 0 ) {
     print "some error message" ; 
   }

   if(flock(fd,LOCK_UN) < 0 ) {
   print some error message; 
   } 

In the reader side of things is a daemon process with no threads.

在事物的读者方面是一个没有线程的守护进程。

int readData() {
    while(true) {
      if( fd == -1 ) {
         fd= open (filename,O_RDWR);
      }
      if( flock (fd, LOCK_EX) < 0 ) { 
        print "some error message"; 
        break; 
      } 
      if( n = read(fd,readBuf,readBufSize)) < 0 ) { 
        print "some error message" ;
        break;
      }  
      if( off < n ) { 
        if ( off <= 0 && n > 0 ) { 
          corrupt_file = true; 
        } 
        if ( lseek(fd, off-n, SEEK_CUR) < 0 ) { 
          print "some error message"; 
        } 
        if( corrupt_spool ) {  
          if (ftruncate(fd,0) < 0 ) { 
             print "some error message";
             break;
           }  
        }
      }
      if( flock(fd, LOCK_UN) < 0 ) 
       print some error message ;
      }  
   }     
}

回答by Basile Starynkevitch

flock(2)is documented as "blocking if an incompatible lock is held by another process" and with "locks created by flock() are associated with an open file table entry", so it should be expected that flock-ed locks by several threads of the same process don't interact. (the flockdocumentation doesn't mention threads).

flock(2)被记录为“如果不兼容的锁被另一个进程持有则阻塞”并且“由 flock() 创建的锁与打开的文件表条目相关联”,因此应该预期flock-ed 锁由多个线程同一个进程不交互。(flock文档没有提到线程)。

Hence, the solution should be simple for you: associate one pthread_mutex_tto every flock-able file descriptor, and protect the call to flockwith that mutex. You might also use pthread_rwlock_tif you want a read vs write locking.

因此,该解决方案对您来说应该很简单:将一个pthread_mutex_t与每个flock-able 文件描述符相关联,并flock使用该互斥锁保护对 的调用。如果您想要读取与写入锁定,您也可以使用pthread_rwlock_t

回答by Chris Dodd

From the Linux man page for flock(2):

来自 flock(2) 的 Linux 手册页:

Locks created by flock() are associated with an open file table entry. This means that duplicate file descriptors (created by, for example, fork(2) or dup(2)) refer to the same lock, and this lock may be modified or released using any of these descriptors. Furthermore, the lock is released either by an explicit LOCK_UN operation on any of these duplicate descriptors, or when all such descriptors have been closed.

flock() 创建的锁与打开的文件表条目相关联。这意味着重复的文件描述符(例如由 fork(2) 或 dup(2) 创建)引用相同的锁,并且可以使用这些描述符中的任何一个来修改或释放该锁。此外,通过对这些重复描述符中的任何一个执行显式 LOCK_UN 操作,或者当所有此类描述符都已关闭时,会释放锁。

In addition, flock locks don't 'stack', so if you try to acquire a lock you already hold, the flock call is a noop that returns immediately without blocking and without changing the lock state in any way.

此外,flock 锁不会“堆叠”,因此如果您尝试获取已持有的锁,flock 调用是一个 noop,它会立即返回而不会阻塞且不会以任何方式更改锁状态。

Since threads within a process share file descriptors, you can flock the file multiple times from different threads, and it won't block, as the lock is already held.

由于进程中的线程共享文件描述符,您可以从不同的线程多次聚集文件,并且它不会阻塞,因为锁已经被持有。

Also from the notes on flock(2):

同样来自关于 flock(2) 的注释:

flock() and fcntl(2) locks have different semantics with respect to forked processes and dup(2). On systems that implement flock() using fcntl(2), the semantics of flock() will be different from those described in this manual page.

flock() 和 fcntl(2) 锁对于分叉进程和 dup(2) 具有不同的语义。在使用 fcntl(2) 实现 flock() 的系统上, flock() 的语义将与本手册页中描述的不同。