C语言 Malloc 分段错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22051294/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 10:49:17  来源:igfitidea点击:

Malloc segmentation fault

csegmentation-faultmallocstack-overflowbuffer-overflow

提问by guilhermemtr

Here is the piece of code in which segmentation fault occurs (the perror is not being called):

这是发生分段错误的代码段(未调用 perror):

job = malloc(sizeof(task_t));
if(job == NULL)
    perror("malloc");

To be more precise, gdb says that the segfaulthappens inside a __int_malloccall, which is a sub-routine call made by malloc.

更准确地说,gdb 说segfault发生在一个__int_malloc调用中,这是一个由malloc.

Since the malloc function is called in parallel with other threads, initially I thought that it could be the problem. I was using version 2.19 of glibc.

由于 malloc 函数是与其他线程并行调用的,所以最初我认为可能是问题所在。我使用的是 2.19 版的 glibc。

The data structures:

数据结构:

typedef struct rv_thread thread_wrapper_t;

typedef struct future
{
  pthread_cond_t wait;
  pthread_mutex_t mutex;
  long completed;
} future_t;

typedef struct task
{
  future_t * f;
  void * data;
  void *
  (*fun)(thread_wrapper_t *, void *);
} task_t;

typedef struct
{
  queue_t * queue;
} pool_worker_t;

typedef struct
{
  task_t * t;
} sfuture_t;

struct rv_thread
{
  pool_worker_t * pool;
};

Now the future implementation:

现在未来的实现:

future_t *
create_future()
{
  future_t * new_f = malloc(sizeof(future_t));
  if(new_f == NULL)
    perror("malloc");
  new_f->completed = 0;
  pthread_mutex_init(&(new_f->mutex), NULL);
  pthread_cond_init(&(new_f->wait), NULL);
  return new_f;
}

int
wait_future(future_t * f)
{
  pthread_mutex_lock(&(f->mutex));
  while (!f->completed)
    {
      pthread_cond_wait(&(f->wait),&(f->mutex));
    }
  pthread_mutex_unlock(&(f->mutex));
  return 0;
}

void
complete(future_t * f)
{
  pthread_mutex_lock(&(f->mutex));
  f->completed = 1;
  pthread_mutex_unlock(&(f->mutex));
  pthread_cond_broadcast(&(f->wait));
}

The thread pool itself:

线程池本身:

pool_worker_t *
create_work_pool(int threads)
{
  pool_worker_t * new_p = malloc(sizeof(pool_worker_t));
  if(new_p == NULL)
    perror("malloc");
  threads = 1;
  new_p->queue = create_queue();
  int i;
  for (i = 0; i < threads; i++){
    thread_wrapper_t * w = malloc(sizeof(thread_wrapper_t));
    if(w == NULL)
      perror("malloc");
    w->pool = new_p;
    pthread_t n;
    pthread_create(&n, NULL, work, w);
  }
  return new_p;
}

task_t *
try_get_new_task(thread_wrapper_t * thr)
{
  task_t * t = NULL;
  try_dequeue(thr->pool->queue, t);
  return t;
}

void
submit_job(pool_worker_t * p, task_t * t)
{
  enqueue(p->queue, t);
}

void *
work(void * data)
{
  thread_wrapper_t * thr = (thread_wrapper_t *) data;
  while (1){
    task_t * t = NULL;
    while ((t = (task_t *) try_get_new_task(thr)) == NULL);
    future_t * f = t->f;
    (*(t->fun))(thr,t->data);
    complete(f);
  }
  pthread_exit(NULL);
}

And finally the task.c:

最后是task.c:

pool_worker_t *
create_tpool()
{
  return (create_work_pool(8));
}

sfuture_t *
async(pool_worker_t * p, thread_wrapper_t * thr, void *
(*fun)(thread_wrapper_t *, void *), void * data)
{
  task_t * job = NULL;
  job = malloc(sizeof(task_t));
  if(job == NULL)
    perror("malloc");
  job->data = data;
  job->fun = fun;
  job->f = create_future();
  submit_job(p, job);
  sfuture_t * new_t = malloc(sizeof(sfuture_t));
  if(new_t == NULL)
    perror("malloc");
  new_t->t = job;
  return (new_t);
}

void
mywait(thread_wrapper_t * thr, sfuture_t * sf)
{
  if (sf == NULL)
    return;
  if (thr != NULL)
    {
      while (!sf->t->f->completed)
        {
          task_t * t_n = try_get_new_task(thr);
          if (t_n != NULL)
            {
          future_t * f = t_n->f;
          (*(t_n->fun))(thr,t_n->data);
          complete(f);
            }
        }
      return;
    }
  wait_future(sf->t->f);
  return ;
}

The queue is the lfds lock-free queue.

该队列是 lfds 无锁队列。

#define enqueue(q,t) {                                 \
    if(!lfds611_queue_enqueue(q->lq, t))             \
      {                                               \
        lfds611_queue_guaranteed_enqueue(q->lq, t);  \
      }                                               \
  }

#define try_dequeue(q,t) {                            \
    lfds611_queue_dequeue(q->lq, &t);               \
  }

The problem happens whenever the number of calls to async is very high.

每当调用 async 的次数非常多时,就会出现问题。

Valgrind output:

Valgrind 输出:

Process terminating with default action of signal 11 (SIGSEGV)
==12022==  Bad permissions for mapped region at address 0x5AF9FF8
==12022==    at 0x4C28737: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)

回答by Jekyll

A SIGSEGV (segmentation fault) is firing in malloc is usually caused by heap corruption. Heap corruption does not cause a segmentation fault, so you would see that only when malloc tries to access there. The problem is that the code that creates the heap corruption could be in any point even far away from where the malloc is called. It is usually the next-block pointer inside the malloc that is changed by your heap corruption to an invalid address, so that when you call malloc an invalid pointer gets dereferenced and you get a segmentation fault.

在 malloc 中触发 SIGSEGV(分段错误)通常是由堆损坏引起的。堆损坏不会导致分段错误,因此您只会在 malloc 尝试访问那里时看到。问题是造成堆损坏的代码可能在任何一点,甚至远离调用 malloc 的地方。通常是 malloc 中的 next-block 指针因堆损坏而更改为无效地址,因此当您调用 malloc 时,无效指针被取消引用并导致分段错误。

I think you may try portions of your code isolated from the rest of the program to reduce the visibility of the bug.

我认为您可以尝试将部分代码与程序的其余部分隔离开来,以降低错误的可见性。

Moreover I see that you never free the memory here and there can be a possible memory leak.

此外,我看到您永远不会在这里释放内存,并且可能存在内存泄漏。

In order to check a memory leak you can run the top command top -b -n 1and check:

为了检查内存泄漏,您可以运行 top 命令top -b -n 1并检查:

RPRVT - resident private address space size
RSHRD - resident shared address space size
RSIZE - resident memory size
VPRVT - private address space size
VSIZE - total memory size

回答by guilhermemtr

I've figured out what the problem is: a stack overflow.

我已经弄清楚问题是什么:堆栈溢出。

First, let me explain why the stack overflow occurs inside malloc (which is probably why you are reading this). When my program was run, the stack size kept increasing each time it started executing (recursively) another task (because of the way I had programmed it). But for each such time, I had to allocate a new task using malloc. However, malloc makes other sub-routine calls, which make the stack increase its size even more than a simple call to execute another task. So, what was happening was that, even if there was no malloc, I would get a stack overflow. However, because I had malloc, the moment the stack overflowed was in malloc, before it overflowed by making another recursive call. The illustration bellow shows what was happening:

首先,让我解释一下为什么在 malloc 内部发生堆栈溢出(这可能就是您阅读本文的原因)。当我的程序运行时,每次开始执行(递归)另一个任务时,堆栈大小都会不断增加(因为我对它进行了编程)。但是对于每次这样的时间,我都必须使用 malloc 分配一个新任务。但是,malloc 进行了其他子例程调用,这使得堆栈增加其大小甚至超过执行另一个任务的简单调用。所以,发生的事情是,即使没有 malloc,我也会得到堆栈溢出。然而,因为我有 malloc,堆栈溢出的那一刻是在 malloc 中,在它通过另一个递归调用溢出之前。下图显示了正在发生的事情:

Initial stack state:

初始堆栈状态:

-------------------------
| recursive call n - 3  |
-------------------------
| recursive call n - 2  |
-------------------------
| recursive call n - 1  |
-------------------------
|        garbage        |
-------------------------
|        garbage        | <- If the stack passes this point, the stack overflows.
-------------------------

stack during malloc call:

malloc 调用期间的堆栈:

-------------------------
| recursive call n - 3  |
-------------------------
| recursive call n - 2  |
-------------------------
| recursive call n - 1  |
-------------------------
|        malloc         |
-------------------------
|     __int_malloc      | <- If the stack passes this point, the stack overflows.
-------------------------

Then the stack shrank again, and my code entered a new recursive call:

然后堆栈再次缩小,我的代码进入了一个新的递归调用:

-------------------------
| recursive call n - 3  |
-------------------------
| recursive call n - 2  |
-------------------------
| recursive call n - 1  |
-------------------------
| recursive call n      |
-------------------------
|        garbage        | <- If the stack passes this point, the stack overflows.
-------------------------

Then, it invoked malloc again inside this new recursive call. However, this time it overflowed:

然后,它在这个新的递归调用中再次调用 malloc。然而,这一次它溢出了:

-------------------------
| recursive call n - 3  |
-------------------------
| recursive call n - 2  |
-------------------------
| recursive call n - 1  |
-------------------------
| recursive call n      |
-------------------------
|        malloc         | <- If the stack passes this point, the stack overflows.
-------------------------
|     __int_malloc      | <- This is when the stack overflow occurs.
-------------------------

[The rest of the answer is more focused around why I had this problem in my code in particular.]

[答案的其余部分更侧重于为什么我的代码中有这个问题。]

Usually, when computing Fibonacci recursively, for example, of a certain number n, the stack size grows linearly with that number. However, in this case I'm creating tasks, using a queue to store them, and dequeuing a (fib) task for execution. If you draw this on paper, you'll see that the number of tasks grows exponentially with the n, rather than linearly (also note that if I had used a stack to store the tasks as they were created, the number of tasks allocated as well as the stack size would only grow linearly with n. So what happens is that the stack grows exponentially with n, leading to a stack overflow... Now comes the part why this overflow occurs inside the call to malloc. So basically, as I explained above, the stack overflow happened inside the malloc call because it was where the stack was largest. What happened was that the stack was almost exploding, and since malloc calls functions inside it, the stack grows more than just the calling of mywait and fib.

通常,当递归计算斐波那契数时,例如,某个数字 n,堆栈大小随该数字线性增长。但是,在这种情况下,我正在创建任务,使用队列来存储它们,并将 (fib) 任务出列以执行。如果你把它画在纸上,你会看到任务的数量随着 n 呈指数增长,而不是线性增长(还要注意,如果我在创建任务时使用堆栈来存储任务,分配的任务数量为以及堆栈大小只会随 n 线性增长。所以发生的情况是堆栈随 n 呈指数增长,导致堆栈溢出......现在是为什么在调用 malloc 时会发生这种溢出的部分。所以基本上,作为我在上面解释过,堆栈溢出发生在 malloc 调用内部,因为它是堆栈最大的地方。

Thank you all! If it wasn't your help i wouldn't be able to figure it out!

谢谢你们!如果不是你的帮助,我将无法弄清楚!