C++ 与 win32 CRITICAL_SECTION 相比的 std::mutex 性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9997473/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 13:29:42  来源:igfitidea点击:

std::mutex performance compared to win32 CRITICAL_SECTION

c++stlsynchronizationthread-safetymutex

提问by uray

how does the performance of std::mutexcompared to CRITICAL_SECTION? is it on par?

std::mutex相比之下的表现如何CRITICAL_SECTION?它是一样的吗?

I need lightweight synchronization object (doesn't need to be an interprocess object) is there any STL class that close to CRITICAL_SECTIONother than std::mutex?

我需要轻量的同步对象(不需要是一个进程间对象)没有任何STL类接近CRITICAL_SECTION以外std::mutex

回答by waldez

Please see my updates at the end of the answer, the situation has dramatically changed since Visual Studio 2015. The original answer is below.

请在答案末尾查看我的更新,自 Visual Studio 2015 以来情况发生了巨大变化。原始答案如下。

I made a very simple test and according to my measurements the std::mutexis around 50-70x slower than CRITICAL_SECTION.

我做了一个非常简单的测试,根据我的测量结果,std::mutex它比CRITICAL_SECTION.

std::mutex:       18140574us
CRITICAL_SECTION: 296874us

Edit: After some more tests it turned out it depends on number of threads (congestion) and number of CPU cores. Generally, the std::mutexis slower, but how much, it depends on use. Following are updated test results (tested on MacBook Pro with Core i5-4258U, Windows 10, Bootcamp):

编辑:经过更多测试,结果证明它取决于线程数(拥塞)和 CPU 内核数。一般来说,std::mutex速度较慢,但​​多少,这取决于使用。以下是更新的测试结果(在配备 Core i5-4258U、Windows 10、Bootcamp 的 MacBook Pro 上测试):

Iterations: 1000000
Thread count: 1
std::mutex:       78132us
CRITICAL_SECTION: 31252us
Thread count: 2
std::mutex:       687538us
CRITICAL_SECTION: 140648us
Thread count: 4
std::mutex:       1031277us
CRITICAL_SECTION: 703180us
Thread count: 8
std::mutex:       86779418us
CRITICAL_SECTION: 1634123us
Thread count: 16
std::mutex:       172916124us
CRITICAL_SECTION: 3390895us

Following is the code that produced this output. Compiled with Visual Studio 2012, default project settings, Win32 release configuration. Please note that this test may not be perfectly correct but it made me think twice before switching my code from using CRITICAL_SECTIONto std::mutex.

以下是产生此输出的代码。Visual Studio 2012编译,默认项目设置,Win32发布配置。请注意,此测试可能并不完全正确,但它让我在将代码从 using 切换CRITICAL_SECTIONstd::mutex.

#include "stdafx.h"
#include <Windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>

const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;

double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;

void sharedFunc( int i )
{
    if ( i % 2 == 0 )
        g_shmem = sqrt(g_shmem);
    else
        g_shmem *= g_shmem;
}

void threadFuncCritSec() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        EnterCriticalSection( &g_critSec );
        sharedFunc(i);
        LeaveCriticalSection( &g_critSec );
    }
}

void threadFuncMutex() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        g_mutex.lock();
        sharedFunc(i);
        g_mutex.unlock();
    }
}

void testRound(int threadCount)
{
    std::vector<std::thread> threads;

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncMutex ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
    std::cout << "us \n\r";

    threads.clear();
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncCritSec ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
    std::cout << "us \n\r";
}

int _tmain(int argc, _TCHAR* argv[]) {
    InitializeCriticalSection( &g_critSec );

    std::cout << "Iterations: " << g_cRepeatCount << "\n\r";

    for (int i = 1; i <= g_cThreadCount; i = i*2) {
        std::cout << "Thread count: " << i << "\n\r";
        testRound(i);
        Sleep(1000);
    }

    DeleteCriticalSection( &g_critSec );

    // Added 10/27/2017 to try to prevent the compiler to completely
    // optimize out the code around g_shmem if it wouldn't be used anywhere.
    std::cout << "Shared variable value: " << g_shmem << std::endl;
    getchar();
    return 0;
}

Update 10/27/2017 (1): Some answers suggest that this is not a realistic test or does not represent a "real world" scenario. That's true, this test tries to measure the overheadof the std::mutex, it's not trying to prove that the difference is negligible for 99% of applications.

2017 年 10 月 27 日更新 (1):一些答案表明这不是一个现实的测试或不代表“现实世界”的场景。这是真的,这个测试试图衡量开销std::mutex,它没有试图证明两者的差异是可以忽略不计的应用程序99%。

Update 10/27/2017 (2): Seems like the situation has changed in favor for std::mutexsince Visual Studio 2015 (VC140). I used VS2017 IDE, exactly the same code as above, x64 release configuration, optimizations disabled and I simply switched the "Platform Toolset" for each test. The results are very surprising and I am really curious what has hanged in VC140.

std::mutex2017 年 10 月 27 日更新 (2):自 Visual Studio 2015 (VC140) 以来,情况似乎有所改变。我使用了 VS2017 IDE,与上面完全相同的代码,x64 版本配置,禁用优化,我只是为每个测试切换了“平台工具集”。结果非常令人惊讶,我真的很好奇 VC140 中挂了什么。

Performance with 8 threads, by platform

8 个线程的性能,按平台

Update 02/25/2020 (3): Reran the test with Visual Studio 2019 (Toolset v142), and situation is still the same: std::mutexis two to three times faster than CRITICAL_SECTION.

2020 年 2 月25 日更新(3):使用 Visual Studio 2019(工具集 v142)重新运行测试,情况仍然相同:std::mutexCRITICAL_SECTION.

回答by Superfly Jon

The test by waldez here is not realistic, it basically simulates 100% contention. In general this is exactly what you don't want in multi-threaded code. Below is a modified test which does some shared calculations. The results I get with this code are different:

waldez这里的测试不太现实,基本模拟了100%的争用。一般来说,这正是您在多线程代码中不想要的。下面是一个经过修改的测试,它进行了一些共享计算。我用这段代码得到的结果是不同的:

Tasks: 160000
Thread count: 1
std::mutex:       12096ms
CRITICAL_SECTION: 12060ms
Thread count: 2
std::mutex:       5206ms
CRITICAL_SECTION: 5110ms
Thread count: 4
std::mutex:       2643ms
CRITICAL_SECTION: 2625ms
Thread count: 8
std::mutex:       1632ms
CRITICAL_SECTION: 1702ms
Thread count: 12
std::mutex:       1227ms
CRITICAL_SECTION: 1244ms

You can see here that for me (using VS2013) the figures are very close between std::mutex and CRITICAL_SECTION. Note that this code does a fixed number of tasks (160,000) which is why the performance improves generally with more threads. I've got 12 cores here so that's why I stopped at 12.

您可以在这里看到,对我而言(使用 VS2013),std::mutex 和 CRITICAL_SECTION 之间的数字非常接近。请注意,此代码执行固定数量的任务 (160,000),这就是为什么使用更多线程通常会提高性能。我这里有 12 个内核,所以这就是我停在 12 个内核的原因。

I'm not saying this is right or wrong compared to the other test but it does highlight that timing issues are generally domain specific.

我并不是说与其他测试相比这是对还是错,但它确实强调了时间问题通常是特定于领域的。

#include "stdafx.h"
#include <Windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>

const int tastCount = 160000;
int numThreads;
const int MAX_THREADS = 16;

double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;

void sharedFunc(int i, double &data)
{
    for (int j = 0; j < 100; j++)
    {
        if (j % 2 == 0)
            data = sqrt(data);
        else
            data *= data;
    }
}

void threadFuncCritSec() {
    double lMem = 8;
    int iterations = tastCount / numThreads;
    for (int i = 0; i < iterations; ++i) {
        for (int j = 0; j < 100; j++)
            sharedFunc(j, lMem);
        EnterCriticalSection(&g_critSec);
        sharedFunc(i, g_shmem);
        LeaveCriticalSection(&g_critSec);
    }
    printf("results: %f\n", lMem);
}

void threadFuncMutex() {
    double lMem = 8;
    int iterations = tastCount / numThreads;
    for (int i = 0; i < iterations; ++i) {
        for (int j = 0; j < 100; j++)
            sharedFunc(j, lMem);
        g_mutex.lock();
        sharedFunc(i, g_shmem);
        g_mutex.unlock();
    }
    printf("results: %f\n", lMem);
}

void testRound()
{
    std::vector<std::thread> threads;

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < numThreads; ++i)
        threads.push_back(std::thread(threadFuncMutex));
    for (std::thread& thd : threads)
        thd.join();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(endMutex - startMutex).count();
    std::cout << "ms \n\r";

    threads.clear();
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < numThreads; ++i)
        threads.push_back(std::thread(threadFuncCritSec));
    for (std::thread& thd : threads)
        thd.join();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(endCritSec - startCritSec).count();
    std::cout << "ms \n\r";
}

int _tmain(int argc, _TCHAR* argv[]) {
    InitializeCriticalSection(&g_critSec);

    std::cout << "Tasks: " << tastCount << "\n\r";

    for (numThreads = 1; numThreads <= MAX_THREADS; numThreads = numThreads * 2) {
        if (numThreads == 16)
            numThreads = 12;
        Sleep(100);
        std::cout << "Thread count: " << numThreads << "\n\r";
        testRound();
    }

    DeleteCriticalSection(&g_critSec);
    return 0;
}

回答by Sergey

I'm using Visual Studio 2013.

我正在使用 Visual Studio 2013。

My results in single threaded usage are looking similar to waldez results:

我在单线程使用中的结果看起来类似于 waldez 结果:

1 million of lock/unlock calls:

100 万次锁定/解锁调用:

CRITICAL_SECTION:       19 ms
std::mutex:             48 ms
std::recursive_mutex:   48 ms

The reason why Microsoft changed implementation is C++11 compatibility. C++11 has 4 kind of mutexes in std namespace:

Microsoft 更改实现的原因是 C++11 兼容性。C++11 在 std 命名空间中有 4 种互斥体:

Microsoft std::mutex and all other mutexes are the wrappers around critical section:

Microsoft std::mutex 和所有其他互斥锁是关键部分的包装器:

struct _Mtx_internal_imp_t
{   /* Win32 mutex */
    int type; // here MS keeps particular mutex type
    Concurrency::critical_section cs;
    long thread_id;
    int count;
};

As for me, std::recursive_mutex should completely match critical section. So Microsoft should optimize its implementation to take less CPU and memory.

至于我, std::recursive_mutex 应该完全匹配临界区。所以微软应该优化它的实现以占用更少的 CPU 和内存。

回答by Hi-Angel

I was searching here for pthread vs critical section benchmarks, however, as my result turned out to be different from the waldez's answer with regard to the topic, I thought it'd be interesting to share.

我在这里搜索 pthread 与关键部分基准测试,但是,由于我的结果与 waldez 就该主题的回答不同,我认为分享会很有趣。

The code is the one used by @waldez, modified to add pthreads to the comparison, compiled with GCC and no optimizations. My CPU is AMD A8-3530MX.

该代码是@waldez 使用的代码,经过修改以将 pthreads 添加到比较中,使用 GCC 编译并且没有优化。我的 CPU 是 AMD A8-3530MX。

Windows 7 Home Edition:

Windows 7 家庭版:

>a.exe
Iterations: 1000000
Thread count: 1
std::mutex:       46800us
CRITICAL_SECTION: 31200us
pthreads:         31200us
Thread count: 2
std::mutex:       171600us
CRITICAL_SECTION: 218400us
pthreads:         124800us
Thread count: 4
std::mutex:       327600us
CRITICAL_SECTION: 374400us
pthreads:         249600us
Thread count: 8
std::mutex:       967201us
CRITICAL_SECTION: 748801us
pthreads:         717601us
Thread count: 16
std::mutex:       2745604us
CRITICAL_SECTION: 1497602us
pthreads:         1903203us

As you can see, the difference varies well within statistical error — sometimes std::mutex is faster, sometimes it's not. What's important, I do not observe such big difference as the original answer.

如您所见,差异在统计误差范围内变化很大——有时 std::mutex 更快,有时则不然。重要的是,我没有观察到与原始答案如此大的差异。

I think, maybe the reason is that when the answer was posted, MSVC compiler wasn't good with newer standards, and note that the original answer have used the version from 2012 year.

我想,也许是因为发布答案时,MSVC 编译器不适合较新的标准,并注意原始答案使用的是 2012 年的版本。

Also, out of curiosity, same binary under Wine on Archlinux:

另外,出于好奇,Archlinux 上 Wine 下的相同二进制文件:

$ wine a.exe
fixme:winediag:start_process Wine Staging 2.19 is a testing version containing experimental patches.
fixme:winediag:start_process Please mention your exact version when filing bug reports on winehq.org.
Iterations: 1000000
Thread count: 1
std::mutex:       53810us 
CRITICAL_SECTION: 95165us 
pthreads:         62316us 
Thread count: 2
std::mutex:       604418us 
CRITICAL_SECTION: 1192601us 
pthreads:         688960us 
Thread count: 4
std::mutex:       779817us 
CRITICAL_SECTION: 2476287us 
pthreads:         818022us 
Thread count: 8
std::mutex:       1806607us 
CRITICAL_SECTION: 7246986us 
pthreads:         809566us 
Thread count: 16
std::mutex:       2987472us 
CRITICAL_SECTION: 14740350us 
pthreads:         1453991us

The waldez's code with my modifications:

经我修改后的 waldez 代码:

#include <math.h>
#include <windows.h>
#include <mutex>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>
#include <pthread.h>

const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;

double g_shmem = 8;
std::mutex g_mutex;
CRITICAL_SECTION g_critSec;
pthread_mutex_t pt_mutex;


void sharedFunc( int i )
{
    if ( i % 2 == 0 )
        g_shmem = sqrt(g_shmem);
    else
        g_shmem *= g_shmem;
}

void threadFuncCritSec() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        EnterCriticalSection( &g_critSec );
        sharedFunc(i);
        LeaveCriticalSection( &g_critSec );
    }
}

void threadFuncMutex() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        g_mutex.lock();
        sharedFunc(i);
        g_mutex.unlock();
    }
}

void threadFuncPTMutex() {
    for ( int i = 0; i < g_cRepeatCount; ++i ) {
        pthread_mutex_lock(&pt_mutex);
        sharedFunc(i);
        pthread_mutex_unlock(&pt_mutex);
    }
}
void testRound(int threadCount)
{
    std::vector<std::thread> threads;

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncMutex ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
    std::cout << "us \n";
    g_shmem = 0;

    threads.clear();
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncCritSec ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
    std::cout << "us \n";
    g_shmem = 0;

    threads.clear();
    auto startPThread = std::chrono::high_resolution_clock::now();
    for (int i = 0; i<threadCount; ++i)
        threads.push_back(std::thread( threadFuncPTMutex ));
    for ( std::thread& thd : threads )
        thd.join();
    auto endPThread = std::chrono::high_resolution_clock::now();

    std::cout << "pthreads:         ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endPThread - startPThread).count();
    std::cout << "us \n";
    g_shmem = 0;
}

int main() {
    InitializeCriticalSection( &g_critSec );
    pthread_mutex_init(&pt_mutex, 0);

    std::cout << "Iterations: " << g_cRepeatCount << "\n";

    for (int i = 1; i <= g_cThreadCount; i = i*2) {
        std::cout << "Thread count: " << i << "\n";
        testRound(i);
        Sleep(1000);
    }

    getchar();
    DeleteCriticalSection( &g_critSec );
    pthread_mutex_destroy(&pt_mutex);
    return 0;
}

回答by Pavel P

Same test programby Waldez modified to run with pthreads and boost::mutex.

Waldez 的相同测试程序修改为与 pthreads 和 boost::mutex 一起运行。

On win10 pro (with intel i7-7820X 16-core cpu) I get better results from std::mutex on VS2015 update3 (and even better from boost::mutex) that from CRITICAL_SECTION:

在win10 pro(使用intel i7-7820X 16核cpu)上,我从VS2015 update3上的std::mutex得到了比CRITICAL_SECTION更好的结果(从boost::mutex得到了更好的结果):

Iterations: 1000000

Thread count: 1
std::mutex:       23403us
boost::mutex:     12574us
CRITICAL_SECTION: 19454us

Thread count: 2
std::mutex:       55031us
boost::mutex:     45263us
CRITICAL_SECTION: 187597us

Thread count: 4
std::mutex:       113964us
boost::mutex:     83699us
CRITICAL_SECTION: 605765us

Thread count: 8
std::mutex:       266091us
boost::mutex:     155265us
CRITICAL_SECTION: 1908491us

Thread count: 16
std::mutex:       633032us
boost::mutex:     300076us
CRITICAL_SECTION: 4015176us

Results for pthreads are here.

pthread 的结果在这里

#ifdef _WIN32
#include <Windows.h>
#endif
#include <mutex>
#include <boost/thread/mutex.hpp>
#include <thread>
#include <vector>
#include <chrono>
#include <iostream>

const int g_cRepeatCount = 1000000;
const int g_cThreadCount = 16;

double g_shmem = 8;
std::recursive_mutex g_mutex;
boost::mutex g_boostMutex;

void sharedFunc(int i)
{
    if (i % 2 == 0)
        g_shmem = sqrt(g_shmem);
    else
        g_shmem *= g_shmem;
}

#ifdef _WIN32
CRITICAL_SECTION g_critSec;
void threadFuncCritSec()
{
    for (int i = 0; i < g_cRepeatCount; ++i)
    {
        EnterCriticalSection(&g_critSec);
        sharedFunc(i);
        LeaveCriticalSection(&g_critSec);
    }
}
#else
pthread_mutex_t pt_mutex;
void threadFuncPtMutex()
{
    for (int i = 0; i < g_cRepeatCount; ++i) {
        pthread_mutex_lock(&pt_mutex);
        sharedFunc(i);
        pthread_mutex_unlock(&pt_mutex);
    }
}
#endif

void threadFuncMutex()
{
    for (int i = 0; i < g_cRepeatCount; ++i)
    {
        g_mutex.lock();
        sharedFunc(i);
        g_mutex.unlock();
    }
}

void threadFuncBoostMutex()
{
    for (int i = 0; i < g_cRepeatCount; ++i)
    {
        g_boostMutex.lock();
        sharedFunc(i);
        g_boostMutex.unlock();
    }
}

void testRound(int threadCount)
{
    std::vector<std::thread> threads;

    std::cout << "\nThread count: " << threadCount << "\n\r";

    auto startMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < threadCount; ++i)
        threads.push_back(std::thread(threadFuncMutex));
    for (std::thread& thd : threads)
        thd.join();
    threads.clear();
    auto endMutex = std::chrono::high_resolution_clock::now();

    std::cout << "std::mutex:       ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endMutex - startMutex).count();
    std::cout << "us \n\r";

    auto startBoostMutex = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < threadCount; ++i)
        threads.push_back(std::thread(threadFuncBoostMutex));
    for (std::thread& thd : threads)
        thd.join();
    threads.clear();
    auto endBoostMutex = std::chrono::high_resolution_clock::now();

    std::cout << "boost::mutex:     ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endBoostMutex - startBoostMutex).count();
    std::cout << "us \n\r";

#ifdef _WIN32
    auto startCritSec = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < threadCount; ++i)
        threads.push_back(std::thread(threadFuncCritSec));
    for (std::thread& thd : threads)
        thd.join();
    threads.clear();
    auto endCritSec = std::chrono::high_resolution_clock::now();

    std::cout << "CRITICAL_SECTION: ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endCritSec - startCritSec).count();
    std::cout << "us \n\r";
#else
    auto startPThread = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < threadCount; ++i)
        threads.push_back(std::thread(threadFuncPtMutex));
    for (std::thread& thd : threads)
        thd.join();
    threads.clear();
    auto endPThread = std::chrono::high_resolution_clock::now();

    std::cout << "pthreads:         ";
    std::cout << std::chrono::duration_cast<std::chrono::microseconds>(endPThread - startPThread).count();
    std::cout << "us \n";
#endif
}

int main()
{
#ifdef _WIN32
    InitializeCriticalSection(&g_critSec);
#else
    pthread_mutex_init(&pt_mutex, 0);
#endif

    std::cout << "Iterations: " << g_cRepeatCount << "\n\r";

    for (int i = 1; i <= g_cThreadCount; i = i * 2)
    {
        testRound(i);
        std::this_thread::sleep_for(std::chrono::seconds(1));
    }

#ifdef _WIN32
    DeleteCriticalSection(&g_critSec);
#else
    pthread_mutex_destroy(&pt_mutex);
#endif
    if (rand() % 10000 == 1)
    {
        // Added 10/27/2017 to try to prevent the compiler to completely
        // optimize out the code around g_shmem if it wouldn't be used anywhere.
        std::cout << "Shared variable value: " << g_shmem << std::endl;
    }
    return 0;
}