C++ 直接写入 std::string 内部缓冲区

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1042940/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 18:32:32  来源:igfitidea点击:

writing directly to std::string internal buffers

c++string

提问by markh44

I was looking for a way to stuff some data into a string across a DLL boundary. Because we use different compilers, all our dll interfaces are simple char*.

我正在寻找一种方法将一些数据填充到跨越 DLL 边界的字符串中。因为我们使用不同的编译器,所以我们所有的 dll 接口都是简单的 char*。

Is there a correct way to pass a pointer into the dll function such that it is able to fill the string buffer directly?

是否有正确的方法将指针传递给 dll 函数,以便它能够直接填充字符串缓冲区?

string stringToFillIn(100, '
vector<char> vectorToFillIn(100);
FunctionInDLL( &vectorToFillIn[0], vectorToFillIn.size() );
string dllGaveUs( &vectorToFillIn[0] );
'); FunctionInDLL( stringToFillIn.c_str(), stringToFillIn.size() ); // definitely WRONG! FunctionInDLL( const_cast<char*>(stringToFillIn.data()), stringToFillIn.size() ); // WRONG? FunctionInDLL( &stringToFillIn[0], stringToFillIn.size() ); // WRONG? stringToFillIn.resize( strlen( stringToFillIn.c_str() ) );

The one that looks most promising is &stringToFillIn[0] but is that a correct way to do this, given that you'd think that string::data() == &string[0]? It seems inconsistent.

看起来最有希望的是 &stringToFillIn[0] 但这是一种正确的方法,因为您认为 string::data() == &string[0]? 似乎不一致。

Or is it better to swallow an extra allocation and avoid the question:

或者最好吞下额外的分配并避免这个问题:

std::vector<char> buffer(100);
FunctionInDLL(&buffer[0], buffer.size());
std::string stringToFillIn(&buffer[0]);

采纳答案by CAdaker

I'm not sure the standard guarantees that the data in a std::stringis stored as a char*. The most portable way I can think of is to use a std::vector, which is guaranteed to store its data in a continuous chunk of memory:

我不确定标准是否保证 a 中的数据std::string存储为char*. 我能想到的最便携的方法是使用 a std::vector,它保证将其数据存储在连续的内存块中:

std::string stringToFillIn(100, 0);
FunctionInDLL(stringToFillIn.data(), stringToFillIn.size());

This will of course require the data to be copied twice, which is a bit inefficient.

这当然需要将数据复制两次,这有点低效。

回答by markh44

After a lot more reading and digging around I've discovered that string::c_strand string::datacould legitimately return a pointer to a buffer that has nothing to do with how the string itself is stored. It's possible that the string is stored in segments for example. Writing to these buffers has an undefined effect on the contents of the string.

经过更多的阅读和挖掘,我发现string::c_str并且string::data可以合法地返回一个指向缓冲区的指针,该缓冲区与字符串本身的存储方式无关。例如,字符串可能存储在段中。写入这些缓冲区对字符串的内容有未定义的影响。

Additionally, string::operator[]should not be used to get a pointer to a sequence of characters - it should only be used for single characters. This is because pointer/array equivalence does not hold with string.

此外,string::operator[]不应用于获取指向字符序列的指针 - 它应仅用于单个字符。这是因为指针/数组等价性不适用于字符串。

What is very dangerous about this is that it can work on some implementations but then suddenly break for no apparent reason at some future date.

非常危险的是,它可以在某些实现上工作,但在未来某个日期突然无故中断。

Therefore the only safe way to do this, as others have said, is to avoid any attempt to directly write into the string buffer and use a vector, pass a pointer to the first element and then assign the string from the vector on return from the dll function.

因此,正如其他人所说,唯一安全的方法是避免任何直接写入字符串缓冲区并使用向量的尝试,将指针传递给第一个元素,然后在从向量返回时从向量中分配字符串dll 函数。

回答by Andrei Bozantan

In C++98 you should not alter the buffers returned by string::c_str()and string::data(). Also, as explained in the other answers, you should not use the string::operator[]to get a pointer to a sequence of characters - it should only be used for single characters.

在C ++ 98你不应该改变由返回的缓冲区string::c_str()string::data()。此外,如其他答案中所述,您不应该使用 thestring::operator[]来获取指向字符序列的指针 - 它应该仅用于单个字符。

Starting with C++11 the strings use contiguous memory, so you could use &string[0]to access the internal buffer.

从 C++11 开始,字符串使用连续内存,因此您可以使用它&string[0]来访问内部缓冲区。

回答by Brian Haak

As long as C++11 gives contiguous memory guaranties, in production practice this 'hacky' method is very popular:

只要 C++11 提供连续的内存保证,在生产实践中这种“hacky”方法就非常流行:

#include "mex.h"
#include <string>
void mexFunction(
    int nlhs,
    mxArray *plhs[],
    int nrhs,
    const mxArray *prhs[]
)
{
    std::string ret;
    int len = (int)mxGetN(prhs[0]);
    ret.reserve(len+1);
    mxGetString(prhs[0],&ret.front(),len+1);
    mexPrintf(ret.c_str());
}

回答by ralphtheninja

I'd not construct a std::stringand ship a pointer to the internal buffers across dll boundaries. Instead I would use either a simple charbuffer (statically or dynamically allocated). After the call to the dll returns, I'd let a std::stringtake over the result. It just feels intuitively wrong to let callees write in an internal class buffer.

我不会构建一个std::string指针,并跨 dll 边界发送指向内部缓冲区的指针。相反,我会使用一个简单的char缓冲区(静态或动态分配)。在对 dll 的调用返回后,我会让一个std::string接管结果。让被调用者在内部类缓冲区中写入只是直觉上是错误的。

回答by Roland Puntaier

Considering Patrick's comment I would say, it's OK and convenient/efficient to directly write into a std::string. I would use &s.front()to get a char *, like in this mex example:

考虑到帕特里克的评论,我会说,直接写入 std::string 是可以且方便/高效的。我会&s.front()用来得到一个char *,就像在这个 mex 例子中一样:

// allocate buffer
auto buf = std::make_unique<char[]>(len);
// read data
FunctionInDLL(buf.get(), len);
// initialize string
std::string res { buf.get() };

回答by Simeon Pilgrim

The standard part of std::stringis the API and the some of the behavior, not the memory layout of the implementation.

标准部分std::string是 API 和一些行为,而不是实现的内存布局。

Therefore if you're using different compilers you can't assume they are the same, so you'll need to transport the actual data. As others have said transport the chars and push into a new std::string.

因此,如果您使用不同的编译器,您不能假设它们是相同的,因此您需要传输实际数据。正如其他人所说,传输字符并推入新的std::string.

回答by isnullxbh

You can use char buffer allocated in unique_ptr instead vector:

您可以使用在 unique_ptr 中分配的字符缓冲区代替向量:

#include <iostream>
#include <string>
#include <sstream>

int main()
{
    std::string str;
    std::stringstream ss;
    ss << "test string";
    ss.write(&str[0], 4);       // doesn't working
    ss.write(str.data(), 4);    // doesn't working
    std::cout << str << '\n';
}

You cannot write directly into string buffer using mentioned ways such as &str[0] and str.data():

您不能使用提到的方法直接写入字符串缓冲区,例如 &str[0] 和 str.data():

##代码##

Live example.

活生生的例子

回答by Faisal Vali

You all have already addressed the contiguity issue (i.e. it's not guaranteed to be contiguous) so I'll just mention the allocation/deallocation point. I've had issues in the past where i've allocated memory in dlls (i.e. had dll return a string) that have caused errors upon destruction (outside the dll). To fix this you must ensure that your allocator and memory pool is consistent across the dll boundary. It'll save you some debugging time ;)

你们都已经解决了连续性问题(即它不能保证是连续的),所以我只提到分配/解除分配点。过去我曾遇到过在 dll 中分配内存(即 dll 返回一个字符串)的问题,这些问题在破坏时(在 dll 之外)导致错误。要解决此问题,您必须确保您的分配器和内存池跨 dll 边界保持一致。它会为您节省一些调试时间;)