C++ 按字符拆分字符串

Question

提问by Ali

I know this is a quite easy problem but I just want to solve it for myself once and for all

我知道这是一个很简单的问题，但我只想为自己一劳永逸地解决它

I would simply like to split a string into an array using a character as the split delimiter. (Much like the C#'s famous .Split()function. I can of course apply the brute-force approach but I wonder if there anything better than that.

我只想使用字符作为拆分分隔符将字符串拆分为数组。（很像 C# 著名的.Split()函数。我当然可以应用蛮力方法，但我想知道是否有比这更好的方法。

So far the I've searched and probably the closestsolution approach is the usage of strtok(), however due to it's inconvenience(converting your string to a char array etc.) I do not like using it. Is there any easier way to implement this?

到目前为止，我已经搜索过并且可能最接近的解决方案是使用strtok()，但是由于它不方便（将字符串转换为字符数组等），我不喜欢使用它。有没有更简单的方法来实现这一点？

Note: I wanted to emphasize this because people might ask "How come brute-force doesn't work". My brute-force solution was to create a loop, and use the substr()function inside. However since it requires the starting pointand the length, it fails when I want to split a date. Because user might enter it as 7/12/2012 or 07/3/2011, where I can really tell the length before calculating the next location of '/' delimiter.

注意：我想强调这一点，因为人们可能会问“为什么蛮力不起作用”。我的蛮力解决方案是创建一个循环，并在其中使用substr()函数。但是，由于它需要起点和长度，因此当我想拆分日期时它会失败。因为用户可能将其输入为 7/12/2012 或 07/3/2011，在计算“/”分隔符的下一个位置之前，我可以真正知道长度。

Answer 1

回答by thelazydeveloper

Using vectors, strings and stringstream. A tad cumbersome but it does the trick.

使用向量、字符串和字符串流。有点麻烦，但它确实有效。

std::stringstream test("this_is_a_test_string");
std::string segment;
std::vector<std::string> seglist;

while(std::getline(test, segment, '_'))
{
   seglist.push_back(segment);
}

Which results in a vector with the same contents as

这会产生与以下内容相同的向量

std::vector<std::string> seglist{ "this", "is", "a", "test", "string" };

Answer 2

回答by Ben Cottrell

Another way (C++11/boost) for people who like RegEx. Personally I'm a big fan of RegEx for this kind of data. IMO it's far more powerful than simply splitting strings using a delimiter since you can choose to be be a lot smarter about what constitutes "valid" data if you wish.

喜欢 RegEx 的人的另一种方式（C++11/boost）。就我个人而言，我非常喜欢 RegEx 处理此类数据。IMO 它比简单地使用分隔符拆分字符串要强大得多，因为如果您愿意，您可以选择更聪明地了解“有效”数据的构成。

#include <string>
#include <algorithm>    // copy
#include <iterator>     // back_inserter
#include <regex>        // regex, sregex_token_iterator
#include <vector>

int main()
{
    std::string str = "08/04/2012";
    std::vector<std::string> tokens;
    std::regex re("\d+");

    //start/end points of tokens in str
    std::sregex_token_iterator
        begin(str.begin(), str.end(), re),
        end;

    std::copy(begin, end, std::back_inserter(tokens));
}

Answer 3

回答by chrisaycock

Boost has the split()you are seeking in algorithm/string.hpp:

Boost 有你正在寻找的split()algorithm/string.hpp：

std::string sample = "07/3/2011";
std::vector<string> strs;
boost::split(strs, sample, boost::is_any_of("/"));

Answer 4

回答by Jerry Coffin

Another possibility is to imbue a stream with a locale that uses a special ctypefacet. A stream uses the ctype facet to determine what's "whitespace", which it treats as separators. With a ctype facet that classifies your separator character as whitespace, the reading can be pretty trivial. Here's one way to implement the facet:

另一种可能性是为流注入使用特殊ctype方面的语言环境。流使用 ctype facet 来确定什么是“空白”，它将其视为分隔符。使用将分隔符分类为空白的 ctype facet，读取可能非常简单。这是实现构面的一种方法：

struct field_reader: std::ctype<char> {

    field_reader(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(table_size, std::ctype_base::mask());

        // we'll assume dates are either a/b/c or a-b-c:
        rc['/'] = std::ctype_base::space;
        rc['-'] = std::ctype_base::space;
        return &rc[0];
    }
};

We use that by using imbueto tell a stream to use a locale that includes it, then read the data from that stream:

我们使用它imbue告诉流使用包含它的语言环境，然后从该流中读取数据：

std::istringstream in("07/3/2011");
in.imbue(std::locale(std::locale(), new field_reader);

With that in place, the splitting becomes almost trivial -- just initialize a vector using a couple of istream_iterators to read the pieces from the string (that's embedded in the istringstream):

有了这个，拆分变得几乎微不足道——只需使用几个istream_iterators初始化一个向量来读取字符串中的片段（嵌入在中istringstream）：

std::vector<std::string>((std::istream_iterator<std::string>(in),
                          std::istream_iterator<std::string>());

Obviously this tends toward overkill if you only use it in one place. If you use it much, however, it can go a long ways toward keeping the rest of the code quite clean.

显然，如果你只在一个地方使用它，这会趋于矫枉过正。然而，如果你经常使用它，它可以大大有助于保持代码的其余部分非常干净。

Answer 5

回答by CodeMouse92

I inherently dislike stringstream, although I'm not sure why. Today, I wrote this function to allow splitting a std::stringby any arbitrary character or string into a vector. I know this question is old, but I wanted to share an alternative way of splitting std::string.

我天生不喜欢stringstream，虽然我不确定为什么。今天，我编写了这个函数来允许将std::string任意字符或字符串拆分成一个向量。我知道这个问题很老，但我想分享另一种拆分std::string.

This code omits the part of the string you split by from the results altogether, although it could be easily modified to include them.

此代码完全省略了您从结果中拆分的字符串部分，尽管可以轻松修改以包含它们。

#include <string>
#include <vector>

void split(std::string str, std::string splitBy, std::vector<std::string>& tokens)
{
    /* Store the original string in the array, so we can loop the rest
     * of the algorithm. */
    tokens.push_back(str);

    // Store the split index in a 'size_t' (unsigned integer) type.
    size_t splitAt;
    // Store the size of what we're splicing out.
    size_t splitLen = splitBy.size();
    // Create a string for temporarily storing the fragment we're processing.
    std::string frag;
    // Loop infinitely - break is internal.
    while(true)
    {
        /* Store the last string in the vector, which is the only logical
         * candidate for processing. */
        frag = tokens.back();
        /* The index where the split is. */
        splitAt = frag.find(splitBy);
        // If we didn't find a new split point...
        if(splitAt == string::npos)
        {
            // Break the loop and (implicitly) return.
            break;
        }
        /* Put everything from the left side of the split where the string
         * being processed used to be. */
        tokens.back() = frag.substr(0, splitAt);
        /* Push everything from the right side of the split to the next empty
         * index in the vector. */
        tokens.push_back(frag.substr(splitAt+splitLen, frag.size()-(splitAt+splitLen)));
    }
}

To use, just call like so...

要使用，只需像这样调用...

std::string foo = "This is some string I want to split by spaces.";
std::vector<std::string> results;
split(foo, " ", results);

You can now access all the results in the vector at will. Simple as that - no stringstream, no third party libraries, no dropping back to C!

您现在可以随意访问向量中的所有结果。就这么简单 - 没有stringstream，没有第三方库，没有回到 C！

Answer 6

回答by Rafa? Rawicki

Take a look at boost::tokenizer

看看boost::tokenizer

If you'd like to roll up your own method, you can use std::string::find()to determine the splitting points.

如果您想汇总自己的方法，可以使用std::string::find()来确定分割点。

Answer 7

回答by xikkub

Is there a reason you don't want to convert a stringto a character array (char*) ? It's rather easy to call .c_str(). You can also use a loop and the .find()function.

您是否有不想将 a 转换string为字符数组 ( char*) 的原因？调用相当容易.c_str()。您还可以使用循环和.find()函数。

string class
string .find()
string .c_str()

字符串类
 字符串 .find()
字符串 .c_str()

Answer 8

回答by Mubin Icyer

What about erase()function? If you know exakt position in string where to split, then you can "extract" fields in string with erase().

什么erase()功能？如果您知道要拆分的字符串中的具体位置，那么您可以使用erase().

std::string date("01/02/2019");
std::string day(date);
std::string month(date);
std::string year(date);

day.erase(2, string::npos); // "01"
month.erase(0, 3).erase(2); // "02"
year.erase(0,6); // "2019"

C++ 按字符拆分字符串

提问by Ali

回答by thelazydeveloper

回答by Ben Cottrell

回答by chrisaycock

回答by Jerry Coffin

回答by CodeMouse92

回答by Rafa? Rawicki

回答by xikkub

回答by Mubin Icyer

相关推荐

最近更新

标签

C++ 按字符拆分字符串

提问by Ali

回答by thelazydeveloper

回答by Ben Cottrell

回答by chrisaycock

回答by Jerry Coffin

回答by CodeMouse92

回答by Rafa? Rawicki

回答by xikkub

回答by Mubin Icyer

相关推荐

C++ 通过引用传递数组

C ++中的base64解码片段

C++ 如何在 linux 上通过 cmake 链接 google protobuf 库？

C++ Qt 创建者，错误信息

相关推荐

最近更新

标签