使用字符串分隔符(标准 C++)解析(拆分)C++ 中的字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14265581/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parse (split) a string in C++ using string delimiter (standard C++)
提问by TheCrazyProgrammer
I am parsing a string in C++ using the following:
我正在使用以下内容解析 C++ 中的字符串:
using namespace std;
string parsed,input="text to be parsed";
stringstream input_stringstream(input);
if (getline(input_stringstream,parsed,' '))
{
// do some processing.
}
Parsing with a single char delimiter is fine. But what if I want to use a string as delimiter.
使用单个字符分隔符进行解析很好。但是如果我想使用字符串作为分隔符怎么办。
Example: I want to split:
示例:我想拆分:
scott>=tiger
with >=
as delimiter so that I can get scott and tiger.
用>=
作为分隔符,这样我就可以得到斯科特和老虎。
回答by Vincenzo Pii
You can use the std::string::find()
function to find the position of your string delimiter, then use std::string::substr()
to get a token.
您可以使用该std::string::find()
函数查找字符串分隔符的位置,然后使用它std::string::substr()
来获取令牌。
Example:
例子:
std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
The
find(const string& str, size_t pos = 0)
function returns the position of the first occurrence ofstr
in the string, ornpos
if the string is not found.The
substr(size_t pos = 0, size_t n = npos)
function returns a substring of the object, starting at positionpos
and of lengthnpos
.
该
find(const string& str, size_t pos = 0)
函数返回str
字符串中第一次出现的位置,或者npos
如果未找到该字符串。该
substr(size_t pos = 0, size_t n = npos)
函数返回对象的子字符串,从位置pos
和长度开始npos
。
If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());
):
如果您有多个分隔符,在提取一个标记后,您可以将其删除(包括分隔符)以进行后续提取(如果您想保留原始字符串,只需使用s = s.substr(pos + delimiter.length());
):
s.erase(0, s.find(delimiter) + delimiter.length());
This way you can easily loop to get each token.
通过这种方式,您可以轻松地循环获取每个令牌。
Complete Example
完整示例
std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";
size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
token = s.substr(0, pos);
std::cout << token << std::endl;
s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;
Output:
输出:
scott
tiger
mushroom
回答by moswald
This method uses std::string::find
without mutating the original string by remembering the beginning and end of the previous substring token.
此方法std::string::find
通过记住前一个子字符串标记的开头和结尾来使用而不改变原始字符串。
#include <iostream>
#include <string>
int main()
{
std::string s = "scott>=tiger";
std::string delim = ">=";
auto start = 0U;
auto end = s.find(delim);
while (end != std::string::npos)
{
std::cout << s.substr(start, end - start) << std::endl;
start = end + delim.length();
end = s.find(delim, start);
}
std::cout << s.substr(start, end);
}
回答by Sviatoslav
You can use next function to split string:
您可以使用 next 函数来拆分字符串:
vector<string> split(const string& str, const string& delim)
{
vector<string> tokens;
size_t prev = 0, pos = 0;
do
{
pos = str.find(delim, prev);
if (pos == string::npos) pos = str.length();
string token = str.substr(prev, pos-prev);
if (!token.empty()) tokens.push_back(token);
prev = pos + delim.length();
}
while (pos < str.length() && prev < str.length());
return tokens;
}
回答by Arafat Hasan
For string delimiter
对于字符串分隔符
Split string based on a string delimiter. Such as splitting string "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih"
based on string delimiter "-+"
, output will be {"adsf", "qwret", "nvfkbdsj", "orthdfjgh", "dfjrleih"}
基于字符串分隔符拆分字符串。如"adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih"
根据字符串分隔符拆分字符串"-+"
,输出将是{"adsf", "qwret", "nvfkbdsj", "orthdfjgh", "dfjrleih"}
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
// for string delimiter
vector<string> split (string s, string delimiter) {
size_t pos_start = 0, pos_end, delim_len = delimiter.length();
string token;
vector<string> res;
while ((pos_end = s.find (delimiter, pos_start)) != string::npos) {
token = s.substr (pos_start, pos_end - pos_start);
pos_start = pos_end + delim_len;
res.push_back (token);
}
res.push_back (s.substr (pos_start));
return res;
}
int main() {
string str = "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih";
string delimiter = "-+";
vector<string> v = split (str, delimiter);
for (auto i : v) cout << i << endl;
return 0;
}
Output
输出
adsf qwret nvfkbdsj orthdfjgh dfjrleih
For single character delimiter
对于单字符分隔符
Split string based on a character delimiter. Such as splitting string "adsf+qwer+poui+fdgh"
with delimiter "+"
will output {"adsf", "qwer", "poui", "fdg"h}
基于字符分隔符拆分字符串。比如"adsf+qwer+poui+fdgh"
用分隔符分割字符串"+"
会输出{"adsf", "qwer", "poui", "fdg"h}
#include <iostream>
#include <sstream>
#include <vector>
using namespace std;
vector<string> split (const string &s, char delim) {
vector<string> result;
stringstream ss (s);
string item;
while (getline (ss, item, delim)) {
result.push_back (item);
}
return result;
}
int main() {
string str = "adsf+qwer+poui+fdgh";
vector<string> v = split (str, '+');
for (auto i : v) cout << i << endl;
return 0;
}
Output
输出
adsf qwer poui fdgh
回答by William Cuervo
This code splits lines from text, and add everyone into a vector.
此代码从文本中拆分行,并将每个人添加到一个向量中。
vector<string> split(char *phrase, string delimiter){
vector<string> list;
string s = string(phrase);
size_t pos = 0;
string token;
while ((pos = s.find(delimiter)) != string::npos) {
token = s.substr(0, pos);
list.push_back(token);
s.erase(0, pos + delimiter.length());
}
list.push_back(s);
return list;
}
Called by:
调用者:
vector<string> listFilesMax = split(buffer, "\n");
回答by ryanbwork
strtokallows you to pass in multiple chars as delimiters. I bet if you passed in ">=" your example string would be split correctly (even though the > and = are counted as individual delimiters).
strtok允许您传入多个字符作为分隔符。我敢打赌,如果您传入 ">=" 您的示例字符串将被正确拆分(即使 > 和 = 被视为单独的分隔符)。
EDIT if you don't want to use c_str()
to convert from string to char*, you can use substrand find_first_ofto tokenize.
编辑如果您不想使用c_str()
从字符串转换为 char*,您可以使用substr和find_first_of来标记化。
string token, mystring("scott>=tiger");
while(token != mystring){
token = mystring.substr(0,mystring.find_first_of(">="));
mystring = mystring.substr(mystring.find_first_of(">=") + 1);
printf("%s ",token.c_str());
}
回答by Beder Acosta Borges
Here's my take on this. It handles the edge cases and takes an optional parameter to remove empty entries from the results.
这是我对此的看法。它处理边缘情况并采用可选参数从结果中删除空条目。
bool endsWith(const std::string& s, const std::string& suffix)
{
return s.size() >= suffix.size() &&
s.substr(s.size() - suffix.size()) == suffix;
}
std::vector<std::string> split(const std::string& s, const std::string& delimiter, const bool& removeEmptyEntries = false)
{
std::vector<std::string> tokens;
for (size_t start = 0, end; start < s.length(); start = end + delimiter.length())
{
size_t position = s.find(delimiter, start);
end = position != string::npos ? position : s.length();
std::string token = s.substr(start, end - start);
if (!removeEmptyEntries || !token.empty())
{
tokens.push_back(token);
}
}
if (!removeEmptyEntries &&
(s.empty() || endsWith(s, delimiter)))
{
tokens.push_back("");
}
return tokens;
}
Examples
例子
split("a-b-c", "-"); // [3]("a","b","c")
split("a--c", "-"); // [3]("a","","c")
split("-b-", "-"); // [3]("","b","")
split("--c--", "-"); // [5]("","","c","","")
split("--c--", "-", true); // [1]("c")
split("a", "-"); // [1]("a")
split("", "-"); // [1]("")
split("", "-", true); // [0]()
回答by hmofrad
This should work perfectly for string (or single character) delimiters. Don't forget to include #include <sstream>
.
这对于字符串(或单个字符)分隔符应该非常有效。不要忘记包含#include <sstream>
.
std::string input = "Alfa=,+Bravo=,+Charlie=,+Delta";
std::string delimiter = "=,+";
std::istringstream ss(input);
std::string token;
std::string::iterator it;
while(std::getline(ss, token, *(it = delimiter.begin()))) {
while(*(++it)) ss.get();
std::cout << token << " " << '\n';
}
The first while loop extracts a token using the first character of the string delimiter. The second while loop skips the rest of the delimiter and stops at the beginning of the next token.
第一个 while 循环使用字符串分隔符的第一个字符提取标记。第二个 while 循环跳过分隔符的其余部分并在下一个标记的开头停止。
回答by Shubham Agrawal
Answer is already there, but selected-answer uses erase function which is very costly, think of some very big string(in MBs). Therefore I use below function.
答案已经存在,但是 selected-answer 使用了非常昂贵的擦除功能,想想一些非常大的字符串(以 MB 为单位)。因此我使用以下功能。
vector<string> split(const string& i_str, const string& i_delim)
{
vector<string> result;
size_t found = i_str.find(i_delim);
size_t startIndex = 0;
while(found != string::npos)
{
string temp(i_str.begin()+startIndex, i_str.begin()+found);
result.push_back(temp);
startIndex = found + i_delim.size();
found = i_str.find(i_delim, startIndex);
}
if(startIndex != i_str.size())
result.push_back(string(i_str.begin()+startIndex, i_str.end()));
return result;
}
回答by Benjamin Lindley
I would use boost::tokenizer
. Here's documentation explaining how to make an appropriate tokenizer function: http://www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm
我会使用boost::tokenizer
. 以下文档解释了如何制作适当的标记器函数:http: //www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm
Here's one that works for your case.
这是一种适用于您的情况。
struct my_tokenizer_func
{
template<typename It>
bool operator()(It& next, It end, std::string & tok)
{
if (next == end)
return false;
char const * del = ">=";
auto pos = std::search(next, end, del, del + 2);
tok.assign(next, pos);
next = pos;
if (next != end)
std::advance(next, 2);
return true;
}
void reset() {}
};
int main()
{
std::string to_be_parsed = "1) one>=2) two>=3) three>=4) four";
for (auto i : boost::tokenizer<my_tokenizer_func>(to_be_parsed))
std::cout << i << '\n';
}