C++ 如何使用 boost split 拆分字符串并忽略空值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15690389/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use boost split to split a string and ignore empty values?
提问by PhiloEpisteme
I am using boost::split to parse a data file. The data file contains lines such as the following.
我正在使用 boost::split 来解析数据文件。数据文件包含如下几行。
data.txt
数据.txt
1:1~15 ASTKGPSVFPLAPSS SVFPLAPSS -12.6 98.3
The white space between the items are tabs. The code I have to split the above line is as follows.
项目之间的空白是制表符。我必须拆分上述行的代码如下。
std::string buf;
/*Assign the line from the file to buf*/
std::vector<std::string> dataLine;
boost::split( dataLine, buf , boost::is_any_of("\t "), boost::token_compress_on); //Split data line
cout << dataLine.size() << endl;
For the above line of code I should get a print out of 5, but I get 6. I have tried to read through the documentation and this solution seems as though it should do what I want, clearly I am missing something. Thanks!
对于上面的代码行,我应该打印 5 行,但我得到 6 行。我试图通读文档,这个解决方案似乎应该按照我的意愿行事,显然我遗漏了一些东西。谢谢!
Edit: Running a forloop as follows on dataLine you get the following.
编辑:按如下方式在 dataLine 上运行 forloop 会得到以下结果。
cout << "****" << endl;
for(int i = 0 ; i < dataLine.size() ; i ++) cout << dataLine[i] << endl;
cout << "****" << endl;
****
1:1~15
ASTKGPSVFPLAPSS
SVFPLAPSS
-12.6
98.3
****
采纳答案by Oberon
Even though "adjacent separators are merged together", it seems like the trailing delimeters make the problem, since even when they are treated as one, it still isone delimeter.
即使“相邻的分隔符合并在一起”,尾随的分隔符似乎也造成了问题,因为即使将它们视为一个,它仍然是一个分隔符。
So your problem cannot be solved with split()
alone. But luckily Boost String Algo has trim()
and trim_if()
, which strip whitespace or delimeters from beginning and end of a string. So just call trim()
on buf, like this:
所以你的问题不能单独解决split()
。但幸运的是 Boost String Algo 有trim()
andtrim_if()
,它从字符串的开头和结尾去除空格或分隔符。所以只需调用trim()
buf,就像这样:
std::string buf = "1:1~15 ASTKGPSVFPLAPSS SVFPLAPSS -12.6 98.3 ";
std::vector<std::string> dataLine;
boost::trim_if(buf, boost::is_any_of("\t ")); // could also use plain boost::trim
boost::split(dataLine, buf, boost::is_any_of("\t "), boost::token_compress_on);
std::cout << out.size() << std::endl;
This question was already asked: boost::split leaves empty tokens at the beginning and end of string - is this desired behaviour?
回答by DannyK
I would recommend using C++ String Toolkit Library. This library is much faster than Boost in my opinion. I used to use Boost to split (aka tokenize) a line of text but found this library to be much more in line with what I want.
我建议使用C++ String Toolkit Library。在我看来,这个库比 Boost 快得多。我曾经使用 Boost 来分割(又名标记化)一行文本,但发现这个库更符合我想要的。
One of the great things about strtk::parse
is its conversion of tokens into their final value and checking the number of elements.
最棒的事情之一strtk::parse
是将令牌转换为它们的最终值并检查元素的数量。
you could use it as so:
你可以这样使用它:
std::vector<std::string> tokens;
// multiple delimiters should be treated as one
if( !strtk::parse( dataLine, "\t", tokens ) )
{
std::cout << "failed" << std::endl;
}
--- another version
--- 另一个版本
std::string token1;
std::string token2;
std::string token3:
float value1;
float value2;
if( !strtk::parse( dataLine, "\t", token1, token2, token3, value1, value2) )
{
std::cout << "failed" << std::endl;
// fails if the number of elements is not what you want
}
Online documentation for the library: String Tokenizer DocumentationLink to the source code: C++ String Toolkit Library
该库的在线文档: String Tokenizer Documentation源代码链接:C++ String Toolkit Library
回答by Jesse Good
Leading and trailing whitespace is intentionally left alone by boost::split
because it does not know if it is significant or not. The solution is to use boost::trim
before calling boost::split
.
前导和尾随空格被故意留下,boost::split
因为它不知道它是否重要。解决方法是boost::trim
在调用之前使用boost::split
。
#include <boost/algorithm/string/trim.hpp>
....
boost::trim(buf);