我可以在 C++ 函数 getline 中使用 2 个或更多分隔符吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37957080/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can I use 2 or more delimiters in C++ function getline?
提问by
I would like to know how can I use 2 or more delimiters in the getline functon, that's my problem:
我想知道如何在 getline 函数中使用 2 个或更多分隔符,这是我的问题:
The program reads a text file... each line is goning to be like:
该程序读取一个文本文件......每一行都将是这样的:
New Your, Paris, 100
CityA, CityB, 200
I am using getline(file, line), but I got the whole line, when I want to to get CityA, then CityB and then the number; and if I use ',' delimiter, I won't know when is the next line, so I'm trying to figure out some solution..
我正在使用 getline(file, line),但我得到了整行,当我想得到 CityA,然后是 CityB,然后是数字;如果我使用 ',' 分隔符,我不知道下一行是什么时候,所以我试图找出一些解决方案..
Though, how could I use comma and \n as a delimiter? By the way,I'm manipulating string type,not char, so strtok is not possible :/
但是,我怎么能使用逗号和 \n 作为分隔符呢?顺便说一句,我正在操作字符串类型,而不是字符,所以 strtok 是不可能的:/
some scratch:
一些划痕:
string line;
ifstream file("text.txt");
if(file.is_open())
while(!file.eof()){
getline(file, line);
// here I need to get each string before comma and \n
}
采纳答案by WhiZTiM
You can read a line using std::getline
, then pass the line to a std::stringstream
and read the comma separated values off it
您可以使用 读取一行std::getline
,然后将该行传递给 astd::stringstream
并从中读取逗号分隔的值
string line;
ifstream file("text.txt");
if(file.is_open()){
while(getline(file, line)){ // get a whole line
std::stringstream ss(line);
while(getline(ss, line, ',')){
// You now have separate entites here
}
}
回答by Sam Varshavchik
No, std::getline
() only accepts a single character, to override the default delimiter. std::getline()
does not have an option for multiple alternate delimiters.
不,std::getline
() 只接受单个字符,以覆盖默认分隔符。std::getline()
没有多个备用分隔符的选项。
The correct way to parse this kind of input is to use the default std::getline
() to read the entire line into a std::string
, then construct a std::istringstream
, and then parse it further, into comma-separate values.
解析这种输入的正确方法是使用 default std::getline
() 将整行读入 a std::string
,然后构造 a std::istringstream
,然后进一步解析为逗号分隔值。
However, if you are truly parsing comma-separated values, you should be using a proper CSV parser.
但是,如果您真的要解析逗号分隔值,则应该使用正确的 CSV 解析器。
回答by Christopher Oicles
Often, it is more intuitive and efficient to parse character input in a hierarchical, tree-like manner, where you start by splitting the string into its major blocks, then go on to process each of the blocks, splitting them up into smaller parts, and so on.
通常,以分层的树状方式解析字符输入更直观和有效,首先将字符串拆分为主要块,然后继续处理每个块,将它们拆分为更小的部分,等等。
An alternative to this is to tokenize like strtok
does -- from the beginning of input, handling one token at a time until the end of input is encountered. This may be preferred when parsing simple inputs, because its is straightforward to implement. This style can also be used when parsing inputs with nested structure, but this requires maintaining some kind of context information, which might grow too complex to maintain inside a single function or limited region of code.
另一种方法是像strtok
这样标记化——从输入的开始,一次处理一个标记,直到遇到输入的结尾。这在解析简单输入时可能是首选,因为它易于实现。在解析具有嵌套结构的输入时也可以使用这种样式,但这需要维护某种上下文信息,这些信息可能会变得过于复杂而无法在单个函数或有限的代码区域内进行维护。
Someone relying on the C++ std library usually ends up using a std::stringstream
, along with std::getline
to tokenize string input. But, this only gives you one delimiter. They would never consider using strtok
, because it is a non-reentrant piece of junk from the C runtime library. So, they end up using streams, and with only one delimiter, one is obligated to use a hierarchical parsing style.
依赖 C++ std 库的人通常最终使用std::stringstream
, 以及std::getline
标记字符串输入。但是,这只会给你一个分隔符。他们永远不会考虑使用strtok
,因为它是来自 C 运行时库的不可重入的垃圾。因此,他们最终使用流,并且只有一个分隔符,因此必须使用分层解析样式。
But zneak brought up std::string::find_first_of
, which takes a set of characters and returns the position nearest to the beginning of the string containing a character from the set. And there are other member functions: find_last_of
, find_first_not_of
, and more, which seem to exist for the sole purpose of parsing strings. But std::string
stops short of providing useful tokenizing functions.
但是 zneak 提出了std::string::find_first_of
,它接受一组字符并返回最接近包含该组字符的字符串开头的位置。还有其他成员函数:find_last_of
, find_first_not_of
, 等等,它们似乎只是为了解析字符串而存在的。但是std::string
没有提供有用的标记化功能。
Another option is the <regex>
library, which can do anything you want, but it is new and you will need to get used to its syntax.
另一个选择是<regex>
库,它可以做任何你想做的事情,但它是新的,你需要习惯它的语法。
But, with very little effort, you can leverage existing functions in std::string
to perform tokenizing tasks, and without resorting to streams. Here is a simple example. get_to()
is the tokenizing function and tokenize
demonstrates how it is used.
但是,只需很少的努力,您就可以利用现有函数std::string
来执行标记化任务,而无需求助于流。这是一个简单的例子。get_to()
是标记函数并tokenize
演示如何使用它。
The code in this example will be slower than strtok
, because it constantly erases characters from the beginning of the string being parsed, and also copies and returns substrings. This makes the code easy to understand, but it does not mean more efficient tokenizing is impossible. It wouldn't even be that much more complicated than this -- you would just keep track of your current position, use this as the start
argument in std::string
member functions, and never alter the source string. And even better techniques exist, no doubt.
此示例中的代码将比 慢strtok
,因为它会不断地从被解析的字符串的开头擦除字符,并且还会复制并返回子字符串。这使代码易于理解,但这并不意味着更有效的标记化是不可能的。它甚至不会比这更复杂——您只需跟踪您当前的位置,将其用作成员函数中的start
参数std::string
,并且永远不会更改源字符串。毫无疑问,还有更好的技术存在。
To understand the example's code, start at the bottom, where main()
is and where you can see how the functions are used. The top of this code is dominated by basic utility functions and dumb comments.
要理解示例的代码,请从底部开始main()
,您可以在何处查看函数的使用方式。这段代码的顶部是基本的实用函数和愚蠢的注释。
#include <iostream>
#include <string>
#include <utility>
namespace string_parsing {
// in-place trim whitespace off ends of a std::string
inline void trim(std::string &str) {
auto space_is_it = [] (char c) {
// A few asks:
// * Suppress criticism WRT localization concerns
// * Avoid jumping to conclusions! And seeing monsters everywhere!
// Things like...ah! Believing "thoughts" that assumptions were made
// regarding character encoding.
// * If an obvious, portable alternative exists within the C++ Standard Library,
// you will see it in 2.0, so no new defect tickets, please.
// * Go ahead and ignore the rumor that using lambdas just to get
// local function definitions is "cheap" or "dumb" or "ignorant."
// That's the latest round of FUD from...*mumble*.
return c > '{1.5, null, 88, 'hi, {there}!'}
[] {
[1.5] ,
[null] ,
[88] ,
[] '
[hi, {there}!] '
[] }
' && c <= ' ';
};
for(auto rit = str.rbegin(); rit != str.rend(); ++rit) {
if(!space_is_it(*rit)) {
if(rit != str.rbegin()) {
str.erase(&*rit - &*str.begin() + 1);
}
for(auto fit=str.begin(); fit != str.end(); ++fit) {
if(!space_is_it(*fit)) {
if(fit != str.begin()) {
str.erase(str.begin(), fit);
}
return;
} } } }
str.clear();
}
// get_to(string, <delimiter set> [, delimiter])
// The input+output argument "string" is searched for the first occurance of one
// from a set of delimiters. All characters to the left of, and the delimiter itself
// are deleted in-place, and the substring which was to the left of the delimiter is
// returned, with whitespace trimmed.
// <delimiter set> is forwarded to std::string::find_first_of, so its type may match
// whatever this function's overloads accept, but this is usually expressed
// as a string literal: ", \n" matches commas, spaces and linefeeds.
// The optional output argument "found_delimiter" receives the delimiter character just found.
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters, char& found_delimiter) {
const auto pos = str.find_first_of(std::forward<D>(delimiters));
if(pos == std::string::npos) {
// When none of the delimiters are present,
// clear the string and return its last value.
// This effectively makes the end of a string an
// implied delimiter.
// This behavior is convenient for parsers which
// consume chunks of a string, looping until
// the string is empty.
// Without this feature, it would be possible to
// continue looping forever, when an iteration
// leaves the string unchanged, usually caused by
// a syntax error in the source string.
// So the implied end-of-string delimiter takes
// away the caller's burden of anticipating and
// handling the range of possible errors.
found_delimiter = '##代码##';
std::string result;
std::swap(result, str);
trim(result);
return result;
}
found_delimiter = str[pos];
auto left = str.substr(0, pos);
trim(left);
str.erase(0, pos + 1);
return left;
}
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters) {
char discarded_delimiter;
return get_to(str, std::forward<D>(delimiters), discarded_delimiter);
}
inline std::string pad_right(const std::string& str,
std::string::size_type min_length,
char pad_char=' ')
{
if(str.length() >= min_length ) return str;
return str + std::string(min_length - str.length(), pad_char);
}
inline void tokenize(std::string source) {
std::cout << source << "\n\n";
bool quote_opened = false;
while(!source.empty()) {
// If we just encountered an open-quote, only include the quote character
// in the delimiter set, so that a quoted token may contain any of the
// other delimiters.
const char* delimiter_set = quote_opened ? "'" : ",'{}";
char delimiter;
auto token = get_to(source, delimiter_set, delimiter);
quote_opened = delimiter == '\'' && !quote_opened;
std::cout << " " << pad_right('[' + token + ']', 16)
<< " " << delimiter << '\n';
}
std::cout << '\n';
}
}
int main() {
string_parsing::tokenize("{1.5, null, 88, 'hi, {there}!'}");
}
This outputs:
这输出:
##代码##回答by Scott Hunter
I don't think that's how you should attack the problem (even if you could do it); instead:
我认为这不是您应该解决问题的方式(即使您可以做到);反而:
- Use what you have to read in each line
- Then split up that line by the commas to get the pieces that you want.
- 使用您在每一行中阅读的内容
- 然后用逗号分割该行以获得您想要的部分。
If strtok
will do the job for #2, you can always convert your string into a char array.
如果strtok
将完成#2 的工作,您始终可以将字符串转换为字符数组。