C++ QString::split() 和 "\r", "\n" 和 "\r\n" 约定
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10348292/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
QString::split() and "\r", "\n" and "\r\n" convention
提问by sashoalm
I understand that QString::split
should be used to get a QStringList
from a multiline QString
. But if I have a file and I don't know if it comes from Mac, Windows or Unix, I'm not sure if QString.split("\n")
would work well in all the cases. What is the best way to handle this situation?
我知道QString::split
应该用于QStringList
从 multiline 中获取 a QString
。但是如果我有一个文件并且我不知道它是来自 Mac、Windows 还是 Unix,我不确定是否QString.split("\n")
在所有情况下都能正常工作。处理这种情况的最佳方法是什么?
回答by Emanuele Bezzi
If it's acceptable to remove blank lines, you can try:
如果可以接受删除空行,您可以尝试:
QString.split(QRegExp("[\r\n]"),QString::SkipEmptyParts);
This splits the string whenever any of the newline character (either line feed or carriage return) is found. Any consecutive line breaks (e.g. \r\n\r\n
or \n\n
) will be considered multiple delimiters with empty parts between them, which will be skipped.
只要找到任何换行符(换行符或回车符),就会拆分字符串。任何连续的换行符(例如\r\n\r\n
或\n\n
)将被视为多个分隔符,它们之间有空部分,将被跳过。
回答by Keith Thompson
Emanuele Bezzi's answermisses a couple of points.
In most cases, a string read from a text file will have been read using a text stream, which automatically translates the OS's end-of-line representation to a single '\n'
character. So if you're dealing with native text files, '\n'
should be the only delimiter you need to worry about. For example, if your program is running on a Windows system, reading input in text mode, line endings will be marked in memorywith single \n
characters; you'll never see the "\r\n"
pairs that exist in the file.
在大多数情况下,从文本文件中读取的字符串将使用文本流读取,该流将操作系统的行尾表示自动转换为单个'\n'
字符。因此,如果您正在处理本机文本文件,则'\n'
应该是唯一需要担心的分隔符。例如,如果您的程序运行在 Windows 系统上,以文本模式读取输入,则行尾将在内存中标记为单个\n
字符;你永远不会看到"\r\n"
文件中存在的对。
But sometimes you do need to deal with "foreign" text files.
但有时您确实需要处理“外部”文本文件。
Ideally, you should probably translate any such files to the local format before reading them, which avoids the issue. Only the translation utility needs to be aware of variant line endings; everything else just deals with text.
理想情况下,您应该在阅读之前将任何此类文件转换为本地格式,从而避免出现此问题。只有翻译实用程序需要知道变体行尾;其他一切都只处理文本。
But that's not always possible; sometimes you might want your program to handle Windows text files when running on a POSIX system (Linux, UNIX, etc.), or vice versa.
但这并不总是可能的。有时您可能希望程序在 POSIX 系统(Linux、UNIX 等)上运行时处理 Windows 文本文件,反之亦然。
A Windows-format text file on a POSIX system will appear to have an extra '\r'
character at the end of each line.
POSIX 系统上的 Windows 格式文本文件将'\r'
在每行末尾显示一个额外的字符。
A POSIX-format text file on a Windows system will appear to consist of one very long line with embedded '\n'
characters.
Windows 系统上的 POSIX 格式文本文件将显示为包含一个非常长的带有嵌入'\n'
字符的行。
The most general approach is to read the file in binary mode and deal with the line endings explicitly.
最通用的方法是以二进制模式读取文件并显式处理行尾。
I'm not familiar with QString.split
, but I suspect that this:
我不熟悉QString.split
,但我怀疑这个:
QString.split(QRegExp("[\r\n]"),QString::SkipEmptyParts);
will ignore empty lines, which will appear either as "\n\n"
or as "\r\n\r\n"
, depending on the format. Empty lines are perfectly valid text data; you shouldn't ignore them unless you're certain that it makes sense to do so.
将忽略空行,空行将显示为"\n\n"
或"\r\n\r\n"
,具体取决于格式。空行是完全有效的文本数据;除非您确定这样做有意义,否则您不应忽略它们。
If you need to deal with text input delimited either by "\n"
, "\r\n"
, or "\r"
, then I think something like this:
如果你需要处理的文本输入分隔符或者通过"\n"
,"\r\n"
或者"\r"
,那我觉得是这样的:
QString.split(QRegExp("\n|\r\n|\r"));
would do the job. (Thanks to parsley72's comment for helping me with the regular expression syntax.)
会做的工作。(感谢 parsley72 的评论帮助我使用正则表达式语法。)
Another point: you're probably not likely to encounter text files that use just '\r'
to delimit lines. That's the format used by MacOS up to version 9. MaxOS X is based on UNIX, and it uses standard UNIX-style '\n'
line endings (though it probably tolerates '\r'
line endings as well).
另一点:您可能不会遇到仅'\r'
用于分隔行的文本文件。这是 MacOS 版本 9 之前使用的格式。MaxOS X 基于 UNIX,它使用标准的 UNIX 样式的'\n'
行尾(尽管它也可能容忍'\r'
行尾)。