如何以最聪明的方式替换 PHP 中不同的换行符样式?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7836632/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to replace different newline styles in PHP the smartest way?
提问by Deckard
I have a text which might have different newline styles. I want to replace all newlines '\r\n', '\n','\r' with the same newline (in this case \r\n ).
我有一个可能有不同换行符样式的文本。我想用相同的换行符(在本例中为 \r\n )替换所有换行符 '\r\n', '\n','\r' 。
What's the fastest way to do this? My current solution looks like this which is way sucky:
这样做的最快方法是什么?我目前的解决方案看起来像这样,这很糟糕:
$sNicetext = str_replace("\r\n",'%%%%somthing%%%%', $sNicetext);
$sNicetext = str_replace(array("\r","\n"),array("\r\n","\r\n"), $sNicetext);
$sNicetext = str_replace('%%%%somthing%%%%',"\r\n", $sNicetext);
Problem is that you can't do this with one replace because the \r\n will be duplicated to \r\n\r\n .
问题是你不能用一次替换来做到这一点,因为 \r\n 将被复制到 \r\n\r\n 。
Thank you for your help!
感谢您的帮助!
回答by NikiC
$string = preg_replace('~\R~u', "\r\n", $string);
If you don't want to replace all Unicode newlines but only CRLF style ones, use:
如果您不想替换所有 Unicode 换行符而只想替换 CRLF 样式的换行符,请使用:
$string = preg_replace('~(*BSR_ANYCRLF)\R~', "\r\n", $string);
\R
matches these newlines, u
is a modifier to treat the input string as UTF-8.
\R
匹配这些换行符,u
是将输入字符串视为 UTF-8 的修饰符。
From the PCRE docs:
来自PCRE 文档:
What
\R
matchesBy default, the sequence \R in a pattern matches any Unicode newline sequence, whatever has been selected as the line ending sequence. If you specify
--enable-bsr-anycrlf
the default is changed so that \R matches only CR, LF, or CRLF. Whatever is selected when PCRE is built can be overridden when the library functions are called.
什么
\R
匹配默认情况下,模式中的序列 \R 匹配任何 Unicode 换行序列,无论已被选为行结束序列。如果您指定
--enable-bsr-anycrlf
默认值已更改,以便 \R 仅匹配 CR、LF 或 CRLF。在构建 PCRE 时选择的任何内容都可以在调用库函数时被覆盖。
and
和
Newline sequences
Outside a character class, by default, the escape sequence \R matches any Unicode newline sequence. In non-UTF-8 mode \R is equivalent to the following:
(?>\r\n|\n|\x0b|\f|\r|\x85)
This is an example of an "atomic group", details of which are given below. This particular group matches either the two-character sequence CR followed by LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab, U+000B), FF (formfeed, U+000C), CR (carriage return, U+000D), or NEL (next line, U+0085). The two-character sequence is treated as a single unit that cannot be split.
In UTF-8 mode, two additional characters whose codepoints are greater than 255 are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029). Unicode character property support is not needed for these characters to be recognized.
It is possible to restrict \R to match only CR, LF, or CRLF (instead of the complete set of Unicode line endings) by setting the option PCRE_BSR_ANYCRLF either at compile time or when the pattern is matched. (BSR is an abbrevation for "backslash R".) This can be made the default when PCRE is built; if this is the case, the other behaviour can be requested via the PCRE_BSR_UNICODE option. It is also possible to specify these settings by starting a pattern string with one of the following sequences:
(*BSR_ANYCRLF) CR, LF, or CRLF only (*BSR_UNICODE) any Unicode newline sequence
These override the default and the options given to pcre_compile() or pcre_compile2(), but they can be overridden by options given to pcre_exec() or pcre_dfa_exec(). Note that these special settings, which are not Perl-compatible, are recognized only at the very start of a pattern, and that they must be in upper case. If more than one of them is present, the last one is used. They can be combined with a change of newline convention; for example, a pattern can start with:
(*ANY)(*BSR_ANYCRLF)
They can also be combined with the (*UTF8) or (*UCP) special sequences. Inside a character class, \R is treated as an unrecognized escape sequence, and so matches the letter "R" by default, but causes an error if PCRE_EXTRA is set.
换行序列
在字符类之外,默认情况下,转义序列 \R 匹配任何 Unicode 换行符序列。在非 UTF-8 模式下,\R 等效于以下内容:
(?>\r\n|\n|\x0b|\f|\r|\x85)
这是一个“原子组”的例子,其细节在下面给出。此特定组匹配两个字符序列 CR 后跟 LF,或单个字符 LF(换行,U+000A)、VT(垂直制表符,U+000B)、FF(换页,U+000C)、CR 之一(回车,U+000D)或 NEL(下一行,U+0085)。两个字符的序列被视为一个不能拆分的单元。
在 UTF-8 模式下,额外添加了两个代码点大于 255 的字符:LS(行分隔符,U+2028)和 PS(段落分隔符,U+2029)。识别这些字符不需要 Unicode 字符属性支持。
通过在编译时或模式匹配时设置选项 PCRE_BSR_ANYCRLF,可以将 \R 限制为仅匹配 CR、LF 或 CRLF(而不是完整的 Unicode 行尾集)。(BSR 是“反斜杠 R”的缩写。)这可以在构建 PCRE 时设为默认值;如果是这种情况,可以通过 PCRE_BSR_UNICODE 选项请求其他行为。还可以通过使用以下序列之一启动模式字符串来指定这些设置:
(*BSR_ANYCRLF) CR, LF, or CRLF only (*BSR_UNICODE) any Unicode newline sequence
这些覆盖了默认值和提供给 pcre_compile() 或 pcre_compile2() 的选项,但它们可以被提供给 pcre_exec() 或 pcre_dfa_exec() 的选项覆盖。请注意,这些与 Perl 不兼容的特殊设置只能在模式的最开始被识别,并且它们必须是大写的。如果存在多个,则使用最后一个。它们可以与换行约定的改变相结合;例如,一个模式可以以:
(*ANY)(*BSR_ANYCRLF)
它们还可以与 (*UTF8) 或 (*UCP) 特殊序列组合。在字符类中,\R 被视为无法识别的转义序列,因此默认匹配字母“R”,但如果设置 PCRE_EXTRA 会导致错误。
回答by Alix Axel
To normalize newlines I always use:
为了规范换行,我总是使用:
$str = preg_replace('~\r\n?~', "\n", $str);
It replaces the old Mac (\r
) and the Windows (\r\n
) newlines with the Unix equivalent (\n
).
它用Unix 等价物 ( )替换旧的 Mac ( \r
) 和 Windows ( \r\n
) 换行符\n
。
I preffer using \n
because it only takes one byte instead of two, but you can easily change it to \r\n
.
我更喜欢使用,\n
因为它只需要一个字节而不是两个字节,但是您可以轻松地将其更改为\r\n
.
回答by Tomalak
How about
怎么样
$sNicetext = preg_replace('/\r\n|\r|\n/', "\r\n", $sNicetext);
回答by Roey
i think the smartest/simplest way to convert to CRLF is:
我认为转换为 CRLF 的最聪明/最简单的方法是:
$output = str_replace("\n", "\r\n", str_replace("\r", '', $input));
to convert to LF only:
仅转换为 LF:
$output = str_replace("\r", '', $input);
it's much more easier than regular expressions.
它比正则表达式容易得多。