Java 11 中 String trim() 和 strip() 方法的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51266582/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between String trim() and strip() methods in Java 11
提问by Mikhail Kholodkov
Among other changes, JDK 11 introduces 6 new methods for java.lang.String class:
在其他变化中,JDK 11 为 java.lang.String 类引入了 6 个新方法:
repeat(int)
- Repeats the String as many times as provided by theint
parameterlines()
- Uses a Spliterator to lazily provide lines from the source stringisBlank()
- Indicates if the String is empty or contains only white space charactersstripLeading()
- Removes the white space from the beginningstripTrailing()
- Removes the white space from the endstrip()
- Removes the white space from both, beginning and the end of string
repeat(int)
- 根据int
参数提供的次数重复字符串lines()
- 使用 Spliterator 懒惰地提供源字符串中的行isBlank()
- 指示字符串是否为空或仅包含空格字符stripLeading()
- 从开头删除空白stripTrailing()
- 从末尾删除空格strip()
- 从字符串的开头和结尾删除空格
In particular, strip()
looks very similar to trim()
. As per this articlestrip*()
methods are designed to:
特别是,strip()
看起来非常类似于trim()
. 根据本文,strip*()
方法旨在:
The String.strip(), String.stripLeading(), and String.stripTrailing() methods trim white space [as determined by Character.isWhiteSpace()] off either the front, back, or both front and back of the targeted String.
String.strip()、String.stripLeading() 和 String.stripTrailing() 方法将空白 [由 Character.isWhiteSpace() 确定] 从目标字符串的前面、后面或前面和后面修剪掉。
String.trim()
JavaDoc states:
String.trim()
JavaDoc 指出:
/**
* Returns a string whose value is this string, with any leading and trailing
* whitespace removed.
* ...
*/
Which is almost identical to the quote above.
这几乎与上面的引用相同。
What exactly the difference between String.trim()
and String.strip()
since Java 11?
Java 11String.trim()
和String.strip()
自 Java 11之间到底有什么区别?
采纳答案by Mikhail Kholodkov
In short: strip()
is "Unicode-aware" evolution of trim()
.
简而言之:strip()
是trim()
.
Problem
String::trim has existed from early days of Java when Unicode had not fully evolved to the standard we widely use today.
The definition of space used by String::trim is any code point less than or equal to the space code point (\u0020), commonly referred to as ASCII or ISO control characters.
Unicode-aware trimming routines should use Character::isWhitespace(int).
Additionally, developers have not been able to specifically remove indentation white space or to specifically remove trailing white space.
Solution
Introduce trimming methods that are Unicode white space aware and provide additional control of leading only or trailing only.
问题
String::trim 从 Java 的早期就已经存在,当时 Unicode 还没有完全发展到我们今天广泛使用的标准。
String::trim 使用的空格定义是任何小于或等于空格代码点 (\u0020) 的代码点,通常称为 ASCII 或 ISO 控制字符。
Unicode 感知修剪例程应使用 Character::isWhitespace(int)。
此外,开发人员无法专门删除缩进空白或专门删除尾随空白。
解决方案
引入可识别 Unicode 空白并提供仅前导或仅尾随的附加控制的修剪方法。
A common characteristic of these new methods is that they use a different (newer) definition of "whitespace" than did old methods such as String.trim()
. Bug JDK-8200373.
这些新方法的一个共同特征是,它们使用的“空白”定义与旧方法(例如String.trim()
. 错误JDK-8200373。
The current JavaDoc for String::trim does not make it clear which definition of "space" is being used in the code. With additional trimming methods coming in the near future that use a different definition of space, clarification is imperative. String::trim uses the definition of space as any codepoint that is less than or equal to the space character codepoint (\u0020.) Newer trimming methods will use the definition of (white) space as any codepoint that returns true when passed to the Character::isWhitespace predicate.
String::trim 的当前 JavaDoc 没有明确说明代码中使用了哪个“空间”定义。随着不久的将来出现使用不同空间定义的其他修剪方法,澄清是必要的。String::trim 使用空格的定义作为任何小于或等于空格字符代码点 (\u0020.) 的代码点。较新的修剪方法将使用(白色)空格的定义作为任何在传递给Character::isWhitespace 谓词。
The method isWhitespace(char)
was added to Character
with JDK 1.1, but the method isWhitespace(int)
was not introduced to the Character
class until JDK 1.5. The latter method (the one accepting a parameter of type int
) was added to support supplementary characters. The Javadoc comments for the Character
class define supplementary characters (typically modeled with int-based "code point") versus BMP characters (typically modeled with single character):
该方法isWhitespace(char)
是在Character
JDK 1.1 中添加的,但该方法直到 JDK 1.5isWhitespace(int)
才被引入到Character
类中。添加了后一种方法(接受 type 参数的方法int
)以支持补充字符。Character
该类的 Javadoc 注释定义了补充字符(通常使用基于 int 的“代码点”建模)与 BMP 字符(通常使用单个字符建模):
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values ... A char value, therefore, represents Basic Multilingual Plane (BMP) code points, including the surrogate code points, or code units of the UTF-16 encoding. An int value represents all Unicode code points, including supplementary code points. ... The methods that only accept a char value cannot support supplementary characters. ... The methods that accept an int value support all Unicode characters, including supplementary characters.
从 U+0000 到 U+FFFF 的字符集有时称为基本多语言平面 (BMP)。码位大于 U+FFFF 的字符称为增补字符。Java 平台在 char 数组以及 String 和 StringBuffer 类中使用 UTF-16 表示。在此表示中,增补字符表示为一对 char 值……因此,char 值表示基本多语言平面 (BMP) 代码点,包括代理代码点或 UTF-16 编码的代码单元。int 值表示所有 Unicode 代码点,包括补充代码点。... 仅接受 char 值的方法不能支持增补字符。...接受 int 值的方法支持所有 Unicode 字符,包括增补字符。
OpenJDK Changeset.
OpenJDK变更集。
Benchmark comparison between trim()
and strip()
- Why is String.strip() 5 times faster than String.trim() for blank string In Java 11
trim()
和之间的基准比较strip()
-为什么 String.strip() 比 String.trim() 在 Java 11 中对于空白字符串快 5 倍
回答by Michael Easter
Here is a unit-test that illustrates the answer by @MikhailKholodkov, using Java 11.
这是一个单元测试,它说明了@MikhailKholodkov 使用 Java 11 给出的答案。
(Note that \u2000
is above \u0020
and not considered whitespace by trim()
)
(请注意,\u2000
在上面\u0020
并且不被视为空格trim()
)
public class StringTestCase {
@Test
public void testSame() {
String s = "\t abc \n";
assertEquals("abc", s.trim());
assertEquals("abc", s.strip());
}
@Test
public void testDifferent() {
Character c = '\u2000';
String s = c + "abc" + c;
assertTrue(Character.isWhitespace(c));
assertEquals(s, s.trim());
assertEquals("abc", s.strip());
}
}