Java 如何使用 Jsoup 删除硬空间?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21137892/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove hard spaces with Jsoup?
提问by Carlos Goce
I'm trying to remove hard spaces (from
entities in the HTML). I can't remove it with .trim()
or .replace(" ", "")
, etc! I don't get it.
我正在尝试删除硬空格(从
HTML 中的实体)。我不能用.trim()
or.replace(" ", "")
等删除它!我不明白。
I even found on Stackoverflow to try with \\u00a0
but didn't work neither.
我什至在 Stackoverflow 上找到了尝试,\\u00a0
但也没有奏效。
I tried this (since text()
returns actual hard space characters, U+00A0):
我试过这个(因为text()
返回实际的硬空间字符,U+00A0):
System.out.println( "'"+fields.get(6).text().replace("\u00a0", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().replace(" ", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().trim()+"'"); //'94,00 '
System.out.println( "'"+fields.get(6).html().replace(" ", "")+"'"); //'94,00' works
But I can't figure out why I can't remove the white space with .text()
.
但我不明白为什么我不能用.text()
.
采纳答案by T.J. Crowder
Your first attempt was very nearlyit, you're quite right that Jsoup maps
to U+00A0. You just don't want the double backslash in your string:
您的第一次尝试非常接近,Jsoup 映射
到 U+00A0是完全正确的。您只是不希望字符串中出现双反斜杠:
System.out.println( "'"+fields.get(6).text().replace("\u00a0", "")+"'" ); //'94,00'
// Just one ------------------------------------------^
replace
doesn't use regular expressions, so you aren't trying to pass a literal backslash through to the regex level. You just want to specify character U+00A0 in the string.
replace
不使用正则表达式,因此您不会尝试将文字反斜杠传递到正则表达式级别。您只想在字符串中指定字符 U+00A0。
回答by Ovokerie Ogbeta
The question has been edited to reflect the true problem.
该问题已被编辑以反映真正的问题。
New answer;
The hardspace, ie. entity (Unicode character NO-BREAK SPACE U+00A0 ) can in Java be represented by the character \u00a0,
thus code becomes, where str
is the string gotten from the text()
method
新答案;硬空间,即。实体(Unicode 字符 NO-BREAK SPACE U+00A0 )在 Java 中可以用这样的字符表示,\u00a0,
代码变成,str
从text()
方法中得到的字符串在哪里
str.replaceAll ("\u00a0", "");
Old answer; Using the JSoup library,
旧答案;使用 JSoup 库,
import org.jsoup.parser.Parser;
String str1 = Parser.unescapeEntities("last week, Ovokerie Ogbeta", false);
String str2 = Parser.unescapeEntities("Entered » Here", false);
System.out.println(str1 + " " + str2);
Prints out:
打印出来:
last week, Ovokerie Ogbeta Entered ? Here