如何使用java转换特殊字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2287473/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I convert special characters using java?
提问by Vladimir
I have strings like:
我有这样的字符串:
Avery?? Laser & Inkjet Self-Adhesive
I need to convert them to
我需要将它们转换为
Avery Laser & Inkjet Self-Adhesive.
I.e. remove special characters and convert html special chars to regular ones.
即删除特殊字符并将 html 特殊字符转换为常规字符。
采纳答案by BalusC
Avery?? Laser & Inkjet Self-Adhesive
First use StringEscapeUtils#unescapeHtml4()
(or #unescapeXml()
, depending on the original format) to unescape the &
into a &
. Then use String#replaceAll()
with [^\x20-\x7e]
to get rid of characters which aren't inside the printable ASCII range.
第一次使用StringEscapeUtils#unescapeHtml4()
(或#unescapeXml()
,根据原始格式),以取消转义的&
成&
。然后使用String#replaceAll()
with[^\x20-\x7e]
去除不在可打印 ASCII 范围内的字符。
Summarized:
总结:
String clean = StringEscapeUtils.unescapeHtml4(dirty).replaceAll("[^\x20-\x7e]", "");
..which produces
..产生
Avery Laser & Inkjet Self-Adhesive
(without the trailing dot as in your example, but that wasn't present in the original ;) )
(没有您的示例中的尾随点,但原始版本中没有;))
That said, this however look like more a request to workaroundthan a request to solution. If you elaborate more about the functional requirement and/or where this string did originate, we may be able to provide the rightsolution. The ??
namely look like to be caused by using the wrong encoding to read the string in and the &
look like to be caused by using a textbased parser to read the string in instead of a fullfledged HTML parser.
也就是说,这看起来更像是对解决方法的请求,而不是对解决方案的请求。如果您详细说明功能要求和/或此字符串的来源,我们或许能够提供正确的解决方案。该??
即像通过使用了错误的编码读取字符串和引起的&
样子通过使用基于文本的解析器读取,而不是fullfledged HTML解析器的字符串引起的。
回答by Romain Linsolas
You can use the StringEscapeUtils
class from Apache Commons Textproject.
您可以使用Apache Commons Text项目中的StringEscapeUtils
类。
回答by oropher
Maybe you can use something like:
也许你可以使用类似的东西:
yourTxt = yourTxt.replaceAll("&", "&");
in some project I did something like:
在一些项目中,我做了类似的事情:
public String replaceAcutesHTML(String str) {
str = str.replaceAll("á","á");
str = str.replaceAll("é","é");
str = str.replaceAll("í","í");
str = str.replaceAll("ó","ó");
str = str.replaceAll("ú","ú");
str = str.replaceAll("Á","á");
str = str.replaceAll("É","é");
str = str.replaceAll("Í","í");
str = str.replaceAll("Ó","ó");
str = str.replaceAll("Ú","ú");
str = str.replaceAll("ñ","?");
str = str.replaceAll("Ñ","?");
return str;
}
}
回答by Bala Dutt
Incase you want to mimic what php function htmlspecialchars_decode does use php function get_html_translation_table() to dump the table and then use the java code like,
如果你想模仿什么 php 函数 htmlspecialchars_decode 使用 php 函数 get_html_translation_table() 来转储表,然后使用 java 代码,如,
static Hashtable html_specialchars_table = new Hashtable();
static {
html_specialchars_table.put("<","<");
html_specialchars_table.put(">",">");
html_specialchars_table.put("&","&");
}
static String htmlspecialchars_decode_ENT_NOQUOTES(String s){
Enumeration en = html_specialchars_table.keys();
while(en.hasMoreElements()){
String key = (String)en.nextElement();
String val = (String)html_specialchars_table.get(key);
s = s.replaceAll(key, val);
}
return s;
}