在java中替换字符串中的任何非ascii字符

Question

提问by leba-lev

How would one convert -lrb-300-rrb-┬á922-6590to -lrb-300-rrb- 922-6590in java?

在java中如何转换-lrb-300-rrb-┬á922-6590为-lrb-300-rrb- 922-6590？

Have tried the following:

尝试了以下方法：

t.lemma = lemma.replaceAll("\p{C}", " ");
t.lemma = lemma.replaceAll("[\u0000-\u001f]", " ");

Am probably missing something conceptual. Will appreciate any pointers to the solution.

我可能缺少一些概念性的东西。将不胜感激任何指向解决方案的指针。

Thank you

谢谢

Answer 1

采纳答案by Paul Vargas

Try the next:

尝试下一个：

`str = str.replaceAll("[^\\p{ASCII}]", " ");`

By the way, \p{ASCII}is all ASCII: [\x00-\x7F].

顺便说一句，\p{ASCII}是所有ASCII： [\x00-\x7F]。

In ahother hand, you need to use a constant of Patternfor avoid recompiled the expression every time.

另一方面，您需要使用常量Pattern以避免每次都重新编译表达式。

private static final Pattern REGEX_PATTERN = 
        Pattern.compile("[^\p{ASCII}]");

public static void main(String[] args) {
    String input = "-lrb-300-rrb-┬á922-6590";
    System.out.println(
        REGEX_PATTERN.matcher(input).replaceAll(" ")
    );  // prints "-lrb-300-rrb- 922-6590"
}

回答by assylias

Assuming you only want to keep a-zA-Z0-9and punctuation characters, you could do:

假设你只想保留a-zA-Z0-9和标点符号，你可以这样做：

t.lemma = lemma.replaceAll("[^\p{Punct}\w]", " "));

在java中替换字符串中的任何非ascii字符

提问by leba-lev

采纳答案by Paul Vargas

`str = str.replaceAll("[^\\p{ASCII}]", " ");`

`str = str.replaceAll("[^\\p{ASCII}]", " ");`

回答by assylias

相关推荐

最近更新

标签

在java中替换字符串中的任何非ascii字符

提问by leba-lev

采纳答案by Paul Vargas

str = str.replaceAll("[^\\p{ASCII}]", " ");

str = str.replaceAll("[^\\p{ASCII}]", " ");

回答by assylias

相关推荐

Java 在 Struts 2 Web 应用程序中检查会话值

Java spring security中auto-config=true有什么用

Java 如何在 JSF 和 PrimeFaces 中上传和读取文本文件？

使用带有@FormParam 的 POST 获取 405“不允许的方法”错误（带有 Jersey REST 的 Java Web 服务）

相关推荐

最近更新

标签

`str = str.replaceAll("[^\\p{ASCII}]", " ");`

`str = str.replaceAll("[^\\p{ASCII}]", " ");`