java替换德语变音
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32696273/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
java replace German umlauts
提问by user2841991
I have the following problem. I am trying to replace german umlauts like ?, ?, üin java. But it simply does not work. Here is my code:
我有以下问题。我正在尝试替换德国元音变音?, ? , ü在 Java 中。但它根本不起作用。这是我的代码:
private static String[][] UMLAUT_REPLACEMENTS = { { "?", "Ae" }, { "ü", "Ue" }, { "?", "Oe" }, { "?", "ae" }, { "ü", "ue" }, { "?", "oe" }, { "?", "ss" } };
public static String replaceUmlaute(String orig) {
String result = orig;
for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
result = result.replaceAll(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]);
}
return result;
}
An ?remains an ?and so on. I do not know if this issue has something to do with encoding, but the String contains the exact character I am trying to replace.
一个?仍然是? 等等。我不知道这个问题是否与编码有关,但 String 包含我要替换的确切字符。
Thank you in advance
先感谢您
采纳答案by user2841991
This finally worked for me:
这最终对我有用:
private static String[][] UMLAUT_REPLACEMENTS = { { new String("?"), "Ae" }, { new String("ü"), "Ue" }, { new String("?"), "Oe" }, { new String("?"), "ae" }, { new String("ü"), "ue" }, { new String("?"), "oe" }, { new String("?"), "ss" } };
public static String replaceUmlaute(String orig) {
String result = orig;
for (int i = 0; i < UMLAUT_REPLACEMENTS.length; i++) {
result = result.replace(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]);
}
return result;
}
So thanks to all your answers and help. It finally was a mixture of nafas(with the new String) and Joop Eggen(the correct replace-Statement). You got my upvote thanks a lot!
所以感谢您的所有回答和帮助。它最终是 nafas(带有新字符串)和 Joop Eggen(正确的替换语句)的混合物。你得到了我的点赞,非常感谢!
回答by user1438038
Your code looks fine, replaceAll()
should work as expected.
您的代码看起来不错,replaceAll()
应该可以按预期工作。
Try this, if you also want to preserve capitalization (e.g. üBUNG
will become UEBUNG
, not UeBUNG
):
试试这个,如果你也想保留大写(例如üBUNG
will become UEBUNG
,not UeBUNG
):
private static String replaceUmlaut(String input) {
//replace all lower Umlauts
String output = input.replace("ü", "ue")
.replace("?", "oe")
.replace("?", "ae")
.replace("?", "ss");
//first replace all capital umlaute in a non-capitalized context (e.g. übung)
output = output.replace("ü(?=[a-z??ü? ])", "Ue")
.replace("?(?=[a-z??ü? ])", "Oe")
.replace("?(?=[a-z??ü? ])", "Ae");
//now replace all the other capital umlaute
output = output.replace("ü", "UE")
.replace("?", "OE")
.replace("?", "AE");
return output;
}
回答by Vistari
I've just tried to run it and it runs fine.
我刚刚尝试运行它,它运行良好。
If you're not using regular expressions then i'd use string.replace
rather than string.replaceAll
as it's slightly quicker than the latter. The difference between them mainly being that replaceAll can handle regex's.
如果您不使用正则表达式,那么我会使用string.replace
而不是string.replaceAll
因为它比后者稍快。它们之间的区别主要在于 replaceAll 可以处理正则表达式。
EDIT: Just noticed people in the comments have the said the same before me so if you've read theres you can pretty much ignore what I said, as stated the problem exists elsewhere in your code as that snippet works as expected.
编辑:刚刚注意到评论中的人在我之前说过同样的话,所以如果你读过那里你几乎可以忽略我说的话,正如所述问题存在于你的代码中的其他地方,因为该代码段按预期工作。
回答by Joop Eggen
First there is a tiny issue in Unicode:
首先,Unicode 中有一个小问题:
?
might be one code point SMALL_LETTER_A_WITH_UMLAUT or two code points: SMALL_LETTER_A followed by COMBINING_DIACRITICAL_MARK_UMLAUT.
?
可能是一个代码点 SMALL_LETTER_A_WITH_UMLAUT 或两个代码点:SMALL_LETTER_A 后跟 COMBINING_DIACRITICAL_MARK_UMLAUT。
For this one may normalizethe Unicode text.
为此,可以规范化Unicode 文本。
s = Normalizer.normalize(s, Normalizer.Form.NFKC);
The C
means compose, and would yield the compact version.
该C
手段组成,并会产生压缩版本。
The second, more prozaic, problem is, that the encoding of the java source in the editor must be the same as used for the javac -encoding ...
compiler.
第二个更常见的问题是,编辑器中 java 源代码的编码必须与javac -encoding ...
编译器使用的相同。
You can test whether the encoding is correct by using (test-wise) the u-escaping:
您可以通过使用(test-wise)u 转义来测试编码是否正确:
"\u00E4" // instead of ?
My guess is, that this might be the problem. The international norm seems to have become using UTF-8 for java sources and compilation.
我的猜测是,这可能是问题所在。国际规范似乎已经将 UTF-8 用于 Java 源代码和编译。
Furthermore you can use
此外,您可以使用
result = result.replace(UMLAUT_REPLACEMENTS[i][0], UMLAUT_REPLACEMENTS[i][1]);
without regex replace, being faster.
没有正则表达式替换,速度更快。
回答by Klas Lindb?ck
Works fine when I try it, so it must be an encoding issue.
当我尝试时效果很好,所以它一定是编码问题。
Check your system encoding. You may want to add -encoding UTF-8
to your javac
compiler command line.
检查您的系统编码。您可能希望添加-encoding UTF-8
到javac
编译器命令行。
-encoding encoding
Set the source file encoding name, such as EUC-JP and UTF-8. If -encoding is not specified, the platform default converter is used.
回答by nafas
ENCODING ENCODING ENCODING....
编码编码编码....
Different source of input may result in complications in the String encoding. for example one may have UTF-8
encoding while the other one is ISO
不同的输入源可能会导致字符串编码的复杂化。例如,一个可能有UTF-8
编码,而另一个是ISO
some people suggested that the code works for them, therefore, its most likely that your Strings have different encoding while processed. (different encoding results in different byte array thus no replacing...)
有些人建议该代码适用于他们,因此,很可能您的字符串在处理时具有不同的编码。(不同的编码导致不同的字节数组,因此无法替换......)
to solve your problem from its root,you must make sure, each of your sources uses exactly same encoding.
要从根本上解决您的问题,您必须确保您的每个来源都使用完全相同的编码。
try this exercise and it hopefully helps you to solve your problem:
试试这个练习,它希望能帮助你解决你的问题:
1-try this:
1-试试这个:
System.out.println(Arrays.asList("?".getBytes()); //1 and 2 should have same results
System.out.println(Arrays.asList(new String("?","UTF-8").getBytes()); //1 and 2 should have same results
System.out.println(Arrays.asList(new String("?","UTF-32").getBytes()); //should have a different results from one and two
System.out.println(Arrays.asList(orig.getBytes()); //look for representation and search for pattenr of numbers (this bit is the hard bit I guess).
System.out.println(Arrays.asList(new String(orig,"UTF-32").getBytes()); //look for representation and search for pattenr of numbers (this bit is the hard bit I guess).
the next step is to see how the orgi
string is formed. for example if you have received from web, make sure your POST and GET method are using your preferred encoding
下一步是查看orgi
字符串是如何形成的。例如,如果您从网络收到,请确保您的 POST 和 GET 方法使用您的首选编码
EDIT 1:
编辑 1:
try this:
尝试这个:
{ { new String("?".getBytes(),"UTF-8"), "Ae" }, ... };
if this one didn't work try this:
如果这个不起作用试试这个:
byte[] bytes = {-61,-124}; //byte representation of ? in utf-8
String Ae = new String(bytes,"UTF-8");
{ { Ae, "Ae" }, ... }; //and do for the rest
回答by dermoritz
i had to modify the answer of user1438038:
我不得不修改user1438038的答案:
private static String replaceUmlaute(String output) {
String newString = output.replace("\u00fc", "ue")
.replace("\u00f6", "oe")
.replace("\u00e4", "ae")
.replace("\u00df", "ss")
.replaceAll("\u00dc(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Ue")
.replaceAll("\u00d6(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Oe")
.replaceAll("\u00c4(?=[a-z\u00e4\u00f6\u00fc\u00df ])", "Ae")
.replace("\u00dc", "UE")
.replace("\u00d6", "OE")
.replace("\u00c4", "AE");
return newString;
}
This should work on any target platform (i had problems on a tomcat on windows).
这应该适用于任何目标平台(我在 windows 上的 tomcat 上遇到了问题)。
回答by JRA_TLL
If you use Apache Commons or Commons3 in your project, it would be most efficient to use a class like
如果您在项目中使用 Apache Commons 或 Commons3,那么使用类似的类将是最有效的
public class UmlautCleaner {
private static final String[] UMLAUTE = new String[] {"?", "?", "ü", "?", "?", "ü", "?"};
private static final String[] UMLAUTE_REPLACEMENT = new String[] {"AE", "OE", "UE", "ae", "oe", "ue", "ss"};
private UmlautCleaner() {
}
public static String cleanSonderzeichen(final String s) {
return StringUtils.stripAccents(StringUtils.replaceEach(s, UMLAUTE, UMLAUTE_REPLACEMENT));
}
}