Javascript 替换 JS 中的元音变音
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11652681/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replacing umlauts in JS
提问by SamiSalami
I am comparing strings and have to replace umlauts in JS, but it seems JS does not recognize the umlauts in the strings. The text comes from the database and in the browser the umlauts do show fine.
我正在比较字符串并且必须替换 JS 中的变音符号,但似乎 JS 无法识别字符串中的变音符号。文本来自数据库,在浏览器中,变音符号显示正常。
function replaceUmlauts(string)
{
value = string.toLowerCase();
value = value.replace(/?/g, 'ae');
value = value.replace(/?/g, 'oe');
value = value.replace(/ü/g, 'ue');
return value;
}
As search patterns I tried:
作为我尝试的搜索模式:
- "?", "?", "ü"
- /?/, /?/, /ü/
- "
ä
", "ö
", "ü
" (well total despair ;-))
- “?”、“?”、“ü”
- /?/, /?/, /ü/
- "
ä
", "ö
", "ü
"(完全绝望;-))
To be sure, that it is not a matter with the replace function I tried indexOf:
可以肯定的是,这与我尝试过的 indexOf 替换函数无关:
console.log(value.indexOf('?'));
But the output with all patterns is: -1
但是所有模式的输出是: -1
So I guess it is some kind of a problem with encoding, but as I said on the page the umlauts do just look fine.
所以我想这是编码的某种问题,但正如我在页面上所说的,变音符号看起来很好。
Any ideas? This seems so simple...
有任何想法吗?这似乎很简单……
EDIT: Even if I found my answer, the problem was not really solved "at the root" (the encoding). This is my page encoding:
编辑:即使我找到了答案,问题也没有真正解决“从根本上”(编码)。这是我的页面编码:
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
The database has: utf8_general_ci
数据库有:utf8_general_ci
Seems totally alright to me.
对我来说似乎完全没问题。
回答by Oleg V. Volkov
Either ensure that your script's encoding is correctly specified (in <script>
tag or in page's header/meta if it's embedded) or specify symbols with \uNNNN
syntax that will always unambiguously resolve to some specific Unicode codepoint.
要么确保您的脚本的编码正确指定(在<script>
标记中或在页面的标题/元中,如果它被嵌入)或指定具有\uNNNN
始终明确解析为某些特定 Unicode 代码点的语法的符号。
For example:
例如:
str.replace(/\u00e4/g, "ae")
Will always replace ? with ae, no matter what encoding is set for your page/script, even if it is incorrect.
会一直更换吗?使用ae,无论您的页面/脚本设置了什么编码,即使它不正确。
Here are the codes needed for Germanic languages:
以下是日耳曼语言所需的代码:
// ü, ü \u00dc, \u00fc
// ?, ? \u00c4, \u00e4
// ?, ? \u00d6, \u00f6
// ? \u00df
回答by Andreas Richter
If you are looking to replace the German Umlaute with cleverly respecting the case, use this (opensource, happy to share, all by me):
如果您希望巧妙地尊重案例来替换德国元音变音,请使用此(开源,乐于分享,全部由我):
const umlautMap = {
'\u00dc': 'UE',
'\u00c4': 'AE',
'\u00d6': 'OE',
'\u00fc': 'ue',
'\u00e4': 'ae',
'\u00f6': 'oe',
'\u00df': 'ss',
}
function replaceUmlaute(str) {
return str
.replace(/[\u00dc|\u00c4|\u00d6][a-z]/g, (a) => {
const big = umlautMap[a.slice(0, 1)];
return big.charAt(0) + big.charAt(1).toLowerCase() + a.slice(1);
})
.replace(new RegExp('['+Object.keys(umlautMap).join('|')+']',"g"),
(a) => umlautMap[a]
);
}
const test = ['übung', 'üBUNG', 'üben', 'einüben', 'EINüBEN', '?de ?tzende schei? übung']
test.forEach((str) => console.log(str + " -> " + replaceUmlaute(str)))
It will:
它会:
- übung -> Uebung
- üBUNG -> UEBUNG
- üben -> ueben
- einüben -> einueben
- EINüBEN -> EINUEBEN
- and the same for ?, ?
- and simple ? -> ss
- 于邦 -> 于邦
- üBUNG -> UEBUNG
- üben -> ueben
- einüben -> einueben
- 爱奴本 -> 爱奴本
- ?, ?
- 和简单?-> ss
回答by Fidel Gonzo
Here's a function that replaces most common chars to produce a Google friendly SEO url:
这是一个替换最常见字符以生成 Google 友好 SEO url 的函数:
function deUmlaut(value){
value = value.toLowerCase();
value = value.replace(/?/g, 'ae');
value = value.replace(/?/g, 'oe');
value = value.replace(/ü/g, 'ue');
value = value.replace(/?/g, 'ss');
value = value.replace(/ /g, '-');
value = value.replace(/\./g, '');
value = value.replace(/,/g, '');
value = value.replace(/\(/g, '');
value = value.replace(/\)/g, '');
return value;
}
回答by Larry K
You need to first figure out what the character codes are that you're trying to replace. For example, depending on the character encoding, the characters could be in 8859, UTF-8 or something else. They could also be character symbols such as "ä"
您需要首先弄清楚您要替换的字符代码是什么。例如,根据字符编码,字符可能是 8859、UTF-8 或其他格式。它们也可以是字符符号,例如“ä”
Rather than guessing, print them out.
与其猜测,不如打印出来。
And beware that your incoming data may not use the same character set/character encoding consistently--you need to check on where the data is coming from.
并注意您的传入数据可能不会始终使用相同的字符集/字符编码——您需要检查数据的来源。
So look at the incoming data by using string. charCodeAt
因此,使用字符串查看传入的数据。字符代码
Check the character code before the toLowerCase
to ensure that it is not changing things on you. You'll need to debug step by step.
检查之前的字符代码,toLowerCase
以确保它不会改变你身上的东西。您需要逐步调试。
Finally, check the character set settings in your editor to ensure that your typed ? is what it should be. You may want to specify it via the UTF8 value rather than typing ?, ? etc
最后,检查编辑器中的字符集设置以确保您输入的 ? 应该是这样。您可能希望通过 UTF8 值来指定它,而不是键入 ?, ? 等等