Javascript 替换 JS 中的元音变音

Question

提问by SamiSalami

I am comparing strings and have to replace umlauts in JS, but it seems JS does not recognize the umlauts in the strings. The text comes from the database and in the browser the umlauts do show fine.

我正在比较字符串并且必须替换 JS 中的变音符号，但似乎 JS 无法识别字符串中的变音符号。文本来自数据库，在浏览器中，变音符号显示正常。

function replaceUmlauts(string)
{
    value = string.toLowerCase();
    value = value.replace(/?/g, 'ae');
    value = value.replace(/?/g, 'oe');
    value = value.replace(/ü/g, 'ue');
    return value;
}

As search patterns I tried:

作为我尝试的搜索模式：

"?", "?", "ü"
/?/, /?/, /ü/
"ä", "ö", "ü" (well total despair ;-))

“？”、“？”、“ü”
/?/, /?/, /ü/
" ä", " ö", " ü"（完全绝望;-)）

To be sure, that it is not a matter with the replace function I tried indexOf:

可以肯定的是，这与我尝试过的 indexOf 替换函数无关：

console.log(value.indexOf('?'));

But the output with all patterns is: -1

但是所有模式的输出是： -1

So I guess it is some kind of a problem with encoding, but as I said on the page the umlauts do just look fine.

所以我想这是编码的某种问题，但正如我在页面上所说的，变音符号看起来很好。

Any ideas? This seems so simple...

有任何想法吗？这似乎很简单……

EDIT: Even if I found my answer, the problem was not really solved "at the root" (the encoding). This is my page encoding:

编辑：即使我找到了答案，问题也没有真正解决“从根本上”（编码）。这是我的页面编码：

<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

The database has: utf8_general_ci

数据库有：utf8_general_ci

Seems totally alright to me.

对我来说似乎完全没问题。

Answer 1

回答by Oleg V. Volkov

Either ensure that your script's encoding is correctly specified (in <script>tag or in page's header/meta if it's embedded) or specify symbols with \uNNNNsyntax that will always unambiguously resolve to some specific Unicode codepoint.

要么确保您的脚本的编码正确指定（在<script>标记中或在页面的标题/元中，如果它被嵌入）或指定具有\uNNNN始终明确解析为某些特定 Unicode 代码点的语法的符号。

For example:

例如：

str.replace(/\u00e4/g, "ae")

Will always replace ? with ae, no matter what encoding is set for your page/script, even if it is incorrect.

会一直更换吗？使用ae，无论您的页面/脚本设置了什么编码，即使它不正确。

Here are the codes needed for Germanic languages:

以下是日耳曼语言所需的代码：

// ü, ü     \u00dc, \u00fc
// ?, ?     \u00c4, \u00e4
// ?, ?     \u00d6, \u00f6
// ?        \u00df

Answer 2

回答by Andreas Richter

If you are looking to replace the German Umlaute with cleverly respecting the case, use this (opensource, happy to share, all by me):

如果您希望巧妙地尊重案例来替换德国元音变音，请使用此（开源，乐于分享，全部由我）：

const umlautMap = {
  '\u00dc': 'UE',
  '\u00c4': 'AE',
  '\u00d6': 'OE',
  '\u00fc': 'ue',
  '\u00e4': 'ae',
  '\u00f6': 'oe',
  '\u00df': 'ss',
}

function replaceUmlaute(str) {
  return str
    .replace(/[\u00dc|\u00c4|\u00d6][a-z]/g, (a) => {
      const big = umlautMap[a.slice(0, 1)];
      return big.charAt(0) + big.charAt(1).toLowerCase() + a.slice(1);
    })
    .replace(new RegExp('['+Object.keys(umlautMap).join('|')+']',"g"),
      (a) => umlautMap[a]
    );
}

const test = ['übung', 'üBUNG', 'üben', 'einüben', 'EINüBEN', '?de ?tzende schei? übung']
test.forEach((str) => console.log(str + " -> " + replaceUmlaute(str)))

It will:

它会：

übung -> Uebung
üBUNG -> UEBUNG
üben -> ueben
einüben -> einueben
EINüBEN -> EINUEBEN
and the same for ?, ?
and simple ? -> ss

于邦 -> 于邦
üBUNG -> UEBUNG
üben -> ueben
einüben -> einueben
爱奴本 -> 爱奴本
?, ?
和简单？-> ss

Answer 3

回答by Fidel Gonzo

Here's a function that replaces most common chars to produce a Google friendly SEO url:

这是一个替换最常见字符以生成 Google 友好 SEO url 的函数：

function deUmlaut(value){
  value = value.toLowerCase();
  value = value.replace(/?/g, 'ae');
  value = value.replace(/?/g, 'oe');
  value = value.replace(/ü/g, 'ue');
  value = value.replace(/?/g, 'ss');
  value = value.replace(/ /g, '-');
  value = value.replace(/\./g, '');
  value = value.replace(/,/g, '');
  value = value.replace(/\(/g, '');
  value = value.replace(/\)/g, '');
  return value;
}

Answer 4

回答by Larry K

You need to first figure out what the character codes are that you're trying to replace. For example, depending on the character encoding, the characters could be in 8859, UTF-8 or something else. They could also be character symbols such as "ä"

您需要首先弄清楚您要替换的字符代码是什么。例如，根据字符编码，字符可能是 8859、UTF-8 或其他格式。它们也可以是字符符号，例如“ä”

Rather than guessing, print them out.

与其猜测，不如打印出来。

And beware that your incoming data may not use the same character set/character encoding consistently--you need to check on where the data is coming from.

并注意您的传入数据可能不会始终使用相同的字符集/字符编码——您需要检查数据的来源。

So look at the incoming data by using string. charCodeAt

因此，使用字符串查看传入的数据。字符代码

Check the character code before the toLowerCaseto ensure that it is not changing things on you. You'll need to debug step by step.

检查之前的字符代码，toLowerCase以确保它不会改变你身上的东西。您需要逐步调试。

Finally, check the character set settings in your editor to ensure that your typed ? is what it should be. You may want to specify it via the UTF8 value rather than typing ?, ? etc

最后，检查编辑器中的字符集设置以确保您输入的 ? 应该是这样。您可能希望通过 UTF8 值来指定它，而不是键入 ?, ? 等等

Javascript 替换 JS 中的元音变音

提问by SamiSalami

回答by Oleg V. Volkov

回答by Andreas Richter

回答by Fidel Gonzo

回答by Larry K

相关推荐

最近更新

标签

Javascript 替换 JS 中的元音变音

提问by SamiSalami

回答by Oleg V. Volkov

回答by Andreas Richter

回答by Fidel Gonzo

回答by Larry K

相关推荐

Javascript 根据下拉选择从数据库填充另一个选择下拉

Javascript 如何打开一个fancybox窗口（不是onclick）

Javascript 在 node.js 中解析查询字符串

Javascript 使用 THREE.LineBasicMaterial 的线条粗细

相关推荐

最近更新

标签