删除 javascript 字符串中的变音符号或特殊字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/4804885/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
remove umlauts or specialchars in javascript string
提问by Frank
Never played before with umlauts or specialchars in javascript strings. My problem is how to remove them?
以前从未在 javascript 字符串中使用变音符号或特殊字符。我的问题是如何删除它们?
For example I have this in javascript:
例如,我在 javascript 中有这个:
var oldstr = "Bayern München";
var str = oldstr.split(' ').join('-');
Result is Bayern-München ok easy, but now I want to remove the umlaut or specialchar like:
结果是拜仁慕尼黑很容易,但现在我想删除元音或特殊字符,如:
Real Sporting de Gijón.
真正的希洪竞技。
How can I realize this?
我怎么能意识到这一点?
Kind regards,
亲切的问候,
Frank
坦率
回答by T.J. Crowder
replaceshould be able to do it for you, e.g.:
replace应该能够为您做到这一点,例如:
var str = str.replace(/ü/g, 'u');
...of course üand uare notthe same letter. :-)
当然......ü和u是不相同的字母。:-)
If you're trying to replace all characters outside a given range with something (like a -), you can do that by specifying a range:
如果您尝试用某些内容(例如 a -)替换给定范围之外的所有字符,则可以通过指定范围来实现:
var str = str.replace(/[^A-Za-z0-9\-_]/g, '-');
That replaces allcharacters that aren't English letters, digits, -, or _with -. (The character range is the [...]bit, the ^at the beginning means "not".) Here's a live example.
这将替换所有不属于英文字母,数字,字符-,或_用-。(字符范围是[...]位,^开头的意思是“不是”。)这是一个活生生的例子。
But that ("Bayern-M-nchen") may be a bit unpleasant for Mr. München to look at. :-) You could use a function passed into replaceto try to just drop diacriticals:
但这(“Bayern-M-nchen”)可能会让 München 先生看起来有点不愉快。:-) 您可以使用传入的函数replace来尝试删除变音符号:
var str = str.replace(/[^A-Za-z0-9\-_]/g, function(ch) {
  // Character that look a bit like 'a'
  if ("áàa?".indexOf(ch) >= 0) { // There are a lot more than this
    return 'a';
  }
  // Character that look a bit like 'u'
  if ("úù?ü".indexOf(ch) >= 0) { // There are a lot more than this
    return 'u';
  }
  /* ...long list of others...*/
  // Default
  return '-';
});
The above is optimized for long strings. If the string itself is short, you may be better off with repeated regexps:
以上针对长字符串进行了优化。如果字符串本身很短,那么使用重复的正则表达式可能会更好:
var str = str.replace(/[áàa?]/g, 'a')
             .replace(/[úù?ü]/g, 'u')
             .replace(/[^A-Za-z0-9\-_]/g, '-');
...but that's speculative.
……但那是推测性的。
Note that literal characters in JavaScript strings are totally fine, but you can run into fun with encoding of files. I tend to stick to unicode escapes. So for instance, the above would be:
请注意,JavaScript 字符串中的文字字符完全没有问题,但您可能会遇到对文件进行编码的乐趣。我倾向于坚持使用 unicode 转义。因此,例如,以上将是:
var str = str.replace(/[\u00e4\u00e2\u00e0\u00e1]/g, 'a')
             .replace(/[\u00fc\u00fb\u00f9\u00fa]/g, 'u')
             .replace(' ','-');
...but again, there are a lotmore to do...
……但是,还有很多事情要做……

