从 JavaScript 字符串中删除零宽度空格字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11305797/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 05:23:10  来源:igfitidea点击:

Remove zero-width space characters from a JavaScript string

javascriptunicode

提问by user1437328

I take user-input (JS code) and execute (process) them in realtime to show some output.

我获取用户输入(JS 代码)并实时执行(处理)它们以显示一些输出。

Sometimes the code has those zero width space, it's really weird. i don't know how the users are input'ing that. Example - "(?$".length === 3

有时代码有那些零宽度空间,这真的很奇怪。我不知道用户是如何输入的。例子 - ”(?$".length === 3

I need to be able to remove that character from my code in JS. How do I do so ? or maybe theres some other way to execute that JS code so that the browser doesn't takes the zero width space characters into account ?

我需要能够从我的 JS 代码中删除该字符。我该怎么做?或者也许有其他方法来执行该 JS 代码,以便浏览器不考虑零宽度空格字符?

回答by Mathias Bynens

Unicode has the following zero-width characters:

Unicode 具有以下零宽度字符:

  • U+200B zero width space
  • U+200C zero width non-joiner Unicode code point
  • U+200D zero width joiner Unicode code point
  • U+FEFF zero width no-break space Unicode code point
  • U+200B 零宽度空间
  • U+200C 零宽度非连接器 Unicode 代码点
  • U+200D 零宽度连接器 Unicode 代码点
  • U+FEFF 零宽度不间断空格 Unicode 代码点

To remove them from a string in JavaScript, you can use a simple regular expression:

要从 JavaScript 中的字符串中删除它们,您可以使用一个简单的正则表达式:

var userInput = 'a\u200Bb\u200Cc\u200Dd\uFEFFe';
console.log(userInput.length); // 9
var result = userInput.replace(/[\u200B-\u200D\uFEFF]/g, '');
console.log(result.length); // 5

Note that there are many more symbols that may not be visible. Some of ASCII's control characters, for example.

请注意,还有更多可能不可见的符号。例如,一些ASCII 的控制字符

回答by Technotronic

I had a problem some invisible characters were corrupting my JSON and causing Unexpected Token ILLEGALexception which was crashing my site.

我遇到了一些不可见字符破坏了我的 JSON 并导致意外令牌非法异常的问题,这使我的网站崩溃。

Here is my solution using RegExp variable:

这是我使用 RegExp 变量的解决方案:

    var re = new RegExp("\u2028|\u2029");
    var result = text.replace(re, '');

More about Javascript and zero width spaces you can find here: Zero Width Spaces

您可以在此处找到有关 Javascript 和零宽度空间的更多信息: Zero Width Spaces

回答by Tarek Salah uddin Mahmud

str.replace(/\u200B/g,'');

200B is the hexadecimal of the zero width space 8203. replace this with empty string to remove this

200B 是零宽度空格 8203 的十六进制。用空字符串替换它以去除它

回答by Florian Margaine

[].filter.call( str, function( c ) {
    return c.charCodeAt( 0 ) !== 8203;
} );

Filter each character to remove the 8203 char code (zero-width space unicode number).

过滤每个字符以删除 8203 字符代码(零宽度空间 unicode 数字)。