JavaScript 从字符串中删除零宽度空间 (unicode 8203)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24205193/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JavaScript remove ZERO WIDTH SPACE (unicode 8203) from string
提问by Shaggydog
I'm writing some javascript that processes website content. My efforts are being thwarted by SharePoint text editor's tendency to put the "zero width space" character in the text when the user presses backspace. The character's unicode value is 8203, or B200 in hexadecimal. I've tried to use the default "replace" function to get rid of it. I've tried many variants, none of them worked:
我正在编写一些处理网站内容的 javascript。当用户按下退格键时,SharePoint 文本编辑器倾向于在文本中放置“零宽度空格”字符,这阻碍了我的努力。字符的 unicode 值为 8203,或十六进制的 B200。我试图使用默认的“替换”功能来摆脱它。我尝试了很多变体,但没有一个起作用:
var a = "o?m"; //the invisible character is between o and m
var b = a.replace(/\u8203/g,'');
= a.replace(/\uB200/g,'');
= a.replace("\uB200",'');
and so on and so forth. I've tried quite a few variations on this theme. None of these expressions work (tested in Chrome and Firefox) The only thing that works is typing the actual character in the expression:
等等等等。我已经尝试了很多关于这个主题的变体。这些表达式都不起作用(在 Chrome 和 Firefox 中测试)唯一有效的是在表达式中输入实际字符:
var b = a.replace("?",''); //it's there, believe me
This poses potential problems. The character is invisible so that line in itself doesn't make sense. I can get around that with comments. But if the code is ever reused, and the file is saved using non-Unicode encoding, (or when it's deployed to SharePoint, there's not guarantee it won't mess up encoding) it will stop working. Is there a way to write this using the unicode notation instead of the character itself?
这带来了潜在的问题。该字符是不可见的,因此该行本身没有意义。我可以通过评论解决这个问题。但是,如果代码被重用,并且文件是使用非 Unicode 编码保存的(或者当它部署到 SharePoint 时,不能保证它不会弄乱编码)它将停止工作。有没有办法使用 unicode 符号而不是字符本身来编写它?
[My ramblings about the character]
[我对角色的漫谈]
In case you haven't met this character, (and you probably haven't, seeing as it's invisible to the naked eye, unless it broke your code and you discovered it while trying to locate the bug) it's a real a-hole that will cause certain types of pattern matching to malfunction. I've caged the beast for you:
如果你没有遇到过这个角色,(你可能没有遇到过,因为它是肉眼看不见的,除非它破坏了你的代码并且你在试图定位错误时发现了它)这是一个真正的漏洞会导致某些类型的模式匹配发生故障。我已经为你关上了野兽:
[?] <- careful, don't let it escape.
[?] <- 小心,不要让它逃脱。
If you want to see it, copy those brackets into a text editor and then iterate your cursor through them. You'll notice you'll need three steps to pass what seems like 2 characters, and your cursor will skip a step in the middle.
如果您想查看它,请将这些括号复制到文本编辑器中,然后用光标遍历它们。您会注意到您需要三个步骤来传递看似 2 个字符的内容,并且您的光标会在中间跳过一个步骤。
回答by T.J. Crowder
The number in a unicode escape should be in hex, and the hex for 8203 is 200B (which is indeed a Unicode zero-width space), so:
unicode 转义中的数字应该是十六进制的,8203 的十六进制是 200B(这确实是一个Unicode 零宽度空间),所以:
var b = a.replace(/\u200B/g,'');
现场示例:
var a = "o?m"; //the invisible character is between o and m
var b = a.replace(/\u200B/g,'');
console.log("a.length = " + a.length); // 3
console.log("a === 'om'? " + (a === 'om')); // false
console.log("b.length = " + b.length); // 2
console.log("b === 'om'? " + (b === 'om')); // true
回答by Adrian Rosca
The accepted answer didn't work for my case.
接受的答案对我的情况不起作用。
But this one did:
但是这个做到了:
text.replace(/(^[\s\u200b]*|[\s\u200b]*$)/g, '')