javascript 如何可靠地去除破坏代码的不可见字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11554416/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to reliably strip invisible characters that break code?
提问by Steven Lu
I am trying to build a bookmarklet and got slammed with this issue which I was just able to figure out: a \u8203
character, which Chrome unhelpfully tells me in my block of code (upon pasting into the JS console) is an `"Invalid character ILLEGAL".
我正在尝试构建一个书签并被这个问题猛烈抨击,我只是能够弄清楚:一个\u8203
字符,Chrome 在我的代码块中无益地告诉我(粘贴到 JS 控制台后)是一个“无效字符非法” ”。
Luckily Safari was the one that told me it was a \u8203
.
幸运的是,Safari 告诉我这是一个\u8203
.
I am editing the code in the Sublime Text 2 editor and somehow copying in and out of it (I also tried TextEdit) fails to remove it.
我正在 Sublime Text 2 编辑器中编辑代码,但不知何故复制进出它(我也尝试过 TextEdit)未能将其删除。
Is there some sort of website somewhere that will strip all characters other than ASCII?
是否有某种网站可以去除除 ASCII 以外的所有字符?
When I try to save as ISO 8859 but it will save it back as UTF-8 "because of unsupported characters".
当我尝试另存为 ISO 8859 时,它会将其另存为 UTF-8,“因为字符不受支持”。
... Yeah. that's the point. Get rid of my unsupported evil characters.
... 是的。这才是重点。摆脱我不受支持的邪恶角色。
What am I supposed to do? Edit my file in a hex editor?
我应该做些什么?在十六进制编辑器中编辑我的文件?
FYI I actually solved it by re-typing the code (which originated from this site by the way).
仅供参考,我实际上是通过重新键入代码(顺便说一下源自该站点)来解决它的。
采纳答案by Adi
Well, the easiest way I can think of is to use sed
嗯,我能想到的最简单的方法是使用 sed
sed -i 's/[^[:print:]]//g' your_script.js
// ^^^^^ this can also be 'ascii'
or using tr
或使用 tr
tr -cd '-6' < old_script.js > new_script.js
回答by Esailija
Is there some sort of website somewhere that will strip all characters other than ASCII?
是否有某种网站可以去除除 ASCII 以外的所有字符?
You could use this website
你可以使用这个网站
You can recreate the website using this code:
您可以使用以下代码重新创建网站:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>- jsFiddle demo</title>
<script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script>
<link rel="stylesheet" type="text/css" href="/css/normalize.css">
<link rel="stylesheet" type="text/css" href="/css/result-light.css">
<style type="text/css">
textarea {
width: 800px;
height: 480px;
outline: none;
font-family: Monaco, Consolas, monospace;
border: 0;
padding: 15px;
color: hsl(0, 0%, 27%);
background-color: #F6F6F6;
}
</style>
<script type="text/javascript">
//<![CDATA[
$(function () {
$("button").click(function () {
$("textarea").val(
$("textarea").val().replace(/[^\u0000-\u007E]/g, "")
);
$("textarea").focus()[0].select();
});
}); //]]>
</script>
</head>
<body>
<textarea></textarea>
<button>Remove</button>
</body>
</html>
回答by Matt Kim
you can use regex to filter everything out of 0-127. For example in javascript:
您可以使用正则表达式过滤 0-127 中的所有内容。例如在 javascript 中:
text.replace(/[^\x00-\x7F]/g, "")
x00 = 0, x7f = 127
x00 = 0, x7f = 127
回答by ERM
Nontechnical solution: paste your text into a new email message in Gmail and click Tx (clear formatting, in the formatting menu). Worked for me.
非技术性解决方案:将您的文本粘贴到 Gmail 中的新电子邮件中,然后点击 Tx(清除格式,在格式菜单中)。对我来说有效。