javascript 替换 UTF-8 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25043024/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-28 03:47:16  来源:igfitidea点击:

Replacing UTF-8 characters

javascriptjqueryhtmlutf-8

提问by Mike 'Pomax' Kamermans

I am working on an open jquery library jspdf.The above library does not support UTF-8 characters. Is there any way so that i can remove all the quotes UTF-8 character in my html string by using regex or any other method.

我正在开发一个开放的 jquery 库jspdf。上述库不支持 UTF-8 字符。有什么方法可以使用正则表达式或任何其他方法删除我的 html 字符串中的所有引号 UTF-8 字符。

PSEDO CODE:

$(htmlstring).replace("utf-8 quotes character" , "") 

回答by Mike 'Pomax' Kamermans

First off: I urge you to stop using jsPDF if it doesn't support Unicode. It's mid 2014, and the lack of support should have meant the death of the project two years ago. But that's just my personal conviction and not part of the answer you're looking for.

首先:如果 jsPDF 不支持 Unicode,我敦促您停止使用它。现在是 2014 年年中,缺乏支持应该意味着该项目在两年前就夭折了。但这只是我个人的信念,而不是您正在寻找的答案的一部分。

If jsPDF only supports ANSI (a 255 character block, rather than ASCII's 127 character block), then you can simply do a regex replace for everything above \xFF:

如果 jsPDF 仅支持 ANSI(一个 255 个字符的块,而不是 ASCII 的 127 个字符块),那么您可以简单地对 \xFF 之上的所有内容进行正则表达式替换:

"lolテスト".replace(/[\u0100-\uFFFF]/g,'');
// gives us "lol"

If you only want to get rid of quotation marks (but leave in potentially jsPDF breaking unicode), you can use the pattern for "just quotation marks" based on where they live in the unicode map:

如果您只想去掉引号(但留在可能破坏 unicode 的 jsPDF 中),您可以根据它们在 unicode 映射中的位置使用“仅引号”模式:

string.replace(/[\u2018-\u201F\u275B-\u275E]/g, '')

will catch ['‘',''','?','?','“','”','?','?','?','?','?','?'], although of course what you probably want to do is replace them with the corresponding safe character instead. Good news: just make a replacement array for the list just presented, and work with that.

将 catch ['‘',''','?','?','“','”','?','?','?','?','?','?'],当然,您可能想要做的是用相应的安全字符替换它们。好消息:只需为刚刚显示的列表创建一个替换数组,然后使用它。

2017 edit:

2017年编辑

ES6 introduced a new pattern for unicode strings in the form of the \u{...}pattern, which can do "any number of hexdigits" inside the curly braces, so a full Unicode 9 compatible regexp would now be:

ES6 以模式的形式为 unicode 字符串引入了一种新模式\u{...},它可以在大括号内执行“任意数量的十六进制数字”,因此完整的 Unicode 9 兼容正则表达式现在是:

// we can't use these in a regexp directly, unfortunately
start = `\u{100}`;
end = `\u{10FFF0}`;
searchPattern = new RegExp(`[${start}-${end}]`,`g`);
c = `lolテスト`.replace(searchPattern, ``);

回答by Valerij

use

利用

$(htmlstring).replace(/[^\x00-\x7F]/g,'')

to remove all non-ascii charakter

删除所有非ASCII字符

(via regex-any-ascii-character)

(通过regex-any-ascii-character