Javascript 如何使用javascript删除表情符号代码?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10992921/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 03:55:20  来源:igfitidea点击:

How to remove emoji code using javascript?

javascriptunicodeemoji

提问by manraj82

How do I remove emoji code using JavaScript? I thought I had taken care of it using the code below, but I still have characters like .

如何使用 JavaScript 删除表情符号代码?我以为我已经使用下面的代码处理了它,但我仍然有像 .

function removeInvalidChars() {
    return this.replace(/[\uE000-\uF8FF]/g, '');
}

回答by bobince

The range you have selected is the Private Use Area, containing non-standard characters. Carriers used to encode emoji as different, inconsistent values inside this range.

您选择的范围是私人使用区,包含非标准字符。用于将表情符号编码为此范围内不同、不一致的值的载体。

More recently, the emoji have been given standardised 'unified' codepoints. Many of these are outside of the Basic Multilingual Plane, in the block U+1F300–U+1F5FF, including your example U+1F534 Large Red Circle.

最近,表情符号被赋予了标准化的“统一”代码点。其中许多位于基本多语言平面之外,位于 U+1F300–U+1F5FF 块中,包括您的示例 U+1F534 大红圈。

You could detect these characters with [\U0001F300-\U0001F5FF]in a regex engine that supported non-BMP characters, but JavaScript's RegExpis not such a beast. Unfortunately the JS string model is based on UTF-16 code units, so you'd have to work with the UTF-16 surrogates in a regexp:

您可以[\U0001F300-\U0001F5FF]在支持非 BMP 字符的正则表达式引擎中检测这些字符,但 JavaScriptRegExp不是这样的野兽。不幸的是,JS 字符串模型基于 UTF-16 代码单元,因此您必须在正则表达式中使用 UTF-16 代理:

return this.replace(/([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g, '')

However, note that there are other characters in the Basic Multilingual Plane that are used as emoji by phones but which long predate emoji. For example U+2665 is the traditional Heart Suit character ?, but it may be rendered as an emoji graphic on some devices. It's up to you whether you treat this as emoji and try to remove it. See this listfor more examples.

但是,请注意,基本多语言平面中还有其他字符被手机用作表情符号,但早于表情符号。例如,U+2665 是传统的 Heart Suit 字符 ?,但它可能会在某些设备上呈现为表情符号图形。是否将其视为表情符号并尝试将其删除取决于您。有关更多示例,请参阅此列表

回答by jony89

For me none of the answers completely removed all emojis so I had to do some work myself and this is what i got :

对我来说,没有一个答案完全删除所有表情符号,所以我不得不自己做一些工作,这就是我得到的:

text.replace(/([\u2700-\u27BF]|[\uE000-\uF8FF]|\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDFFF]|[\u2011-\u26FF]|\uD83E[\uDD10-\uDDFF])/g, '');

Also, it should take into account that if one inserting the string later to the database, replacing with empty string could expose security issue. instead replace with the replacement character U+FFFD, see : http://www.unicode.org/reports/tr36/#Deletion_of_Noncharacters

此外,还应考虑到如果稍后将字符串插入数据库,用空字符串替换可能会暴露安全问题。而是用替换字符 U+FFFD 替换,请参阅:http: //www.unicode.org/reports/tr36/#Deletion_of_Noncharacters

回答by lucas

I've found many suggestions around but the regex that have solved my problem is:

我发现了很多建议,但解决我的问题的正则表达式是:

/(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|\ud83c[\ude32-\ude3a]|\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g

A short example

一个简短的例子

function removeEmojis (string) {
  var regex = /(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|\ud83c[\ude32-\ude3a]|\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])/g;
  return string.replace(regex, '');
}

Hope it can help you

希望能帮到你

回答by sandre89

@bobince's solution didn't work for me. Either the Emojis stayed there or they were swapped by a different Emoji.

@bobince 的解决方案对我不起作用。表情符号要么留在那里,要么被不同的表情符号交换。

This solution did the trick for me:

这个解决方案对我有用:

var ranges = [
  '\ud83c[\udf00-\udfff]', // U+1F300 to U+1F3FF
  '\ud83d[\udc00-\ude4f]', // U+1F400 to U+1F64F
  '\ud83d[\ude80-\udeff]' // U+1F680 to U+1F6FF
];


$('#mybtn').on('click', function() {
  removeInvalidChars();
})

function removeInvalidChars() {
  var str = $('#myinput').val();

  str = str.replace(new RegExp(ranges.join('|'), 'g'), '');
  $("#myinput").val(str);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" id="myinput"/>
<input type="submit" id="mybtn" value="clear"/>

Source

来源

回答by Evangelos Aktoudianakis

I know this post is a bit old, but I stumbled across this very problem at work and a colleague came up with an interesting idea. Basically instead of stripping emoji character only allow valid characters in. Consulting this ASCII table:

我知道这篇文章有点旧,但我在工作中偶然发现了这个问题,一位同事提出了一个有趣的想法。基本上,而不是剥离表情符号字符只允许有效字符。咨询这个 ASCII 表:

http://www.asciitable.com/

http://www.asciitable.com/

A function such as this could only keep legal characters (the range itself dependent on what you are after)

像这样的函数只能保留合法字符(范围本身取决于您所追求的内容)

function (input) {
            var result = '';
            if (input.length == 0)
                return input;
            for (var indexOfInput = 0, lengthOfInput = input.length; indexOfInput < lengthOfInput; indexOfInput++) {
                var charAtSpecificIndex = input[indexOfInput].charCodeAt(0);
                if ((32 <= charAtSpecificIndex) && (charAtSpecificIndex <= 126)) {
                    result += input[indexOfInput];
                }
            }
            return result;
        };

This should preserve all numbers, letters and special characters of the Alphabet for a situation where you wish to preserve the English alphabet + number + special characters. Hope it helps someone :)

在您希望保留英文字母 + 数字 + 特殊字符的情况下,这应该保留字母表的所有数字、字母和特殊字符。希望它可以帮助某人:)

回答by aeharding

None of the answers here worked for all the unicode characters I tested (specifically characters in the miscellaneous range such as ? or ??).

这里的答案都不适用于我测试的所有 unicode 字符(特别是杂项范围内的字符,例如 ? 或 ??)。

Here is one that worked for me, (heavily) inspired from this SO PHP answer:

这是一个对我有用的方法,(很大程度上)受此SO PHP 答案的启发:

function _removeEmojis(str) {
  return str.replace(/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?/g, '');
}

(My use case is sorting in a data grid where emojis can come first in a string but users want the text ordered by the actual words.)

(我的用例是在数据网格中排序,其中表情符号可以在字符串中排在第一位,但用户希望文本按实际单词排序。)

回答by Spyryto

sandre89's answeris good but not perfect. I spent some time on the subject and have a working solution.

Sandre89 的回答很好,但并不完美。我花了一些时间在这个主题上,并有一个可行的解决方案。

var ranges = [
  '[\u00A0-\u269f]',
  '[\u26A0-\u329f]',
  // The following characters could not be minified correctly
  // if specifed with the ES6 syntax \u{1F400}
  '[-]'
  //'[\u{1F004}-\u{1F9C0}]'
];


$('#mybtn').on('click', function() {
  removeInvalidChars();
});

function removeInvalidChars() {
  var str = $('#myinput').val();
  str = str.replace(new RegExp(ranges.join('|'), 'ug'), '');
  $("#myinput").val(str);
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type="text" id="myinput" />
<input type="submit" id="mybtn" value="clear" />

Here is my CodePen

这是我的 CodePen

There are some points to note, though.

不过,有几点需要注意。

  1. Unicode characters from U+1F000up need a special notation, so you can use sandre89's way, or opt for the \u{1F000}ES6 notation, which may or may not work with your minificator. I succeeded pasting the emojis directly in the UTF-8 encoded script.

  2. Don't forget the uflag in the regex, or your Javascript engine may throw an error.

  1. Unicode 字符U+1F000需要一个特殊的符号,所以你可以使用 Sandre89 的方式,或者选择\u{1F000}ES6 符号,这可能会或可能不会与你的缩小器一起工作。我成功地将表情符号直接粘贴到 UTF-8 编码的脚本中。

  2. 不要忘记u正则表达式中的标志,否则您的 Javascript 引擎可能会抛出错误。

Beware that things may not be working due to the file encoding, character set, or minificator. In my case nothing worked until I took the script off an .ismlfile (Demandware) and pasted it into a .jsfile.

请注意,由于文件编码、字符集或缩小器,事情可能无法正常工作。在我的情况下,直到我从.isml文件(Demandware)中取出脚本并将其粘贴到.js文件中之前,什么都不起作用。

You may gain some insight by referring to Wikipedia Emoji pageand How many bytes does one Unicode character take?, and by tinkering with this Online Unicode converter, as I did.

您可以通过参考维基百科表情符号页面一个 Unicode 字符占用多少字节来获得一些见解,并像我一样修改这个在线 Unicode 转换器

回答by sin-jung-il

var emoji =/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?|[\u20E3]|[\u26A0-\u3000]|\uD83E[\udd00-\uddff]|[\u00A0-\u269F]/g;

str.replace(emoji, "");

i add this '\uD83E[\udd00-\uddff]'

我添加这个 '\uD83E[\udd00-\uddff]'

these emojis were updated when 2018 june

这些表情符号于 2018 年 6 月更新

if u want block emojis after other update then use this

如果您想在其他更新后阻止表情符号,请使用此

str.replace(/[^0-9a-zA-Z?-?+×÷=%??☆?)(*&^/~#@!-:;,?`_|<>{}¥£$◇■□●○?°※¤《》???\[\]\"\' \]/g ,"");

u can block all emojis and u can only use eng, num, hangle, and some Characters thx :)

你可以屏蔽所有表情符号,你只能使用 eng、num、hangle 和一些字符 thx :)

回答by Mahesh Thippala

You can use this function to replace emojis with nothing:

您可以使用此功能将表情符号替换为空:

function msgAfterClearEmojis(msg)
{
    var new_msg = msg.replace(/([#0-9]\u20E3)|[\xA9\xAE\u203C\u2047-\u2049\u2122\u2139\u3030\u303D\u3297\u3299][\uFE00-\uFEFF]?|[\u2190-\u21FF][\uFE00-\uFEFF]?|[\u2300-\u23FF][\uFE00-\uFEFF]?|[\u2460-\u24FF][\uFE00-\uFEFF]?|[\u25A0-\u25FF][\uFE00-\uFEFF]?|[\u2600-\u27BF][\uFE00-\uFEFF]?|[\u2900-\u297F][\uFE00-\uFEFF]?|[\u2B00-\u2BF0][\uFE00-\uFEFF]?|(?:\uD83C[\uDC00-\uDFFF]|\uD83D[\uDC00-\uDEFF])[\uFE00-\uFEFF]?|[\u20E3]|[\u26A0-\u3000]|\uD83E[\udd00-\uddff]|[\u00A0-\u269F]/g, '').trim();
    return new_msg;
}

回答by Sumit

<!DOCTYPE html>
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script>
function isEmoji(str) {
    var ranges = [       
       '[\uE000-\uF8FF]',
       '\uD83C[\uDC00-\uDFFF]',
       '\uD83D[\uDC00-\uDFFF]',
       '[\u2011-\u26FF]',
       '\uD83E[\uDD10-\uDDFF]'         
    ];
    if (str.match(ranges.join('|'))) {
        return true;
    } else {
        return false;
    }
}
$(document).ready(function(){
 $('input').on('input',function(){
    var $th = $(this);
    console.log("Value of Input"+$th.val());
    emojiInput= isEmoji($th.val());
    if (emojiInput==true) {
        $th.val("");
    }
});
});
</script>
</head>
<body>
Enter your name: <input type="text">
</body>
</html>