java 检查字母是否是表情符号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28366172/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 13:27:38  来源:igfitidea点击:

Check if letter is emoji

javaregexemoji

提问by bdv

I want to check if a letter is a emoji. I've found some similiar questions on so and found this regex:

我想检查一个字母是否是表情符号。我发现了一些类似的问题,并找到了这个正则表达式:

private final String emo_regex = "([\u20a0-\u32ff\ud83c\udc00-\ud83d\udeff\udbb9\udce5-\udbb9\udcee])";

However, when I do the following in a sentence like:

但是,当我在如下句子中执行以下操作时:

for (int k=0; k<letters.length;k++) {    
    if (letters[k].matches(emo_regex)) {
        emoticon.add(letters[k]);
    }
}

It doesn't add any letters with any emoji. I've also tried with a Matcherand a Pattern, but that didn't work either. Is there something wrong with the regex or am I missing something obvious in my code?

它不会添加任何带有任何表情符号的字母。我也试过 aMatcher和 a Pattern,但这也不起作用。正则表达式有问题还是我的代码中遗漏了一些明显的东西?

This is how I get the letter:

这是我收到这封信的方式:

sentence = "Jij staat op 10 "
String[] letters = sentence.split("");

The last should be recognized and added to emoticon

最后一个应该被识别并添加到 emoticon

采纳答案by tobias_k

It seems like those emojis are two characters long, but with split("")you are splitting between each single character, thus none of those letters can be the emoji you are looking for.

看起来这些表情符号有两个字符长,但是split("")您在每个字符之间进行拆分,因此这些字母都不是您要查找的表情符号。

Instead, you could try splitting between words:

相反,您可以尝试在单词之间拆分:

for (String word : sentence.split(" ")) {
    if (word.matches(emo_regex)) {
        System.out.println(word);
    }
}

But of course this will miss emojis that are joined to a word, or punctuation.

但这当然会错过连接到单词或标点符号的表情符号。

Alternatively, you could just use a Matcherto findany groupin the sentence that matches the regex.

或者,您可以在与正则表达式匹配的句子中使用 a Matcherto findany group

Matcher matcher = Pattern.compile(emo_regex).matcher(sentence);
while (matcher.find()) {
    System.out.println(matcher.group());
}

回答by Chaitanya

You could use emoji4jlibrary. The following should solve the issue.

您可以使用emoji4j库。以下应该可以解决问题。

String htmlifiedText = EmojiUtils.htmlify(text);
// regex to identify html entitities in htmlified text
Matcher matcher = htmlEntityPattern.matcher(htmlifiedText);

while (matcher.find()) {
    String emojiCode = matcher.group();
    if (isEmoji(emojiCode)) {

        emojis.add(EmojiUtils.getEmoji(emojiCode).getEmoji());
    }
}

回答by liheyuan

Try this project simple-emoji-4j

试试这个项目simple-emoji-4j

Compatible with Emoji 12.0 (2018.10.15)

兼容表情符号 12.0 (2018.10.15)

Simple with:

简单:

EmojiUtils.containsEmoji(str)

回答by Noamaw

This function I created checks if given String consists of only emojis. in other words if the String contains any character not included in the Regex, it will return false.

我创建的这个函数检查给定的字符串是否只包含表情符号。换句话说,如果 String 包含任何未包含在 Regex 中的字符,它将返回 false。

private static boolean isEmoji(String message){
    return message.matches("(?:[\uD83C\uDF00-\uD83D\uDDFF]|[\uD83E\uDD00-\uD83E\uDDFF]|" +
            "[\uD83D\uDE00-\uD83D\uDE4F]|[\uD83D\uDE80-\uD83D\uDEFF]|" +
            "[\u2600-\u26FF]\uFE0F?|[\u2700-\u27BF]\uFE0F?|\u24C2\uFE0F?|" +
            "[\uD83C\uDDE6-\uD83C\uDDFF]{1,2}|" +
            "[\uD83C\uDD70\uD83C\uDD71\uD83C\uDD7E\uD83C\uDD7F\uD83C\uDD8E\uD83C\uDD91-\uD83C\uDD9A]\uFE0F?|" +
            "[\u0023\u002A\u0030-\u0039]\uFE0F?\u20E3|[\u2194-\u2199\u21A9-\u21AA]\uFE0F?|[\u2B05-\u2B07\u2B1B\u2B1C\u2B50\u2B55]\uFE0F?|" +
            "[\u2934\u2935]\uFE0F?|[\u3030\u303D]\uFE0F?|[\u3297\u3299]\uFE0F?|" +
            "[\uD83C\uDE01\uD83C\uDE02\uD83C\uDE1A\uD83C\uDE2F\uD83C\uDE32-\uD83C\uDE3A\uD83C\uDE50\uD83C\uDE51]\uFE0F?|" +
            "[\u203C\u2049]\uFE0F?|[\u25AA\u25AB\u25B6\u25C0\u25FB-\u25FE]\uFE0F?|" +
            "[\u00A9\u00AE]\uFE0F?|[\u2122\u2139]\uFE0F?|\uD83C\uDC04\uFE0F?|\uD83C\uDCCF\uFE0F?|" +
            "[\u231A\u231B\u2328\u23CF\u23E9-\u23F3\u23F8-\u23FA]\uFE0F?)+");
}

Example of implementation:

实施例:

public static int detectEmojis(String message){
    int len = message.length(), NumEmoji = 0;
    // if the the given String is only emojis.
    if(isEmoji(message)){
        for (int i = 0; i < len; i++) {
            // if the charAt(i) is an emoji by it self -> ++NumEmoji
            if (isEmoji(message.charAt(i)+"")) {
                NumEmoji++;
            } else {
                // maybe the emoji is of size 2 - so lets check.
                if (i < (len - 1)) { // some Emojis are two characters long in java, e.g. a rocket emoji is "\uD83D\uDE80";
                    if (Character.isSurrogatePair(message.charAt(i), message.charAt(i + 1))) {
                        i += 1; //also skip the second character of the emoji
                        NumEmoji++;
                    }
                }
            }
        }
        return NumEmoji;
    }
    return 0;
}

given is a function that runs on a string (of only emojis) and return the number of emojis in it. (with the help of other answers i found here on StackOverFlow).

given 是一个在字符串(只有表情符号)上运行并返回其中表情符号数量的函数。(在我在 StackOverFlow 上找到的其他答案的帮助下)。

回答by user2474486

You can use Characterclass for determining is letter is part of surrogate pair. There some helpful methods to deal with surrogate pairs emoji symbols, for example:

您可以使用Character类来确定字母是否是代理对的一部分。有一些有用的方法可以处理代理对表情符号,例如:

String text = "";
if (text.length() > 1 && Character.isSurrogatePair(text.charAt(0), text.charAt(1))) {
    int codePoint = Character.toCodePoint(text.charAt(0), text.charAt(1));
    char[] c = Character.toChars(codePoint);
}

回答by slim

It's worth bearing in mind that Java code can be written in Unicode. So you canjust do:

值得记住的是,Java 代码可以用 Unicode 编写。所以你可以这样做:

@Test
public void containsEmoji_detects_smileys() {
    assertTrue(containsEmoji("This  is a smiley "));
    assertTrue(containsEmoji("This  is a different smiley"));
    assertFalse(containsEmoji("No smiley here"));
}

private boolean containsEmoji(String s) {
    String pattern = ".*[].*";
    return s.matches(pattern);
}

Although see: Should source code be saved in UTF-8 formatfor discussion on whether that's a good idea.

虽然请参阅:是否应将源代码保存为 UTF-8 格式以讨论这是否是一个好主意。



You can split a String into Unicode codepoints in Java 8 using String.codePoints(), which returns an IntStream. That means you can do something like:

您可以在 Java 8 中使用 将字符串拆分为 Unicode 代码点String.codePoints(),它返回一个IntStream. 这意味着您可以执行以下操作:

Set<Integer> emojis = new HashSet<>();
emojis.add("".codePointAt(0));
emojis.add("".codePointAt(0));
String s = "1345";
s.codePoints().forEach( codepoint -> {
    System.out.println(
        new String(Character.toChars(codepoint)) 
        + " " 
        + emojis.contains(codepoint));
});

... prints ...

... 印刷 ...

1 false
 true
3 false
4 false
 true
5 false

Of course if you prefer not to have literal unicode chars in your code you can just put numbers in your set:

当然,如果您不想在代码中包含文字 unicode 字符,则可以将数字放入集合中:

emojis.add(0x1F601);