在 PHP 函数中从推文中检索所有主题标签

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3060601/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 08:34:51  来源:igfitidea点击:

Retrieve all hashtags from a tweet in a PHP function

phpregextwitter

提问by snorpey

I want to retrieve all hashtags from a tweet using a PHP function.

我想使用 PHP 函数从推文中检索所有主题标签。

I know someone asked a similar question here, but there is no hint how exactly to implement this in PHP. Since I'm not very familiar with regular expressions, don't know how to write a function that returns an array of all hashtags in a tweet.

我知道有人在这里问过类似的问题,但没有提示如何在 PHP 中实现这一点。由于我对正则表达式不是很熟悉,不知道如何编写一个函数来返回推文中所有主题标签的数组。

So how do I do this, using the following regular expression:

那么我该如何做到这一点,使用以下正则表达式:

#\S*\w

回答by trante

I created my own solution. It does:

我创建了自己的解决方案。它确实:

  • Finds all hashtags in a string
  • Removes duplicate ones
  • Sorts hashtags regarding to count of the existence in text
  • Supports unicode characters

    function getHashtags($string) {  
        $hashtags= FALSE;  
        preg_match_all("/(#\w+)/u", $string, $matches);  
        if ($matches) {
            $hashtagsArray = array_count_values($matches[0]);
            $hashtags = array_keys($hashtagsArray);
        }
        return $hashtags;
    }
    
  • 查找字符串中的所有主题标签
  • 删除重复的
  • 根据文本中存在的计数对主题标签进行排序
  • 支持unicode字符

    function getHashtags($string) {  
        $hashtags= FALSE;  
        preg_match_all("/(#\w+)/u", $string, $matches);  
        if ($matches) {
            $hashtagsArray = array_count_values($matches[0]);
            $hashtags = array_keys($hashtagsArray);
        }
        return $hashtags;
    }
    

Output is like this:

输出是这样的:

(
    [0] => #_?OllOw_
    [1] => #FF
    [2] => #neslitükendi
    [3] => #F_0_L_L_O_W_
    [4] => #takipede?erdost
    [5] => #G?nüldenTakiple?iyorum
)

回答by Cups

$tweet = "this has a #hashtag a  #badhash-tag and a #goodhash_tag";

preg_match_all("/(#\w+)/", $tweet, $matches);

var_dump( $matches );

*Dashes are illegal chars for hashtags, underscores are allowed.

*破折号是标签的非法字符,允许使用下划线。

回答by minaz

Don't forget about hashtags that contain unicode, numeric values and underscores:

不要忘记包含 unicode、数值和下划线的主题标签:

$tweet = "Valid hashtags include: #hashtag #NYC2016 #NYC_2016 #g?yp?landet!";

preg_match_all('/#([\p{Pc}\p{N}\p{L}\p{Mn}]+)/u', $tweet, $matches);

print_r( $matches );

\p{Pc} - to match underscore

\p{Pc} - 匹配下划线

\p{N} - numeric character in any script

\p{N} - 任何脚本中的数字字符

\p{L} - letter from any language

\p{L} - 任何语言的字母

\p{Mn} - any non marking space (accents, umlauts, etc)

\p{Mn} - 任何非标记空间(重音、变音等)

回答by Wireblue

Try this regular expression:

试试这个正则表达式:

/#[^\s]*/i

Or use this if there are multiple hash tags joined together (eg. #foo#bar).

或者,如果有多个哈希标签连接在一起(例如#foo#bar),则使用它。

/#[^\s#]*/i

Running it PHP would look like:

运行它 PHP 看起来像:

preg_match_all('/#[^\s#]*/i', $tweet_string, $result);

The result is an array containing all the hashtags in the Tweet (saved as "$result" - the third argument).

结果是一个包含推文中所有主题标签的数组(保存为“$result” - 第三个参数)。

Lastly, check out this site. I've found it really handy for testing regular expressions. http://regex.larsolavtorvik.com/

最后,看看这个网站。我发现它对于测试正则表达式非常方便。http://regex.larsolavtorvik.com/

EDIT: I tried your regular expression and it worked great too!

编辑:我试过你的正则表达式,效果也很好!

EDIT 2: Added another regex to extract hash tags, even if they're consecutive.

编辑 2:添加另一个正则表达式来提取哈希标签,即使它们是连续的。

回答by BoltClock

Use the preg_match_all()function:

使用preg_match_all()函数:

function get_hashtags($tweet)
{
    $matches = array();
    preg_match_all('/#\S*\w/i', $tweet, $matches);
    return $matches[0];
}