在 PHP 函数中从推文中检索所有主题标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3060601/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Retrieve all hashtags from a tweet in a PHP function
提问by snorpey
I want to retrieve all hashtags from a tweet using a PHP function.
我想使用 PHP 函数从推文中检索所有主题标签。
I know someone asked a similar question here, but there is no hint how exactly to implement this in PHP. Since I'm not very familiar with regular expressions, don't know how to write a function that returns an array of all hashtags in a tweet.
我知道有人在这里问过类似的问题,但没有提示如何在 PHP 中实现这一点。由于我对正则表达式不是很熟悉,不知道如何编写一个函数来返回推文中所有主题标签的数组。
So how do I do this, using the following regular expression:
那么我该如何做到这一点,使用以下正则表达式:
#\S*\w
回答by trante
I created my own solution. It does:
我创建了自己的解决方案。它确实:
- Finds all hashtags in a string
- Removes duplicate ones
- Sorts hashtags regarding to count of the existence in text
Supports unicode characters
function getHashtags($string) { $hashtags= FALSE; preg_match_all("/(#\w+)/u", $string, $matches); if ($matches) { $hashtagsArray = array_count_values($matches[0]); $hashtags = array_keys($hashtagsArray); } return $hashtags; }
- 查找字符串中的所有主题标签
- 删除重复的
- 根据文本中存在的计数对主题标签进行排序
支持unicode字符
function getHashtags($string) { $hashtags= FALSE; preg_match_all("/(#\w+)/u", $string, $matches); if ($matches) { $hashtagsArray = array_count_values($matches[0]); $hashtags = array_keys($hashtagsArray); } return $hashtags; }
Output is like this:
输出是这样的:
(
[0] => #_?OllOw_
[1] => #FF
[2] => #neslitükendi
[3] => #F_0_L_L_O_W_
[4] => #takipede?erdost
[5] => #G?nüldenTakiple?iyorum
)
回答by Cups
$tweet = "this has a #hashtag a #badhash-tag and a #goodhash_tag";
preg_match_all("/(#\w+)/", $tweet, $matches);
var_dump( $matches );
*Dashes are illegal chars for hashtags, underscores are allowed.
*破折号是标签的非法字符,允许使用下划线。
回答by minaz
Don't forget about hashtags that contain unicode, numeric values and underscores:
不要忘记包含 unicode、数值和下划线的主题标签:
$tweet = "Valid hashtags include: #hashtag #NYC2016 #NYC_2016 #g?yp?landet!";
preg_match_all('/#([\p{Pc}\p{N}\p{L}\p{Mn}]+)/u', $tweet, $matches);
print_r( $matches );
\p{Pc} - to match underscore
\p{Pc} - 匹配下划线
\p{N} - numeric character in any script
\p{N} - 任何脚本中的数字字符
\p{L} - letter from any language
\p{L} - 任何语言的字母
\p{Mn} - any non marking space (accents, umlauts, etc)
\p{Mn} - 任何非标记空间(重音、变音等)
回答by Wireblue
Try this regular expression:
试试这个正则表达式:
/#[^\s]*/i
Or use this if there are multiple hash tags joined together (eg. #foo#bar).
或者,如果有多个哈希标签连接在一起(例如#foo#bar),则使用它。
/#[^\s#]*/i
Running it PHP would look like:
运行它 PHP 看起来像:
preg_match_all('/#[^\s#]*/i', $tweet_string, $result);
The result is an array containing all the hashtags in the Tweet (saved as "$result" - the third argument).
结果是一个包含推文中所有主题标签的数组(保存为“$result” - 第三个参数)。
Lastly, check out this site. I've found it really handy for testing regular expressions. http://regex.larsolavtorvik.com/
最后,看看这个网站。我发现它对于测试正则表达式非常方便。http://regex.larsolavtorvik.com/
EDIT: I tried your regular expression and it worked great too!
编辑:我试过你的正则表达式,效果也很好!
EDIT 2: Added another regex to extract hash tags, even if they're consecutive.
编辑 2:添加另一个正则表达式来提取哈希标签,即使它们是连续的。
回答by BoltClock
Use the preg_match_all()function:
使用preg_match_all()函数:
function get_hashtags($tweet)
{
$matches = array();
preg_match_all('/#\S*\w/i', $tweet, $matches);
return $matches[0];
}

