php 去除 HTML 和特殊字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7128856/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Strip out HTML and Special Characters
提问by Reham Fahmy
I'd like to use any php function or whatever so that i can remove any HTML code and special characters and gives me only alpha-numeric output
我想使用任何 php 函数或其他任何东西,以便我可以删除任何 HTML 代码和特殊字符,并只给我字母数字输出
$des = "Hello world)<b> (*&^%$#@! it's me: and; love you.<p>";
I want the output become Hello world it s me and love you
(just Aa-Zz-0-9-WhiteSpace)
我希望输出变成Hello world it s me and love you
(只是 Aa-Zz-0-9-WhiteSpace)
I've tried strip_tags
but it removes only HTML codes
我试过了,strip_tags
但它只删除了 HTML 代码
$clear = strip_tags($des); echo $clear;
so is there any way to do it ~Thanks
有什么办法吗~谢谢
回答by Mez
Probably better here for a regex replace
对于正则表达式替换,这里可能更好
// Strip HTML Tags
$clear = strip_tags($des);
// Clean up things like &
$clear = html_entity_decode($clear);
// Strip out any url-encoded stuff
$clear = urldecode($clear);
// Replace non-AlNum characters with space
$clear = preg_replace('/[^A-Za-z0-9]/', ' ', $clear);
// Replace Multiple spaces with single space
$clear = preg_replace('/ +/', ' ', $clear);
// Trim the string of leading/trailing space
$clear = trim($clear);
Or, in one go
或者,一口气
$clear = trim(preg_replace('/ +/', ' ', preg_replace('/[^A-Za-z0-9 ]/', ' ', urldecode(html_entity_decode(strip_tags($des))))));
回答by Matt Stein
Strip out tags, leave only alphanumeric characters and space:
去掉标签,只留下字母数字字符和空格:
$clear = preg_replace('/[^a-zA-Z0-9\s]/', '', strip_tags($des));
Edit:all credit to DaveRandom for the perfect solution...
编辑:所有归功于 DaveRandom 的完美解决方案......
$clear = preg_replace('/[^a-zA-Z0-9\s]/', '', strip_tags(html_entity_decode($des)));
回答by Jo?o Pimentel Ferreira
All the other solutions are creepy because they are from someone that arrogantly simply thinks that English is the only language in the world :)
所有其他解决方案都令人毛骨悚然,因为它们来自一个傲慢地认为英语是世界上唯一语言的人:)
All those solutions strip also diacritics like ? or à.
所有这些解决方案也剥离了变音符号,如 ? 或à。
The perfect solution, as stated in PHP documentation, is simply:
正如PHP 文档中所述,完美的解决方案很简单:
$clear = strip_tags($des);
回答by Aditya P Bhatt
In a more detailed manner from Above example, Considering below is your string:
从上面的例子中以更详细的方式,考虑下面是你的字符串:
$string = '<div>This..</div> <a>is<a/> <strong>hello</strong> <i>world</i> ! ??? ?? ????? ??????! !@#$%^&&**(*)<>?:";p[]"/.,\|`~1@#$%^&^&*(()908978867564564534423412313`1`` "Arabic Text ?? ???? test 123 ?,.m,............ ~~~ ??]??}~?]?}"; ';
Code:
代码:
echo preg_replace('/[^A-Za-z0-9 !@#$%^&*().]/u','', strip_tags($string));
Allows:
English letters (Capital and small), 0 to 9 and characters !@#$%^&*().
Allows:
英文字母(大小写)、0到9和字符 !@#$%^&*().
Removes:
All html tags, and special characters other than above
Removes:
所有 html 标签,以及上述以外的特殊字符
回答by nodws
You can do it in one single line :) specially useful for GET or POST requests
您可以在一行中完成:) 特别适用于 GET 或 POST 请求
$clear = preg_replace('/[^A-Za-z0-9\-]/', '', urldecode($_GET['id']));
回答by Viktor
Here's a function I've been using that I've put together from various threads around the net that removes everything, all tags and leaves you with a perfect phrase. Does anyone know how to modify this script to allow periods (.) ? In other words, leave everything 'as is' but leave the periods alone or other punctuation like and ! or a comma? let me know.
这是我一直在使用的一个功能,我从网络上的各种线程中组合在一起,它可以删除所有内容、所有标签,并为您留下一个完美的短语。有谁知道如何修改此脚本以允许使用句点 (.) ?换句话说,保留所有内容“原样”,但保留句号或其他标点符号,例如和!还是逗号?让我知道。
function stripAlpha( $item )
{
$search = array(
'@<script[^>]*?>.*?</script>@si' // Strip out javascript
,'@<style[^>]*?>.*?</style>@siU' // Strip style tags properly
,'@<[\/\!]*?[^<>]*?>@si' // Strip out HTML tags
,'@<![\s\S]*?–[ \t\n\r]*>@' // Strip multi-line comments including CDATA
,'/\s{2,}/'
,'/(\s){2,}/'
);
$pattern = array(
'#[^a-zA-Z ]#' // Non alpha characters
,'/\s+/' // More than one whitespace
);
$replace = array(
''
,' '
);
$item = preg_replace( $search, '', html_entity_decode( $item ) );
$item = trim( preg_replace( $pattern, $replace, strip_tags( $item ) ) );
return $item;
}
回答by Tom
to allow periodsand any other characterjust add them like so:
要允许句号和任何其他字符,只需像这样添加它们:
change: '#[^a-zA-Z ]#
'
to:'#[^a-zA-Z .()!]#
'
更改:' #[^a-zA-Z ]#
' 到:' #[^a-zA-Z .()!]#
'
回答by suika
preg_replace('/[^a-zA-Z0-9\s]/', '',$string)
this is using for removing special character only rather than space between the strings.
preg_replace('/[^a-zA-Z0-9\s]/', '',$string)
这仅用于删除特殊字符而不是字符串之间的空格。
回答by Siddharth Shukla
Remove all special character don't give space write in single line
删除所有特殊字符不要在单行中写空格
trim(preg_replace('/ +/', ' ', preg_replace('/[^A-Za-z0-9 ]/', ' ',
urldecode(html_entity_decode(strip_tags($string))))));