php 如何清理用户提交的 url?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11780976/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 02:04:53  来源:igfitidea点击:

How do I sanitize a user submitted url?

phpregexsecurityurl

提问by Anonymous

I want to store users' personal urls as plain text, encoded by htmlspecialchars().

我想将用户的个人 url 存储为纯文本,由 htmlspecialchars() 编码。

Then I would retrieve this data and generate and display a link, as follows:

然后我将检索这些数据并生成并显示一个链接,如下所示:

echo '<a href="'.$retrieved_string.'" target="_blank">';

And yet, even with encoded special chars and quotes, the href may not be safe, due to the potentially inserted javascript, example of a bad link:

然而,即使使用编码的特殊字符和引号,href 也可能不安全,因为可能插入了 javascript,例如错误链接:

javascript:alert(document.cookie);

So what I'm thinking is to strip up for a potential 'javascript' tag (before I do the special chars encode of course), as follows:

所以我在想的是去掉一个潜在的“javascript”标签(当然在我做特殊字符编码之前),如下所示:

preg_replace('/^javascript:?/', '', $submitted_and_trimmed_input);

So let us sum it up altogether:

所以让我们总结一下:

$input=htmlspecialchars(preg_replace('/^javascript:?/', '', trim($_POST['link'])),11,'UTF-8',true);
mysql_query("update users set link='".mysql_real_escape_string($input)."'");

//And retrieving:

$query=mysql_query("select link from users");
$a=mysql_fetch_assoc($query);
echo '<a href="'.$a['link'].'" target="_blank">';

Now the question is, would it be enough to an url link safe, or is there any other potential surprises I should be alert against?

现在的问题是,它是否足以使 url 链接安全,或者是否还有其他任何我应该警惕的潜在意外?

EDIT:

编辑:

I've read a bit about filter_var() and it seems to utterly fail in many ways. It doesn't validate international domains with unicode chars, then again the following string successfully passes the test:

我已经阅读了一些关于 filter_var() 的内容,它似乎在很多方面都完全失败了。它不使用 unicode 字符验证国际域,然后以下字符串再次成功通过测试:

http://example.com/"><script>alert(document.cookie)</script>
  • I mean common... that's just rediculous, there must be a better way
  • 我的意思是普通……那太可笑了,一定有更好的方法

采纳答案by Anonymous

This is how I'm gonna do it. It looks to me the best way is to prepend it with http:

这就是我要做的。在我看来,最好的方法是在它前面加上 http:

$link=preg_replace('/^(http(s)?)?:?\/*/u','http://',trim($_POST['website']));

So even if a script gets there I couldn't care less. Then actually convert chars:

因此,即使脚本到达那里,我也毫不在意。然后实际转换字符:

$link= htmlspecialchars($link, 11,'UTF-8',true);

That's it. No beating around the bush, and should be utf-8 compat also.

就是这样。不要绕圈子,也应该是 utf-8 兼容的。

回答by John Conde

Try using filter_var()

尝试使用 filter_var()

filter_var('http://example.com', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED)