python 在 Django 中对富文本字段使用安全过滤器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1414986/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using safe filter in Django for rich text fields
提问by Ned Batchelder
I am using TinyMCEeditor for textarea fileds in Djangoforms.
我正在将TinyMCE编辑器用于Django表单中的textarea 文件。
Now, in order to display the rich text back to the user, I am forced to use the "safe" filter in Django templates so that HTML rich text can be displayed on the browser.
现在,为了向用户显示富文本,我被迫在 Django 模板中使用“安全”过滤器,以便可以在浏览器上显示 HTML 富文本。
Suppose JavaScript is disabled on the user's browser, TinyMCE won't load and the user could pass <script>
or other XSStags from such a textarea field. Such HTML won't be safe to display back to the User.
假设在用户的浏览器上禁用了 JavaScript,TinyMCE 不会加载并且用户可以从这样的 textarea 字段传递<script>
或其他XSS标签。这样的 HTML 不能安全地显示给用户。
How do I take care of such unsafe HTML Text that doesn't come from TinyMCE?
我如何处理这种不是来自 TinyMCE 的不安全 HTML 文本?
回答by Ned Batchelder
You are right to be concerned about raw HTML, but not just for Javascript-disabled browsers. When considering the security of your server, you have to ignore any work done in the browser, and look solely at what the server accepts and what happens to it. Your server accepts HTML and displays it on the page. This is unsafe.
您关心原始 HTML 是正确的,但不仅仅是针对禁用 Javascript 的浏览器。在考虑服务器的安全性时,您必须忽略在浏览器中完成的任何工作,而只查看服务器接受什么以及对它发生了什么。您的服务器接受 HTML 并将其显示在页面上。这是不安全的。
The fact that TinyMce quotes HTML is a false security: the server trusts what it accepts, which it should not.
TinyMce 引用 HTML 的事实是一种虚假的安全性:服务器信任它接受的内容,而不信任它。
The solution to this is to process the HTML when it arrives, to remove dangerous constructs. This is a complicated problem to solve. Take a look at the XSS Cheat Sheetto see the wide variety of inputs that could cause a problem.
对此的解决方案是在 HTML 到达时对其进行处理,以删除危险的结构。这是一个需要解决的复杂问题。查看XSS 备忘单,了解可能导致问题的各种输入。
lxml has a function to clean HTML: http://lxml.de/lxmlhtml.html#cleaning-up-html, but I've never used it, so I can't vouch for its quality.
lxml 有一个清理 HTML 的功能:http: //lxml.de/lxmlhtml.html#cleaning-up-html,但我从未使用过它,所以我不能保证它的质量。
回答by seddonym
Use django-bleach. This provides you with a bleach
template filter that allows you to filter out just the tags you want:
使用django-bleach。这为您提供了一个bleach
模板过滤器,允许您仅过滤掉您想要的标签:
{% load bleach_tags %}
{{ mymodel.my_html_field|bleach }}
The trick is to configure the editor to produce the same tags as you're willing to 'let through' in your bleach settings.
诀窍是配置编辑器以生成与您愿意在漂白设置中“通过”相同的标签。
Here's an example of my bleach settings:
这是我的漂白设置示例:
# Which HTML tags are allowed
BLEACH_ALLOWED_TAGS = ['p', 'h3', 'h4', 'em', 'strong', 'a', 'ul', 'ol', 'li', 'blockquote']
# Which HTML attributes are allowed
BLEACH_ALLOWED_ATTRIBUTES = ['href', 'title', 'name']
BLEACH_STRIP_TAGS = True
Then you can configure TinyMCE (or whatever WYSIWYG editor you're using) only to have the buttons that create the allowed tags.
然后,您可以配置 TinyMCE(或您正在使用的任何 WYSIWYG 编辑器),只使用创建允许标签的按钮。
回答by AbeEstrada
You can use the template filter "removetags" and just remove 'script'.
您可以使用模板过滤器“ removetags”并删除“脚本”。
Note that removetags
has been removed from Django 2.0. Here is the deprecation notice from the docs:
请注意,removetags
已从 Django 2.0 中删除。这是文档中的弃用通知:
Deprecated since version 1.8:
removetags
cannot guarantee HTML safe output and has been deprecated due to security concerns. Consider usingbleach
instead.
1.8 版后已弃用:
removetags
无法保证 HTML 安全输出,出于安全考虑已弃用。考虑bleach
改用。
回答by Paul McMillan
There isn't a good answer to this one. TinyMCE generates HTML, and django's auto-escape specifically removes HTML.
这个没有很好的答案。TinyMCE 生成 HTML,django 的自动转义专门去除 HTML。
The traditional solution to this problem has been to either use some non-html markup language in the user input side (bbcode, markdown, etc.) or to whitelist a limited number of HTML tags. TinyMCE/HTML are generally only appropriate input solutions for more or less trusted users.
这个问题的传统解决方案是在用户输入端使用一些非 html 标记语言(bbcode、markdown 等),或者将有限数量的 HTML 标签列入白名单。TinyMCE/HTML 通常只是适合或多或少受信任用户的输入解决方案。
The whitelist approach is tricky to implement without any security holes. The one thing you don't want to do is try to just detect "bad" tags - you WILL miss edge cases.
白名单方法很难在没有任何安全漏洞的情况下实现。您不想做的一件事就是尝试只检测“坏”标签——您将错过边缘情况。