HTML:我应该编码大于还是不大于?( > > )

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9010678/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 22:06:56  来源:igfitidea点击:

HTML: Should I encode greater than or not? ( > > )

htmlencodingxss

提问by Bryan Field

When encoding possibly unsafe data, is there a reason to encode >?

当编码可能不安全的数据时,是否有理由编码>

  • It validateseither way.
  • The browser interprets the same either way, (In the cases of attr="data", attr='data', <tag>data</tag>)
  • 它以任何一种方式验证
  • 浏览器以任何一种方式解释相同,(在attr="data", attr='data',的情况下<tag>data</tag>

I think the reasons somebody would do this are

我认为有人会这样做的原因是

  • To simplify regex based tag removal. <[^>]+>?(rare)
  • Non-quoted strings attr=data. :-o(not happening!)
  • Aesthetics in the code. (so what?)
  • 简化基于正则表达式的标签删除。<[^>]+>?(稀有的)
  • 非引号字符串attr=data:-o(没有发生!)
  • 代码中的美学。(所以呢?)

Am I missing anything?

我错过了什么吗?

采纳答案by Niet the Dark Absol

Strictly speaking, to prevent HTML injection, you need only encode <as &lt;.

严格来说,为了防止 HTML 注入,你只需要编码<&lt;.

If user input is going to be put in an attribute, also encode "as &quot;.

如果用户输入将被放入一个属性中,也编码"&quot;.

If you're doing things right and using properly quoted attributes, you don't need to worry about >. However, if you're not certain of this you should encode it just for peace of mind - it won't do any harm.

如果您做对了事情并使用了正确引用的属性,则无需担心>. 但是,如果您不确定这一点,您应该对其进行编码以确保安心 - 它不会造成任何伤害。

回答by Basile Starynkevitch

The HTML4specification in its section 5.3.2 says that

HTML4在其第5.3.2节规范指出,

authors should use "&gt;" (ASCII decimal 62) in text instead of ">"

作者应该&gt;在文本中使用“ ”(ASCII 十进制 62)而不是“>”

so I believe you should encodethe greater >sign as &gt;(because you should obey the standards).

所以我相信你应该将更大的>符号编码&gt;(因为你应该遵守标准)。

回答by user123444555621

Current browsers' HTML parsers have no problems with uquoted >s

当前浏览器的 HTML 解析器对 uquoted >s没有问题

However, unfortunately, using regular expressions to "parse"HTML in JS is pretty common. (example: Ext.util.Format.stripTags). Also poorly written command line tools, IDEs, or Java classes etc. may not be sophisticated enough to determine the limiter of an opening tag.

然而,不幸的是,在 JS 中使用正则表达式“解析”HTML 是很常见的。(例如:Ext.util.Format.stripTags)。此外,编写不当的命令行工具、IDE 或 Java 类等可能不够复杂,无法确定开始标记的限制器。

So, you may run into problems with code like this:

所以,你可能会遇到这样的代码问题:

<script data-usercontent=">malicious();//"></script>

(Note how the syntax highlighter treats this snippet!)

(注意语法高亮是如何处理这个片段的!)

回答by coder

Yes, because if signs were not encoded, this allows xss on forms social media and many other because a attacker can use <script>tag. If you parse the signs the browser would not execute it but instead show the sign.

是的,因为如果没有对符号进行编码,这将允许在社交媒体和许多其他表单上使用 xss,因为攻击者可以使用<script>标签。如果您解析标志,浏览器将不会执行它,而是显示标志。

回答by mrlee

Always

总是

This is to prevent XSSinjections (through users using any of your forms to submit raw HTML or javascript). By escaping your output, the browser knows not to parse or execute any of it - only display it as text.

这是为了防止XSS注入(通过用户使用您的任何表单提交原始 HTML 或 javascript)。通过转义您的输出,浏览器知道不解析或执行其中的任何内容 - 仅将其显示为文本。

This may feel like less of an issue if you're not dealing with dynamic output based on user input, however it's important to at least understand, if not to make a good habit.

如果您不处理基于用户输入的动态输出,这可能感觉不是什么问题,但是如果不是养成一个好习惯,至少理解很重要。

回答by albanx

Encoding html chars is always a delicate job. You should always encodewhat needs to be encoded and always use standards. Using double quotes is standard, and even quotes inside double quotes should be encoded. ENCODE always. Imagine something like this

编码 html 字符始终是一项微妙的工作。您应该始终对需要编码的内容进行编码并始终使用标准。使用双引号是标准的,甚至双引号内的引号也应该被编码。始终编​​码。想象一下这样的事情

<div> this is my text an img></div>

Probably the img> will be parsed from the browser as an image tag. Browsers always try to resolve unclosed tags or quotes. As basile says use standards, otherwise you could have unexpected results without understanding the source of errors.

可能 img> 将从浏览器中解析为图像标签。浏览器总是尝试解析未关闭的标签或引号。正如 basile 所说的使用标准,否则你可能会在不了解错误来源的情况下得到意想不到的结果。