如何使用 C# 清理 html 页面上的输入?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/188870/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 17:18:41  来源:igfitidea点击:

How to use C# to sanitize input on an html page?

c#html-sanitizingantixsslibrary

提问by NotMe

Is there a library or acceptable method for sanitizing the input to an html page?

是否有用于清理 html 页面输入的库或可接受的方法?

In this case I have a form with just a name, phone number, and email address.

在这种情况下,我有一个只有姓名、电话号码和电子邮件地址的表单。

Code must be C#.

代码必须是 C#。

For example:

例如:

"<script src='bobs.js'>John Doe</script>"should become "John Doe"

"<script src='bobs.js'>John Doe</script>"应该成为 "John Doe"

采纳答案by Julian

We are using the HtmlSanitizer.Net library, which:

我们正在使用HtmlSanitizer.Net 库,它:

Also on NuGet

也在NuGet 上

回答by Mitchel Sellers

If by sanitize you mean REMOVE the tags entirely, the RegEx example referenced by Bryant is the type of solution you want.

如果 sanitize 是指完全删除标签,那么 Bryant 引用的 RegEx 示例就是您想要的解决方案类型。

If you just want to ensure that the code DOESN'T mess with your design and render to the user. You can use the HttpUtility.HtmlEncode method to prevent against that!

如果您只是想确保代码不会干扰您的设计并呈现给用户。您可以使用 HttpUtility.HtmlEncode 方法来防止这种情况发生!

回答by Joel Coehoorn

Based on the comment you made to this answer, you might find some useful info in this question:
https://stackoverflow.com/questions/72394/what-should-a-developer-know-before-building-a-public-web-site

根据您对此答案的评论,您可能会在此问题中找到一些有用的信息:https:
//stackoverflow.com/questions/72394/what-should-a-developer-know-before-building-a-public-网站

Here's a parameterized query example. Instead of this:

这是一个参数化查询示例。取而代之的是:

string sql = "UPDATE UserRecord SET FirstName='" + txtFirstName.Text + "' WHERE UserID=" + UserID;

Do this:

做这个:

SqlCommand cmd = new SqlCommand("UPDATE UserRecord SET FirstName= @FirstName WHERE UserID= @UserID");
cmd.Parameters.Add("@FirstName", SqlDbType.VarChar, 50).Value = txtFirstName.Text;
cmd.Parameters.Add("@UserID", SqlDbType.Integer).Value = UserID;


Edit: Since there was no injection, I removed the portion of the answer dealing with that. I left the basic parameterized query example, since that may still be useful to anyone else reading the question.
--Joel

编辑:由于没有注入,我删除了处理该问题的答案部分。我离开了基本的参数化查询示例,因为它可能对其他阅读问题的人仍然有用。
--乔尔

回答by Jeremy Cook

It sounds like you have users that submit content but you cannot fully trust them, and yet you still want to render the content they provide as super safe HTML. Here are three techniques: HTML encode everything, HTML encode and/or remove just the evil parts, or use a DSL that compiles to HTML you are comfortable with.

听起来您有提交内容的用户,但您不能完全信任他们,但您仍然希望将他们提供的内容呈现为超级安全的 HTML。以下是三种技术:HTML 编码所有内容、HTML 编码和/或仅删除有害部分,或使用可编译为您喜欢的 HTML 的 DSL。

  1. Should it become "John Doe"? I would HTML encodethat string and let the user, "John Doe" (if indeed that is his real name...), have the stupid looking name <script src='bobs.js'>John Doe</script>. He shouldn't have wrapped his name in script tags or any tags in the first place. This is the approach I use in all cases unless there is a really good business case for one of the other techniques.

  2. Accept HTML from the user and then sanitize it (on output) using a whitelist approach like the sanitization method@Bryant mentioned. Getting this right is (extremely) hard, and I defer pulling that off to greater minds. Note that some sanitizers will HTML encode evil where others would have removed the offending bits completely.

  3. Another approach is to use a DSL that "compiles" to HTML. Make sure to whitehatyour DSL compiler because some (like MarkdownSharp) will allow arbitrary HTML like <script>tags and evil attributes through unencoded (which by the way is perfectly reasonable but may not be what youneed or expect). If that is the case you will need to use technique #2 and sanitize what your compiler outputs.

  1. 它应该变成“John Doe”吗?我会对该字符串进行 HTML 编码,并让用户“John Doe”(如果这确实是他的真名...)拥有看起来很愚蠢的 name <script src='bobs.js'>John Doe</script>。他一开始就不应该用脚本标签或任何标签包裹他的名字。这是我在所有情况下都使用的方法,除非对于其他技术之一有非常好的商业案例。

  2. 接受来自用户的 HTML,然后使用白名单方法(如@Bryant 提到的清理方法)对其进行清理(在输出时)。正确地做到这一点(非常)困难,我推迟将其付诸于更大的头脑。请注意,某些消毒剂会将 HTML 编码为邪恶,而其他消毒剂会完全删除违规位。

  3. 另一种方法是使用“编译”为 HTML 的 DSL。确保你的 DSL 编译器白帽,因为有些(如MarkdownSharp)将允许<script>通过未编码的任意 HTML标签和邪恶属性(顺便说一句,这是完全合理的,但可能不是需要或期望的)。如果是这种情况,您将需要使用技术 #2 并清理编译器输出的内容。

Closing thoughts:

结语: