使用 C# 正则表达式替换 XML 元素内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/448376/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 03:46:13  来源:igfitidea点击:

Using C# Regular expression to replace XML element content

c#.netxmlregexparsing

提问by Millhouse

I'm writing some code that handles logging xml data and I would like to be able to replace the content of certain elements (eg passwords) in the document. I'd rather not serialize and parse the document as my code will be handling a variety of schemas.

我正在编写一些处理记录 xml 数据的代码,我希望能够替换文档中某些元素(例如密码)的内容。我宁愿不序列化和解析文档,因为我的代码将处理各种模式。

Sample input documents:

示例输入文档:

doc #1:

文档#1:

   <user>
       <userid>jsmith</userid>
       <password>myPword</password>
    </user>

doc #2:

文档#2:

<secinfo>
       <ns:username>jsmith</ns:username>
       <ns:password>myPword</ns:password>
 </secinfo>

What I'd like my output to be:

我希望我的输出是:

output doc #1:

输出文档#1:

<user>
       <userid>jsmith</userid>
       <password>XXXXX</password>
 </user>

output doc #2:

输出文档#2:

<secinfo>
       <ns:username>jsmith</ns:username>
       <ns:password>XXXXX</ns:password>
 </secinfo>

Since the documents I'll be processing could have a variety of schemas, I was hoping to come up with a nice generic regular expression solution that could find elements with password in them and mask the content accordingly.

由于我将要处理的文档可能具有多种模式,因此我希望提出一个很好的通用正则表达式解决方案,该解决方案可以在其中找到带有密码的元素并相应地屏蔽内容。

Can I solve this using regular expressions and C# or is there a more efficient way?

我可以使用正则表达式和 C# 解决这个问题还是有更有效的方法?

采纳答案by Andrew Hare

This problem is best solved with XSLT:

这个问题最好用 XSLT 解决:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="//password">
        <xsl:copy>
            <xsl:text>XXXXX</xsl:text>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

This will work for both inputs as long as you handle the namespaces properly.

只要您正确处理命名空间,这将适用于两个输入。

Edit : Clarification of what I mean by "handle namespaces properly"

编辑:澄清我所说的“正确处理命名空间”

Make sure your source document that has the nsname prefix has as namespace defined for the document like so:

确保具有ns名称前缀的源文档具有为文档定义的命名空间,如下所示:

<?xml version="1.0" encoding="utf-8"?>
<secinfo xmlns:ns="urn:foo">
    <ns:username>jsmith</ns:username>
    <ns:password>XXXXX</ns:password>
</secinfo>

回答by Welbog

I'd say you're better off parsing the content with a .NET XmlDocument object and finding password elements using XPath, then changing their innerXML properties. It has the advantage of being more correct (since XML isn't regular in the first place), and it's conceptually easy to understand.

我认为您最好使用 .NET XmlDocument 对象解析内容并使用 XPath 查找密码元素,然后更改它们的 innerXML 属性。它的优点是更正确(因为 XML 一开始就不是常规的),并且在概念上很容易理解。

回答by John Conrad

You canuse regular expressions if you know enough about what you are trying to match. For example if you are looking for any tag that has the word "password" in it with no inner tags this regex expression would work:

可以,如果你足够了解你正在尝试以匹配使用正则表达式。例如,如果您正在寻找任何包含“密码”一词且没有内部标签的标签,则此正则表达式将起作用:

(<([^>]*?password[^>]*?)>)([^<]*?)(<\/>)

You could use the same C# replace statement in zowat's answer as well but for the replace string you would want to use "$1XXXXX$4" instead.

您也可以在 zowat 的答案中使用相同的 C# 替换语句,但对于替换字符串,您可能希望使用 "$1XXXXX$4"。

回答by Michael Kohne

From experience with systems that try to parse and/or modify XML without proper parsers, let me say: DON'T DO IT. Use an XML parser (There are other answers here that have ways to do that quickly and easily).

根据尝试在没有适当解析器的情况下解析和/或修改 XML 的系统的经验,让我说:不要这样做。使用 XML 解析器(这里有其他答案可以快速轻松地做到这一点)。

Using non-xml methods to parse and/or modify an XML stream will ALWAYS lead you to pain at some point in the future. I know, because I have felt that pain.

使用非 xml 方法解析和/或修改 XML 流在将来的某个时刻总是会让您感到痛苦。我知道,因为我曾感受到那种痛苦。

I know that it seems like it would be quicker-at-runtime/simpler-to-code/easier-to-understand/whatever if you use the regex solution. But you're just going to make someone's life miserable later.

我知道,如果您使用正则表达式解决方案,它似乎在运行时更快/代码更简单/更容易理解/无论如何。但是你以后只会让某人的生活变得悲惨。

回答by Kev

Regex is the wrong approach for this, I've seen it go so badly wrong when you least expect it.

正则表达式是错误的方法,我已经看到它在你最不期望的时候出错了。

XDocument is way more fun anyway:

无论如何,XDocument 更有趣:

XDocument doc = XDocument.Parse(@"
            <user>
                <userid>jsmith</userid>
                <password>password</password>
            </user>");

doc.Element("user").Element("password").Value = "XXXX";

// Temp namespace just for the purposes of the example -
XDocument doc2 = XDocument.Parse(@"
            <secinfo xmlns:ns='http://tempuru.org/users'>
                <ns:userid>jsmith</ns:userid>
                <ns:password>password</ns:password>
            </secinfo>");

doc2.Element("secinfo").Element("{http://tempuru.org/users}password").Value = "XXXXX";

回答by Millhouse

Here is what I came up with when I went with XMLDocument, it may not be as slick as XSLT, but should be generic enough to handle a variety of documents:

这是我在使用 XMLDocument 时想到的,它可能不像 XSLT 那样圆滑,但应该足够通用以处理各种文档:

            //input is a String with some valid XML
            XmlDocument doc = new XmlDocument();
            doc.LoadXml(input);
            XmlNodeList nodeList = doc.SelectNodes("//*");

            foreach (XmlNode node in nodeList)
            {
                if (node.Name.ToUpper().Contains("PASSWORD"))
                {
                    node.InnerText = "XXXX";
                }
                else if (node.Attributes.Count > 0)
                {
                    foreach (XmlAttribute a in node.Attributes)
                    {
                        if (a.LocalName.ToUpper().Contains("PASSWORD"))
                        {
                            a.InnerText = "XXXXX";
                        }
                    }
                }    
            }

回答by pelle

The main reason that XSLT exist is to be able to transform XML-structures, this means that an XSLT is a type of stylesheet that can be used to alter the order of elements och change content of elements. Therefore this is a typical situation where it′s highly recommended to use XSLT instead of parsing as Andrew Hare said in a previous post.

XSLT 存在的主要原因是能够转换 XML 结构,这意味着 XSLT 是一种样式表,可用于更改元素的顺序或更改元素的内容。因此,这是一种典型的情况,强烈建议使用 XSLT,而不是像 Andrew Hare 在上一篇文章中所说的那样进行解析。