C# 如何从 XmlNode 实例获取 xpath

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/241238/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 19:27:31  来源:igfitidea点击:

How to get xpath from an XmlNode instance

c#.netxml.net-2.0

提问by joe

Could someone supply some code that would get the xpath of a System.Xml.XmlNode instance?

有人可以提供一些代码来获取 System.Xml.XmlNode 实例的 xpath 吗?

Thanks!

谢谢!

采纳答案by Jon Skeet

Okay, I couldn't resist having a go at it. It'll only work for attributes and elements, but hey... what can you expect in 15 minutes :) Likewise there may very well be a cleaner way of doing it.

好吧,我忍不住想试一试。它只适用于属性和元素,但是嘿......你能在 15 分钟内期待什么:) 同样,很可能有一种更简洁的方式来做到这一点。

It is superfluous to include the index on every element (particularly the root one!) but it's easier than trying to work out whether there's any ambiguity otherwise.

在每个元素(尤其是根元素!)上都包含索引是多余的,但它比试图找出是否存在任何歧义要容易。

using System;
using System.Text;
using System.Xml;

class Test
{
    static void Main()
    {
        string xml = @"
<root>
  <foo />
  <foo>
     <bar attr='value'/>
     <bar other='va' />
  </foo>
  <foo><bar /></foo>
</root>";
        XmlDocument doc = new XmlDocument();
        doc.LoadXml(xml);
        XmlNode node = doc.SelectSingleNode("//@attr");
        Console.WriteLine(FindXPath(node));
        Console.WriteLine(doc.SelectSingleNode(FindXPath(node)) == node);
    }

    static string FindXPath(XmlNode node)
    {
        StringBuilder builder = new StringBuilder();
        while (node != null)
        {
            switch (node.NodeType)
            {
                case XmlNodeType.Attribute:
                    builder.Insert(0, "/@" + node.Name);
                    node = ((XmlAttribute) node).OwnerElement;
                    break;
                case XmlNodeType.Element:
                    int index = FindElementIndex((XmlElement) node);
                    builder.Insert(0, "/" + node.Name + "[" + index + "]");
                    node = node.ParentNode;
                    break;
                case XmlNodeType.Document:
                    return builder.ToString();
                default:
                    throw new ArgumentException("Only elements and attributes are supported");
            }
        }
        throw new ArgumentException("Node was not in a document");
    }

    static int FindElementIndex(XmlElement element)
    {
        XmlNode parentNode = element.ParentNode;
        if (parentNode is XmlDocument)
        {
            return 1;
        }
        XmlElement parent = (XmlElement) parentNode;
        int index = 1;
        foreach (XmlNode candidate in parent.ChildNodes)
        {
            if (candidate is XmlElement && candidate.Name == element.Name)
            {
                if (candidate == element)
                {
                    return index;
                }
                index++;
            }
        }
        throw new ArgumentException("Couldn't find element within parent");
    }
}

回答by Jon Skeet

There's no such thing as "the" xpath of a node. For any given node there may well be many xpath expressions which will match it.

没有节点的“xpath”这样的东西。对于任何给定的节点,很可能有许多 xpath 表达式与其匹配。

You can probably work up the tree to build up anexpression which will match it, taking into account the index of particular elements etc, but it's not going to be terribly nice code.

考虑到特定元素的索引等,您可能可以对树进行操作以构建与其匹配表达式,但这不会是非常好的代码。

Why do you need this? There may be a better solution.

你为什么需要这个?可能有更好的解决方案。

回答by Robert Rossney

Jon's correct that there are any number of XPath expressions that will yield the same node in an an instance document. The simplest way to build an expression that unambiguously yields a specific node is a chain of node tests that use the node position in the predicate, e.g.:

Jon 是正确的,即有任意数量的 XPath 表达式将在实例文档中产生相同的节点。构建明确产生特定节点的表达式的最简单方法是使用谓词中的节点位置的节点测试链,例如:

/node()[0]/node()[2]/node()[6]/node()[1]/node()[2]

Obviously, this expression isn't using element names, but then if all you're trying to do is locate a node within a document, you don't need its name. It also can't be used to find attributes (because attributes aren't nodes and don't have position; you can only find them by name), but it will find all other node types.

显然,这个表达式没有使用元素名称,但是如果您要做的只是在文档中定位一个节点,则不需要它的名称。它也不能用于查找属性(因为属性不是节点并且没有位置;您只能按名称查找它们),但它会查找所有其他节点类型。

To build this expression, you need to write a method that returns a node's position in its parent's child nodes, because XmlNodedoesn't expose that as a property:

要构建此表达式,您需要编写一个方法来返回节点在其父子节点中的位置,因为XmlNode不会将其作为属性公开:

static int GetNodePosition(XmlNode child)
{
   for (int i=0; i<child.ParentNode.ChildNodes.Count; i++)
   {
       if (child.ParentNode.ChildNodes[i] == child)
       {
          // tricksy XPath, not starting its positions at 0 like a normal language
          return i + 1;
       }
   }
   throw new InvalidOperationException("Child node somehow not found in its parent's ChildNodes property.");
}

(There's probably a more elegant way to do that using LINQ, since XmlNodeListimplements IEnumerable, but I'm going with what I know here.)

(使用 LINQ 可能有一种更优雅的方式来做到这一点,因为XmlNodeListimplements IEnumerable,但我将在这里使用我所知道的。)

Then you can write a recursive method like this:

然后你可以写一个这样的递归方法:

static string GetXPathToNode(XmlNode node)
{
    if (node.NodeType == XmlNodeType.Attribute)
    {
        // attributes have an OwnerElement, not a ParentNode; also they have
        // to be matched by name, not found by position
        return String.Format(
            "{0}/@{1}",
            GetXPathToNode(((XmlAttribute)node).OwnerElement),
            node.Name
            );            
    }
    if (node.ParentNode == null)
    {
        // the only node with no parent is the root node, which has no path
        return "";
    }
    // the path to a node is the path to its parent, plus "/node()[n]", where 
    // n is its position among its siblings.
    return String.Format(
        "{0}/node()[{1}]",
        GetXPathToNode(node.ParentNode),
        GetNodePosition(node)
        );
}

As you can see, I hacked in a way for it to find attributes as well.

如您所见,我以某种方式对其进行了黑客攻击,以便它也可以找到属性。

Jon slipped in with his version while I was writing mine. There's something about his code that's going to make me rant a bit now, and I apologize in advance if it sounds like I'm ragging on Jon. (I'm not. I'm pretty sure that the list of things Jon has to learn from me is exceedingly short.) But I think the point I'm going to make is a pretty important one for anyone who works with XML to think about.

在我写我的时候,乔恩带着他的版本溜进来了。关于他的代码的某些内容现在会让我有点咆哮,如果这听起来像是我在对 Jon 喋喋不休,我提前道歉。(我不是。我很确定 Jon 必须向我学习的东西清单非常短。)但我认为我要提出的观点对于任何使用 XML 的人来说都非常重要想一想。

I suspect that Jon's solution emerged from something I see a lot of developers do: thinking of XML documents as trees of elements and attributes. I think this largely comes from developers whose primary use of XML is as a serialization format, because all the XML they're used to using is structured this way. You can spot these developers because they're using the terms "node" and "element" interchangeably. This leads them to come up with solutions that treat all other node types as special cases. (I was one of these guys myself for a very long time.)

我怀疑 Jon 的解决方案源于我看到很多开发人员所做的事情:将 XML 文档视为元素和属性的树。我认为这主要来自于主要使用 XML 作为序列化格式的开发人员,因为他们习惯使用的所有 XML 都是以这种方式构建的。您可以发现这些开发人员,因为他们交替使用术语“节点”和“元素”。这导致他们提出将所有其他节点类型视为特殊情况的解决方案。(在很长一段时间内,我自己就是这些人中的一员。)

This feels like it's a simplifying assumption while you're making it. But it's not. It makes problems harder and code more complex. It leads you to bypass the pieces of XML technology (like the node()function in XPath) that are specifically designed to treat all node types generically.

这感觉就像是你在做它时的一个简化假设。但事实并非如此。它使问题更难,代码更复杂。它引导您绕过node()专门设计用于一般性处理所有节点类型的 XML 技术(如XPath 中的函数)。

There's a red flag in Jon's code that would make me query it in a code review even if I didn't know what the requirements are, and that's GetElementsByTagName. Whenever I see that method in use, the question that leaps to mind is always "why does it have to be an element?" And the answer is very often "oh, does this code need to handle text nodes too?"

Jon 的代码中有一个危险信号,即使我不知道要求是什么,它也会让我在代码审查中查询它,那就是GetElementsByTagName. 每当我看到这种方法在使用时,脑海中总会跳出一个问题:“为什么它必须是一个元素?” 答案通常是“哦,这段代码也需要处理文本节点吗?”

回答by Robert Rossney

This is even easier

这更容易

 ''' <summary>
    ''' Gets the full XPath of a single node.
    ''' </summary>
    ''' <param name="node"></param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Private Function GetXPath(ByVal node As Xml.XmlNode) As String
        Dim temp As String
        Dim sibling As Xml.XmlNode
        Dim previousSiblings As Integer = 1

        'I dont want to know that it was a generic document
        If node.Name = "#document" Then Return ""

        'Prime it
        sibling = node.PreviousSibling
        'Perculate up getting the count of all of this node's sibling before it.
        While sibling IsNot Nothing
            'Only count if the sibling has the same name as this node
            If sibling.Name = node.Name Then
                previousSiblings += 1
            End If
            sibling = sibling.PreviousSibling
        End While

        'Mark this node's index, if it has one
        ' Also mark the index to 1 or the default if it does have a sibling just no previous.
        temp = node.Name + IIf(previousSiblings > 0 OrElse node.NextSibling IsNot Nothing, "[" + previousSiblings.ToString() + "]", "").ToString()

        If node.ParentNode IsNot Nothing Then
            Return GetXPath(node.ParentNode) + "/" + temp
        End If

        Return temp
    End Function

回答by James Randle

My 10p worth is a hybrid of Robert and Corey's answers. I can only claim credit for the actual typing of the extra lines of code.

我的 10p 价值是 Robert 和 Corey 答案的混合体。我只能声称对额外代码行的实际键入功劳。

    private static string GetXPathToNode(XmlNode node)
    {
        if (node.NodeType == XmlNodeType.Attribute)
        {
            // attributes have an OwnerElement, not a ParentNode; also they have
            // to be matched by name, not found by position
            return String.Format(
                "{0}/@{1}",
                GetXPathToNode(((XmlAttribute)node).OwnerElement),
                node.Name
                );
        }
        if (node.ParentNode == null)
        {
            // the only node with no parent is the root node, which has no path
            return "";
        }
        //get the index
        int iIndex = 1;
        XmlNode xnIndex = node;
        while (xnIndex.PreviousSibling != null) { iIndex++; xnIndex = xnIndex.PreviousSibling; }
        // the path to a node is the path to its parent, plus "/node()[n]", where 
        // n is its position among its siblings.
        return String.Format(
            "{0}/node()[{1}]",
            GetXPathToNode(node.ParentNode),
            iIndex
            );
    }

回答by René Endress

If you do this, you will get a Path with Names of der Nodes AND the Position, if you have Nodes with the same name like this: "/Service[1]/System[1]/Group[1]/Folder[2]/File[2]"

如果你这样做,你会得到一个带有节点名称和位置的路径,如果你有像这样的相同名称的节点:“/Service[1]/System[1]/Group[1]/Folder[2] ]/文件[2]"

public string GetXPathToNode(XmlNode node)
{         
    if (node.NodeType == XmlNodeType.Attribute)
    {             
        // attributes have an OwnerElement, not a ParentNode; also they have             
        // to be matched by name, not found by position             
        return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
    }
    if (node.ParentNode == null)
    {             
        // the only node with no parent is the root node, which has no path
        return "";
    }

    //get the index
    int iIndex = 1;
    XmlNode xnIndex = node;
    while (xnIndex.PreviousSibling != null && xnIndex.PreviousSibling.Name == xnIndex.Name)
    {
         iIndex++;
         xnIndex = xnIndex.PreviousSibling; 
    }

    // the path to a node is the path to its parent, plus "/node()[n]", where
    // n is its position among its siblings.         
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, iIndex);
}

回答by cjbarth

I found that none of the above worked with XDocument, so I wrote my own code to support XDocumentand used recursion. I think this code handles multiple identical nodes better than some of the other code here because it first tries to go as deep in to the XML path as it can and then backs up to build only what is needed. So if you have /home/white/boband /home/white/mikeand you want to create /home/white/bob/garagethe code will know how to create that. However, I didn't want to mess with predicates or wildcards, so I explicitly disallowed those; but it would be easy to add support for them.

我发现上述方法都不适用于XDocument,所以我编写了自己的代码来支持XDocument和使用递归。我认为这段代码比这里的其他一些代码更好地处理多个相同的节点,因为它首先尝试尽可能深入到 XML 路径,然后备份以仅构建所需的内容。因此,如果您拥有/home/white/bob并且/home/white/mike想要创建/home/white/bob/garage代码,那么您将知道如何创建它。但是,我不想弄乱谓词或通配符,所以我明确禁止它们;但是添加对它们的支持很容易。

Private Sub NodeItterate(XDoc As XElement, XPath As String)
    'get the deepest path
    Dim nodes As IEnumerable(Of XElement)

    nodes = XDoc.XPathSelectElements(XPath)

    'if it doesn't exist, try the next shallow path
    If nodes.Count = 0 Then
        NodeItterate(XDoc, XPath.Substring(0, XPath.LastIndexOf("/")))
        'by this time all the required parent elements will have been constructed
        Dim ParentPath As String = XPath.Substring(0, XPath.LastIndexOf("/"))
        Dim ParentNode As XElement = XDoc.XPathSelectElement(ParentPath)
        Dim NewElementName As String = XPath.Substring(XPath.LastIndexOf("/") + 1, XPath.Length - XPath.LastIndexOf("/") - 1)
        ParentNode.Add(New XElement(NewElementName))
    End If

    'if we find there are more than 1 elements at the deepest path we have access to, we can't proceed
    If nodes.Count > 1 Then
        Throw New ArgumentOutOfRangeException("There are too many paths that match your expression.")
    End If

    'if there is just one element, we can proceed
    If nodes.Count = 1 Then
        'just proceed
    End If

End Sub

Public Sub CreateXPath(ByVal XDoc As XElement, ByVal XPath As String)

    If XPath.Contains("//") Or XPath.Contains("*") Or XPath.Contains(".") Then
        Throw New ArgumentException("Can't create a path based on searches, wildcards, or relative paths.")
    End If

    If Regex.IsMatch(XPath, "\[\]()@='<>\|") Then
        Throw New ArgumentException("Can't create a path based on predicates.")
    End If

    'we will process this recursively.
    NodeItterate(XDoc, XPath)

End Sub

回答by rugg

Here's a simple method that I've used, worked for me.

这是我使用过的一种简单方法,对我有用。

    static string GetXpath(XmlNode node)
    {
        if (node.Name == "#document")
            return String.Empty;
        return GetXpath(node.SelectSingleNode("..")) + "/" +  (node.NodeType == XmlNodeType.Attribute ? "@":String.Empty) + node.Name;
    }

回答by Roemer

I know, old post but the version I liked the most (the one with names) was flawed: When a parent node has nodes with different names, it stopped counting the index after it found the first non-matching node-name.

我知道,旧帖子,但我最喜欢的版本(带有名称的版本)存在缺陷:当父节点具有不同名称的节点时,它在找到第一个不匹配的节点名称后停止计算索引。

Here is my fixed version of it:

这是我的固定版本:

/// <summary>
/// Gets the X-Path to a given Node
/// </summary>
/// <param name="node">The Node to get the X-Path from</param>
/// <returns>The X-Path of the Node</returns>
public string GetXPathToNode(XmlNode node)
{
    if (node.NodeType == XmlNodeType.Attribute)
    {
        // attributes have an OwnerElement, not a ParentNode; also they have             
        // to be matched by name, not found by position             
        return String.Format("{0}/@{1}", GetXPathToNode(((XmlAttribute)node).OwnerElement), node.Name);
    }
    if (node.ParentNode == null)
    {
        // the only node with no parent is the root node, which has no path
        return "";
    }

    // Get the Index
    int indexInParent = 1;
    XmlNode siblingNode = node.PreviousSibling;
    // Loop thru all Siblings
    while (siblingNode != null)
    {
        // Increase the Index if the Sibling has the same Name
        if (siblingNode.Name == node.Name)
        {
            indexInParent++;
        }
        siblingNode = siblingNode.PreviousSibling;
    }

    // the path to a node is the path to its parent, plus "/node()[n]", where n is its position among its siblings.         
    return String.Format("{0}/{1}[{2}]", GetXPathToNode(node.ParentNode), node.Name, indexInParent);
}

回答by Plasmabubble

What about using class extension ? ;) My version (building on others work) uses the syntaxe name[index]... with index omited is element has no "brothers". The loop to get the element index is outside in an independant routine (also a class extension).

使用类扩展怎么样?;) 我的版本(建立在其他人的工作上)使用语法 name[index]... 省略索引是元素没有“兄弟”。获取元素索引的循环位于独立例程(也是类扩展)之外。

Just past the following in any utility class (or in the main Program class)

只需在任何实用程序类(或主 Program 类)中通过以下内容

static public int GetRank( this XmlNode node )
{
    // return 0 if unique, else return position 1...n in siblings with same name
    try
    {
        if( node is XmlElement ) 
        {
            int rank = 1;
            bool alone = true, found = false;

            foreach( XmlNode n in node.ParentNode.ChildNodes )
                if( n.Name == node.Name ) // sibling with same name
                {
                    if( n.Equals(node) )
                    {
                        if( ! alone ) return rank; // no need to continue
                        found = true;
                    }
                    else
                    {
                        if( found ) return rank; // no need to continue
                        alone = false;
                        rank++;
                    }
                }

        }
    }
    catch{}
    return 0;
}

static public string GetXPath( this XmlNode node )
{
    try
    {
        if( node is XmlAttribute )
            return String.Format( "{0}/@{1}", (node as XmlAttribute).OwnerElement.GetXPath(), node.Name );

        if( node is XmlText || node is XmlCDataSection )
            return node.ParentNode.GetXPath();

        if( node.ParentNode == null )   // the only node with no parent is the root node, which has no path
            return "";

        int rank = node.GetRank();
        if( rank == 0 ) return String.Format( "{0}/{1}",        node.ParentNode.GetXPath(), node.Name );
        else            return String.Format( "{0}/{1}[{2}]",   node.ParentNode.GetXPath(), node.Name, rank );
    }
    catch{}
    return "";
}