xml 如何从 shell 执行 XPath one-liners?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15461737/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to execute XPath one-liners from shell?
提问by clacke
Is there a package out there, for Ubuntu and/or CentOS, that has a command-line tool that can execute an XPath one-liner like foo //element@attribute filename.xmlor foo //element@attribute < filename.xmland return the results line by line?
是否有针对 Ubuntu 和/或 CentOS 的软件包,它有一个命令行工具,可以像foo //element@attribute filename.xml或那样执行单行 XPathfoo //element@attribute < filename.xml并逐行返回结果?
I'm looking for something that would allow me to just apt-get install fooor yum install fooand then just works out-of-the-box, no wrappers or other adaptation necessary.
我正在寻找可以让我只是apt-get install foo或yum install foo然后开箱即用的东西,不需要包装器或其他适应。
Here are some examples of things that come close:
以下是一些接近的例子:
Nokogiri. If I write this wrapper I could call the wrapper in the way described above:
诺克切里。如果我编写这个包装器,我可以按照上述方式调用包装器:
#!/usr/bin/ruby
require 'nokogiri'
Nokogiri::XML(STDIN).xpath(ARGV[0]).each do |row|
puts row
end
XML::XPath. Would work with this wrapper:
XML::XPath。将与此包装器一起使用:
#!/usr/bin/perl
use strict;
use warnings;
use XML::XPath;
my $root = XML::XPath->new(ioref => 'STDIN');
for my $node ($root->find($ARGV[0])->get_nodelist) {
print($node->getData, "\n");
}
xpathfrom XML::XPath returns too much noise, -- NODE --and attribute = "value".
xpath从 XML::XPath 返回太多噪音,-- NODE --并且attribute = "value".
xml_grepfrom XML::Twig cannot handle expressions that do not return elements, so cannot be used to extract attribute values without further processing.
xml_grepfrom XML::Twig 无法处理不返回元素的表达式,因此无法在不进一步处理的情况下用于提取属性值。
EDIT:
编辑:
echo cat //element/@attribute | xmllint --shell filename.xmlreturns noise similar to xpath.
echo cat //element/@attribute | xmllint --shell filename.xml返回类似于 的噪声xpath。
xmllint --xpath //element/@attribute filename.xmlreturns attribute = "value".
xmllint --xpath //element/@attribute filename.xml返回attribute = "value"。
xmllint --xpath 'string(//element/@attribute)' filename.xmlreturns what I want, but only for the first match.
xmllint --xpath 'string(//element/@attribute)' filename.xml返回我想要的,但仅适用于第一场比赛。
For another solution almost satisfying the question, here is an XSLT that can be used to evaluate arbitrary XPath expressions (requires dyn:evaluate support in the XSLT processor):
对于另一个几乎满足该问题的解决方案,这里有一个 XSLT,可用于计算任意 XPath 表达式(需要 XSLT 处理器中的 dyn:evaluate 支持):
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:dyn="http://exslt.org/dynamic" extension-element-prefixes="dyn">
<xsl:output omit-xml-declaration="yes" indent="no" method="text"/>
<xsl:template match="/">
<xsl:for-each select="dyn:evaluate($pattern)">
<xsl:value-of select="dyn:evaluate($value)"/>
<xsl:value-of select="' '"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Run with xsltproc --stringparam pattern //element/@attribute --stringparam value . arbitrary-xpath.xslt filename.xml.
运行xsltproc --stringparam pattern //element/@attribute --stringparam value . arbitrary-xpath.xslt filename.xml。
回答by Gilles Quenot
You should try these tools :
你应该试试这些工具:
xmlstarlet: can edit, select, transform... Not installed by default, xpath1xmllint: often installed by default withlibxml2, xpath1 (check my wrapperto have newlines delimited outputxpath: installed via perl's moduleXML::XPath, xpath1xml_grep: installed via perl's moduleXML::Twig, xpath1 (limited xpath usage)xidel: xpath3saxon-lint: my own project, wrapper over @Michael Kay's Saxon-HE Java library, xpath3
xmlstarlet: 可以编辑、选择、转换...默认没有安装,xpath1xmllint: 默认情况下通常使用libxml2, xpath1安装(检查我的包装器以换行分隔输出xpath: 通过 perl 的模块安装XML::XPath,xpath1xml_grep:通过 perl 的模块安装XML::Twig,xpath1(有限的 xpath 使用)xidel:xpath3saxon-lint:我自己的项目,封装了@Michael Kay 的 Saxon-HE Java 库 xpath3
xmllintcomes with libxml2-utils(can be used as interactive shell with the --shellswitch)
xmllint自带libxml2-utils(可作为交互式外壳与--shell开关一起使用)
xmlstarletis xmlstarlet.
xmlstarlet是xmlstarlet。
xpathcomes with perl's module XML::Xpath
xpath带有 perl 的模块 XML::Xpath
xml_grepcomes with perl's module XML::Twig
xml_grep带有 perl 的模块 XML::Twig
xidelis xidel
xidel是 xidel
saxon-lintusing SaxonHE 9.6,XPath 3.x(+retro compatibility)
saxon-lint使用SaxonHE 9.6,XPath 3.x(+复古兼容性)
Ex :
前任 :
xmllint --xpath '//element/@attribute' file.xml
xmlstarlet sel -t -v "//element/@attribute" file.xml
xpath -q -e '//element/@attribute' file.xml
xidel -se '//element/@attribute' file.xml
saxon-lint --xpath '//element/@attribute' file.xml
.
.
回答by BeniBela
You can also try my Xidel. It is not in a package in the repository, but you can just download it from the webpage (it has no dependencies).
你也可以试试我的Xidel。它不在存储库中的包中,但您可以从网页下载它(它没有依赖项)。
It has simple syntax for this task:
它具有用于此任务的简单语法:
xidel filename.xml -e '//element/@attribute'
And it is one of the rare of these tools that supports XPath 2.
它是支持 XPath 2 的这些工具中很少见的一种。
回答by clacke
One package that is very likely to be installed on a system already is python-lxml. If so, this is possible without installing any extra package:
一个很可能已经安装在系统上的软件包是python-lxml. 如果是这样,则无需安装任何额外的软件包即可实现:
python -c "from lxml.etree import parse; from sys import stdin; print '\n'.join(parse(stdin).xpath('//element/@attribute'))"
回答by Mike
In my search to query maven pom.xml files I ran accross this question. However I had the following limitations:
在我搜索查询 maven pom.xml 文件时,我遇到了这个问题。但是我有以下限制:
- must run cross-platform.
- must exist on all major linux distributions without any additional module installation
- must handle complex xml-files such as maven pom.xml files
- simple syntax
- 必须跨平台运行。
- 必须存在于所有主要的 Linux 发行版上,无需安装任何额外的模块
- 必须处理复杂的 xml 文件,例如 maven pom.xml 文件
- 简单的语法
I have tried many of the above without success:
我已经尝试了上面的很多方法都没有成功:
- python lxml.etree is not part of the standard python distribution
- xml.etree is but does not handle complex maven pom.xml files well, have not digged deep enough
- python xml.etree does not handle maven pom.xml files for unknown reason
- xmllint does not work either, core dumps often on ubuntu 12.04 "xmllint: using libxml version 20708"
- python lxml.etree 不是标准 python 发行版的一部分
- xml.etree 是但不能很好地处理复杂的 maven pom.xml 文件,还没有深入挖掘
- python xml.etree 由于未知原因不处理 maven pom.xml 文件
- xmllint 也不起作用,核心转储经常在 ubuntu 12.04“xmllint:使用 libxml 版本 20708”
The solution that I have come across that is stable, short and work on many platforms and that is mature is the rexml lib builtin in ruby:
我遇到的稳定、简短且可在许多平台上工作且成熟的解决方案是在 ruby 中内置的 rexml lib:
ruby -r rexml/document -e 'include REXML;
puts XPath.first(Document.new($stdin), "/project/version/text()")' < pom.xml
What inspired me to find this one was the following articles:
启发我找到这篇文章的是以下文章:
回答by Michael Kay
Saxon will do this not only for XPath 2.0, but also for XQuery 1.0 and (in the commercial version) 3.0. It doesn't come as a Linux package, but as a jar file. Syntax (which you can easily wrap in a simple script) is
Saxon 不仅会为 XPath 2.0 执行此操作,还会为 XQuery 1.0 和(在商业版本中)3.0 执行此操作。它不是作为 Linux 包,而是作为 jar 文件。语法(你可以很容易地包装在一个简单的脚本中)是
java net.sf.saxon.Query -s:source.xml -qs://element/attribute
回答by choroba
回答by sideshowbarker
clacke's answeris great but I think only works if your source is well-formed XML, not normal HTML.
clacke 的回答很好,但我认为只有当您的源代码是格式良好的 XML 而不是普通的 HTML 时才有效。
So to do the same for normal Web content—HTML docs that aren't necessarily well-formed XML:
因此,对普通 Web 内容(不一定是格式良好的 XML 的 HTML 文档)执行相同操作:
echo "<p>foo<div>bar</div><p>baz" | python -c "from sys import stdin; \
from lxml import html; \
print '\n'.join(html.tostring(node) for node in html.parse(stdin).xpath('//p'))"
And to instead use html5lib (to ensure you get the same parsing behavior as Web browsers—because like browser parsers, html5lib conforms to the parsing requirements in the HTML spec).
并改为使用 html5lib(以确保您获得与 Web 浏览器相同的解析行为——因为与浏览器解析器一样,html5lib 符合 HTML 规范中的解析要求)。
echo "<p>foo<div>bar</div><p>baz" | python -c "from sys import stdin; \
import html5lib; from lxml import html; \
doc = html5lib.parse(stdin, treebuilder='lxml', namespaceHTMLElements=False); \
print '\n'.join(html.tostring(node) for node in doc.xpath('//p'))
回答by pdr
Similar to Mike's and clacke's answers, here is the python one-liner (using python >= 2.5) to get the build version from a pom.xml file that gets around the fact that pom.xml files don't normally have a dtd or default namespace, so don't appear well-formed to libxml:
与 Mike 和 clacke 的答案类似,这里是 python one-liner(使用 python >= 2.5)从 pom.xml 文件获取构建版本,该文件解决了 pom.xml 文件通常没有 dtd 或默认命名空间,因此对 libxml 来说格式不正确:
python -c "import xml.etree.ElementTree as ET; \
print(ET.parse(open('pom.xml')).getroot().find('\
{http://maven.apache.org/POM/4.0.0}version').text)"
Tested on Mac and Linux, and doesn't require any extra packages to be installed.
在 Mac 和 Linux 上测试过,不需要安装任何额外的软件包。
回答by Geoff Nixon
It bears mentioning that nokogiri itself ships with a command line tool, which should be installed with gem install nokogiri.
值得一提的是,nokogiri 本身附带了一个命令行工具,应该使用gem install nokogiri.
You might find this blog post useful.
您可能会发现这篇博文很有用。
回答by G. Cito
In addition to XML::XSHand XML::XSH2there are some grep-like utilities suck as App::xml_grep2and XML::Twig(which includes xml_greprather than xml_grep2). These can be quite useful when working on a large or numerous XML files for quick oneliners or Makefiletargets. XML::Twigis especially nice to work with for a perlscripting approach when you want to a a bit more processing than your $SHELLand xmllintxstlprocoffer.
除了XML::XSH和XML::XSH2 之外,还有一些类似grep的实用程序就像App::xml_grep2and XML::Twig(包括xml_grep而不是xml_grep2)。这些在处理大型或大量 XML 文件以用于快速单行或Makefile目标时非常有用。 当您想要比您的和提供更多的处理时XML::Twig,使用perl脚本方法特别好。$SHELLxmllintxstlproc
The numbering scheme in the application names indicates that the "2" versions are newer/later version of essentially the same tool which may require later versions of other modules (or of perlitself).
应用程序名称中的编号方案表明“2”版本是本质上相同工具的较新/较新版本,可能需要其他模块(或其perl本身)的较新版本。

