Linux 如何使用shellscript解析XML?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4680143/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 00:08:16  来源:igfitidea点击:

How to parse XML using shellscript?

linuxbashshell

提问by Spredzy

I would like to know what would be the best way to parse an XML file using shellscript ?

我想知道使用 shellscript 解析 XML 文件的最佳方法是什么?

  • Should one do it by hand ?
  • Does third tiers library exist ?
  • 应该用手做吗?
  • 是否存在第三层库?

If you already made it if you could let me know how did you manage to do it

如果你已经做到了,如果你能告诉我你是怎么做到的

采纳答案by Joel

You could try xmllint

你可以试试xmllint

The xmllint program parses one or more XML files, specified on the command line as xmlfile. It prints various types of output, depending upon the options selected. It is useful for detecting errors both in XML code and in the XML parser itse

xmllint 程序解析一个或多个 XML 文件,在命令行上指定为 xmlfile。它根据选择的选项打印各种类型的输出。它对于检测 XML 代码和 XML 解析器本身中的错误很有用

It allows you select elements in the XML doc by xpath, using the --pattern option.

它允许您使用 --pattern 选项通过 xpath 选择 XML 文档中的元素。

On Mac OS X (Yosemite), it is installed by default.
On Ubuntu, if it is not already installed, you can run apt-get install libxml2-utils

在 Mac OS X (Yosemite) 上,它是默认安装的。
在 Ubuntu 上,如果尚未安装,则可以运行apt-get install libxml2-utils

回答by Keith

Try sgrep. It's not clear exactly what you are trying to do, but I surely would not attempt writing an XML parser in bash.

尝试sgrep。目前尚不清楚您要做什么,但我肯定不会尝试在 bash 中编写 XML 解析器。

回答by tim

There's also xmlstarlet (which is available for Windows as well).

还有 xmlstarlet(也可用于 Windows)。

http://xmlstar.sourceforge.net/doc/xmlstarlet.txt

http://xmlstar.sourceforge.net/doc/xmlstarlet.txt

回答by David W.

This really is beyond the capabilities of shell script. Shell script and the standard Unix tools are okay at parsing line oriented files, but things change when you talk about XML. Even simple tags can present a problem:

这确实超出了 shell 脚本的能力。Shell 脚本和标准 Unix 工具可以很好地解析面向行的文件,但是当您谈论 XML 时,情况就会发生变化。即使是简单的标签也会出现问题:

<MYTAG>Data</MYTAG>

<MYTAG>
     Data
</MYTAG>

<MYTAG param="value">Data</MYTAG>

<MYTAG><ANOTHER_TAG>Data
</ANOTHER_TAG><MYTAG>

Imagine trying to write a shell script that can read the data enclosed in . The three very, very simply XML examples all show different ways this can be an issue. The first two examples are the exact same syntax in XML. The third simply has an attribute attached to it. The fourth contains the data in another tag. Simple sed, awk, and grepcommands cannot catch all possibilities.

想象一下,尝试编写一个 shell 脚本,它可以读取包含在 . 这三个非常非常简单的 XML 示例都显示了这可能是一个问题的不同方式。前两个示例与 XML 中的语法完全相同。第三个只是附加了一个属性。第四个包含另一个标签中的数据。简单的sedawkgrep命令无法捕获所有可能性。

You need to use a full blown scripting language like Perl, Python, or Ruby. Each of these have modules that can parse XML data and make the underlying structure easier to access. I've use XML::Simplein Perl. It took me a few tries to understand it, but it did what I needed, and made my programming much easier.

您需要使用完整的脚本语言,如 Perl、Python 或 Ruby。其中每一个都有可以解析 XML 数据并使底层结构更容易访问的模块。我在 Perl 中使用了XML::Simple。我花了几次尝试才理解它,但它确实满足了我的需求,并使我的编程变得更加容易。

回答by frankc

Do you have xml_grep installed? It's a perl based utility standard on some distributions (it came pre-installed on my CentOS system). Rather than giving it a regular expression, you give it an xpath expression.

你安装了 xml_grep 吗?它是某些发行版上基于 perl 的实用程序标准(它预装在我的 CentOS 系统上)。与其给它一个正则表达式,不如给它一个 xpath 表达式。

回答by user321

A rather new project is the xml-coreutils package featuring xml-cat, xml-cp, xml-cut, xml-grep, ...

一个相当新的项目是 xml-coreutils 包,其中包含 xml-cat、xml-cp、xml-cut、xml-grep 等。

http://xml-coreutils.sourceforge.net/contents.html

http://xml-coreutils.sourceforge.net/contents.html

回答by freethinker

Here's a function which will convert XML name-value pairs and attributes into bash variables.

这是一个将 XML 名称-值对和属性转换为 bash 变量的函数。

http://www.humbug.in/2010/parse-simple-xml-files-using-bash-extract-name-value-pairs-and-attributes/

http://www.humbug.in/2010/parse-simple-xml-files-using-bash-extract-name-value-pairs-and-attributes/

回答by Mark Rose

Try using xpath. You can use it to parse elements out of an xml tree.

尝试使用 xpath。您可以使用它来解析 xml 树中的元素。

http://www.ibm.com/developerworks/xml/library/x-tipclp/index.html

http://www.ibm.com/developerworks/xml/library/x-tipclp/index.html

回答by Ed K

Here's a solution using xml_grep (because xpath wasn't part of our distributable and I didn't want to add it to all production machines)...

这是使用 xml_grep 的解决方案(因为 xpath 不是我们可分发的一部分,我不想将它添加到所有生产机器)...

If you are looking for a specific setting in an XML file, and if all elements at a given tree level are unique, and there are no attributes, then you can use this handy function:

如果您要在 XML 文件中查找特定设置,并且给定树级别的所有元素都是唯一的,并且没有属性,那么您可以使用这个方便的功能:

# File to be parsed
xmlFile="xxxxxxx"

# use xml_grep to find settings in an XML file
# Input (): path to setting
function getXmlSetting() {

    # Filter out the element name for parsing
    local element=`echo  | sed 's/^.*\///'`

    # Verify the element is not empty
    local check=${element:?getXmlSetting invalid input: }

    # Parse out the CDATA from the XML element
    # 1) Find the element (xml_grep)
    # 2) Remove newlines (tr -d \n)
    # 3) Extract CDATA by looking for *element> CDATA <element*
    # 4) Remove leading and trailing spaces
    local getXmlSettingResult=`xml_grep --cond  $xmlFile 2>/dev/null | tr -d '\n' | sed -n -e "s/.*$element>[[:space:]]*\([^[:space:]].*[^[:space:]]\)[[:space:]]*<\/$element.*//p"`

    # Return the result
    echo $getXmlSettingResult
}

#EXAMPLE
logPath=`getXmlSetting //config/logs/path`
check=${logPath:?"XML file missing //config/logs/path"}

This will work with this structure:

这将适用于以下结构:

<config>
  <logs>
     <path>/path/to/logs</path>
  <logs>
</config>

It will also work with this (but it won't keep the newlines):

它也适用于此(但不会保留换行符):

<config>
  <logs>
     <path>
          /path/to/logs
     </path>
  <logs>
</config>

If you have duplicate <config> or <logs> or <path>, then it will only return the last one. You can probably modify the function to return an array if it finds multiple matches.

如果您有重复的 <config> 或 <logs> 或 <path>,那么它只会返回最后一个。如果找到多个匹配项,您可能可以修改该函数以返回一个数组。

FYI: This code works on RedHat 6.3 with GNU BASH 4.1.2, but I don't think I'm doing anything particular to that, so should work everywhere.

仅供参考:此代码适用于带有 GNU BASH 4.1.2 的 RedHat 6.3,但我认为我没有做任何特别的事情,所以应该可以在任何地方使用。

NOTE: For anybody new to scripting, make sure you use the right types of quotes, all three are used in this code (normal single quote '=literal, backward single quote `=execute, and double quote "=group).

注意:对于脚本新手,请确保使用正确类型的引号,此代码中使用了所有三种引号(正常单引号 '=literal、后向单引号 `=execute 和双引号“=group)。

回答by user49310

I am surprised no one has mentioned xmlsh. The mission statement :

我很惊讶没有人提到xmlsh。使命宣言:

A command line shell for XML Based on the philosophy and design of the Unix Shells

xmlsh provides a familiar scripting environment, but specifically tailored for scripting xml processes.

基于 Unix Shells 的哲学和设计的 XML 命令行 shell

xmlsh 提供了一个熟悉的脚本环境,但专门为编写 xml 进程的脚本而量身定制。

A list of shell like commands are provided here.

此处提供类似 shell 的命令列表。

I use the xedcommand a lot which is equivalent to sedfor XML, and allows XPathbased search and replaces.

xed经常使用与sedfor XML等效的命令,并允许XPath基于搜索和替换。