使用 unix 终端解析 XML
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parsing XML using unix terminal
提问by Mattias
Sometimes I need to quickly extract some arbitrary data from XML files to put into a CSV format. What's your best practices for doing this in the Unix terminal? I would love some code examples, so for instance how can I get the following problem solved?
有时我需要从 XML 文件中快速提取一些任意数据以放入 CSV 格式。在 Unix 终端中执行此操作的最佳实践是什么?我会喜欢一些代码示例,例如我怎样才能解决以下问题?
Example XML input:
XML 输入示例:
<root>
<myel name="Foo" />
<myel name="Bar" />
</root>
My desired CSV output:
我想要的 CSV 输出:
Foo,
Bar,
采纳答案by Peter Hilton
If you just want the name attributes of any element, here is a quick but incomplete solution.
如果您只想要任何元素的名称属性,这里有一个快速但不完整的解决方案。
(Your example text is in the file example)
(您的示例文本在文件example 中)
grep "name" example | cut -d"\"" -f2,2 | xargs -I{} echo "{},"
grep "名称" 示例 | cut -d"\"" -f2,2 | xargs -I{} echo "{},"
回答by jelovirt
Peter's answeris correct, but it outputs a trailing line feed.
彼得的答案是正确的,但它输出了一个尾随换行符。
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="root">
<xsl:for-each select="myel">
<xsl:value-of select="@name"/>
<xsl:text>,</xsl:text>
<xsl:if test="not(position() = last())">
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Just run e.g.
只需运行例如
xsltproc stylesheet.xsl source.xml
to generate the CSV results into standard output.
将 CSV 结果生成到标准输出中。
回答by Peter Hilton
Use a command-line XSLT processor such as xsltproc, saxonor xalanto parse the XML and generate CSV. Here's an example, which for your case is the stylesheet:
使用命令行 XSLT 处理器(例如xsltproc、saxon或xalan)来解析 XML 并生成 CSV。这是一个示例,对于您的情况,它是样式表:
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="root">
<xsl:apply-templates select="myel"/>
</xsl:template>
<xsl:template match="myel">
<xsl:for-each select="@*">
<xsl:value-of select="."/>
<xsl:value-of select="','"/>
</xsl:for-each>
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
回答by DaveP
XMLStarlet is a command line toolkit to query/edit/check/transform XML documents (for more information see http://xmlstar.sourceforge.net/)
XMLStarlet 是一个用于查询/编辑/检查/转换 XML 文档的命令行工具包(有关更多信息,请参阅http://xmlstar.sourceforge.net/)
No files to write, just pipe your file to xmlstarlet and apply an xpath filter.
无需写入文件,只需将文件通过管道传输到 xmlstarlet 并应用 xpath 过滤器。
cat file.xml | xml sel -t -m 'xpathExpression' -v 'elemName' 'literal' -v 'elname' -n
-m expression -v value '' included literal -n newline
-m 表达式 -v 值 '' 包含文字 -n 换行符
So for your xpath the xpath expression would be //myel/@name which would provide the two attribute values.
因此,对于您的 xpath,xpath 表达式将是 //myel/@name,它将提供两个属性值。
Very handy tool.
非常方便的工具。
回答by Uday Thombre
Answering the original question, assuming xml file is "test.xml" that contains:
回答原始问题,假设 xml 文件是“test.xml”,其中包含:
<root>
<myel name="Foo" />
<myel name="Bar" />
</root>
<root>
<myel name="Foo" />
<myel name="Bar" />
</root>
cat text.xml | tr -s "\"" " " | awk '{printf "%s,\n", }'
回答by AndrewR
Here's a little ruby script that does exactlywhat your question asks (pull an attribute called 'name' out of elements called 'myel'). Should be easy to generalize
这是一个小红宝石脚本,它完全满足您的问题(从名为“myel”的元素中提取名为“name”的属性)。应该很容易概括
#!/usr/bin/ruby -w
require 'rexml/document'
xml = REXML::Document.new(File.open(ARGV[0].to_s))
xml.elements.each("//myel") { |el| puts "#{el.attributes['name']}," if el.attributes['name'] }
回答by AndrewR
your test file is in test.xml.
您的测试文件在 test.xml 中。
sed -n 's/^\s`*`<myel\s`*`name="\([^"]`*`\)".`*`$/,/p' test.xml
It has it's pitfalls, for example if it is not strictly given that each myelis on one line you have to "normalize" the xml file first (so each myelis on one separate line)
它有它的陷阱,例如,如果没有严格规定每个myel都在一行上,您必须首先“规范化”xml 文件(因此每个myel都在一个单独的行上)

