bash xmllint 问题到输出行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14726951/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
xmllint problems to output lines
提问by Metal3d
I know that my question includes 2 questions...
我知道我的问题包括两个问题...
At first, I want to use xmllint to output "loc" content tags. The sitemap I load has got a xmlns="...".
首先,我想使用 xmllint 输出“loc”内容标签。我加载的站点地图有一个 xmlns="..."。
On xmllint shell, I need to do this:
在 xmllint shell 上,我需要这样做:
setrootns
xpath //defaultns:loc
That works... no problem. But I need to do this in a bash script.
那行得通……没问题。但是我需要在 bash 脚本中执行此操作。
(AFAIK) xmllint hasn't got option to tell "let's go, setrootns" so I cannot do this:
(AFAIK) xmllint 没有选择告诉“我们走吧,setrootns”所以我不能这样做:
xmllint --xpath "//loc" sitemaps.xml
# or
xmllint --xpath "//defaultns:loc" sitemaps.xml
This is the first question, how can I tell to xmllint to load the default ns ?
这是第一个问题,我如何告诉 xmllint 加载默认的 ns ?
If I can't, let's take a look on my second solution:
如果我不能,让我们看看我的第二个解决方案:
I can remove xmlns attribute and then, there os no ns to use:
我可以删除 xmlns 属性,然后,没有 ns 可以使用:
xmllint --xpath "//loc" <(sed -r 's/xmlns=".*?"//' sitemaps.xml)
But... now... the whole response of my 500 "loc" content is concatenated in one line !...
但是......现在......我的500个“loc”内容的整个响应被连接在一行中!......
I tried this too:
我也试过这个:
xmllint --shell sitemaps.xml <<EOF
setrootns
xpath //defaultns:loc/text()
EOF
Or again
或者再次
xmllint --shell sitemaps.xml <<EOF
setrootns
cat //defaultns:loc
EOF
The first gives me (for example)
第一个给我(例如)
465 TEXT
content=http://...
with truncated url
带有截断的网址
The second gives me "------" every 2 lines... and a "/>" at last line...
第二个给我“------”每两行......最后一行给我一个“/>”......
And I begin to be very nervous... :)
我开始很紧张... :)
A big thanks if you find any solution.
如果您找到任何解决方案,非常感谢。
The goal is to have every location, one per line.
目标是拥有每个位置,每行一个。
回答by BrnVrn
I used to do something similar:
我曾经做过类似的事情:
clean_xml_message=$(echo "$xml_message" | sed 's/xmlns/ignore/')
Eventually you could try to put back the new lines:
最终,您可以尝试放回新行:
sed 's/></>\n</g'
I guess you only want the URL without the <loc></loc>?
Then I would select all the loc elements with xmllint:
我猜你只想要没有<loc></loc>? 然后我将使用 xmllint 选择所有 loc 元素:
<loc>...</loc><loc>...</loc><loc>...</loc>
Then add the new lines: sed 's/<loc>/<loc>\n/g' | sed 's#</loc>#\n</loc>#g'
然后添加新行: sed 's/<loc>/<loc>\n/g' | sed 's#</loc>#\n</loc>#g'
<loc>
...
</loc><loc>
...
</loc><loc>
...
</loc>
Finally remove the tags grep -v "<loc>" |grep -v "</loc>"or a single grep -v "$<"could do it. (-v is the invert selection: http://unixhelp.ed.ac.uk/CGI/man-cgi?grep)
最后删除标签grep -v "<loc>" |grep -v "</loc>"或单个grep -v "$<"可以做到。(-v 是反转选择:http: //unixhelp.ed.ac.uk/CGI/man-cgi?grep)
回答by Metal3d
@BrnVrn is right, I only had to append "\n" after tags
@BrnVrn 是对的,我只需要在标签后附加“\n”
Then I found my answer about namespaces, I can use local-name to not check default namespace
然后我找到了关于命名空间的答案,我可以使用 local-name 来不检查默认命名空间
So, I did this:
所以,我这样做了:
xmllint --xpath "//*[local-name()='loc']/text()" <(sed 's/<loc>/<loc>\n/g' sitemaps.xml)
And it works !
它有效!
Thanks to all
谢谢大家
回答by Cyker
For the newline problem you can have a look at this repo:
对于换行问题,您可以查看此 repo:
https://gitlab.gnome.org/cykerway/libxml2
https://gitlab.gnome.org/cykerway/libxml2
And its merge request:
及其合并请求:
https://gitlab.gnome.org/GNOME/libxml2/merge_requests/8
https://gitlab.gnome.org/GNOME/libxml2/merge_requests/8
Basically it gives you a choice of separator in a xpath nodeset result. So with this example.xml:
基本上,它为您提供了 xpath 节点集结果中的分隔符选择。所以有了这个example.xml:
<?xml version="1.0" encoding="UTF-8"?>
<menu>
<food>
<name>Hot Chocolate</name>
<price>.99</price>
</food>
<food>
<name>Iced Tea</name>
<price>.99</price>
</food>
</menu>
You can do:
你可以做:
# xmllint --xpath "//name/text()" --xpath-separator "\n" example.xml
Output:
输出:
Hot Chocolate
Iced Tea

