bash xmllint 问题到输出行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14726951/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 04:29:02  来源:igfitidea点击:

xmllint problems to output lines

xmlbashxml-parsingsitemapxmllint

提问by Metal3d

I know that my question includes 2 questions...

我知道我的问题包括两个问题...

At first, I want to use xmllint to output "loc" content tags. The sitemap I load has got a xmlns="...".

首先,我想使用 xmllint 输出“loc”内容标签。我加载的站点地图有一个 xmlns="..."。

On xmllint shell, I need to do this:

在 xmllint shell 上,我需要这样做:

setrootns
xpath //defaultns:loc

That works... no problem. But I need to do this in a bash script.

那行得通……没问题。但是我需要在 bash 脚本中执行此操作。

(AFAIK) xmllint hasn't got option to tell "let's go, setrootns" so I cannot do this:

(AFAIK) xmllint 没有选择告诉“我们走吧,setrootns”所以我不能这样做:

xmllint --xpath "//loc" sitemaps.xml
# or
xmllint --xpath "//defaultns:loc" sitemaps.xml

This is the first question, how can I tell to xmllint to load the default ns ?

这是第一个问题,我如何告诉 xmllint 加载默认的 ns ?

If I can't, let's take a look on my second solution:

如果我不能,让我们看看我的第二个解决方案:

I can remove xmlns attribute and then, there os no ns to use:

我可以删除 xmlns 属性,然后,没有 ns 可以使用:

xmllint --xpath "//loc" <(sed -r 's/xmlns=".*?"//' sitemaps.xml)

But... now... the whole response of my 500 "loc" content is concatenated in one line !...

但是......现在......我的500个“loc”内容的整个响应被连接在一行中!......

I tried this too:

我也试过这个:

xmllint --shell sitemaps.xml <<EOF
setrootns
xpath //defaultns:loc/text()
EOF

Or again

或者再次

xmllint --shell sitemaps.xml <<EOF
setrootns
cat //defaultns:loc
EOF

The first gives me (for example)

第一个给我(例如)

465  TEXT
    content=http://... 

with truncated url

带有截断的网址

The second gives me "------" every 2 lines... and a "/>" at last line...

第二个给我“------”每两行......最后一行给我一个“/>”......

And I begin to be very nervous... :)

我开始很紧张... :)

A big thanks if you find any solution.

如果您找到任何解决方案,非常感谢。

The goal is to have every location, one per line.

目标是拥有每个位置,每行一个。

回答by BrnVrn

I used to do something similar:

我曾经做过类似的事情:

clean_xml_message=$(echo "$xml_message" | sed 's/xmlns/ignore/')

Eventually you could try to put back the new lines:

最终,您可以尝试放回新行:

sed 's/></>\n</g' 

I guess you only want the URL without the <loc></loc>? Then I would select all the loc elements with xmllint:

我猜你只想要没有<loc></loc>? 然后我将使用 xmllint 选择所有 loc 元素:

<loc>...</loc><loc>...</loc><loc>...</loc>

Then add the new lines: sed 's/<loc>/<loc>\n/g' | sed 's#</loc>#\n</loc>#g'

然后添加新行: sed 's/<loc>/<loc>\n/g' | sed 's#</loc>#\n</loc>#g'

<loc>
...
</loc><loc>
...
</loc><loc>
...
</loc>

Finally remove the tags grep -v "<loc>" |grep -v "</loc>"or a single grep -v "$<"could do it. (-v is the invert selection: http://unixhelp.ed.ac.uk/CGI/man-cgi?grep)

最后删除标签grep -v "<loc>" |grep -v "</loc>"或单个grep -v "$<"可以做到。(-v 是反转选择:http: //unixhelp.ed.ac.uk/CGI/man-cgi?grep

回答by Metal3d

@BrnVrn is right, I only had to append "\n" after tags

@BrnVrn 是对的,我只需要在标签后附加“\n”

Then I found my answer about namespaces, I can use local-name to not check default namespace

然后我找到了关于命名空间的答案,我可以使用 local-name 来不检查默认命名空间

So, I did this:

所以,我这样做了:

xmllint  --xpath "//*[local-name()='loc']/text()" <(sed 's/<loc>/<loc>\n/g' sitemaps.xml)

And it works !

它有效!

Thanks to all

谢谢大家

回答by Cyker

For the newline problem you can have a look at this repo:

对于换行问题,您可以查看此 repo:

https://gitlab.gnome.org/cykerway/libxml2

https://gitlab.gnome.org/cykerway/libxml2

And its merge request:

及其合并请求:

https://gitlab.gnome.org/GNOME/libxml2/merge_requests/8

https://gitlab.gnome.org/GNOME/libxml2/merge_requests/8

Basically it gives you a choice of separator in a xpath nodeset result. So with this example.xml:

基本上,它为您提供了 xpath 节点集结果中的分隔符选择。所以有了这个example.xml

<?xml version="1.0" encoding="UTF-8"?>
<menu>
    <food>
        <name>Hot Chocolate</name>
        <price>.99</price>
    </food>
    <food>
        <name>Iced Tea</name>
        <price>.99</price>
    </food>
</menu>

You can do:

你可以做:

# xmllint --xpath "//name/text()" --xpath-separator "\n" example.xml

Output:

输出:

Hot Chocolate
Iced Tea