如何在 Bash 中解析 XML?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/893585/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 12:30:54  来源:igfitidea点击:

How to parse XML in Bash?

xmlbashxhtmlshellxpath

提问by chad

Ideally, what I would like to be able to do is:

理想情况下,我希望能够做的是:

cat xhtmlfile.xhtml |
getElementViaXPath --path='/html/head/title' |
sed -e 's%(^<title>|</title>$)%%g' > titleOfXHTMLPage.txt

回答by chad

This is really just an explaination of Yuzem'sanswer, but I didn't feel like this much editing should be done to someone else, and comments don't allow formatting, so...

这实际上只是对Yuzem回答的解释,但我觉得不应该对其他人进行这么多编辑,并且评论不允许格式化,所以......

rdom () { local IFS=\> ; read -d \< E C ;}

Let's call that "read_dom" instead of "rdom", space it out a bit and use longer variables:

让我们称其为“read_dom”而不是“rdom”,将其隔开一点并使用更长的变量:

read_dom () {
    local IFS=\>
    read -d \< ENTITY CONTENT
}

Okay so it defines a function called read_dom. The first line makes IFS (the input field separator) local to this function and changes it to >. That means that when you read data instead of automatically being split on space, tab or newlines it gets split on '>'. The next line says to read input from stdin, and instead of stopping at a newline, stop when you see a '<' character (the -d for deliminator flag). What is read is then split using the IFS and assigned to the variable ENTITY and CONTENT. So take the following:

好的,它定义了一个名为 read_dom 的函数。第一行使 IFS(输入字段分隔符)成为此函数的本地并将其更改为 >。这意味着当您读取数据而不是自动在空格、制表符或换行符上拆分时,它会在 '>' 上拆分。下一行表示从 stdin 读取输入,而不是在换行符处停止,而是在看到 '<' 字符时停止(-d 表示分隔符标志)。然后使用 IFS 拆分读取的内容并分配给变量 ENTITY 和 CONTENT。所以采取以下措施:

<tag>value</tag>

The first call to read_domget an empty string (since the '<' is the first character). That gets split by IFS into just '', since there isn't a '>' character. Read then assigns an empty string to both variables. The second call gets the string 'tag>value'. That gets split then by the IFS into the two fields 'tag' and 'value'. Read then assigns the variables like: ENTITY=tagand CONTENT=value. The third call gets the string '/tag>'. That gets split by the IFS into the two fields '/tag' and ''. Read then assigns the variables like: ENTITY=/tagand CONTENT=. The fourth call will return a non-zero status because we've reached the end of file.

第一次调用read_dom获取空字符串(因为 '<' 是第一个字符)。由于没有 '>' 字符,IFS 将其拆分为 ''。Read 然后为这两个变量分配一个空字符串。第二个调用获取字符串 'tag>value'。然后由 IFS 将其拆分为两个字段“标签”和“值”。Read 然后分配变量,如:ENTITY=tagCONTENT=value。第三次调用获取字符串'/tag>'。这被 IFS 拆分为两个字段“/tag”和“”。Read 然后分配变量,如:ENTITY=/tagCONTENT=。第四次调用将返回一个非零状态,因为我们已经到达文件末尾。

Now his while loop cleaned up a bit to match the above:

现在他的 while 循环清理了一下以匹配上面的内容:

while read_dom; do
    if [[ $ENTITY = "title" ]]; then
        echo $CONTENT
        exit
    fi
done < xhtmlfile.xhtml > titleOfXHTMLPage.txt

The first line just says, "while the read_dom functionreturns a zero status, do the following." The second line checks if the entity we've just seen is "title". The next line echos the content of the tag. The four line exits. If it wasn't the title entity then the loop repeats on the sixth line. We redirect "xhtmlfile.xhtml" into standard input (for the read_domfunction) and redirect standard output to "titleOfXHTMLPage.txt" (the echo from earlier in the loop).

第一行只是说,“当 read_dom 函数返回零状态时,请执行以下操作。” 第二行检查我们刚刚看到的实体是否是“title”。下一行回显标签的内容。四行退出。如果它不是标题实体,则循环在第六行重复。我们将“xhtmlfile.xhtml”重定向到标准输入(对于read_dom函数),并将标准输出重定向到“titleOfXHTMLPage.txt”(循环早期的回声)。

Now given the following (similar to what you get from listing a bucket on S3) for input.xml:

现在给出以下内容(类似于您在 S3 上列出存储桶所获得的内容)input.xml

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
  <Name>sth-items</Name>
  <IsTruncated>false</IsTruncated>
  <Contents>
    <Key>[email protected]</Key>
    <LastModified>2011-07-25T22:23:04.000Z</LastModified>
    <ETag>&quot;0032a28286680abee71aed5d059c6a09&quot;</ETag>
    <Size>1785</Size>
    <StorageClass>STANDARD</StorageClass>
  </Contents>
</ListBucketResult>

and the following loop:

和以下循环:

while read_dom; do
    echo "$ENTITY => $CONTENT"
done < input.xml

You should get:

你应该得到:

 => 
ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/" => 
Name => sth-items
/Name => 
IsTruncated => false
/IsTruncated => 
Contents => 
Key => [email protected]
/Key => 
LastModified => 2011-07-25T22:23:04.000Z
/LastModified => 
ETag => &quot;0032a28286680abee71aed5d059c6a09&quot;
/ETag => 
Size => 1785
/Size => 
StorageClass => STANDARD
/StorageClass => 
/Contents => 

So if we wrote a whileloop like Yuzem's:

所以如果我们写一个while像 Yuzem这样的循环:

while read_dom; do
    if [[ $ENTITY = "Key" ]] ; then
        echo $CONTENT
    fi
done < input.xml

We'd get a listing of all the files in the S3 bucket.

我们会得到 S3 存储桶中所有文件的列表。

EDITIf for some reason local IFS=\>doesn't work for you and you set it globally, you should reset it at the end of the function like:

编辑如果由于某种原因local IFS=\>对您不起作用并且您全局设置它,您应该在函数结束时重置它,例如:

read_dom () {
    ORIGINAL_IFS=$IFS
    IFS=\>
    read -d \< ENTITY CONTENT
    IFS=$ORIGINAL_IFS
}

Otherwise, any line splitting you do later in the script will be messed up.

否则,您稍后在脚本中进行的任何行拆分都会搞砸。

EDIT 2To split out attribute name/value pairs you can augment the read_dom()like so:

编辑 2要拆分属性名称/值对,您可以read_dom()像这样增加:

read_dom () {
    local IFS=\>
    read -d \< ENTITY CONTENT
    local ret=$?
    TAG_NAME=${ENTITY%% *}
    ATTRIBUTES=${ENTITY#* }
    return $ret
}

Then write your function to parse and get the data you want like this:

然后编写您的函数来解析并获取您想要的数据,如下所示:

parse_dom () {
    if [[ $TAG_NAME = "foo" ]] ; then
        eval local $ATTRIBUTES
        echo "foo size is: $size"
    elif [[ $TAG_NAME = "bar" ]] ; then
        eval local $ATTRIBUTES
        echo "bar type is: $type"
    fi
}

Then while you read_domcall parse_dom:

然后当你read_dom打电话时parse_dom

while read_dom; do
    parse_dom
done

Then given the following example markup:

然后给出以下示例标记:

<example>
  <bar size="bar_size" type="metal">bars content</bar>
  <foo size="1789" type="unknown">foos content</foo>
</example>

You should get this output:

你应该得到这个输出:

$ cat example.xml | ./bash_xml.sh 
bar type is: metal
foo size is: 1789

EDIT 3another usersaid they were having problems with it in FreeBSD and suggested saving the exit status from read and returning it at the end of read_dom like:

编辑 3另一个用户说他们在 FreeBSD 中遇到了问题,并建议保存 read 退出状态并在 read_dom 结束时返回它,例如:

read_dom () {
    local IFS=\>
    read -d \< ENTITY CONTENT
    local RET=$?
    TAG_NAME=${ENTITY%% *}
    ATTRIBUTES=${ENTITY#* }
    return $RET
}

I don't see any reason why that shouldn't work

我看不出有任何理由不应该这样做

回答by Yuzem

You can do that very easily using only bash. You only have to add this function:

仅使用 bash 就可以非常轻松地做到这一点。你只需要添加这个函数:

rdom () { local IFS=\> ; read -d \< E C ;}

Now you can use rdom like read but for html documents. When called rdom will assign the element to variable E and the content to var C.

现在您可以像 read 一样使用 rdom ,但用于 html 文档。当调用 rdom 时,会将元素分配给变量 E,将内容分配给变量 C。

For example, to do what you wanted to do:

例如,做你想做的事:

while rdom; do
    if [[ $E = title ]]; then
        echo $C
        exit
    fi
done < xhtmlfile.xhtml > titleOfXHTMLPage.txt

回答by Nat

Command-line tools that can be called from shell scripts include:

可以从 shell 脚本调用的命令行工具包括:

  • 4xpath- command-line wrapper around Python's 4Suitepackage
  • XMLStarlet
  • xpath - command-line wrapper around Perl's XPath library
  • Xidel- Works with URLs as well as files. Also works with JSON
  • 4xpath- Python 的4Suite包的命令行包装器
  • XMLStarlet
  • xpath - Perl 的 XPath 库的命令行包装器
  • Xidel- 适用于 URL 和文件。也适用于 JSON

I also use xmllint and xsltproc with little XSL transform scripts to do XML processing from the command line or in shell scripts.

我还使用带有少量 XSL 转换脚本的 xmllint 和 xsltproc 从命令行或在 shell 脚本中进行 XML 处理。

回答by Grisha

You can use xpath utility. It's installed with the Perl XML-XPath package.

您可以使用 xpath 实用程序。它与 Perl XML-XPath 包一起安装。

Usage:

用法:

/usr/bin/xpath [filename] query

or XMLStarlet. To install it on opensuse use:

XMLStarlet。要在 opensuse 上安装它,请使用:

sudo zypper install xmlstarlet

or try cnf xmlon other platforms.

cnf xml在其他平台上尝试。

回答by teknopaul

This is sufficient...

这足够了...

xpath xhtmlfile.xhtml '/html/head/title/text()' > titleOfXHTMLPage.txt

回答by scavenger

starting from the chad's answer, here is the COMPLETE working solution to parse UML, with propper handling of comments, with just 2 little functions (more than 2 bu you can mix them all). I don't say chad's one didn't work at all, but it had too much issues with badly formated XML files: So you have to be a bit more tricky to handle comments and misplaced spaces/CR/TAB/etc.

从乍得的回答开始,这里是解析 UML 的完整工作解决方案,正确处理注释,只有 2 个小功能(超过 2 个,您可以将它们全部混合)。我并不是说 chad 的一个根本不起作用,但它有太多格式错误的 XML 文件的问题:所以你必须更加棘手来处理注释和错位的空格/CR/TAB/等。

The purpose of this answer is to give ready-2-use, out of the box bash functions to anyone needing parsing UML without complex tools using perl, python or anything else. As for me, I cannot install cpan, nor perl modules for the old production OS i'm working on, and python isn't available.

此答案的目的是为需要使用 perl、python 或其他任何工具解析 UML 而无需复杂工具的任何人提供即用型、开箱即用的 bash 函数。至于我,我无法为我正在使用的旧生产操作系统安装 cpan 或 perl 模块,并且 python 不可用。

First, a definition of the UML words used in this post:

首先,本文中使用的 UML 词的定义:

<!-- comment... -->
<tag attribute="value">content...</tag>

EDIT: updated functions, with handle of:

编辑:更新的功能,与句柄:

  • Websphere xml (xmi and xmlns attributes)
  • must have a compatible terminal with 256 colors
  • 24 shades of grey
  • compatibility added for IBM AIX bash 3.2.16(1)
  • Websphere xml(xmi 和 xmlns 属性)
  • 必须有一个兼容的 256 色终端
  • 24 种灰色
  • 为 IBM AIX bash 3.2.16(1) 添加了兼容性

The functions, first is the xml_read_dom which's called recursively by xml_read:

函数,首先是 xml_read_dom,它被 xml_read 递归调用:

xml_read_dom() {
# https://stackoverflow.com/questions/893585/how-to-parse-xml-in-bash
local ENTITY IFS=\>
if $ITSACOMMENT; then
  read -d \< COMMENTS
  COMMENTS="$(rtrim "${COMMENTS}")"
  return 0
else
  read -d \< ENTITY CONTENT
  CR=$?
  [ "x${ENTITY:0:1}x" == "x/x" ] && return 0
  TAG_NAME=${ENTITY%%[[:space:]]*}
  [ "x${TAG_NAME}x" == "x?xmlx" ] && TAG_NAME=xml
  TAG_NAME=${TAG_NAME%%:*}
  ATTRIBUTES=${ENTITY#*[[:space:]]}
  ATTRIBUTES="${ATTRIBUTES//xmi:/}"
  ATTRIBUTES="${ATTRIBUTES//xmlns:/}"
fi

# when comments sticks to !-- :
[ "x${TAG_NAME:0:3}x" == "x!--x" ] && COMMENTS="${TAG_NAME:3} ${ATTRIBUTES}" && ITSACOMMENT=true && return 0

# http://tldp.org/LDP/abs/html/string-manipulation.html
# INFO: oh wait it doesn't work on IBM AIX bash 3.2.16(1):
# [ "x${ATTRIBUTES:(-1):1}x" == "x/x" -o "x${ATTRIBUTES:(-1):1}x" == "x?x" ] && ATTRIBUTES="${ATTRIBUTES:0:(-1)}"
[ "x${ATTRIBUTES:${#ATTRIBUTES} -1:1}x" == "x/x" -o "x${ATTRIBUTES:${#ATTRIBUTES} -1:1}x" == "x?x" ] && ATTRIBUTES="${ATTRIBUTES:0:${#ATTRIBUTES} -1}"
return $CR
}

and the second one :

第二个:

xml_read() {
# https://stackoverflow.com/questions/893585/how-to-parse-xml-in-bash
ITSACOMMENT=false
local MULTIPLE_ATTR LIGHT FORCE_PRINT XAPPLY XCOMMAND XATTRIBUTE GETCONTENT fileXml tag attributes attribute tag2print TAGPRINTED attribute2print XAPPLIED_COLOR PROSTPROCESS USAGE
local TMP LOG LOGG
LIGHT=false
FORCE_PRINT=false
XAPPLY=false
MULTIPLE_ATTR=false
XAPPLIED_COLOR=g
TAGPRINTED=false
GETCONTENT=false
PROSTPROCESS=cat
Debug=${Debug:-false}
TMP=/tmp/xml_read.$RANDOM
USAGE="${C}${FUNCNAME}${c} [-cdlp] [-x command <-a attribute>] <file.xml> [tag | \"any\"] [attributes .. | \"content\"]
${nn[2]}  -c = NOCOLOR${END}
${nn[2]}  -d = Debug${END}
${nn[2]}  -l = LIGHT (no \"attribute=\" printed)${END}
${nn[2]}  -p = FORCE PRINT (when no attributes given)${END}
${nn[2]}  -x = apply a command on an attribute and print the result instead of the former value, in green color${END}
${nn[1]}  (no attribute given will load their values into your shell; use '-p' to print them as well)${END}"

! (($#)) && echo2 "$USAGE" && return 99
(( $# < 2 )) && ERROR nbaram 2 0 && return 99
# getopts:
while getopts :cdlpx:a: _OPT 2>/dev/null
do
{
  case ${_OPT} in
    c) PROSTPROCESS="${DECOLORIZE}" ;;
    d) local Debug=true ;;
    l) LIGHT=true; XAPPLIED_COLOR=END ;;
    p) FORCE_PRINT=true ;;
    x) XAPPLY=true; XCOMMAND="${OPTARG}" ;;
    a) XATTRIBUTE="${OPTARG}" ;;
    *) _NOARGS="${_NOARGS}${_NOARGS+, }-${OPTARG}" ;;
  esac
}
done
shift $((OPTIND - 1))
unset _OPT OPTARG OPTIND
[ "X${_NOARGS}" != "X" ] && ERROR param "${_NOARGS}" 0

fileXml=
tag=
(( $# > 2 )) && shift 2 && attributes=$*
(( $# > 1 )) && MULTIPLE_ATTR=true

[ -d "${fileXml}" -o ! -s "${fileXml}" ] && ERROR empty "${fileXml}" 0 && return 1
$XAPPLY && $MULTIPLE_ATTR && [ -z "${XATTRIBUTE}" ] && ERROR param "-x command " 0 && return 2
# nb attributes == 1 because $MULTIPLE_ATTR is false
[ "${attributes}" == "content" ] && GETCONTENT=true

while xml_read_dom; do
  # (( CR != 0 )) && break
  (( PIPESTATUS[1] != 0 )) && break

  if $ITSACOMMENT; then
    # oh wait it doesn't work on IBM AIX bash 3.2.16(1):
    # if [ "x${COMMENTS:(-2):2}x" == "x--x" ]; then COMMENTS="${COMMENTS:0:(-2)}" && ITSACOMMENT=false
    # elif [ "x${COMMENTS:(-3):3}x" == "x-->x" ]; then COMMENTS="${COMMENTS:0:(-3)}" && ITSACOMMENT=false
    if [ "x${COMMENTS:${#COMMENTS} - 2:2}x" == "x--x" ]; then COMMENTS="${COMMENTS:0:${#COMMENTS} - 2}" && ITSACOMMENT=false
    elif [ "x${COMMENTS:${#COMMENTS} - 3:3}x" == "x-->x" ]; then COMMENTS="${COMMENTS:0:${#COMMENTS} - 3}" && ITSACOMMENT=false
    fi
    $Debug && echo2 "${N}${COMMENTS}${END}"
  elif test "${TAG_NAME}"; then
    if [ "x${TAG_NAME}x" == "x${tag}x" -o "x${tag}x" == "xanyx" ]; then
      if $GETCONTENT; then
        CONTENT="$(trim "${CONTENT}")"
        test ${CONTENT} && echo "${CONTENT}"
      else
        # eval local $ATTRIBUTES => eval test "\"$${attribute}\"" will be true for matching attributes
        eval local $ATTRIBUTES
        $Debug && (echo2 "${m}${TAG_NAME}: ${M}$ATTRIBUTES${END}"; test ${CONTENT} && echo2 "${m}CONTENT=${M}$CONTENT${END}")
        if test "${attributes}"; then
          if $MULTIPLE_ATTR; then
            # we don't print "tag: attr=x ..." for a tag passed as argument: it's usefull only for "any" tags so then we print the matching tags found
            ! $LIGHT && [ "x${tag}x" == "xanyx" ] && tag2print="${g6}${TAG_NAME}: "
            for attribute in ${attributes}; do
              ! $LIGHT && attribute2print="${g10}${attribute}${g6}=${g14}"
              if eval test "\"$${attribute}\""; then
                test "${tag2print}" && ${print} "${tag2print}"
                TAGPRINTED=true; unset tag2print
                if [ "$XAPPLY" == "true" -a "${attribute}" == "${XATTRIBUTE}" ]; then
                  eval ${print} "%s%s\ " "${attribute2print}" "${${XAPPLIED_COLOR}}\"$($XCOMMAND $${attribute})\"${END}" && eval unset ${attribute}
                else
                  eval ${print} "%s%s\ " "${attribute2print}" "\"$${attribute}\"" && eval unset ${attribute}
                fi
              fi
            done
            # this trick prints a CR only if attributes have been printed durint the loop:
            $TAGPRINTED && ${print} "\n" && TAGPRINTED=false
          else
            if eval test "\"$${attributes}\""; then
              if $XAPPLY; then
                eval echo "${g}$($XCOMMAND $${attributes})" && eval unset ${attributes}
              else
                eval echo "$${attributes}" && eval unset ${attributes}
              fi
            fi
          fi
        else
          echo eval $ATTRIBUTES >>$TMP
        fi
      fi
    fi
  fi
  unset CR TAG_NAME ATTRIBUTES CONTENT COMMENTS
done < "${fileXml}" | ${PROSTPROCESS}
# http://mywiki.wooledge.org/BashFAQ/024
# INFO: I set variables in a "while loop" that's in a pipeline. Why do they disappear? workaround:
if [ -s "$TMP" ]; then
  $FORCE_PRINT && ! $LIGHT && cat $TMP
  # $FORCE_PRINT && $LIGHT && perl -pe 's/[[:space:]].*?=/ /g' $TMP
  $FORCE_PRINT && $LIGHT && sed -r 's/[^\"]*([\"][^\"]*[\"][,]?)[^\"]*/ /g' $TMP
  . $TMP
  rm -f $TMP
fi
unset ITSACOMMENT
}

and lastly, the rtrim, trim and echo2 (to stderr) functions:

最后,rtrim、trim 和 echo2(到 stderr)函数:

rtrim() {
local var=$@
var="${var%"${var##*[![:space:]]}"}"   # remove trailing whitespace characters
echo -n "$var"
}
trim() {
local var=$@
var="${var#"${var%%[![:space:]]*}"}"   # remove leading whitespace characters
var="${var%"${var##*[![:space:]]}"}"   # remove trailing whitespace characters
echo -n "$var"
}
echo2() { echo -e "$@" 1>&2; }

Colorization:

着色:

oh and you will need some neat colorizing dynamic variables to be defined at first, and exported, too:

哦,您首先需要定义一些整洁的着色动态变量,然后导出:

set -a
TERM=xterm-256color
case ${UNAME} in
AIX|SunOS)
  M=$(${print} '3[1;35m')
  m=$(${print} '3[0;35m')
  END=$(${print} '3[0m')
;;
*)
  m=$(tput setaf 5)
  M=$(tput setaf 13)
  # END=$(tput sgr0)          # issue on Linux: it can produces ^[(B instead of ^[[0m, more likely when using screenrc
  END=$(${print} '3[0m')
;;
esac
# 24 shades of grey:
for i in $(seq 0 23); do eval g$i="$(${print} \"\033\[38\;5\;$((232 + i))m\")" ; done
# another way of having an array of 5 shades of grey:
declare -a colorNums=(238 240 243 248 254)
for num in 0 1 2 3 4; do nn[$num]=$(${print} "3[38;5;${colorNums[$num]}m"); NN[$num]=$(${print} "3[48;5;${colorNums[$num]}m"); done
# piped decolorization:
DECOLORIZE='eval sed "s,${END}\[[0-9;]*[m|K],,g"'

How to load all that stuff:

如何加载所有这些东西:

Either you know how to create functions and load them via FPATH (ksh) or an emulation of FPATH (bash)

您知道如何创建函数并通过 FPATH (ksh) 或模拟 FPATH (bash) 加载它们

If not, just copy/paste everything on the command line.

如果没有,只需在命令行上复制/粘贴所有内容。

How does it work:

它是如何工作的:

xml_read [-cdlp] [-x command <-a attribute>] <file.xml> [tag | "any"] [attributes .. | "content"]
  -c = NOCOLOR
  -d = Debug
  -l = LIGHT (no \"attribute=\" printed)
  -p = FORCE PRINT (when no attributes given)
  -x = apply a command on an attribute and print the result instead of the former value, in green color
  (no attribute given will load their values into your shell as $ATTRIBUTE=value; use '-p' to print them as well)

xml_read server.xml title content     # print content between <title></title>
xml_read server.xml Connector port    # print all port values from Connector tags
xml_read server.xml any port          # print all port values from any tags

With Debug mode (-d) comments and parsed attributes are printed to stderr

使用调试模式 (-d) 将注释和解析的属性打印到 stderr

回答by simon04

Check out XML2from http://www.ofb.net/~egnor/xml2/which converts XML to a line-oriented format.

http://www.ofb.net/~egnor/xml2/查看XML2,它将 XML 转换为面向行的格式。

回答by mirod

I am not aware of any pure shell XML parsing tool. So you will most likely need a tool written in an other language.

我不知道有任何纯 shell XML 解析工具。因此,您很可能需要用其他语言编写的工具。

My XML::Twig Perl module comes with such a tool: xml_grep, where you would probably write what you want as xml_grep -t '/html/head/title' xhtmlfile.xhtml > titleOfXHTMLPage.txt(the -toption gives you the result as text instead of xml)

我的 XML::Twig Perl 模块带有这样一个工具:xml_grep,您可能会在其中编写您想要的内容xml_grep -t '/html/head/title' xhtmlfile.xhtml > titleOfXHTMLPage.txt(该-t选项为您提供文本而不是 xml 的结果)

回答by BeniBela

Another command line tool is my new Xidel. It also supports XPath 2 and XQuery, contrary to the already mentioned xpath/xmlstarlet.

另一个命令行工具是我的新Xidel。它还支持 XPath 2 和 XQuery,这与已经提到的 xpath/xmlstarlet 不同。

The title can be read like:

标题可以这样读:

xidel xhtmlfile.xhtml -e /html/head/title > titleOfXHTMLPage.txt

And it also has a cool feature to export multiple variables to bash. For example

它还有一个很酷的功能,可以将多个变量导出到 bash。例如

eval $(xidel xhtmlfile.xhtml -e 'title := //title, imgcount := count(//img)' --output-format bash )

sets $titleto the title and $imgcountto the number of images in the file, which should be as flexible as parsing it directly in bash.

设置$title为文件中的标题和$imgcount图像数量,这应该与直接在 bash 中解析它一样灵活。

回答by user485380

After some research for translation between Linux and Windows formats of the file paths in XML files I found interesting tutorials and solutions on:

在对 XML 文件中文件路径的 Linux 和 Windows 格式之间的转换进行了一些研究后,我发现了以下有趣的教程和解决方案: