Bash 将 XML 解析为数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14698173/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bash Parse XML to Array
提问by Aaron Henderson
I am doing a mixed language script with the parent script being bash (don't ask why, it's a long story). Part of my script pulls the source of an XML page into a variable. I want to use bash to process the XML in the variable into several arrays. The XML is set up as follows:
我正在做一个混合语言脚本,父脚本是 bash(不要问为什么,这是一个很长的故事)。我的脚本的一部分将 XML 页面的源提取到一个变量中。我想用bash将变量中的XML处理成几个数组。XML 设置如下:
<event>
<id>34287352</id>
<what>New Post</what>
<when>1 Minute Ago 03:50 PM</when>
<title>This is a title</title>
<preview>sdfasd</preview>
<poster>
<![CDATA[ USERNAME ]]>
</poster>
<threadid>2346566</threadid>
<postid>34287352</postid>
<lastpost>1360021837</lastpost>
<userid>3291696</userid>
<forumid>2</forumid>
<forumname>General Discussion</forumname>
<views>201,913</views>
<replies>6,709</replies>
<statusicon>images/statusicon/thread.gif</statusicon>
</event>
There are 20 <event>'s in the XML file. I want to pull what title and preview from the XML and put them all into their own array
<event>XML 文件中有 20 个。我想从 XML 中提取什么标题和预览,并将它们全部放入自己的数组中
I followed an example here on SOF
我在 SOF 上遵循了一个示例
for tag in what title preview
do
OUT=`grep $tag $source | tr -d '\t' | sed 's/^<.*>\([^<].*\)<.*>$//' `
# This is what I call the eval_trick, difficult to explain in words.
eval ${tag}=`echo -ne \""${OUT}"\"`
done
W_ARRAY=( `echo ${what}` )
T_ARRAY=( `echo ${title}` )
P_ARRAY=( `echo ${preview}` )
echo ${W_ARRAY[0]}
echo ${T_ARRAY[0]}
echo ${P_ARRAY[0]}
But using the above my script always freaks right out and repeats grep: <part of the xml>: No such file or directory
但是使用上面的我的脚本总是吓坏了并重复 grep: <part of the xml>: No such file or directory
Thoughts?
想法?
EDIT:
编辑:
Well it is ugly as hell but I managed to get the sudoxml into an array
好吧,这太丑了,但我设法将 sudoxml 放入数组中
windex=0
tindex=0
pindex=0
while read -r line
do
WHAT=$(echo ${line} | awk -F "</?what>" '{ print }')
if [ "$WHAT" != "" ]; then
W_ARRAY[$windex]=$OUT
let windex+=1
fi
TITLE=$(echo ${line} | awk -F "</?title>" '{ print }')
if [ "$TITLE" != "" ]; then
T_ARRAY[$tindex]=$OUT
let tindex+=1
fi
PREVIEW=$(echo ${line} | awk -F "</?preview>" '{ print }')
if [ "$PREVIEW" != "" ]; then
P_ARRAY[$pindex]=$OUT
let pindex+=1
fi
done <<< "$source"
采纳答案by Aaron Henderson
Got everything working. For those of you who ever plan to do something anything similar here is the haps:
一切正常。对于那些曾经计划做类似事情的人来说,这是可能的:
on run argv
set region to item 1 of argv
set XML_URL to "http://" & region & ".<URL REMOVED>.com/board/vaispy-secret.php?do=xml"
try
tell application "Safari"
set URL of tab 1 of front window to XML_URL
my waitforload()
--delay 5
-- Get page source
set currentTab to current tab of front window
set currentSource to currentTab's source
return currentSource
end tell
on error err
log "Could not retrieve source."
log err
display dialog err
--return "NULL"
end try
end run
on waitforload()
--check if page has loaded
local loadflag, zarg, test_html
set loadflag to 0
repeat until loadflag is 1
delay 0.5
tell application "Safari"
set test_html to source of document 1
end tell
try
set zarg to text ((count of characters in test_html) - 10) thru (count of characters in test_html) of test_html
if "</events>" is in text ((count of characters in test_html) - 10) thru (count of characters in test_html) of test_html then
set loadflag to 1
end if
end try
end repeat
end waitforload
Create bash script:
创建 bash 脚本:
#!/bin/bash
clear
if [ "" == "na" ]; then
region="na"
elif [ "" == "eu" ]; then
region="euw"
else
echo "FRcli requires an argument."
echo "usage: [eu|na]"
echo "[eu scans EUW & EUNE]"
echo "[na scans NA]"
exit $?
fi
while true; do
clear
echo "Region: $region"
echo "...Importing Naughty"
declare -a NAUGHTY=()
nindex=0
while read line
do
NAUGHTY[$nindex]=$line
let nindex+=1
done < $HOME/Desktop/naughty.txt
NC=${#NAUGHTY[@]}
let NC-=1
echo "...Pulling Source"
source=$(osascript FRcli.scpt $region)
echo "...Extracting Arrays"
windex=0
tindex=0
pindex=0
dindex=0
while read -r line
do
#WHAT=$(echo ${line} | awk -F "</?what>" '{ print }')
WHAT=$(echo ${line} | sed -n 's/^.*<what>\([^<]*\).*//p')
if [ "$WHAT" != "" ]; then
W_ARRAY[$windex]=$WHAT
let windex+=1
fi
#TITLE=$(echo ${line} | awk -F "</?title>" '{ print }')
TITLE=$(echo ${line} | sed -n 's/^.*<title>\([^<]*\).*//p')
if [ "$TITLE" != "" ]; then
T_ARRAY[$tindex]=$TITLE
let tindex+=1
fi
#PREVIEW=$(echo ${line} | awk -F "</?preview>" '{ print }')
#PREVIEW=$(echo ${line} | sed -n '/<preview*/,/<\/preview>/p')
PREVIEW=$(echo ${line} | sed -n 's/^.*<preview>\([^<]*\).*//p')
if [ "$PREVIEW" != "" ]; then
P_ARRAY[$pindex]=$PREVIEW
let pindex+=1
fi
POSTID=$(echo ${line} | sed -n 's/^.*<postid>\([^<]*\).*//p')
if [ "$POSTID" != "" ]; then
D_ARRAY[$dindex]=$POSTID
let dindex+=1
fi
done <<< "$source"
echo "What: ${#W_ARRAY[@]}"
echo "Title: ${#T_ARRAY[@]}"
echo "Preview: ${#P_ARRAY[@]}"
echo "PostID: ${#D_ARRAY[@]}"
for ((i=0; i <= 19; i++))
do
found=0
fpid=""
if [ "${W_ARRAY[$i]}" = "New Thread" ]; then
echo "Scanning Thread"
scan=$(echo ${T_ARRAY[$i]} ${P_ARRAY[$i]})
echo "Title: ${T_ARRAY[$i]}"
echo "Post: ${P_ARRAY[$i]}"
else
echo "Scanning Post"
scan=$(echo ${P_ARRAY[$i]})
echo "Post: ${scan}"
fi
sleep .5
for ((n=0; n<=$NC; n++))
do
nw=${NAUGHTY[$n]}
a=$(echo ${scan} | tr [:lower:] [:upper:])
b=$(echo ${nw} | tr [:lower:] [:upper:])
echo "Checking: $b"
#echo "$a"
if [[ $a == *$b* ]]; then
## Change != to == in release
echo "Found: $b"
found=1
echo "...Loading PID"
declare -a PID=()
pindex=0
while read line
do
PID[$pindex]=$line
let pindex+=1
done < $HOME/Desktop/pid.txt
PIDC=${#PID[@]}
for (( p=0; p<=$PIDC ; p++))
do
lpid=${PID[$p]}
if [ "$region ${D_ARRAY[$i]}" == "$lpid" ]; then
echo "Found: $lpid"
echo "Ignoring Flag"
fpid=1
elif [ "$region ${D_ARRAY[$i]}" != "$lpid" ]; then
echo "$region ${D_ARRAY[$i]} $lpid"
echo "PID not found, opening URL."
fpid=0
break
else
echo "Hi"
fpid=1
fi
done
if [ "$found" == "1" -a "$fpid" == "0" ]; then
FFURL="http://$region.<URL REMOVED>.com/board/showthread.php?p=${D_ARRAY[$i]}&highlight=$nw"
open -a Firefox "$FFURL"
echo $region ${D_ARRAY[$i]} >> $HOME/Desktop/pid.txt
found=0
fipd=""
fi
fi
done
sleep .5
done
if [ "" == "eu" ]; then
if [ "$region" == "euw" ]; then
region="eune"
else
region="euw"
fi
fi
clear
done I'm sure their are far more efficient means of doing this. Using cURL in the bash script would have made this a once script deal (couldn't with this script due to security in place for this boards iSpy). But this works and it is pretty zippy. Uses only AVG 32.7 Mem and as far as I can tell doesn't have any memory leaks (like my 100% applescript version of this did)
完成 我相信他们是这样做的更有效的手段。在 bash 脚本中使用 cURL 会使这成为一次脚本交易(由于此板 iSpy 的安全性,无法使用此脚本)。但这有效并且非常活泼。仅使用 AVG 32.7 Mem,据我所知没有任何内存泄漏(就像我的 100% 苹果脚本版本那样)
回答by X Tian
I had something sooo similar, parsing wise, here's a hacked version
我有一些非常相似的东西,解析明智,这是一个被黑的版本
I use xsltproc (which is in ubuntu, but can't remember if I have installed it specifically)
我使用 xsltproc (在 ubuntu 中,但不记得我是否专门安装了它)
Command line
命令行
xsltproc tfile.xslt tfile.xml
tfile.xml (is your example copied 3 times), wrapped in events tags ie.
tfile.xml(您的示例复制了 3 次),包装在事件标签中,即。
<events>
<event> ... </event>
<event> ... </event>
<event> ... </event>
</events>
tfile.xsl :
tfile.xsl :
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method='text'/>
<!-- ================================================================== -->
<xsl:template match="/">
<xsl:apply-templates select="//event"/>
</xsl:template>
<xsl:template match="event">
<xsl:text>event[</xsl:text><xsl:value-of select="position()"/><xsl:text>]['id']=</xsl:text>
<xsl:value-of select="id"/> <xsl:text> </xsl:text>
<xsl:text>event[</xsl:text><xsl:value-of select="position()"/><xsl:text>]['what']=</xsl:text>
<xsl:value-of select="what"/><xsl:text> </xsl:text>
<xsl:text>event[</xsl:text><xsl:value-of select="position()"/><xsl:text>]['preview']=</xsl:text>
<xsl:value-of select="preview"/><xsl:text> </xsl:text>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
Output
输出
event[1]['id']=34287352 event[1]['what']=New Post event[1]['preview']=sdfasd
event[2]['id']=34287353 event[2]['what']=New Post3 event[2]['preview']=sdfasd
event[3]['id']=34287354 event[3]['what']=New Post4 event[3]['preview']=sdfasd
Hope you know a bit of xslt processing, change output as you want.
希望您了解一些 xslt 处理,根据需要更改输出。
回答by BeniBela
Well, now this completely unhelpful, but I'm currently working on a command line xml parser. If it were finished (it would already be, if I weren't distracted by a topcoder marathon march...), you could write it as simply as:
好吧,现在这完全没有帮助,但我目前正在研究命令行 xml 解析器。如果它完成了(它已经完成了,如果我没有被顶级编码器马拉松游行分心......),你可以简单地编写它:
eval $(echo "$source" | xidel - -e '<event>
<what>{$W_ARRAY}</what>
<title>{$T_ARRAY}</title>
<preview>{$P_ARRAY}</preview>
</event>*' --output-format bash)
Looks amazing, doesn't it?
看起来很神奇,不是吗?
回答by Majid Laissi
In recap of my comments, here's what's wrong with your code:
回顾一下我的评论,您的代码有什么问题:
1- As your $sourcevariable is not a filename, in your grep you should use:
1- 由于您的$source变量不是文件名,因此在您的 grep 中您应该使用:
OUT=`echo $source | grep $tag | tr -d '\t' | sed 's/^<.*>\([^<].*\)<.*>$//' `
2- Your trcommand replaces all the tabs in your XML-like variable. However, rour variable does not contain tabs but instead 4 white spaces.
2- 您的tr命令会替换tab类似 XML 的变量中的所有s。但是, rour 变量不包含tabs 而是 4 个空格。
So instead you need to have :
因此,您需要:
... | tr -d ' ' | ...
3- An alternative solution would be:
3-另一种解决方案是:
OUT=`echo $source | grep $tag | sed 's/<.*>\([^<].*\)<.*>$//' `
(note that the ^in the sedis removed)
(注意^中的sed已删除)

