bash bash中的简单正则表达式解析
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3968111/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Simple regular expression parsing in bash
提问by Salman A
I want to parse a log file (log.txt) which contains rows similar to these:
我想解析一个日志文件 (log.txt),其中包含与这些类似的行:
2010-10-19 07:56:14 URL:http://www.website.com/page.php?ID=26 [13676] -> "www.website.com/page.php?ID=26" [1]
2010-10-19 07:56:14 URL:http://www.website.com/page.php?ID=44 [14152] -> "www.website.com/page.php?ID=44" [1]
2010-10-19 07:56:14 URL:http://www.website.com/page.php?ID=13 [13681] -> "www.website.com/page.php?ID=13" [1]
2010-10-19 07:56:14 ERROR:Something bad happened
2010-10-19 07:56:14 ERROR:Something really bad happened
2010-10-19 07:56:15 URL:http://www.website.com/page.php?ID=14 [12627] -> "www.website.com/page.php?ID=14" [1]
2010-10-19 07:56:14 ERROR:Page not found
2010-10-19 07:56:15 URL:http://www.website.com/page.php?ID=29 [13694] -> "www.website.com/page.php?ID=29" [1]
As you might have guessed:
你可能已经猜到了:
1) I need to extract this portion from each row:
1)我需要从每一行中提取这部分:
2010-10-19 07:56:15 URL:http://www.website.com/page.php?ID=29 [13694] -> "www.website.com/page.php?ID=29" [1]
------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2) This portion goes to another file (log.html) like this:
2)这部分转到另一个文件(log.html),如下所示:
<a href="http://www.website.com/page.php?ID=29">http://www.website.com/page.php?ID=29</a>
I need to do this via bash script, which will run on a *nix platform. I have no idea about shell programming so detailed script will be much appreciated, pointers to bash programming reference will do.
我需要通过 bash 脚本执行此操作,该脚本将在 *nix 平台上运行。我对 shell 编程一无所知,因此非常感谢详细的脚本,指向 bash 编程参考的指针就可以了。
采纳答案by mouviciel
This should work:
这应该有效:
sed -n 's%^.* URL:\(.*\) \[[0-9]*\] -> .*$%<a href=""></a>%p' log.txt
回答by ghostdog74
Here's a bash solution
这是一个 bash 解决方案
#!/bin/bash
exec 4<"log.txt"
while read -r line<&4
do
case "$line" in
*URL:* )
url="${line#*URL:}"
url=${url%% [*}
echo "<a href=\"${url}\">${url}</a>"
esac
done
exec 4<&-
回答by a'r
Here's a small awk script that should do what you need.
这是一个小 awk 脚本,可以满足您的需求。
awk '/URL:/ { sub(/^URL:/,"", ); printf "<a href=\"%s"\">%s</a>\n", , ; }'
回答by Zsolt Botykai
What about sed:
sed怎么样:
sed -n 's/.*URL:\([^ ]\+\) .*/<a href=""><\/a>/;/<a href/p' logfile
(Please note: you can address the URL part more properly, e.g. by the length of the date string in front of it, but I was just lazy.)
(请注意:您可以更正确地寻址 URL 部分,例如通过它前面的日期字符串的长度,但我只是懒惰。)
回答by codaddict
Something like this:
像这样的东西:
while read line
do
URL=$(echo $line | egrep -o 'URL:[^ ]+' | sed 's/^URL://')
if [ -n "$URL" ]; then
echo "<a href=\"$URL\">$URL</a>" >> output.txt
fi
done < input.txt

