bash bash中的简单正则表达式解析

Question

提问by Salman A

I want to parse a log file (log.txt) which contains rows similar to these:

我想解析一个日志文件 (log.txt)，其中包含与这些类似的行：

2010-10-19 07:56:14 URL:http://www.website.com/page.php?ID=26 [13676] -> "www.website.com/page.php?ID=26" [1]
2010-10-19 07:56:14 URL:http://www.website.com/page.php?ID=44 [14152] -> "www.website.com/page.php?ID=44" [1]
2010-10-19 07:56:14 URL:http://www.website.com/page.php?ID=13 [13681] -> "www.website.com/page.php?ID=13" [1]
2010-10-19 07:56:14 ERROR:Something bad happened
2010-10-19 07:56:14 ERROR:Something really bad happened
2010-10-19 07:56:15 URL:http://www.website.com/page.php?ID=14 [12627] -> "www.website.com/page.php?ID=14" [1]
2010-10-19 07:56:14 ERROR:Page not found
2010-10-19 07:56:15 URL:http://www.website.com/page.php?ID=29 [13694] -> "www.website.com/page.php?ID=29" [1]

As you might have guessed:

你可能已经猜到了：

1) I need to extract this portion from each row:

1）我需要从每一行中提取这部分：

2010-10-19 07:56:15 URL:http://www.website.com/page.php?ID=29 [13694] -> "www.website.com/page.php?ID=29" [1]
------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

2) This portion goes to another file (log.html) like this:

2）这部分转到另一个文件（log.html），如下所示：

<a href="http://www.website.com/page.php?ID=29">http://www.website.com/page.php?ID=29</a>

I need to do this via bash script, which will run on a *nix platform. I have no idea about shell programming so detailed script will be much appreciated, pointers to bash programming reference will do.

我需要通过 bash 脚本执行此操作，该脚本将在 *nix 平台上运行。我对 shell 编程一无所知，因此非常感谢详细的脚本，指向 bash 编程参考的指针就可以了。

Answer 1

采纳答案by mouviciel

This should work:

这应该有效：

sed -n 's%^.* URL:\(.*\) \[[0-9]*\] -> .*$%<a href=""></a>%p' log.txt

Answer 2

回答by ghostdog74

Here's a bash solution

这是一个 bash 解决方案

#!/bin/bash
exec 4<"log.txt"
while read -r line<&4
do
  case "$line" in
    *URL:* )
      url="${line#*URL:}"
      url=${url%% [*}
      echo "<a href=\"${url}\">${url}</a>"
  esac
done
exec 4<&-

Answer 3

回答by a'r

Here's a small awk script that should do what you need.

这是一个小 awk 脚本，可以满足您的需求。

awk '/URL:/ { sub(/^URL:/,"", ); printf "<a href=\"%s"\">%s</a>\n", , ; }'

Answer 4

回答by Zsolt Botykai

What about sed:

sed怎么样：

sed -n 's/.*URL:\([^ ]\+\) .*/<a href=""><\/a>/;/<a href/p' logfile

(Please note: you can address the URL part more properly, e.g. by the length of the date string in front of it, but I was just lazy.)

（请注意：您可以更正确地寻址 URL 部分，例如通过它前面的日期字符串的长度，但我只是懒惰。）

Answer 5

回答by codaddict

Something like this:

像这样的东西：

while read line
do
        URL=$(echo $line | egrep -o 'URL:[^ ]+' | sed  's/^URL://')     
        if [ -n "$URL" ]; then
                echo "<a href=\"$URL\">$URL</a>" >> output.txt
        fi  
done < input.txt

bash bash中的简单正则表达式解析

提问by Salman A

采纳答案by mouviciel

回答by ghostdog74

回答by a'r

回答by Zsolt Botykai

回答by codaddict

相关推荐

最近更新

标签

bash bash中的简单正则表达式解析

提问by Salman A

采纳答案by mouviciel

回答by ghostdog74

回答by a'r

回答by Zsolt Botykai

回答by codaddict

相关推荐

我可以使用 heredoc 在 bash 中输入密码吗？

在 bash 中进行并行处理？

bash bash中函数参数和for循环的问题

在 Bash 中按创建时间到毫秒对文件进行排序

相关推荐

最近更新

标签