bash 如何在shell中解码URL编码的字符串？

Question

提问by user785717

I have a file with a list of user-agents which are encoded. E.g.:

我有一个文件，其中包含已编码的用户代理列表。例如：

Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en

I want a shell script which can read this file and write to a new file with decoded strings.

我想要一个 shell 脚本，它可以读取这个文件并用解码的字符串写入一个新文件。

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en

I have been trying to use this example to get it going but it is not working so far.

我一直在尝试使用这个示例来实现它，但到目前为止它不起作用。

$ echo -e "$(echo "%31+%32%0A%33+%34" | sed 'y/+/ /; s/%/\x/g')"

My script looks like:

我的脚本看起来像：

#!/bin/bash
for f in *.log; do
  echo -e "$(cat $f | sed 'y/+/ /; s/%/\x/g')" > y.log
done

Answer 1

回答by guest

Here is a simple one-line solution.

这是一个简单的单行解决方案。

$ function urldecode() { : "${*//+/ }"; echo -e "${_//%/\x}"; }

It may look like perl :) but it is just pure bash. No awks, no seds ... no overheads. Using the : builtin, special parameters, pattern substitution and the echo builtin's -e option to translate hex codes into characters. See bash's manpage for further details. You can use this function as separate command

它可能看起来像 perl :) 但它只是纯粹的 bash。没有 awks，没有 seds ......没有开销。使用 : 内置、特殊参数、模式替换和 echo 内置的 -e 选项将十六进制代码转换为字符。有关更多详细信息，请参阅 bash 的联机帮助页。您可以将此功能用作单独的命令

$ urldecode https%3A%2F%2Fgoogle.com%2Fsearch%3Fq%3Durldecode%2Bbash
https://google.com/search?q=urldecode+bash

or in variable assignments, like so:

或者在变量赋值中，像这样：

$ x="http%3A%2F%2Fstackoverflow.com%2Fsearch%3Fq%3Durldecode%2Bbash"
$ y=$(urldecode "$x")
$ echo "$y"
http://stackoverflow.com/search?q=urldecode+bash

Answer 2

回答by Steven Penny

GNU awk

#!/usr/bin/awk -fn
@include "ord"
BEGIN {
  RS = "%.."
}
{
  printf RT ? #!/bin/sh
awk -niord '{printf RT?while read; do echo -e ${REPLY//%/\x}; done
chr("0x"substr(RT,2)):while read; do echo -e ${REPLY//%/\x}; done < file
}' RS=%..
 chr("0x" substr(RT, 2)) : echo 'a%21b' | while read; do echo -e ${REPLY//%/\x}; done

}

Or

或者

while read; do : "${REPLY//%/\x}"; echo -e ${_//+/ }; done

Using awk printf to urldecode text

使用 awk printf 对文本进行 urldecode

Answer 3

回答by brendan

With BASH, to read the per cent encoded URL from standard in and decode:

使用 BASH，从标准输入读取百分比编码的 URL 并解码：

echo -n "%21%20" | python3 -c "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read()));"

Press CTRL-Dto signal the end of file(EOF) and quit gracefully.

按CTRL-D表示文件结束（EOF）并正常退出。

You can decode the contents of a file by setting the file to be standard in:

您可以通过将文件设置为标准来解码文件的内容：

echo -n "%21%20" | python -c "import sys, urllib as ul; print ul.unquote(sys.stdin.read());"

You can decode input from a pipe either, for example:

您可以解码来自管道的输入，例如：

#!/bin/bash
urldecode(){
  echo -e "$(sed 's/+/ /g;s/%\(..\)/\x/g;')"
}

for f in /opt/logs/*.log; do
    name=${f##/*/}
    cat $f | urldecode > /opt/logs/processed/$HOSTNAME.$name
done

The read built in command reads standard in until it sees a Line Feed character. It sets a variable called REPLYequal to the line of text it just read.
${REPLY//%/\\x}replaces all instances of '%' with '\x'.
echo -einterprets \xNNas the ASCII character with hexadecimal value of NN.
while repeats this loop until the read command fails, eg. EOF has been reached.

read 内置命令读取标准输入，直到它看到换行符。它设置一个变量，称为REPLY等于它刚刚读取的文本行。
${REPLY//%/\\x}用 '\x' 替换 '%' 的所有实例。
echo -e解释\xNN为十六进制值为的 ASCII 字符NN。
while 重复此循环，直到读取命令失败，例如。已达到EOF。

The above does not change '+' to ' '. To change '+' to ' ' also, like guest's answer:

以上不会将“+”更改为“ ”。要将 '+' 更改为 ' '，就像客人的回答：

perl -pi.back -e 'y/+/ /;s/%([\da-f]{2})/pack H2,/gie' ./*.log

:is a BASH builtin command. Here it just takes in a single argument and does nothing with it.
The double quotes make everything inside one single parameter.
_is a special parameter that is equal to the last argument of the previous command, after argument expansion. This is the value of REPLYwith all instances of '%' replaced with '\x'.
${_//+/ }replaces all instances of '+' with ' '.

:是一个 BASH 内置命令。在这里，它只接受一个参数，不做任何处理。
双引号使所有内容都包含在一个参数中。
_是一个特殊参数，在参数扩展后等于上一个命令的最后一个参数。这是REPLY'%' 的所有实例都替换为 '\x' 的值。
${_//+/ }用“ ”替换“+”的所有实例。

This uses only BASH and doesn't start any other process, similar to guest's answer.

这仅使用 BASH 而不会启动任何其他进程，类似于来宾的回答。

Answer 4

回答by Jay

If you are a pythondeveloper, this maybe preferable:

如果您是Python开发人员，这可能更可取：

For Python 3.x(default):

对于 Python 3.x（默认）：

perl -pi.back -e 'y/+/ /;s/%([\da-f]{2})/chr hex /gie' ./*.log

For Python 2.x(deprecated):

对于 Python 2.x（已弃用）：

perl -pi.back -MURI::Escape -e 'y/+/ /;$_=uri_unescape$_' ./*.log

urllibis really good at handling URL parsing

urllib非常擅长处理 URL 解析

Answer 5

回答by user785717

This is what seems to be working for me.

这似乎对我有用。

LANG=C

urlencode() {
    local l=${#1}
    for (( i = 0 ; i < l ; i++ )); do
        local c=${1:i:1}
        case "$c" in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            ' ') printf + ;;
            *) printf '%%%.2X' "'$c"
        esac
    done
}

urldecode() {
    local data=${1//+/ }
    printf '%b' "${data//%/\x}"
}

Replacing '+'s with spaces, and % signs with '\x' escapes, and letting echo interpret the \x escapes using the '-e' option was not working. For some reason, the cat command was printing the % sign as its own encoded form %25. So sed was simply replacing %25 with \x25. When the -e option was used, it was simply evaluating \x25 as % and the output was same as the original.

用空格替换 '+'，用 '\x' 转义符替换 % 符号，并让 echo 使用 '-e' 选项解释 \x 转义符是行不通的。出于某种原因，cat 命令将 % 符号打印为它自己的编码形式 %25。所以 sed 只是用 \x25 替换了 %25。使用 -e 选项时，它只是将 \x25 评估为 % 并且输出与原始输出相同。

Trace:

痕迹：

Original:Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en

原文：Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en

sed:Mozilla\x252F5.0\x2520\x2528Macintosh\x253B\x2520U\x253B\x2520Intel\x2520Mac\x2520OS\x2520X\x252010.6\x253B\x2520en

sed:Mozilla\x252F5.0\x2520\x2528Macintosh\x253B\x2520U\x253B\x2520Intel\x2520Mac\x2520OS\x2520X\x252010.6\x2520B\x2

echo -e:Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en

Fix:Basically ignore the 2 characters after the % in sed.

修复：基本上忽略 sed 中 % 之后的 2 个字符。

sed:Mozilla\x2F5.0\x20\x28Macintosh\x3B\x20U\x3B\x20Intel\x20Mac\x20OS\x20X\x2010.6\x3B\x20en

echo -e:Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en

Not sure what complications this would result in, after extensive testing, but works for now.

经过广泛的测试，不确定这会导致什么并发症，但现在有效。

Answer 6

回答by Stephane Chazelas

while true
  do cat /dev/urandom | tr -d 'tail -f nginx.access.log | php -R 'echo urldecode($argn)."\n";'
' | head -c1000 > /tmp/tmp;
     A="$(cat /tmp/tmp; printf x)"
     A=${A%x}
     A=$(urlencode "$A")
     urldecode "$A" > /tmp/tmp2
     cmp /tmp/tmp /tmp/tmp2
     if [ $? != 0 ]
       then break
     fi
done

With -iupdates the files in-place (some sedimplementations have borrowed that from perl) with .backas the backup extension.

通过-i就地更新文件（某些sed实现从中借用了文件perl），.back并将其作为备份扩展。

s/x/y/esubstitutes xwith the evaluation of the yperl code.

s/x/y/ex用perl 代码的e值代替y。

The perl code in this case uses packto pack the hex number captured in $1(first parentheses pair in the regexp) as the corresponding character.

在这种情况下，perl 代码用于pack将捕获的十六进制数$1（正则表达式中的第一个括号对）打包为相应的字符。

An alternative to packis to use chr(hex($1)):

另一种方法pack是使用chr(hex($1))：

% echo -e "$(echo "Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en" | sed 'y/+/ /; s/%/\x/g')"
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en

If available, you could also use uri_unescape()from URI::Escape:

如果可用，您还可以使用uri_unescape()from URI::Escape：

#!/usr/bin/env python

import glob
import os
import urllib

for logfile in glob.glob(os.path.join('.', '*.log')):
    with open(logfile) as current:
        new_log_filename = logfile + '.new'
        with open(new_log_filename, 'w') as new_log_file:
            for url in current:
                unquoted = urllib.unquote(url.strip())
                new_log_file.write(unquoted + '\n')

Answer 7

回答by Janus Troelsen

Bash script for doing it in native Bash (original source):

用于在本机 Bash 中执行此操作的 Bash 脚本（原始来源）：

gawk -vRS='%[0-9a-fA-F]{2}' 'RT{sub("%","0x",RT);RT=sprintf("%c",strtonum(RT))}
                             {gsub(/\+/," ");printf "%s", ##代码## RT}'

If you want to urldecode file content, just put the file content as an argument.

如果要对文件内容进行 urldecode，只需将文件内容作为参数即可。

Here's a test that will run halt if the decoded encoded file content differs (if it runs for a few seconds, the script probably works correctly):

这是一个测试，如果解码的编码文件内容不同，它将停止运行（如果它运行几秒钟，脚本可能会正常工作）：

##代码##

Answer 8

回答by Oleg Bondar'

If you have php installed on your server, you can "cat" or even "tail" any file, with url encoded strings very easily.

如果您的服务器上安装了 php，您可以“cat”甚至“tail”任何文件，非常容易地使用 url 编码字符串。

##代码##

Answer 9

回答by Johnsyweb

As @barti_ddusaid in the comments, \x"should be [double-]escaped".

正如@barti_ddu在评论中所说，\x“应该[双重]转义”。

##代码##

Rather than mixing up Bash and sed, I would do this all in Python. Here's a rough cut of how:

与其将 Bash 和 sed 混在一起，我会用 Python 来完成这一切。这是一个粗略的方法：

##代码##

Answer 10

回答by Stephane Chazelas

With GNU awk:

使用 GNU awk：

##代码##

bash 如何在shell中解码URL编码的字符串？

提问by user785717

回答by guest

回答by Steven Penny

回答by brendan

回答by Jay

回答by user785717

回答by Stephane Chazelas

回答by Janus Troelsen

回答by Oleg Bondar'

回答by Johnsyweb

回答by Stephane Chazelas

相关推荐

最近更新

标签

bash 如何在shell中解码URL编码的字符串？

提问by user785717

回答by guest

回答by Steven Penny

回答by brendan

回答by Jay

回答by user785717

回答by Stephane Chazelas

回答by Janus Troelsen

回答by Oleg Bondar'

回答by Johnsyweb

回答by Stephane Chazelas

相关推荐

Bash 脚本 - 将变量内容作为要运行的命令

bash 向后台进程发送命令

bash Bash工具从文件中获取第n行

在 perl 中使用 bash 命令

相关推荐

最近更新

标签