bash 如何在shell中解码URL编码的字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6250698/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to decode URL-encoded string in shell?
提问by user785717
I have a file with a list of user-agents which are encoded. E.g.:
我有一个文件,其中包含已编码的用户代理列表。例如:
Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en
I want a shell script which can read this file and write to a new file with decoded strings.
我想要一个 shell 脚本,它可以读取这个文件并用解码的字符串写入一个新文件。
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en
I have been trying to use this example to get it going but it is not working so far.
我一直在尝试使用这个示例来实现它,但到目前为止它不起作用。
$ echo -e "$(echo "%31+%32%0A%33+%34" | sed 'y/+/ /; s/%/\x/g')"
My script looks like:
我的脚本看起来像:
#!/bin/bash
for f in *.log; do
echo -e "$(cat $f | sed 'y/+/ /; s/%/\x/g')" > y.log
done
回答by guest
Here is a simple one-line solution.
这是一个简单的单行解决方案。
$ function urldecode() { : "${*//+/ }"; echo -e "${_//%/\x}"; }
It may look like perl :) but it is just pure bash. No awks, no seds ... no overheads. Using the : builtin, special parameters, pattern substitution and the echo builtin's -e option to translate hex codes into characters. See bash's manpage for further details. You can use this function as separate command
它可能看起来像 perl :) 但它只是纯粹的 bash。没有 awks,没有 seds ......没有开销。使用 : 内置、特殊参数、模式替换和 echo 内置的 -e 选项将十六进制代码转换为字符。有关更多详细信息,请参阅 bash 的联机帮助页。您可以将此功能用作单独的命令
$ urldecode https%3A%2F%2Fgoogle.com%2Fsearch%3Fq%3Durldecode%2Bbash
https://google.com/search?q=urldecode+bash
or in variable assignments, like so:
或者在变量赋值中,像这样:
$ x="http%3A%2F%2Fstackoverflow.com%2Fsearch%3Fq%3Durldecode%2Bbash"
$ y=$(urldecode "$x")
$ echo "$y"
http://stackoverflow.com/search?q=urldecode+bash
回答by Steven Penny
GNU awk
GNU awk
#!/usr/bin/awk -fn
@include "ord"
BEGIN {
RS = "%.."
}
{
printf RT ? #!/bin/sh
awk -niord '{printf RT?while read; do echo -e ${REPLY//%/\x}; done
chr("0x"substr(RT,2)):while read; do echo -e ${REPLY//%/\x}; done < file
}' RS=%..
chr("0x" substr(RT, 2)) : echo 'a%21b' | while read; do echo -e ${REPLY//%/\x}; done
}
Or
或者
while read; do : "${REPLY//%/\x}"; echo -e ${_//+/ }; done
回答by brendan
With BASH, to read the per cent encoded URL from standard in and decode:
使用 BASH,从标准输入读取百分比编码的 URL 并解码:
echo -n "%21%20" | python3 -c "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read()));"
Press CTRL-Dto signal the end of file(EOF) and quit gracefully.
按CTRL-D表示文件结束(EOF)并正常退出。
You can decode the contents of a file by setting the file to be standard in:
您可以通过将文件设置为标准来解码文件的内容:
echo -n "%21%20" | python -c "import sys, urllib as ul; print ul.unquote(sys.stdin.read());"
You can decode input from a pipe either, for example:
您可以解码来自管道的输入,例如:
#!/bin/bash
urldecode(){
echo -e "$(sed 's/+/ /g;s/%\(..\)/\x/g;')"
}
for f in /opt/logs/*.log; do
name=${f##/*/}
cat $f | urldecode > /opt/logs/processed/$HOSTNAME.$name
done
- The read built in command reads standard in until it sees a Line Feed character. It sets a variable called
REPLY
equal to the line of text it just read. ${REPLY//%/\\x}
replaces all instances of '%' with '\x'.echo -e
interprets\xNN
as the ASCII character with hexadecimal value ofNN
.- while repeats this loop until the read command fails, eg. EOF has been reached.
- read 内置命令读取标准输入,直到它看到换行符。它设置一个变量,称为
REPLY
等于它刚刚读取的文本行。 ${REPLY//%/\\x}
用 '\x' 替换 '%' 的所有实例。echo -e
解释\xNN
为十六进制值为 的 ASCII 字符NN
。- while 重复此循环,直到读取命令失败,例如。已达到EOF。
The above does not change '+' to ' '. To change '+' to ' ' also, like guest's answer:
以上不会将“+”更改为“ ”。要将 '+' 更改为 ' ',就像客人的回答:
perl -pi.back -e 'y/+/ /;s/%([\da-f]{2})/pack H2,/gie' ./*.log
:
is a BASH builtin command. Here it just takes in a single argument and does nothing with it.- The double quotes make everything inside one single parameter.
_
is a special parameter that is equal to the last argument of the previous command, after argument expansion. This is the value ofREPLY
with all instances of '%' replaced with '\x'.${_//+/ }
replaces all instances of '+' with ' '.
:
是一个 BASH 内置命令。在这里,它只接受一个参数,不做任何处理。- 双引号使所有内容都包含在一个参数中。
_
是一个特殊参数,在参数扩展后等于上一个命令的最后一个参数。这是REPLY
'%' 的所有实例都替换为 '\x' 的值。${_//+/ }
用“ ”替换“+”的所有实例。
This uses only BASH and doesn't start any other process, similar to guest's answer.
这仅使用 BASH 而不会启动任何其他进程,类似于来宾的回答。
回答by Jay
If you are a pythondeveloper, this maybe preferable:
如果您是Python开发人员,这可能更可取:
For Python 3.x(default):
对于 Python 3.x(默认):
perl -pi.back -e 'y/+/ /;s/%([\da-f]{2})/chr hex /gie' ./*.log
For Python 2.x(deprecated):
对于 Python 2.x(已弃用):
perl -pi.back -MURI::Escape -e 'y/+/ /;$_=uri_unescape$_' ./*.log
urllibis really good at handling URL parsing
urllib非常擅长处理 URL 解析
回答by user785717
This is what seems to be working for me.
这似乎对我有用。
LANG=C
urlencode() {
local l=${#1}
for (( i = 0 ; i < l ; i++ )); do
local c=${1:i:1}
case "$c" in
[a-zA-Z0-9.~_-]) printf "$c" ;;
' ') printf + ;;
*) printf '%%%.2X' "'$c"
esac
done
}
urldecode() {
local data=${1//+/ }
printf '%b' "${data//%/\x}"
}
Replacing '+'s with spaces, and % signs with '\x' escapes, and letting echo interpret the \x escapes using the '-e' option was not working. For some reason, the cat command was printing the % sign as its own encoded form %25. So sed was simply replacing %25 with \x25. When the -e option was used, it was simply evaluating \x25 as % and the output was same as the original.
用空格替换 '+',用 '\x' 转义符替换 % 符号,并让 echo 使用 '-e' 选项解释 \x 转义符是行不通的。出于某种原因,cat 命令将 % 符号打印为它自己的编码形式 %25。所以 sed 只是用 \x25 替换了 %25。使用 -e 选项时,它只是将 \x25 评估为 % 并且输出与原始输出相同。
Trace:
痕迹:
Original:Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en
原文:Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en
sed:Mozilla\x252F5.0\x2520\x2528Macintosh\x253B\x2520U\x253B\x2520Intel\x2520Mac\x2520OS\x2520X\x252010.6\x253B\x2520en
sed:Mozilla\x252F5.0\x2520\x2528Macintosh\x253B\x2520U\x253B\x2520Intel\x2520Mac\x2520OS\x2520X\x252010.6\x2520B\x2
echo -e:Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en
echo -e:Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en
Fix:Basically ignore the 2 characters after the % in sed.
修复:基本上忽略 sed 中 % 之后的 2 个字符。
sed:Mozilla\x2F5.0\x20\x28Macintosh\x3B\x20U\x3B\x20Intel\x20Mac\x20OS\x20X\x2010.6\x3B\x20en
sed:Mozilla\x2F5.0\x20\x28Macintosh\x3B\x20U\x3B\x20Intel\x20Mac\x20OS\x20X\x2010.6\x3B\x20en
echo -e:Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en
echo -e:Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en
Not sure what complications this would result in, after extensive testing, but works for now.
经过广泛的测试,不确定这会导致什么并发症,但现在有效。
回答by Stephane Chazelas
while true
do cat /dev/urandom | tr -d 'tail -f nginx.access.log | php -R 'echo urldecode($argn)."\n";'
' | head -c1000 > /tmp/tmp;
A="$(cat /tmp/tmp; printf x)"
A=${A%x}
A=$(urlencode "$A")
urldecode "$A" > /tmp/tmp2
cmp /tmp/tmp /tmp/tmp2
if [ $? != 0 ]
then break
fi
done
With -i
updates the files in-place (some sed
implementations have borrowed that from perl
) with .back
as the backup extension.
通过-i
就地更新文件(某些sed
实现从 中借用了文件perl
),.back
并将其作为备份扩展。
s/x/y/e
substitutes x
with the evaluation of the y
perl code.
s/x/y/e
x
用perl 代码的e值代替y
。
The perl code in this case uses pack
to pack the hex number captured in $1
(first parentheses pair in the regexp) as the corresponding character.
在这种情况下,perl 代码用于pack
将捕获的十六进制数$1
(正则表达式中的第一个括号对)打包为相应的字符。
An alternative to pack
is to use chr(hex($1))
:
另一种方法pack
是使用chr(hex($1))
:
% echo -e "$(echo "Mozilla%2F5.0%20%28Macintosh%3B%20U%3B%20Intel%20Mac%20OS%20X%2010.6%3B%20en" | sed 'y/+/ /; s/%/\x/g')"
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en
If available, you could also use uri_unescape()
from URI::Escape
:
如果可用,您还可以使用uri_unescape()
from URI::Escape
:
#!/usr/bin/env python
import glob
import os
import urllib
for logfile in glob.glob(os.path.join('.', '*.log')):
with open(logfile) as current:
new_log_filename = logfile + '.new'
with open(new_log_filename, 'w') as new_log_file:
for url in current:
unquoted = urllib.unquote(url.strip())
new_log_file.write(unquoted + '\n')
回答by Janus Troelsen
Bash script for doing it in native Bash (original source):
用于在本机 Bash 中执行此操作的 Bash 脚本(原始来源):
gawk -vRS='%[0-9a-fA-F]{2}' 'RT{sub("%","0x",RT);RT=sprintf("%c",strtonum(RT))}
{gsub(/\+/," ");printf "%s", ##代码## RT}'
If you want to urldecode file content, just put the file content as an argument.
如果要对文件内容进行 urldecode,只需将文件内容作为参数即可。
Here's a test that will run halt if the decoded encoded file content differs (if it runs for a few seconds, the script probably works correctly):
这是一个测试,如果解码的编码文件内容不同,它将停止运行(如果它运行几秒钟,脚本可能会正常工作):
##代码##回答by Oleg Bondar'
If you have php installed on your server, you can "cat" or even "tail" any file, with url encoded strings very easily.
如果您的服务器上安装了 php,您可以“cat”甚至“tail”任何文件,非常容易地使用 url 编码字符串。
##代码##回答by Johnsyweb
As @barti_ddusaid in the comments, \x
"should be [double-]escaped".
正如@barti_ddu在评论中所说,\x
“应该[双重]转义”。
Rather than mixing up Bash and sed, I would do this all in Python. Here's a rough cut of how:
与其将 Bash 和 sed 混在一起,我会用 Python 来完成这一切。这是一个粗略的方法:
##代码##回答by Stephane Chazelas
With GNU awk
:
使用 GNU awk
: