如何使用 linux shell 脚本删除 ^[ 以及文件中的所有转义序列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6534556/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove ^[, and all of the escape sequences in a file using linux shell scripting
提问by hasan
We want to remove ^[
, and all of the escape sequences.
我们要删除 ^[
, 以及所有转义序列。
sed is not working and is giving us this error:
sed 不工作,并给我们这个错误:
$ sed 's/^[//g' oldfile > newfile; mv newfile oldfile;
sed: -e expression #1, char 7: unterminated `s' command
$ sed -i '' -e 's/^[//g' somefile
sed: -e expression #1, char 7: unterminated `s' command
回答by sehe
Are you looking for ansifilter?
您在寻找ansifilter吗?
Two things you can do: enter the literal escape (in bash:)
您可以做两件事:输入文字转义(在 bash 中:)
Using keyboard entry:
使用键盘输入:
sed 's/Ctrl-vEsc//g'
alternatively
或者
sed 's/Ctrl-vCtrl-[//g'
Or you can use character escapes:
或者您可以使用字符转义:
sed 's/\x1b//g'
or for all control characters:
或对于所有控制字符:
sed 's/[\x01-\x1F\x7F]//g' # NOTE: zaps TAB character too!
回答by gronostaj
I've stumbled upon this post when looking for a way to strip extra formatting from man pages. ansifilter did it, but it was far from desired result (for example all previously-bold characters were duplicated, like SSYYNNOOPPSSIISS
).
我在寻找一种从手册页中去除额外格式的方法时偶然发现了这篇文章。ansifilter 做到了,但与预期的结果相差甚远(例如,所有以前的粗体字符都被复制了,例如SSYYNNOOPPSSIISS
)。
For that task the correct command would be col -bx
, for example:
对于该任务,正确的命令是col -bx
,例如:
groff -man -Tascii fopen.3 | col -bx > fopen.3.txt
回答by Luke H
I managed with the following for my purposes, but this doesn't include all possible ANSI escapes:
我出于我的目的使用了以下方法,但这不包括所有可能的ANSI 转义:
sed -r s/\x1b\[[0-9;]*m?//g
This removes m
commands, but for all escapes (as commented by @lethalman) use:
这会删除m
命令,但对于所有转义(如@lethalman 所评论),请使用:
sed -r s/\x1b\[[^@-~]*[@-~]//g
Also see "Python regex to match VT100 escape sequences".
另请参阅“匹配 VT100 转义序列的 Python 正则表达式”。
There is also a table of common escape sequences.
还有一个常见转义序列表。
回答by sdaau
Just a note; let's say you have a file like this (such line endings are generated by git
remote reports):
只是一个笔记;假设您有一个这样的文件(此类行尾由git
远程报告生成):
echo -e "remote: * 27625a8 (HEAD, master) 1st git commit\x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: Current branch master is up to date.\x1b[K" > chartest.txt
In binary, this looks like this:
在二进制中,这看起来像这样:
$ cat chartest.txt | hexdump -C
00000000 72 65 6d 6f 74 65 3a 20 2a 20 32 37 36 32 35 61 |remote: * 27625a|
00000010 38 20 28 48 45 41 44 2c 20 6d 61 73 74 65 72 29 |8 (HEAD, master)|
00000020 20 31 73 74 20 67 69 74 20 63 6f 6d 6d 69 74 1b | 1st git commit.|
00000030 5b 4b 0a 72 65 6d 6f 74 65 3a 20 1b 5b 4b 0a 72 |[K.remote: .[K.r|
00000040 65 6d 6f 74 65 3a 20 1b 5b 4b 0a 72 65 6d 6f 74 |emote: .[K.remot|
00000050 65 3a 20 1b 5b 4b 0a 72 65 6d 6f 74 65 3a 20 1b |e: .[K.remote: .|
00000060 5b 4b 0a 72 65 6d 6f 74 65 3a 20 1b 5b 4b 0a 72 |[K.remote: .[K.r|
00000070 65 6d 6f 74 65 3a 20 43 75 72 72 65 6e 74 20 62 |emote: Current b|
00000080 72 61 6e 63 68 20 6d 61 73 74 65 72 20 69 73 20 |ranch master is |
00000090 75 70 20 74 6f 20 64 61 74 65 2e 1b 5b 4b 0a |up to date..[K.|
0000009f
It is visible that git
here adds the sequence 0x1b
0x5b
0x4b
before the line ending (0x0a
).
可以看出,git
这里0x1b
0x5b
0x4b
在行尾 ( 0x0a
)之前添加了序列。
Note that - while you can match the 0x1b
with a literal format \x1b
in sed, you CANNOT do the same for 0x5b
, which represents the left square bracket [
:
请注意 - 虽然您可以将0x1b
与\x1b
sed 中的文字格式匹配,但您不能对0x5b
代表左方括号的执行相同操作[
:
$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
sed: -e expression #1, char 13: Invalid regular expression
You might think you can escape the representation with an extra backslash \
- which ends up as \\x5b
; but while that "passes" - it doesn't match anything as intended:
您可能认为您可以使用额外的反斜杠来转义表示\
- 最终为\\x5b
; 但是虽然“通过” - 它与预期的任何内容都不匹配:
$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
00000000 72 65 6d 6f 74 65 3a 20 2a 20 32 37 36 32 35 61 |remote: * 27625a|
00000010 38 20 28 48 45 41 44 2c 20 6d 61 73 74 65 72 29 |8 (HEAD, master)|
00000020 20 31 73 74 20 67 69 74 20 63 6f 6d 6d 69 74 1b | 1st git commit.|
00000030 5b 4b 0a 72 65 6d 6f 74 65 3a 20 1b 5b 4b 0a 72 |[K.remote: .[K.r|
00000040 65 6d 6f 74 65 3a 20 1b 5b 4b 0a 72 65 6d 6f 74 |emote: .[K.remot|
...
So if you want to match this character, apparently you mustwrite it as escaped left square bracket, that is \[
- the rest of the values can than be entered with escaped \x
notation:
所以如果你想匹配这个字符,显然你必须把它写成转义的左方括号,也就是说\[
- 其余的值可以用转义\x
符号输入:
$ cat chartest.txt | sed 's/\x1b\[\x4b//g' | hexdump -C
00000000 72 65 6d 6f 74 65 3a 20 2a 20 32 37 36 32 35 61 |remote: * 27625a|
00000010 38 20 28 48 45 41 44 2c 20 6d 61 73 74 65 72 29 |8 (HEAD, master)|
00000020 20 31 73 74 20 67 69 74 20 63 6f 6d 6d 69 74 0a | 1st git commit.|
00000030 72 65 6d 6f 74 65 3a 20 0a 72 65 6d 6f 74 65 3a |remote: .remote:|
00000040 20 0a 72 65 6d 6f 74 65 3a 20 0a 72 65 6d 6f 74 | .remote: .remot|
00000050 65 3a 20 0a 72 65 6d 6f 74 65 3a 20 0a 72 65 6d |e: .remote: .rem|
00000060 6f 74 65 3a 20 43 75 72 72 65 6e 74 20 62 72 61 |ote: Current bra|
00000070 6e 63 68 20 6d 61 73 74 65 72 20 69 73 20 75 70 |nch master is up|
00000080 20 74 6f 20 64 61 74 65 2e 0a | to date..|
0000008a
回答by soorajmr
ansi2txt command (part of kbtin package) seems to be doing the job perfectly on Ubuntu.
ansi2txt 命令(kbtin 包的一部分)似乎在 Ubuntu 上完美地完成了这项工作。
回答by Tom Hale
commandlinefu gives the correct answerwhich strips ANSI colours as well as movement commands:
commandlinefu 给出了正确的答案,它去除了 ANSI 颜色以及移动命令:
sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"
sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"
回答by lunixbochs
I built vtcleanfor this. It strips escape sequences using these regular expressions in order (explained in regex.txt):
我为此构建了vtclean。它按顺序使用这些正则表达式去除转义序列(在regex.txt 中解释):
// handles long-form RGB codes
^3](\d+);([^3]+)3\
// excludes non-movement/color codes
^3(\[[^a-zA-Z0-9@\?]+|[\(\)]).
// parses movement and color codes
^3([\[\]]([\d\?]+)?(;[\d\?]+)*)?(.)`)
It additionally does basic line-edit emulation, so backspace and other movement characters (like left arrow key) are parsed.
它还进行基本的行编辑模拟,因此解析退格和其他移动字符(如左箭头键)。
回答by AGipson
I don't have enough reputation to add a comment to the answergiven by Luke H, but I did want to share the regular expression that I've been using to eliminate all of the ASCII Escape Sequences.
我没有足够的声誉来为Luke H给出的答案添加评论,但我确实想分享我一直用来消除所有 ASCII 转义序列的正则表达式。
sed -r 's~\x01?(\x1B\(B)?\x1B\[([0-9;]*)?[JKmsu]\x02?~~g'
回答by pyjama
You can remove all non printable characters with this:
您可以使用以下命令删除所有不可打印的字符:
sed 's/[^[:print:]]//g'
sed 's/[^[:print:]]//g'
回答by kbulgrien
Tom Hale's answerleft unwanted codes, but was a good base to work from. Adding additional filtering cleared out leftover, unwanted codes:
Tom Hale 的回答留下了不需要的代码,但它是一个很好的工作基础。添加额外的过滤清除剩余的、不需要的代码:
sed -e "s,^[[[(][0-9;?]*[a-zA-Z],,g" \
-e "s/^[[[][0-9][0-9]*[@]//" \
-e "s/^[[=0-9]<[^>]*>//" \
-e "s/^[[)][0-9]//" \
-e "s/.^H//g" \
-e "s/^M//g" \
-e "s/^^H//" \
file.dirty > file.clean
As this was done on a non-GNU version of sed, where you see ^[
, ^H
, and ^M
, I used Ctrl-V <Esc>, Ctrl-V Ctrl-H, and Ctrl-V Ctrl-M respectively. The ^>
is literally a carat (^) and greater-than character, not Ctrl-<.
由于这是在非 GNU 版本的 sed 上完成的,您可以在其中看到^[
、^H
和^M
,因此我分别使用了 Ctrl-V <Esc>、Ctrl-V Ctrl-H 和 Ctrl-V Ctrl-M。的^>
实际上是一种符号(^)和大于字符,不是CTRL- <。
TERM=xterm was in use at the time.
当时正在使用 TERM=xterm。