如何使用 linux shell 脚本删除 ^[ 以及文件中的所有转义序列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6534556/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 04:51:21  来源:igfitidea点击:

How to remove ^[, and all of the escape sequences in a file using linux shell scripting

linuxshellscripting

提问by hasan

We want to remove ^[, and all of the escape sequences.

我们要删除 ^[, 以及所有转义序列。

sed is not working and is giving us this error:

sed 不工作,并给我们这个错误:

$ sed 's/^[//g' oldfile > newfile; mv newfile oldfile;
sed: -e expression #1, char 7: unterminated `s' command

$ sed -i '' -e 's/^[//g' somefile
sed: -e expression #1, char 7: unterminated `s' command

回答by sehe

Are you looking for ansifilter?

您在寻找ansifilter吗?



Two things you can do: enter the literal escape (in bash:)

您可以做两件事:输入文字转义(在 bash 中:)

Using keyboard entry:

使用键盘输入:

sed 's/Ctrl-vEsc//g'

alternatively

或者

sed 's/Ctrl-vCtrl-[//g'

Or you can use character escapes:

或者您可以使用字符转义:

sed 's/\x1b//g'

or for all control characters:

或对于所有控制字符

sed 's/[\x01-\x1F\x7F]//g' # NOTE: zaps TAB character too!

回答by gronostaj

I've stumbled upon this post when looking for a way to strip extra formatting from man pages. ansifilter did it, but it was far from desired result (for example all previously-bold characters were duplicated, like SSYYNNOOPPSSIISS).

我在寻找一种从手册页中去除额外格式的方法时偶然发现了这篇文章。ansifilter 做到了,但与预期的结果相差甚远(例如,所有以前的粗体字符都被复制了,例如SSYYNNOOPPSSIISS)。

For that task the correct command would be col -bx, for example:

对于该任务,正确的命令是col -bx,例如:

groff -man -Tascii fopen.3 | col -bx > fopen.3.txt

(source)

(来源)

回答by Luke H

I managed with the following for my purposes, but this doesn't include all possible ANSI escapes:

我出于我的目的使用了以下方法,但这不包括所有可能的ANSI 转义

sed -r s/\x1b\[[0-9;]*m?//g

This removes mcommands, but for all escapes (as commented by @lethalman) use:

这会删除m命令,但对于所有转义(如@lethalman 所评论),请使用:

sed -r s/\x1b\[[^@-~]*[@-~]//g

Also see "Python regex to match VT100 escape sequences".

另请参阅“匹配 VT100 转义序列的 Python 正则表达式”。

There is also a table of common escape sequences.

还有一个常见转义序列表

回答by sdaau

Just a note; let's say you have a file like this (such line endings are generated by gitremote reports):

只是一个笔记;假设您有一个这样的文件(此类行尾由git远程报告生成):

echo -e "remote: * 27625a8 (HEAD, master) 1st git commit\x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: Current branch master is up to date.\x1b[K" > chartest.txt

In binary, this looks like this:

在二进制中,这看起来像这样:

$ cat chartest.txt | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
00000050  65 3a 20 1b 5b 4b 0a 72  65 6d 6f 74 65 3a 20 1b  |e: .[K.remote: .|
00000060  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000070  65 6d 6f 74 65 3a 20 43  75 72 72 65 6e 74 20 62  |emote: Current b|
00000080  72 61 6e 63 68 20 6d 61  73 74 65 72 20 69 73 20  |ranch master is |
00000090  75 70 20 74 6f 20 64 61  74 65 2e 1b 5b 4b 0a     |up to date..[K.|
0000009f

It is visible that githere adds the sequence 0x1b0x5b0x4bbefore the line ending (0x0a).

可以看出,git这里0x1b0x5b0x4b在行尾 ( 0x0a)之前添加了序列。

Note that - while you can match the 0x1bwith a literal format \x1bin sed, you CANNOT do the same for 0x5b, which represents the left square bracket [:

请注意 - 虽然您可以将0x1b\x1bsed 中的文字格式匹配,但您不能对0x5b代表左方括号的执行相同操作[

$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
sed: -e expression #1, char 13: Invalid regular expression

You might think you can escape the representation with an extra backslash \- which ends up as \\x5b; but while that "passes" - it doesn't match anything as intended:

您可能认为您可以使用额外的反斜杠来转义表示\- 最终为\\x5b; 但是虽然“通过” - 它与预期的任何内容都不匹配:

$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
...

So if you want to match this character, apparently you mustwrite it as escaped left square bracket, that is \[- the rest of the values can than be entered with escaped \xnotation:

所以如果你想匹配这个字符,显然你必须把它写成转义的左方括号,也就是说\[- 其余的值可以用转义\x符号输入:

$ cat chartest.txt | sed 's/\x1b\[\x4b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 0a  | 1st git commit.|
00000030  72 65 6d 6f 74 65 3a 20  0a 72 65 6d 6f 74 65 3a  |remote: .remote:|
00000040  20 0a 72 65 6d 6f 74 65  3a 20 0a 72 65 6d 6f 74  | .remote: .remot|
00000050  65 3a 20 0a 72 65 6d 6f  74 65 3a 20 0a 72 65 6d  |e: .remote: .rem|
00000060  6f 74 65 3a 20 43 75 72  72 65 6e 74 20 62 72 61  |ote: Current bra|
00000070  6e 63 68 20 6d 61 73 74  65 72 20 69 73 20 75 70  |nch master is up|
00000080  20 74 6f 20 64 61 74 65  2e 0a                    | to date..|
0000008a

回答by soorajmr

ansi2txt command (part of kbtin package) seems to be doing the job perfectly on Ubuntu.

ansi2txt 命令(kbtin 包的一部分)似乎在 Ubuntu 上完美地完成了这项工作。

回答by Tom Hale

commandlinefu gives the correct answerwhich strips ANSI colours as well as movement commands:

commandlinefu 给出了正确的答案,它去除了 ANSI 颜色以及移动命令:

sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"
sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"

回答by lunixbochs

I built vtcleanfor this. It strips escape sequences using these regular expressions in order (explained in regex.txt):

我为此构建了vtclean。它按顺序使用这些正则表达式去除转义序列(在regex.txt 中解释):

// handles long-form RGB codes
^3](\d+);([^3]+)3\

// excludes non-movement/color codes
^3(\[[^a-zA-Z0-9@\?]+|[\(\)]).

// parses movement and color codes
^3([\[\]]([\d\?]+)?(;[\d\?]+)*)?(.)`)

It additionally does basic line-edit emulation, so backspace and other movement characters (like left arrow key) are parsed.

它还进行基本的行编辑模拟,因此解析退格和其他移动字符(如左箭头键)。

回答by AGipson

I don't have enough reputation to add a comment to the answergiven by Luke H, but I did want to share the regular expression that I've been using to eliminate all of the ASCII Escape Sequences.

我没有足够的声誉来为Luke H给出的答案添加评论,但我确实想分享我一直用来消除所有 ASCII 转义序列的正则表达式。

sed -r 's~\x01?(\x1B\(B)?\x1B\[([0-9;]*)?[JKmsu]\x02?~~g'

回答by pyjama

You can remove all non printable characters with this:

您可以使用以下命令删除所有不可打印的字符:

sed 's/[^[:print:]]//g'

sed 's/[^[:print:]]//g'

回答by kbulgrien

Tom Hale's answerleft unwanted codes, but was a good base to work from. Adding additional filtering cleared out leftover, unwanted codes:

Tom Hale 的回答留下了不需要的代码,但它是一个很好的工作基础。添加额外的过滤清除剩余的、不需要的代码:

sed -e "s,^[[[(][0-9;?]*[a-zA-Z],,g" \
    -e "s/^[[[][0-9][0-9]*[@]//" \
    -e "s/^[[=0-9]<[^>]*>//" \
    -e "s/^[[)][0-9]//" \
    -e "s/.^H//g" \
    -e "s/^M//g" \
    -e "s/^^H//" \
        file.dirty > file.clean

As this was done on a non-GNU version of sed, where you see ^[, ^H, and ^M, I used Ctrl-V <Esc>, Ctrl-V Ctrl-H, and Ctrl-V Ctrl-M respectively. The ^>is literally a carat (^) and greater-than character, not Ctrl-<.

由于这是在非 GNU 版本的 sed 上完成的,您可以在其中看到^[^H^M,因此我分别使用了 Ctrl-V <Esc>、Ctrl-V Ctrl-H 和 Ctrl-V Ctrl-M。的^>实际上是一种符号(^)和大于字符,不是CTRL- <。

TERM=xterm was in use at the time.

当时正在使用 TERM=xterm。