bash 从文本文件中识别并删除特定的隐藏字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25778587/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 11:20:15  来源:igfitidea点击:

Identify and remove specific hidden characters from text file

bashunixsed

提问by p014k

I have a text file that contains several hidden characters. Using cat -vI am able to see that they include the following;

我有一个包含几个隐藏字符的文本文件。使用cat -v我可以看到它们包括以下内容;

^M

^[[A

^M

^[[A

There are also \ncharacters at the end of the line. I would like to be able to display these as well somehow.

\n行尾也有字符。我希望能够以某种方式显示这些。

Then I would like to be able to selectively cutand sedthese hidden characters. How would I go able accomplishing this?

然后我希望能够有选择地cutsed这些隐藏字符。我怎样才能做到这一点?

I've tried dos2unixbut that didn't help remove any of the ^Mcharacters. I've also tried sed s/^M//gwherein I pressed ctrl+v m.

我试过了,dos2unix但这并没有帮助删除任何^M字符。我也试过sed s/^M//g按下ctrl+vm



Raw data

原始数据

Output from cat -von the raw data, also available at: http://pastebin.com/Vk2i81JC

cat -v原始数据的输出,也可从以下网址获得:http: //pastebin.com/Vk2i81JC

^MCopying non-tried blocks... Pass 1 (forwards)^M^[[A^[[A^[[Arescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
^MFinished

Output wanted

想要的输出

Also available at: http://pastebin.com/wfDnrELm

也可在:http: //pastebin.com/wfDnrELm

rescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
Finished

回答by Ram

Try the below trcommand which is used to translate or delete characters. The below command removes all the characters other than the one specified in octal within the quotes

尝试以下tr用于翻译或删除字符的命令。下面的命令删除引号中八进制指定的字符以外的所有字符

octal \12 - new line(\n), octal \11 - TAB(^I), octal \40-\176 - are good characters.

八进制 \12 - 换行符 (\n),八进制 \11 - TAB(^I),八进制 \40-\176 - 都是很好的字符。

For a complete reference of octal values refer to this page: https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html

有关八进制值的完整参考,请参阅此页面:https: //courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html

tr -cd '-6' < org.txt > new.txt

The file new.txtwill contain the characters removed.

该文件new.txt将包含删除的字符。

To remove the characters between ^M and remove the unnecessary control characters use the below command

要删除 ^M 之间的字符并删除不必要的控制字符,请使用以下命令

sed "s/\r.*\r//g" org.txt | tr -cd '-6' > new.txt