bash 从文本文件中识别并删除特定的隐藏字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25778587/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Identify and remove specific hidden characters from text file
提问by p014k
I have a text file that contains several hidden characters. Using cat -v
I am able to see that they include the following;
我有一个包含几个隐藏字符的文本文件。使用cat -v
我可以看到它们包括以下内容;
^M
^[[A
^M
^[[A
There are also \n
characters at the end of the line. I would like to be able to display these as well somehow.
\n
行尾也有字符。我希望能够以某种方式显示这些。
Then I would like to be able to selectively cut
and sed
these hidden characters. How would I go able accomplishing this?
然后我希望能够有选择地cut
和sed
这些隐藏字符。我怎样才能做到这一点?
I've tried dos2unix
but that didn't help remove any of the ^M
characters. I've also tried sed s/^M//g
wherein I pressed ctrl+v m.
我试过了,dos2unix
但这并没有帮助删除任何^M
字符。我也试过sed s/^M//g
按下ctrl+vm。
Raw data
原始数据
Output from cat -v
on the raw data,
also available at: http://pastebin.com/Vk2i81JC
cat -v
原始数据的输出,也可从以下网址获得:http: //pastebin.com/Vk2i81JC
^MCopying non-tried blocks... Pass 1 (forwards)^M^[[A^[[A^[[Arescued: 0 B, errsize: 0 B, current rate: 0 B/s
ipos: 0 B, errors: 0, average rate: 0 B/s
opos: 0 B, run time: 1 s, successful read: 1 s ago
^MFinished
Output wanted
想要的输出
Also available at: http://pastebin.com/wfDnrELm
也可在:http: //pastebin.com/wfDnrELm
rescued: 0 B, errsize: 0 B, current rate: 0 B/s
ipos: 0 B, errors: 0, average rate: 0 B/s
opos: 0 B, run time: 1 s, successful read: 1 s ago
Finished
回答by Ram
Try the below tr
command which is used to translate or delete characters. The below command removes all the characters other than the one specified in octal within the quotes
尝试以下tr
用于翻译或删除字符的命令。下面的命令删除引号中八进制指定的字符以外的所有字符
octal \12 - new line(\n), octal \11 - TAB(^I), octal \40-\176 - are good characters.
八进制 \12 - 换行符 (\n),八进制 \11 - TAB(^I),八进制 \40-\176 - 都是很好的字符。
For a complete reference of octal values refer to this page: https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html
有关八进制值的完整参考,请参阅此页面:https: //courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html
tr -cd '-6' < org.txt > new.txt
The file new.txt
will contain the characters removed.
该文件new.txt
将包含删除的字符。
To remove the characters between ^M and remove the unnecessary control characters use the below command
要删除 ^M 之间的字符并删除不必要的控制字符,请使用以下命令
sed "s/\r.*\r//g" org.txt | tr -cd '-6' > new.txt