bash 尝试从 UNIX 文件中删除不可打印的字符(垃圾值)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34412754/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Trying to remove non-printable charaters(junk values) from a UNIX file
提问by Pranav
I am trying to remove non-printable character (for e.g. ^@
) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time.
I tried using
我正在尝试^@
从我的文件中的记录中删除不可打印的字符(例如)。由于文件中的记录量太大,使用 cat 不是一种选择,因为循环花费了太多时间。我尝试使用
sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\{}|;'\'':",.\/<>?]//g' FILENAME
but still the ^@
characters are not removed.
Also I tried using
但仍然^@
没有删除字符。我也尝试使用
awk '{ sub("[^a-zA-Z0-9\"!@#$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE
but it also did not help.
但这也无济于事。
Can anybody suggest some alternative way to remove non-printable characters?
有人可以建议一些替代方法来删除不可打印的字符吗?
Used tr -cd
but it is removing accented characters. But they are required in the file.
使用过,tr -cd
但它正在删除重音字符。但它们在文件中是必需的。
回答by Tom Fenech
Perhaps you could go with the complement of [:print:]
, which contains all printable characters:
也许您可以使用[:print:]
包含所有可打印字符的的补码:
tr -cd '[:print:]' < file > newfile
If your version of tr
doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):
如果您的版本tr
不支持多字节字符(似乎很多不支持),这对我来说适用于 GNU sed(使用 UTF-8 语言环境设置):
sed 's/[^[:print:]]//g' file
回答by Pranav
Remove all control characters first:
首先删除所有控制字符:
tr -dc 'sed -i 's/[^@a-zA-Z 0-9`~!@#$%^&*()_+\[\]\{}|;'\'':",.\/<>?]//g' newfile
7-12-50-6' < file > newfile
Then try your string:
然后试试你的字符串:
strings -1 file... > outputfile
I believe that what you see ^@
is in fact a zero value \0
.
The tr
filter from above will remove those as well.
我相信你看到^@
的实际上是一个零值\0
。上面
的tr
过滤器也将删除这些。
回答by derek
seems to work
似乎工作