在 mac bash 终端中使用 shell 命令时出现“非法字节序列”错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18953667/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 06:39:20  来源:igfitidea点击:

"Illegal Byte sequence" error while using shell commands in mac bash terminal

bashshellunixseduniq

提问by Abhineet Prasad

Getting "illegal byte sequence" error while trying to extract non English characters from a large file in MacOS bash shell. This is the script that I am trying to use:

尝试从 MacOS bash shell 中的大文件中提取非英文字符时出现“非法字节序列”错误。这是我尝试使用的脚本:

sed 's/[][a-z,0-9,A-Z,!@#$%^&*(){}":/_-|. -][\;''=?]*//g' <  >Abhineet_extract1.txt;
sed 's/\(.\)/\
/g' <Abhineet_extract1.txt | sort | uniq |tr -d '\n' >&1;
rm Abhineet_extract1.txt;

and here is the error that I am getting:

这是我得到的错误:

uniq: stdin: Illegal byte sequence

'+?

'+?

回答by devnull

It seems that a UTF-8 locale is causing Illegal byte sequence.

似乎是 UTF-8 语言环境导致Illegal byte sequence.

Instead say:

而是说:

LC_CTYPE=C your_command

man localesays:

man locale说:

   These environment variables affect each locale categories for all
   locale-aware programs:

   LC_CTYPE

           Character classification and case conversion.