如何使用 bash 工具搜索非 ASCII 字符？

Question

提问by Jonas Stein

I have a large text file that contains a few unicode characters that make LaTeX crash. How can I find non-ASCII characters in a file with sed, and the like in a Linux bash?

我有一个大文本文件，其中包含一些使 LaTeX 崩溃的 unicode 字符。如何在 Linux bash 中使用 sed 等查找文件中的非 ASCII 字符？

Answer 1

回答by pixelbeat

Try:

尝试：

nonascii() { LANG=C grep --color=always '[^ -~]\+'; }

Which can be used like:

可以像这样使用：

printf '?TF8\n' | nonascii

Within []^means "not". So [^ -~]means characters not between space and ~. So excluding control chars, this matches non ASCII characters, and is a more portable though slightly less accurate version of [^\x00-\x7f]below. The \+means 1 or moreand will get multibye characters to have a color shown around the complete character(s), rather than interspersed in each byte, thus corrupting the multibyte sequence

内的[]^意思是“不是”。所以[^ -~]意味着字符不在空格和 ~ 之间。所以不包括控制字符，这匹配非 ASCII 字符，并且是一个更便携但稍微不太准确的[^\x00-\x7f]下面的版本。的\+手段1 or more和将得到multibye字符具有围绕完整的字符（一个或多个）中所示的颜色，而不是散布在每个字节，从而破坏多字节序列

Answer 2

回答by kev

Try this command:

试试这个命令：

grep -P '[^\x00-\x7f]' file

如何使用 bash 工具搜索非 ASCII 字符？

提问by Jonas Stein

回答by pixelbeat

回答by kev

相关推荐

最近更新

标签

如何使用 bash 工具搜索非 ASCII 字符？

提问by Jonas Stein

回答by pixelbeat

回答by kev

相关推荐

bash 检查数组中的索引或键的最简单方法？

使用 Bash 将当前目录保存在变量中？

bash 在bash中动态创建数组

bash Shell脚本中的十六进制转十进制

相关推荐

最近更新

标签