bash 如何用 ASCII 替换 Unicode 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27052194/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 11:50:25  来源:igfitidea点击:

How to replace Unicode characters with ASCII

bashshellunixunicodesed

提问by Sandeep Johal

I have the following command to replace Unicode characters with ASCII ones.

我有以下命令用 ASCII 字符替换 Unicode 字符。

sed -i 's/?/A/g'

The problem is ?isn't recognized by the sed command in my Unix environment so I'd assume you replace it with its hexadecimal value. What would the syntax look like if I were to use C3instead?

问题是?我的 Unix 环境中的 sed 命令无法识别问题,因此我假设您将其替换为十六进制值。如果我改用它,语法会是什么样的C3

I'm using this command as a template for other characters i'd like to replace with blank spaces such as:

我将此命令用作其他字符的模板,我想用空格替换,例如:

sed -i 's/?/ /g'

sed -i 's/?/ /g'

回答by ajaaskel

It is possible to use hex values in "sed".

可以在“sed”中使用十六进制值。

echo "?" | hexdump -C
00000000  c3 83 0a                                          |...|
00000003

Ok, that character is two byte combination "c3 83". Let's replace it with single byte "A":

好的,那个字符是两个字节的组合“c3 83”。让我们用单字节“A”替换它:

echo "?" |sed 's/\xc3\x83/A/g'
A

Explanation: \x indicates for "sed" that a hex code follows.

说明:\x 表示“sed”后面跟有十六进制代码。

回答by midori

You can use iconv:

您可以使用 iconv:

iconv -f utf-8 -t ascii//translit

回答by midori

Try setting LANG=Cand then run it over the Unicode range:
echo "hi ? there ?" | LANG=C sed "s/[\x80-\xFF]//g"

尝试设置LANG=C,然后在 Unicode 范围内运行它:
echo "hi ? there ?" | LANG=C sed "s/[\x80-\xFF]//g"

回答by julp

There is also uconv, from ICU.

还有uconv,来自ICU

Examples:

例子:

  • uconv -x "::NFD; [:Nonspacing Mark:] > ; ::NFC;": to remove accents
  • uconv -x "::Latin; ::Latin-ASCII;": for a transliteration latin/ascii
  • uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;": for a transliteration latin/ascii and removal of remaining code points > 0x7F
  • ...
  • uconv -x "::NFD; [:Nonspacing Mark:] > ; ::NFC;": 去除重音
  • uconv -x "::Latin; ::Latin-ASCII;": 用于音译拉丁文/ascii
  • uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;": 用于音译拉丁文/ASCII 并删除剩余的代码点 > 0x7F
  • ...

echo "à l'école ?" | uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;" gives: A l'ecole

echo "à l'école ?" | uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;" 给出: A l'ecole