bash 如何用 ASCII 替换 Unicode 字符

Question

提问by Sandeep Johal

I have the following command to replace Unicode characters with ASCII ones.

我有以下命令用 ASCII 字符替换 Unicode 字符。

sed -i 's/?/A/g'

The problem is ?isn't recognized by the sed command in my Unix environment so I'd assume you replace it with its hexadecimal value. What would the syntax look like if I were to use C3instead?

问题是?我的 Unix 环境中的 sed 命令无法识别问题，因此我假设您将其替换为十六进制值。如果我改用它，语法会是什么样的C3？

I'm using this command as a template for other characters i'd like to replace with blank spaces such as:

我将此命令用作其他字符的模板，我想用空格替换，例如：

sed -i 's/?/ /g'

Answer 1

回答by ajaaskel

It is possible to use hex values in "sed".

可以在“sed”中使用十六进制值。

echo "?" | hexdump -C
00000000  c3 83 0a                                          |...|
00000003

Ok, that character is two byte combination "c3 83". Let's replace it with single byte "A":

好的，那个字符是两个字节的组合“c3 83”。让我们用单字节“A”替换它：

echo "?" |sed 's/\xc3\x83/A/g'
A

Explanation: \x indicates for "sed" that a hex code follows.

说明：\x 表示“sed”后面跟有十六进制代码。

Answer 2

回答by midori

You can use iconv:

您可以使用 iconv：

iconv -f utf-8 -t ascii//translit

Answer 3

回答by midori

Try setting LANG=Cand then run it over the Unicode range:
echo "hi ? there ?" | LANG=C sed "s/[\x80-\xFF]//g"

尝试设置LANG=C，然后在 Unicode 范围内运行它：
echo "hi ? there ?" | LANG=C sed "s/[\x80-\xFF]//g"

Answer 4

回答by julp

There is also uconv, from ICU.

还有uconv，来自ICU。

Examples:

例子：

uconv -x "::NFD; [:Nonspacing Mark:] > ; ::NFC;": to remove accents
uconv -x "::Latin; ::Latin-ASCII;": for a transliteration latin/ascii
uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;": for a transliteration latin/ascii and removal of remaining code points > 0x7F
...

uconv -x "::NFD; [:Nonspacing Mark:] > ; ::NFC;": 去除重音
uconv -x "::Latin; ::Latin-ASCII;": 用于音译拉丁文/ascii
uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;": 用于音译拉丁文/ASCII 并删除剩余的代码点 > 0x7F
...

echo "à l'école ?" | uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;"gives: A l'ecole

echo "à l'école ?" | uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;"给出： A l'ecole

bash 如何用 ASCII 替换 Unicode 字符

提问by Sandeep Johal

回答by ajaaskel

回答by midori

回答by midori

回答by julp

相关推荐

最近更新

标签

bash 如何用 ASCII 替换 Unicode 字符

提问by Sandeep Johal

回答by ajaaskel

回答by midori

回答by midori

回答by julp

相关推荐

如何通过 bash 获得处理器使用的百分比？

Bash：从另一个时区获取日期和时间

bash 陷阱中断命令，但应在循环结束时退出

如何将 bash 数组格式化为 JSON 数组

相关推荐

最近更新

标签