bash 如何用 ASCII 替换 Unicode 字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27052194/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to replace Unicode characters with ASCII
提问by Sandeep Johal
I have the following command to replace Unicode characters with ASCII ones.
我有以下命令用 ASCII 字符替换 Unicode 字符。
sed -i 's/?/A/g'
The problem is ?
isn't recognized by the sed command in my Unix environment so I'd assume you replace it with its hexadecimal value. What would the syntax look like if I were to use C3
instead?
问题是?
我的 Unix 环境中的 sed 命令无法识别问题,因此我假设您将其替换为十六进制值。如果我改用它,语法会是什么样的C3
?
I'm using this command as a template for other characters i'd like to replace with blank spaces such as:
我将此命令用作其他字符的模板,我想用空格替换,例如:
sed -i 's/?/ /g'
sed -i 's/?/ /g'
回答by ajaaskel
It is possible to use hex values in "sed".
可以在“sed”中使用十六进制值。
echo "?" | hexdump -C
00000000 c3 83 0a |...|
00000003
Ok, that character is two byte combination "c3 83". Let's replace it with single byte "A":
好的,那个字符是两个字节的组合“c3 83”。让我们用单字节“A”替换它:
echo "?" |sed 's/\xc3\x83/A/g'
A
Explanation: \x indicates for "sed" that a hex code follows.
说明:\x 表示“sed”后面跟有十六进制代码。
回答by midori
You can use iconv:
您可以使用 iconv:
iconv -f utf-8 -t ascii//translit
回答by midori
Try setting LANG=C
and then run it over the Unicode range:echo "hi ? there ?" | LANG=C sed "s/[\x80-\xFF]//g"
尝试设置LANG=C
,然后在 Unicode 范围内运行它:echo "hi ? there ?" | LANG=C sed "s/[\x80-\xFF]//g"
回答by julp
There is also uconv
, from ICU.
还有uconv
,来自ICU。
Examples:
例子:
uconv -x "::NFD; [:Nonspacing Mark:] > ; ::NFC;"
: to remove accentsuconv -x "::Latin; ::Latin-ASCII;"
: for a transliteration latin/asciiuconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;"
: for a transliteration latin/ascii and removal of remaining code points > 0x7F- ...
uconv -x "::NFD; [:Nonspacing Mark:] > ; ::NFC;"
: 去除重音uconv -x "::Latin; ::Latin-ASCII;"
: 用于音译拉丁文/asciiuconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;"
: 用于音译拉丁文/ASCII 并删除剩余的代码点 > 0x7F- ...
echo "à l'école ?" | uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;"
gives: A l'ecole
echo "à l'école ?" | uconv -x "::Latin; ::Latin-ASCII; ([^\x00-\x7F]) > ;"
给出: A l'ecole