从文本文件中删除 unicode 字符 - sed ,其他 bash/shell 方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8562354/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove unicode characters from textfiles - sed , other bash/shell methods
提问by alvas
How do I remove unicode characters from a bunch of text files on the terminal? I've tried this but it didn't work:
如何从终端上的一堆文本文件中删除 unicode 字符?我试过这个,但没有用:
sed 'g/\u'U+200E'//' -i *.txt
I need to remove these unicodes from the textfiles
我需要从文本文件中删除这些 unicodes
U+0091 - sort of weird "control" space
U+0092 - same sort of weird "control" space
A0 - non-space break
U+200E - left to right mark
回答by kev
clear all non-ascii chars of file.txt
清除所有非 ascii 字符 file.txt
$ iconv -c -f utf-8 -t ascii file.txt
$ strings file.txt
回答by Micha? ?rajer
If you want to remove ONLY particular characters and you have python, you can:
如果您只想删除特定字符并且您有 python,您可以:
CHARS=$(python -c 'print u"\u0091\u0092\u00a0\u200E".encode("utf8")')
sed 's/['"$CHARS"']//g' < /tmp/utf8_input.txt > /tmp/ascii_output.txt
回答by choroba
For utf-8 encoding of unicode, you can use this regular expression for sed:
对于 unicode 的 utf-8 编码,您可以将这个正则表达式用于 sed:
sed 's/\xc2\x91\|\xc2\x92\|\xc2\xa0\|\xe2\x80\x8e//g'
回答by Micha? ?rajer
Use iconv:
使用 iconv:
iconv -f utf8 -t ascii//TRANSLIT < /tmp/utf8_input.txt > /tmp/ascii_output.txt
This will translate characters like "?" into "S" (most similar looking ones).
这将翻译像“?”这样的字符 变成“S”(最相似的那些)。
回答by ma11hew28
Convert Swift files from utf-8 to ascii:
将 Swift 文件从 utf-8 转换为 ascii:
for file in *.swift; do
iconv -f utf-8 -t ascii "$file" > "$file".tmp
mv -f "$file".tmp "$file"
done