从文本文件中删除 unicode 字符 - sed ，其他 bash/shell 方法

Question

提问by alvas

How do I remove unicode characters from a bunch of text files on the terminal? I've tried this but it didn't work:

如何从终端上的一堆文本文件中删除 unicode 字符？我试过这个，但没有用：

sed 'g/\u'U+200E'//' -i *.txt

I need to remove these unicodes from the textfiles

我需要从文本文件中删除这些 unicodes

U+0091 - sort of weird "control" space
U+0092 - same sort of weird "control" space
A0 - non-space break
U+200E - left to right mark

Answer 1

回答by kev

clear all non-ascii chars of file.txt

清除所有非 ascii 字符 file.txt

$ iconv -c -f utf-8 -t ascii file.txt
$ strings file.txt

Answer 2

回答by Micha? ?rajer

If you want to remove ONLY particular characters and you have python, you can:

如果您只想删除特定字符并且您有 python，您可以：

CHARS=$(python -c 'print u"\u0091\u0092\u00a0\u200E".encode("utf8")')
sed 's/['"$CHARS"']//g' < /tmp/utf8_input.txt > /tmp/ascii_output.txt

Answer 3

回答by choroba

For utf-8 encoding of unicode, you can use this regular expression for sed:

对于 unicode 的 utf-8 编码，您可以将这个正则表达式用于 sed：

sed 's/\xc2\x91\|\xc2\x92\|\xc2\xa0\|\xe2\x80\x8e//g'

Answer 4

回答by Micha? ?rajer

Use iconv:

使用 iconv：

iconv -f utf8 -t ascii//TRANSLIT < /tmp/utf8_input.txt > /tmp/ascii_output.txt

This will translate characters like "?" into "S" (most similar looking ones).

这将翻译像“？”这样的字符变成“S”（最相似的那些）。

Answer 5

回答by ma11hew28

Convert Swift files from utf-8 to ascii:

将 Swift 文件从 utf-8 转换为 ascii：

for file in *.swift; do
    iconv -f utf-8 -t ascii "$file" > "$file".tmp
    mv -f "$file".tmp "$file"
done

swift auto completion not working in Xcode6-Beta

快速自动完成在 Xcode6-Beta 中不起作用

从文本文件中删除 unicode 字符 - sed ，其他 bash/shell 方法

提问by alvas

回答by kev

回答by Micha? ?rajer

回答by choroba

回答by Micha? ?rajer

回答by ma11hew28

相关推荐

最近更新

标签

从文本文件中删除 unicode 字符 - sed ，其他 bash/shell 方法

提问by alvas

回答by kev

回答by Micha? ?rajer

回答by choroba

回答by Micha? ?rajer

回答by ma11hew28

相关推荐

如何使用 Bash 制作 Echo 服务器？

bash 如何使用“cmp”比较两个二进制文件并找到它们不同的所有字节偏移量？

在 Bash 中减去两个变量

bash 如何在bash脚本中正确循环？

相关推荐

最近更新

标签