bash 从命令行将文本转换为 7 位 ASCII
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/212745/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert text to 7-bit ASCII from command-line
提问by Alexander Gladysh
I'm on OS X 10.5.5 (though it does not matter much I guess)
我在 OS X 10.5.5 上(虽然我猜这并不重要)
I have a set of text files with fancy characters like double backquotes, ellipsises ("...") in one character etc.
我有一组带有花哨字符的文本文件,例如双反引号、一个字符中的省略号(“...”)等。
I need to convert these files to good old plain 7-bit ASCII, preferably without losing character meaning (that is, convert those ellipses to three periods, backquotes to usual "s etc.).
我需要将这些文件转换为旧的纯 7 位 ASCII,最好不要丢失字符含义(即,将这些省略号转换为三个句点,将反引号转换为通常的“s 等”)。
Please advise some smart command-line (bash) tool/script to do that.
请建议一些智能命令行(bash)工具/脚本来做到这一点。
采纳答案by Josh Lee
The Elinksweb browser will convert Unicode entities to their ASCII equivalents, giving things like "--" for "—" and "..." for "…", etc. There is a python module python-elinkswhich uses the same conversion table, and it would be trivial to turn it into a shell filter, like this:
该ElinksWeb浏览器将统一实体转换成的ASCII码值,使之类的东西“ - ”号“ - ”和“...”‘...’的,等有一个Python模块中的python-elinks使用相同的转换表,把它变成一个壳过滤器是微不足道的,像这样:
#!/usr/bin/env python
import elinks
import sys
for line in sys.stdin:
line = line.decode('utf-8')
sys.stdout.write(line.encode('ASCII', 'elinks'))
回答by unwind
iconvshould do it, as far as I know. Not 100% certain about how it handles conversions where one input character should/could become several output characters, such as with the ellipsis example ... Something to try!
iconv应该这样做,据我所知。不能 100% 确定它如何处理一个输入字符应该/可能成为多个输出字符的转换,例如省略号示例......尝试一下!
Update: I did try it, and it seems it doesn't work. It fails, possibly since it doesn't know how to express ellipsis (the test character I used) in a "smaller" encoding. Converting from UTF-8 to UTF-16 went fine. :/ Still, iconv might be worth investigating further.
更新:我确实尝试过,但似乎不起作用。它失败了,可能是因为它不知道如何在“较小”的编码中表达省略号(我使用的测试字符)。从 UTF-8 转换为 UTF-16 很顺利。:/ 尽管如此,iconv 可能值得进一步研究。
回答by unwind
回答by glennkentwell
I have used iconv to convert a file from UTF-16LE (little-endian as I found out by trial and error) that was created by TextPad in Windows into ASCII on OSX like this:
我已经使用 iconv 将文件从 UTF-16LE(我通过反复试验发现的小端)转换为 OSX 上的 ASCII 文件,如下所示:
cat utf16file.txt |iconv -f UTF-16LE -t ASCII > asciifile.txt
You can pipe through hexdump as well to view the characters and make sure you're getting the right output, the terminal knows how to interpret UTF-16 and displays it properly so you can't tell just but doing 'cat' on the file:
您也可以通过 hexdump 进行管道传输以查看字符并确保获得正确的输出,终端知道如何解释 UTF-16 并正确显示它,因此您不能只知道在文件上执行“cat” :
cat utf16file.txt | iconv -f UTF-16LE -t ASCII | hexdump -C
This shows the layout with the hex char codes and the ASCII characters to the right-hand side, and you can try different encodings in the -f "from" parameter to figure out what you're dealing with.
这显示了右侧的十六进制字符代码和 ASCII 字符的布局,您可以在 -f "from" 参数中尝试不同的编码,以确定您正在处理的内容。
Use 'iconv -l' to list the character sets iconv can use on your system.
使用 'iconv -l' 列出 iconv 可以在您的系统上使用的字符集。
回答by Jonathan Leffler
There was a question yesterday or the day before about file renaming, and I showed a Perl script rename.plthat would be usable for the task. The problem area is knowing how the odd characters are encoded, and devising the correct sequence of transliterations. I'd probably do it with an adaptation of that script that did all the mappings sequentially. Doing it one character at a time would be unduly fiddly.
昨天或前天有一个关于文件重命名的问题,我展示了一个rename.pl可用于该任务的 Perl 脚本。问题在于知道如何编码奇数字符,并设计正确的音译序列。我可能会使用该脚本的改编版来完成,该脚本按顺序执行所有映射。一次只做一个角色会过于繁琐。
Question was: How to rename with prefix/suffix
问题是:如何使用前缀/后缀重命名

