bash 如何将文件名从 unicode 转换为 ascii

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3011569/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 22:11:37  来源:igfitidea点击:

How do I convert filenames from unicode to ascii

bashunicodescriptingascii

提问by zedwarth

I have a bunch of music files on a NTFS partition mounted on linux that have filenames with unicode characters. I'm having trouble writing a script to rename the files so that all of the file names use only ASCII characters. I think that using the iconvcommand should work, but I'm having trouble escaping the characters for the 'mv'command.

我在 Linux 上安装的 NTFS 分区上有一堆音乐文件,文件名带有 unicode 字符。我在编写脚本来重命名文件时遇到问题,以便所有文件名都只使用 ASCII 字符。我认为使用该iconv命令应该可以工作,但是我无法转义该'mv'命令的字符。

EDIT: It doesn't matter if there isn't a direct translieration for the unicode chars. I guess that i'll just replace those with a "?" character.

编辑:如果没有对 unicode 字符的直接转译并不重要。我想我会用“?”替换它们。特点。

采纳答案by Thanatos

I don't think iconvhas any character replacement facilities. This in Python might help:

我认为iconv没有任何角色替换设施。这在 Python 中可能会有所帮助:

#!/usr/bin/python
import sys

def unistrip(s):
    if isinstance(s, str):
        s = s.decode('utf-8')
    chars = []
    for i in s:
        if ord(i) > 0x7f:
            chars.append(u'?')
        else:
            chars.append(i)
    return u''.join(chars)

if __name__ == '__main__':
    print unistrip(sys.argv[1])

Then call as:

然后调用为:

$ ./unistrip.py "yikes__oh_look_a_file_火"
yikes_?_oh_look_a_file_?

Also:

还:

$ mv "yikes__oh_look_a_file_火" "`./unistrip.py "yikes__oh_look_a_file_火"`"

You might test it a bit first. For large move operations, generating a list of mvcommands (ie, write code to write a script) is advisable, as you can look over the move commands before telling them to execute.

你可以先测试一下。对于大型移动操作,建议生成mv命令列表(即编写代码以编写脚本),因为您可以在告诉它们执行之前查看移动命令。

回答by Hefnawi

Sometimes mvwill not be able to read the filename in a shell, so you can try the inodereference.

有时mv会无法在shell中读取文件名,因此您可以尝试inode参考。

To get the inode of a file:

要获取文件的 inode:

$ ls -il

$ ls -il

Output will be something like this:

输出将是这样的:

13377799 -rw-r--r--  1 draco  draco      11809 Apr 25 01:39 some_filename.ext
9340462  -rw-r--r--  1 draco  draco      81648 Apr 23 02:27 some_strange_filename.ext
9340480  -rw-r--r--  1 draco  draco       4717 Apr 23 03:54 yikes__oh_look_a_file_火

Then use findto get your file and perhaps using the python code by Thanatos:

然后用于find获取您的文件,并可能使用 Thanatos 的 python 代码:

$ find . -inum 9340480 -exec ./unistrip.py {} \;

$ find . -inum 9340480 -exec ./unistrip.py {} \;

You could also use the above command with iconvin a shell.

您也可以iconv在 shell 中使用上述命令。

Hope this helps someone out, and excuse me for any mistakes[first answer].

希望这对某人有所帮助,并请原谅我的任何错误[第一个答案]。

回答by Florian Diesch

convmvis a good Perl script to convert file name encodings. But it can't handle characters that aren't in the destination encoding.

convmv是一个很好的 Perl 脚本来转换文件名编码。但它无法处理不在目标编码中的字符。

You can change any character not in ASCII to '?' using the rename utility distributed with Perl:

您可以将任何非 ASCII 字符更改为 '?' 使用随 Perl 分发的重命名实用程序:

rename 's/[^ -~]/?/g' *

Unfortunately this replaces multi-byte characters with multiple '?'s. Depending on the Unicode encoding that is used and the characters involved changing the regex may help, e.g.

不幸的是,这用多个“?”替换了多字节字符。根据所使用的 Unicode 编码和所涉及的字符更改正则表达式可能会有所帮助,例如

rename 's/[^ -~]{2}/?/g' *

for 2-byte characters.

对于 2 字节字符。