macos 我如何告诉 Python sys.argv 是 Unicode?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5113618/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 07:44:54  来源:igfitidea点击:

How do I tell Python that sys.argv is in Unicode?

pythonmacosunicodeterminal

提问by vy32

Here is a little program:

这是一个小程序:

import sys

f = sys.argv[1]
print type(f)
print u"f=%s" % (f)

Here is my running of the program:

这是我对程序的运行:

$ python x.py 'Recent/????? ???????.LNK'
<type 'str'>
Traceback (most recent call last):
  File "x.py", line 5, in <module>
    print u"f=%s" % (f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 7: ordinal not in range(128)
$ 

The problem is that sys.argv[1] is thinking that it's getting an ascii string, which it can't convert to Unicode. But I'm using a Mac with a full Unicode-aware Terminal, so x.pyis actually getting a Unicode string. How do I tell Python that sys.argv[] is Unicode and not Ascii? Failing that, how do I convert ASCII (that has unicode inside it) into Unicode? The obvious conversions don't work.

问题是 sys.argv[1] 认为它正在获取一个 ascii 字符串,它无法转换为 Unicode。但是我使用的是带有完整 Unicode 感知终端的 Mac,因此x.py实际上获得了一个 Unicode 字符串。我如何告诉 Python sys.argv[] 是 Unicode 而不是 Ascii?如果做不到这一点,我该如何将 ASCII(其中包含 unicode)转换为 Unicode?明显的转换不起作用。

回答by jfs

The UnicodeDecodeErrorerror you see is due to you're mixing the Unicode string u"f=%s"and the sys.argv[1]bytestring:

UnicodeDecodeError您看到的错误是由于您混合了 Unicode 字符串u"f=%s"sys.argv[1]字节字符串:

  • both bytestrings:

    $ python -c'import sys; print "f=%s" % (sys.argv[1],)' 'Recent/????? ???????'
    

    This passes bytes transparently from/to your terminal. It works for any encoding.

  • both Unicode:

    $ python -c'import sys; print u"f=%s" % (sys.argv[1].decode("utf-8"),)' 'Rec..
    

    Here you should replace 'utf-8'by the encoding your terminal uses. You might use sys.getfilesystemencoding()here if the terminal is not Unicode-aware.

  • 两个字节串:

    $ python -c'import sys; print "f=%s" % (sys.argv[1],)' 'Recent/????? ???????'
    

    这会从/向您的终端透明地传递字节。它适用于任何编码。

  • 两个 Unicode:

    $ python -c'import sys; print u"f=%s" % (sys.argv[1].decode("utf-8"),)' 'Rec..
    

    在这里,您应该替换'utf-8'为您的终端使用的编码。sys.getfilesystemencoding()如果终端不支持 Unicode,您可以在此处使用。

Both commands produce the same output:

这两个命令产生相同的输出:

f=Recent/????? ???????

In general you should convert bytestrings that you consider to be text to Unicode as soon as possible.

通常,您应该尽快将您认为是文本的字节串转换为 Unicode。

回答by sherpya

sys.argv = map(lambda arg: arg.decode(sys.stdout.encoding), sys.argv)

or you can pick encoding from locale.getdefaultlocale()[1]

或者您可以从中选择编码 locale.getdefaultlocale()[1]

回答by Andreas Jung

Command line parameters are passed into Python as byte string using the encoding as used on the shell used for started Python. So there is no way for having commandline parameters passed into Python as unicode string other than converting parameters yourselfto unicode inside your application.

命令行参数作为字节字符串传递到 Python 中,使用的编码与用于启动 Python 的 shell 相同。因此,除了在应用程序中自己将参数转换为 unicode 之外,没有办法将命令行参数作为 unicode 字符串传递到 Python 中。

回答by mkelley33

  1. sys.argv is never "in Unicode"; it's encoded for sure, but Unicode is not an encoding, rather it is a set of code points (numbers), where each number uniquely represents a character. http://www.unicode.org/standard/WhatIsUnicode.html

  2. Go to Terminal.app > Terminal > Preferences > Settings > Character encoding, and select UTF-8 from the drop-down list.

  3. Also, the default Python that ships with Mac OS X has one flaw with regards to Unicode: its built using the deprecated UCS-2 by default; see: http://webamused.wordpress.com/2011/01/31/building-64-bit-python-python-org-using-ucs-4-on-mac-os-x-10-6-6-snow-leopard/

  1. sys.argv 永远不会“在 Unicode 中”;它确实经过编码,但 Unicode 不是一种编码,而是一组代码点(数字),其中每个数字唯一地代表一个字符。http://www.unicode.org/standard/WhatIsUnicode.html

  2. 转到 Terminal.app > Terminal > Preferences > Settings > Character encoding,然后从下拉列表中选择 UTF-8。

  3. 此外,Mac OS X 附带的默认 Python 在 Unicode 方面存在一个缺陷:默认情况下它使用已弃用的 UCS-2 构建;见:http: //webamused.wordpress.com/2011/01/31/building-64-bit-python-python-org-using-ucs-4-on-mac-os-x-10-6-6-雪豹/

回答by DmitrySandalov

try either:

尝试:

f = sys.argv[1].decode('utf-8')

or:

或者:

f = unicode(sys.argv[1], 'utf-8')