Windows 上的 Unicode 文件名,使用 Python 和 subprocess.Popen()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1910275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Unicode filenames on Windows with Python & subprocess.Popen()
提问by Norman
Why does the following occur:
为什么会出现以下情况:
>>> u'\u0308'.encode('mbcs') #UMLAUT
'\xa8'
>>> u'\u041A'.encode('mbcs') #CYRILLIC CAPITAL LETTER KA
'?'
>>>
I have a Python application accepting filenames from the operating system. It works for some international users, but not others.
我有一个 Python 应用程序接受来自操作系统的文件名。它适用于某些国际用户,但不适用于其他用户。
For example, this unicode filename: u'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'
例如,这个unicode文件名:u'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'
will not encode with Windows 'mbcs' encoding (the one used by the filesystem, returned by sys.getfilesystemencoding()). I get '???????', indicating the encoder fails on those characters. But this makes no sense, since the filename came from the user to begin with.
不会使用 Windows 'mbcs' 编码(文件系统使用的编码,由 sys.getfilesystemencoding() 返回)进行编码。我得到'???????',表明编码器在这些字符上失败。但这毫无意义,因为文件名一开始就来自用户。
Update: Here's the background to my reasons behind this... I have a file on my system with the name in Cyrillic. I want to call subprocess.Popen() with that file as an argument. Popen won't handle unicode. Normally I can get away with encoding the argument with the codec given by sys.getfilesystemencoding(). In this case it won't work
更新:这是我背后的原因的背景......我的系统上有一个文件,名称为西里尔文。我想用该文件作为参数调用 subprocess.Popen() 。Popen 不会处理 unicode。通常,我可以使用 sys.getfilesystemencoding() 给出的编解码器对参数进行编码。在这种情况下它不会工作
回答by kxr
In Py3K - at least from Python 3.2 - subprocess.Popen
and sys.argv
work consistently with (default unicode) strings on Windows. CreateProcessW
and GetCommandLineW
are used obviously.
在 Py3K 中 - 至少来自 Python 3.2 -subprocess.Popen
并sys.argv
在 Windows 上与(默认 unicode)字符串一致工作。CreateProcessW
并且GetCommandLineW
明显使用。
In Python - up to v2.7.2 at least - subprocess.Popen
is buggy with Unicode arguments. It sticks to CreateProcessA
(while os.*
are consistent with Unicode). And shlex.split
creates additional nonsense.
在 Python 中 - 至少到 v2.7.2 - subprocess.Popen
Unicode 参数有问题。它坚持CreateProcessA
(同时os.*
与 Unicode 一致)。并shlex.split
制造额外的废话。
Pywin32's win32process.CreateProcess
also doesn't auto-switch to the W version, nor is there a win32process.CreateProcessW
. Same with GetCommandLine
.
Thus ctypes.windll.kernel32.CreateProcessW...
needs to be used.
The subprocess module perhaps should be fixed regarding this issue.
Pywin32win32process.CreateProcess
也不会自动切换到 W 版本,也没有win32process.CreateProcessW
. 与GetCommandLine
. 因此ctypes.windll.kernel32.CreateProcessW...
需要使用。关于这个问题,可能应该修复子流程模块。
UTF8 on argv[1:]
with private apps remains clumsy on a Unicode OS. Such tricks may be legal for 8-bit "Latin1" string OSes like Linux.
UTF8argv[1:]
与私人应用程序在 Unicode 操作系统上仍然很笨拙。这些技巧对于像 Linux 这样的 8 位“Latin1”字符串操作系统可能是合法的。
UPDATEvaab has created a patched version of Popen
for Python 2.7 which fixes the issue.
See https://gist.github.com/vaab/2ad7051fc193167f15f85ef573e54eb9
Blog post with explanations: http://vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/
更新vaab 已经Popen
为 Python 2.7创建了一个补丁版本来解决这个问题。
见https://gist.github.com/vaab/2ad7051fc193167f15f85ef573e54eb9
博客文章和解释:http: //vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue -with-subprocesss-popen/
回答by vaab
DISCLAIMER:I'm the author of the fix mentionned in the following.
免责声明:我是以下提到的修复程序的作者。
To support unicode command line on windows with python 2.7, you can use
this patchto subprocess.Popen(..)
为支持Unicode命令行窗口与Python 2.7,你可以使用
这个补丁来subprocess.Popen(..)
The situation
情况
Python 2 support of unicode command line on windows is very poor.
Python 2 在 windows 上对 unicode 命令行的支持很差。
Are severly bugged:
被严重窃听:
issuing the unicode command line to the system from the caller side (via
subprocess.Popen(..)
),and reading the current command line unicode arguments from the callee side (via
sys.argv
),
从调用方(通过
subprocess.Popen(..)
)向系统发出 unicode 命令行,并从被调用方(通过
sys.argv
)读取当前的命令行 unicode 参数,
It is acknowledged and won't be fixedon Python 2. These are fixed in Python 3.
它是公认的,不会在 Python 2 上修复。这些在 Python 3 中已修复。
Technical Reasons
技术原因
In Python 2, windows implementation of subprocess.Popen(..)
and sys.argv
use the non unicode ready windows systems call CreateProcess(..)
(see python code, and MSDN doc of CreateProcess) and does not use GetCommandLineW(..)
for sys.argv
.
在 Python 2 中,windows 实现subprocess.Popen(..)
和sys.argv
使用非 unicode 就绪的 windows 系统调用CreateProcess(..)
(参见 python代码和CreateProcess 的MSDN文档)并且不使用GetCommandLineW(..)
for sys.argv
。
In Python 3, windows implementation of subprocess.Popen(..)
make use of the correct windows systems calls CreateProcessW(..)
starting from 3.0
(see codein 3.0
) and sys.argv
uses GetCommandLineW(..)
starting from 3.3
(see codein 3.3
).
在Python 3,Windows实现的subprocess.Popen(..)
利用正确的Windows系统调用CreateProcessW(..)
从开始3.0
(见代码中3.0
),并sys.argv
使用GetCommandLineW(..)
从开始3.3
(见代码中3.3
)。
How is it fixed
它是如何固定的
The given patchwill leverage ctypes
module to call C windows
system CreateProcessW(..)
directly. It proposes a new fixed Popen
object by overriding private method Popen._execute_child(..)
and private function _subprocess.CreateProcess(..)
to setup and use CreateProcessW(..)
from windows system lib in a way that mimics as much as possible how it is done in Python 3.6
.
给定的补丁将利用ctypes
模块CreateProcessW(..)
直接调用 C windows 系统。它Popen
通过覆盖私有方法Popen._execute_child(..)
和私有函数_subprocess.CreateProcess(..)
来提出一个新的固定对象,以CreateProcessW(..)
尽可能多地模仿在 Python 中完成的方式从 Windows 系统库中设置和使用3.6
。
How to use it
如何使用它
How to use the given patch is demonstrated with this blog post explanation. It additionally shows how to read the current processes
sys.argv
with another fix.
回答by John Machin
Docs for sys.getfilesystemencoding()say that for Windows NT and later, file names are natively Unicode. If you have a valid unicode file name, why would you bother encoding it using mbcs?
sys.getfilesystemencoding() 的文档说,对于 Windows NT 及更高版本,文件名本身就是 Unicode。如果您有一个有效的 unicode 文件名,为什么还要费心使用 mbcs 对其进行编码?
Docs for codecs modulesay that mbcs encodes using "ANSI code page" (which will differ depending on user's locale) so if the locale doesn't use Cyrillic characters, splat.
编解码器模块的文档说 mbcs 使用“ANSI 代码页”(这将根据用户的语言环境而有所不同)进行编码,因此如果语言环境不使用西里尔字符,则 splat.
Edit: So your process is calling subprocess.Popen(). If your invoked process is under your control, the two processes ahould be able to agree to use UTF-8 as the Unicode Transport Format. Otherwise, you may need to ask on the pywin32 mailing list. In any case, edit your question to state the degree of control you have over the invoked process.
编辑:所以你的进程正在调用 subprocess.Popen()。如果您调用的进程在您的控制之下,那么这两个进程应该能够同意使用 UTF-8 作为 Unicode 传输格式。否则,您可能需要在 pywin32 邮件列表上询问。在任何情况下,编辑您的问题以说明您对调用过程的控制程度。
回答by tzot
If you need to pass the name of an existing file, then you might have a better chance of success by passing the 8.3 version of the Unicode filename.
如果您需要传递现有文件的名称,那么通过传递 8.3 版本的 Unicode 文件名可能更有可能成功。
You need to have the pywin32package installed, then you can do:
您需要安装pywin32软件包,然后您可以执行以下操作:
>>> import win32api
>>> win32api.GetShortPathName(u"C:\Program Files")
'C:\PROGRA~1'
I believe these short filenames use only ASCII characters, and therefore you should be able to use them as arguments to a command line.
我相信这些短文件名仅使用 ASCII 字符,因此您应该能够将它们用作命令行的参数。
Should you need to specify also filenames to be created, you can create them with zero size in advance from Python using Unicode filenames, and pass the short name of the file as an argument.
如果您还需要指定要创建的文件名,您可以使用 Unicode 文件名从 Python 提前创建零大小,并将文件的短名称作为参数传递。
UPDATE: user bogdan says correctly that 8.3 filename generation can be disabled (I had it disabled, too, when I had Windows XP on my laptop), so you can't rely on them. So, as another more far-fetched approach when working on NTFS volumes, one can hard linkthe Unicode filenames to plain ASCII ones; pass the ASCII filenames to an external command and delete them afterwards.
更新:用户 bogdan 正确地说可以禁用 8.3 文件名生成(我也禁用了它,当我的笔记本电脑上有 Windows XP 时),所以你不能依赖它们。因此,作为处理 NTFS 卷时的另一种更牵强的方法,可以将 Unicode 文件名硬链接到纯 ASCII 文件名;将 ASCII 文件名传递给外部命令,然后将其删除。
回答by Florian Winter
With Python 3, just don't encode the string. Windows filenames are natively Unicode, and all strings in Python 3 are Unicode, and Popen uses the Unicode version of the CreateProcess
Windows API function.
使用 Python 3,只是不要对字符串进行编码。Windows 文件名本机是 Unicode,Python 3 中的所有字符串都是 Unicode,Popen 使用CreateProcess
Windows API 函数的 Unicode 版本。
With Python 2.7, the easiest solution is to use the third-party module https://pypi.org/project/subprocessww/. There is no "built-in" solution to get full Unicode support (independent of system locale), and the maintainers of Python 2.7 consider this a feature request rather than a bugfix, so this is not going to change.
对于 Python 2.7,最简单的解决方案是使用第三方模块https://pypi.org/project/subprocessww/。没有“内置”解决方案来获得完整的 Unicode 支持(独立于系统区域设置),Python 2.7 的维护者认为这是一个功能请求而不是错误修复,所以这不会改变。
For a detailed technical explanation of why things are as they are, please see the other answers.
有关事物为何如此的详细技术解释,请参阅其他答案。