Windows 上的 Unicode 文件名，使用 Python 和 subprocess.Popen()

Question

提问by Norman

Why does the following occur:

为什么会出现以下情况：

>>> u'\u0308'.encode('mbcs')   #UMLAUT
'\xa8'
>>> u'\u041A'.encode('mbcs')   #CYRILLIC CAPITAL LETTER KA
'?'
>>>

I have a Python application accepting filenames from the operating system. It works for some international users, but not others.

我有一个 Python 应用程序接受来自操作系统的文件名。它适用于某些国际用户，但不适用于其他用户。

For example, this unicode filename: u'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'

例如，这个unicode文件名：u'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'

will not encode with Windows 'mbcs' encoding (the one used by the filesystem, returned by sys.getfilesystemencoding()). I get '???????', indicating the encoder fails on those characters. But this makes no sense, since the filename came from the user to begin with.

不会使用 Windows 'mbcs' 编码（文件系统使用的编码，由 sys.getfilesystemencoding() 返回）进行编码。我得到'???????'，表明编码器在这些字符上失败。但这毫无意义，因为文件名一开始就来自用户。

Update: Here's the background to my reasons behind this... I have a file on my system with the name in Cyrillic. I want to call subprocess.Popen() with that file as an argument. Popen won't handle unicode. Normally I can get away with encoding the argument with the codec given by sys.getfilesystemencoding(). In this case it won't work

更新：这是我背后的原因的背景......我的系统上有一个文件，名称为西里尔文。我想用该文件作为参数调用 subprocess.Popen() 。Popen 不会处理 unicode。通常，我可以使用 sys.getfilesystemencoding() 给出的编解码器对参数进行编码。在这种情况下它不会工作

Answer 1

回答by kxr

In Py3K - at least from Python 3.2 - subprocess.Popenand sys.argvwork consistently with (default unicode) strings on Windows. CreateProcessWand GetCommandLineWare used obviously.

在 Py3K 中 - 至少来自 Python 3.2 -subprocess.Popen并sys.argv在 Windows 上与（默认 unicode）字符串一致工作。CreateProcessW并且GetCommandLineW明显使用。

In Python - up to v2.7.2 at least - subprocess.Popenis buggy with Unicode arguments. It sticks to CreateProcessA(while os.*are consistent with Unicode). And shlex.splitcreates additional nonsense.

在 Python 中 - 至少到 v2.7.2 - subprocess.PopenUnicode 参数有问题。它坚持CreateProcessA（同时os.*与 Unicode 一致）。并shlex.split制造额外的废话。

Pywin32's win32process.CreateProcessalso doesn't auto-switch to the W version, nor is there a win32process.CreateProcessW. Same with GetCommandLine. Thus ctypes.windll.kernel32.CreateProcessW...needs to be used. The subprocess module perhaps should be fixed regarding this issue.

Pywin32win32process.CreateProcess也不会自动切换到 W 版本，也没有win32process.CreateProcessW. 与GetCommandLine. 因此ctypes.windll.kernel32.CreateProcessW...需要使用。关于这个问题，可能应该修复子流程模块。

UTF8 on argv[1:]with private apps remains clumsy on a Unicode OS. Such tricks may be legal for 8-bit "Latin1" string OSes like Linux.

UTF8argv[1:]与私人应用程序在 Unicode 操作系统上仍然很笨拙。这些技巧对于像 Linux 这样的 8 位“Latin1”字符串操作系统可能是合法的。

UPDATEvaab has created a patched version of Popenfor Python 2.7 which fixes the issue.
See https://gist.github.com/vaab/2ad7051fc193167f15f85ef573e54eb9
Blog post with explanations: http://vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/

更新vaab 已经Popen为 Python 2.7创建了一个补丁版本来解决这个问题。
见https://gist.github.com/vaab/2ad7051fc193167f15f85ef573e54eb9
博客文章和解释：http: //vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue -with-subprocesss-popen/

Answer 2

回答by vaab

DISCLAIMER:I'm the author of the fix mentionned in the following.

免责声明：我是以下提到的修复程序的作者。

To support unicode command line on windows with python 2.7, you can use this patchto subprocess.Popen(..)

为支持Unicode命令行窗口与Python 2.7，你可以使用这个补丁来subprocess.Popen(..)

The situation

情况

Python 2 support of unicode command line on windows is very poor.

Python 2 在 windows 上对 unicode 命令行的支持很差。

Are severly bugged:

被严重窃听：

issuing the unicode command line to the system from the caller side (via subprocess.Popen(..)),
and reading the current command line unicode arguments from the callee side (via sys.argv),

从调用方（通过subprocess.Popen(..)）向系统发出 unicode 命令行，
并从被调用方（通过sys.argv）读取当前的命令行 unicode 参数，

It is acknowledged and won't be fixedon Python 2. These are fixed in Python 3.

它是公认的，不会在 Python 2 上修复。这些在 Python 3 中已修复。

Technical Reasons

技术原因

In Python 2, windows implementation of subprocess.Popen(..)and sys.argvuse the non unicode ready windows systems call CreateProcess(..)(see python code, and MSDN doc of CreateProcess) and does not use GetCommandLineW(..)for sys.argv.

在 Python 2 中，windows 实现subprocess.Popen(..)和sys.argv使用非 unicode 就绪的 windows 系统调用CreateProcess(..)（参见 python代码和CreateProcess 的MSDN文档）并且不使用GetCommandLineW(..)for sys.argv。

In Python 3, windows implementation of subprocess.Popen(..)make use of the correct windows systems calls CreateProcessW(..)starting from 3.0(see codein 3.0) and sys.argvuses GetCommandLineW(..)starting from 3.3(see codein 3.3).

在Python 3，Windows实现的subprocess.Popen(..)利用正确的Windows系统调用CreateProcessW(..)从开始3.0（见代码中3.0），并sys.argv使用GetCommandLineW(..)从开始3.3（见代码中3.3）。

How is it fixed

它是如何固定的

The given patchwill leverage ctypesmodule to call C windows system CreateProcessW(..)directly. It proposes a new fixed Popenobject by overriding private method Popen._execute_child(..)and private function _subprocess.CreateProcess(..)to setup and use CreateProcessW(..)from windows system lib in a way that mimics as much as possible how it is done in Python 3.6.

给定的补丁将利用ctypes模块CreateProcessW(..)直接调用 C windows 系统。它Popen通过覆盖私有方法Popen._execute_child(..)和私有函数_subprocess.CreateProcess(..)来提出一个新的固定对象，以CreateProcessW(..)尽可能多地模仿在 Python 中完成的方式从 Windows 系统库中设置和使用3.6。

How to use it

如何使用它

How to use the given patch is demonstrated with this blog post explanation. It additionally shows how to read the current processes sys.argvwith another fix.

这篇博文解释演示了如何使用给定的补丁。它还显示了如何sys.argv使用另一个修复程序读取当前进程。

Answer 3

回答by John Machin

Docs for sys.getfilesystemencoding()say that for Windows NT and later, file names are natively Unicode. If you have a valid unicode file name, why would you bother encoding it using mbcs?

sys.getfilesystemencoding() 的文档说，对于 Windows NT 及更高版本，文件名本身就是 Unicode。如果您有一个有效的 unicode 文件名，为什么还要费心使用 mbcs 对其进行编码？

Docs for codecs modulesay that mbcs encodes using "ANSI code page" (which will differ depending on user's locale) so if the locale doesn't use Cyrillic characters, splat.

编解码器模块的文档说 mbcs 使用“ANSI 代码页”（这将根据用户的语言环境而有所不同）进行编码，因此如果语言环境不使用西里尔字符，则 splat.

Edit: So your process is calling subprocess.Popen(). If your invoked process is under your control, the two processes ahould be able to agree to use UTF-8 as the Unicode Transport Format. Otherwise, you may need to ask on the pywin32 mailing list. In any case, edit your question to state the degree of control you have over the invoked process.

编辑：所以你的进程正在调用 subprocess.Popen()。如果您调用的进程在您的控制之下，那么这两个进程应该能够同意使用 UTF-8 作为 Unicode 传输格式。否则，您可能需要在 pywin32 邮件列表上询问。在任何情况下，编辑您的问题以说明您对调用过程的控制程度。

Answer 4

回答by tzot

If you need to pass the name of an existing file, then you might have a better chance of success by passing the 8.3 version of the Unicode filename.

如果您需要传递现有文件的名称，那么通过传递 8.3 版本的 Unicode 文件名可能更有可能成功。

You need to have the pywin32package installed, then you can do:

您需要安装pywin32软件包，然后您可以执行以下操作：

>>> import win32api
>>> win32api.GetShortPathName(u"C:\Program Files")
'C:\PROGRA~1'

I believe these short filenames use only ASCII characters, and therefore you should be able to use them as arguments to a command line.

我相信这些短文件名仅使用 ASCII 字符，因此您应该能够将它们用作命令行的参数。

Should you need to specify also filenames to be created, you can create them with zero size in advance from Python using Unicode filenames, and pass the short name of the file as an argument.

如果您还需要指定要创建的文件名，您可以使用 Unicode 文件名从 Python 提前创建零大小，并将文件的短名称作为参数传递。

UPDATE: user bogdan says correctly that 8.3 filename generation can be disabled (I had it disabled, too, when I had Windows XP on my laptop), so you can't rely on them. So, as another more far-fetched approach when working on NTFS volumes, one can hard linkthe Unicode filenames to plain ASCII ones; pass the ASCII filenames to an external command and delete them afterwards.

更新：用户 bogdan 正确地说可以禁用 8.3 文件名生成（我也禁用了它，当我的笔记本电脑上有 Windows XP 时），所以你不能依赖它们。因此，作为处理 NTFS 卷时的另一种更牵强的方法，可以将 Unicode 文件名硬链接到纯 ASCII 文件名；将 ASCII 文件名传递给外部命令，然后将其删除。

Answer 5

回答by Florian Winter

With Python 3, just don't encode the string. Windows filenames are natively Unicode, and all strings in Python 3 are Unicode, and Popen uses the Unicode version of the CreateProcessWindows API function.

使用 Python 3，只是不要对字符串进行编码。Windows 文件名本机是 Unicode，Python 3 中的所有字符串都是 Unicode，Popen 使用CreateProcessWindows API 函数的 Unicode 版本。

With Python 2.7, the easiest solution is to use the third-party module https://pypi.org/project/subprocessww/. There is no "built-in" solution to get full Unicode support (independent of system locale), and the maintainers of Python 2.7 consider this a feature request rather than a bugfix, so this is not going to change.

对于 Python 2.7，最简单的解决方案是使用第三方模块https://pypi.org/project/subprocessww/。没有“内置”解决方案来获得完整的 Unicode 支持（独立于系统区域设置），Python 2.7 的维护者认为这是一个功能请求而不是错误修复，所以这不会改变。

For a detailed technical explanation of why things are as they are, please see the other answers.

有关事物为何如此的详细技术解释，请参阅其他答案。

Windows 上的 Unicode 文件名，使用 Python 和 subprocess.Popen()

提问by Norman

回答by kxr

回答by vaab

回答by John Machin

回答by tzot

回答by Florian Winter

相关推荐

最近更新

标签

Windows 上的 Unicode 文件名，使用 Python 和 subprocess.Popen()

提问by Norman

回答by kxr

回答by vaab

回答by John Machin

回答by tzot

回答by Florian Winter

相关推荐

windows 来自 PHP exec 的 taskkill

windows 我在哪里下载：VBRun60sp6.exe（安装 Visual Basic 6.0 SP6 运行时文件）？

windows 如何在 16 位 MASM 程序集 x86 中创建睡眠功能？

源洞察力，如 Windows 中的免费源代码浏览器和编辑器

相关推荐

最近更新

标签