Unicode 文件名到 python subprocess.call()

Question

提问by otrov

I'm trying to run subprocess.call() with unicode filename, and here is simplified problem:

我正在尝试使用 unicode 文件名运行 subprocess.call()，这里是一个简化的问题：

n = u'c:\windows\notepad.exe '
f = u'c:\temp\nèw.txt'

subprocess.call(n + f)

which raises famous error:

这引发了著名的错误：

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8'

UnicodeEncodeError: 'ascii' 编解码器无法编码字符 u'\xe8'

Encoding to utf-8 produces wrong filename, and mbcs passes filename as new.txt without accent

编码为 utf-8 会产生错误的文件名，并且 mbcs 将文件名作为 new.txt 传递而没有重音

I just can't read any more on this confusing subject and spin in circle. I found here lot of answers for many different problems in past so I thought to join and ask for help myself

我只是无法再阅读这个令人困惑的主题并绕圈子旋转。我在这里找到了很多过去许多不同问题的答案，所以我想自己加入并寻求帮助

Thanks

谢谢

Answer 1

回答by RedOrav

I found a fine workaround, it's a bit messy, but it works.

我找到了一个很好的解决方法，它有点乱，但它有效。

subprocess.call is going to pass the text in its own encoding to the terminal, which might or not be the one it's expecting. Because you want to make it portable, you'll need to know the machine's encoding at runtime.

subprocess.call 将以自己的编码将文本传递给终端，这可能是也可能不是它所期望的。因为您想让它具有可移植性，所以您需要在运行时知道机器的编码。

The following

下列

notepad = 'C://Notepad.exe'
subprocess.call([notepad.encode(sys.getfilesystemencoding())])

attempts to figure out the current encoding and therefore applies the correct one to subprocess.call

尝试找出当前编码，因此将正确的编码应用于 subprocess.call

As a sidenote, I have also found that if you attempt to compose a string with the current directory, using

作为旁注，我还发现，如果您尝试使用当前目录组合一个字符串，请使用

os.cwd()

Python (or the OS, don't know) will mess up directories with accented characters. To prevent this I have found the following to work:

Python（或操作系统，不知道）会弄乱带有重音字符的目录。为了防止这种情况，我发现以下方法有效：

os.cwd().decode(sys.getfilesystemencoding())

Which is very similar to the solution above.

这与上面的解决方案非常相似。

Hope it helps.

希望能帮助到你。

Answer 2

回答by WGH

If your file exists, you can use short filename(aka 8.3 name). This name is defined for existent files, and should cause no trouble to non-Unicode aware programs when passed as argument.

如果您的文件存在，您可以使用短文件名（又名 8.3 名称）。此名称是为现有文件定义的，作为参数传递时，不会对非 Unicode 感知程序造成麻烦。

One way to obtain one (needs Pywin32to be installed):

一种获取方式（需要安装Pywin32）：

import win32api
short_path = win32api.GetShortPathName(unicode_path)

Alternatively, you can also use ctypes:

或者，您也可以使用ctypes：

import ctypes
import ctypes.wintypes

ctypes.windll.kernel32.GetShortPathNameW.argtypes = [
    ctypes.wintypes.LPCWSTR, # lpszLongPath
    ctypes.wintypes.LPWSTR, # lpszShortPath
    ctypes.wintypes.DWORD # cchBuffer
]
ctypes.windll.kernel32.GetShortPathNameW.restype = ctypes.wintypes.DWORD

buf = ctypes.create_unicode_buffer(1024) # adjust buffer size, if necessary
ctypes.windll.kernel32.GetShortPathNameW(unicode_path, buf, len(buf))

short_path = buf.value

Answer 3

回答by clahey

It appears that to make this work, the subprocess code would have to be modified to use a wide character version of CreateProcess (assuming that one exists). There's a PEP discussing the same change made for the file object at http://www.python.org/dev/peps/pep-0277/Perhaps you could research the Windows C calls and propose a similar change for subprocess.

似乎要使这项工作，必须修改子流程代码以使用 CreateProcess 的宽字符版本（假设存在）。在http://www.python.org/dev/peps/pep-0277/有一个 PEP 讨论了对文件对象所做的相同更改，也许您可以研究 Windows C 调用并为子进程提出类似的更改。

Answer 4

回答by tzot

You can try opening the file as:

您可以尝试将文件打开为：

subprocess.call((n + f).encode("cp437"))

or whichever codepage chcpreports as being used in a command prompt window. If you try to chcp 65001as starbuck suggested, you'll have to edit the stdlib encodings\aliases.py file and add cp65001as an alias to 'utf-8' beforehand. It's an open issue in the Python source.

或任何代码页chcp报告为在命令提示符窗口中使用。如果您尝试chcp 65001按照星巴克的建议进行操作，则必须cp65001事先编辑 stdlib encodings\aliases.py 文件并添加为“utf-8”的别名。这是 Python 源代码中的一个未解决的问题。

UPDATE: since this is a multiple target scenario, before running such a command, make sure you run a single chcpcommand first, analyse the output and retrieve the current "Command Prompt" (DOS) codepage. Subsequently, use the discovered codepage to encode the subprocess.callargument.

更新：由于这是一个多目标场景，因此在运行此类命令之前，请确保先运行单个chcp命令，分析输出并检索当前的“命令提示符”(DOS) 代码页。随后，使用发现的代码页对subprocess.call参数进行编码。

Answer 5

回答by newtover

As ΤΖΩΤΖΙΟΥ and starbuck mentioned, the problem is with the console code page which is in your case 866 (in Russian localization of windows) and not 1251. Just run chcpin console.

正如 ΤΖΩΤΖΙΟΥ 和星巴克提到的，问题在于控制台代码页，在您的情况下是 866（在 Windows 的俄罗斯本地化中）而不是 1251。只需chcp在控制台中运行。

The problem is the same as when you want output unicode to Windows console. Unfortunatelly you will need at least to reqister and alias for unicode as 'cp866' in encodings\aliases.py (or do it programmatically on script start) and change the code page of the console to 65001 before running the notepad and setting it back afterwards.

问题与您希望将 unicode 输出到 Windows 控制台时相同。不幸的是，您至少需要在 encodings\aliases.py 中为 unicode 重新注册和别名为“cp866”（或在脚本启动时以编程方式进行）并在运行记事本之前将控制台的代码页更改为 65001 并在之后将其设置回来.

chcp 65001 & c:\WINDOWS\notepad.exe nèw.txt & chcp 866

By the way, to be able to run the command in console and see the filename correctly, you will need to change the console font to Lucida Console in console window properties.

顺便说一句，为了能够在控制台中运行命令并正确查看文件名，您需要在控制台窗口属性中将控制台字体更改为 Lucida Console。

It might be even worse: you will need to change the code page of the current process. To do that, you will need either run chcp 65001 right before the script start or use pywin32 to do it within the script.

情况可能更糟：您将需要更改当前进程的代码页。为此，您需要在脚本启动之前运行 chcp 65001 或使用 pywin32 在脚本中执行此操作。

Answer 6

回答by newtover

I don't have an answer for you, but I've done a fair amount of research into this problem. Python converts all output (including system calls) to the same character as the terminal it is running in. Windows terminals use code pages for character mapping; the default code page is 437, but it can be changed with the chcp command. chcp 65001will theoretically change the code page to utf-8, but as far as I know python doesn't know what to do with this, so you're SOL.

我没有答案，但我已经对这个问题进行了大量研究。Python 将所有输出（包括系统调用）转换为与其运行的终端相同的字符。Windows 终端使用代码页进行字符映射；默认代码页为 437，但可以使用 chcp 命令进行更改。chcp 65001理论上会将代码页更改为 utf-8，但据我所知，python 不知道如何处理这个，所以你是 SOL。

Answer 7

回答by Rishi Sharma

Use os.startfilewith the operation edit. This will work better as it will open the default application for your extension.

使用os.startfile与操作编辑。这会更好地工作，因为它将为您的扩展程序打开默认应用程序。

Unicode 文件名到 python subprocess.call()

提问by otrov

回答by RedOrav

回答by WGH

回答by clahey

回答by tzot

回答by newtover

回答by newtover

回答by Rishi Sharma

相关推荐

最近更新

标签

Unicode 文件名到 python subprocess.call()

提问by otrov

回答by RedOrav

回答by WGH

回答by clahey

回答by tzot

回答by newtover

回答by newtover

回答by Rishi Sharma

相关推荐

python 是否有多线程 map() 函数？

浏览器模拟 - Python

Python 中的输出替代方案

python Django 查询过滤器中的参数“name__icontains”和“description__icontains”是什么意思？

相关推荐

最近更新

标签

python Django 查询过滤器中的参数“nameicontains”和“descriptionicontains”是什么意思？