Unicode 文件名到 python subprocess.call()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2595448/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Unicode filename to python subprocess.call()
提问by otrov
I'm trying to run subprocess.call() with unicode filename, and here is simplified problem:
我正在尝试使用 unicode 文件名运行 subprocess.call(),这里是一个简化的问题:
n = u'c:\windows\notepad.exe '
f = u'c:\temp\nèw.txt'
subprocess.call(n + f)
which raises famous error:
这引发了著名的错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8'
UnicodeEncodeError: 'ascii' 编解码器无法编码字符 u'\xe8'
Encoding to utf-8 produces wrong filename, and mbcs passes filename as new.txt without accent
编码为 utf-8 会产生错误的文件名,并且 mbcs 将文件名作为 new.txt 传递而没有重音
I just can't read any more on this confusing subject and spin in circle. I found here lot of answers for many different problems in past so I thought to join and ask for help myself
我只是无法再阅读这个令人困惑的主题并绕圈子旋转。我在这里找到了很多过去许多不同问题的答案,所以我想自己加入并寻求帮助
Thanks
谢谢
回答by RedOrav
I found a fine workaround, it's a bit messy, but it works.
我找到了一个很好的解决方法,它有点乱,但它有效。
subprocess.call is going to pass the text in its own encoding to the terminal, which might or not be the one it's expecting. Because you want to make it portable, you'll need to know the machine's encoding at runtime.
subprocess.call 将以自己的编码将文本传递给终端,这可能是也可能不是它所期望的。因为您想让它具有可移植性,所以您需要在运行时知道机器的编码。
The following
下列
notepad = 'C://Notepad.exe'
subprocess.call([notepad.encode(sys.getfilesystemencoding())])
attempts to figure out the current encoding and therefore applies the correct one to subprocess.call
尝试找出当前编码,因此将正确的编码应用于 subprocess.call
As a sidenote, I have also found that if you attempt to compose a string with the current directory, using
作为旁注,我还发现,如果您尝试使用当前目录组合一个字符串,请使用
os.cwd()
Python (or the OS, don't know) will mess up directories with accented characters. To prevent this I have found the following to work:
Python(或操作系统,不知道)会弄乱带有重音字符的目录。为了防止这种情况,我发现以下方法有效:
os.cwd().decode(sys.getfilesystemencoding())
Which is very similar to the solution above.
这与上面的解决方案非常相似。
Hope it helps.
希望能帮助到你。
回答by WGH
If your file exists, you can use short filename(aka 8.3 name). This name is defined for existent files, and should cause no trouble to non-Unicode aware programs when passed as argument.
如果您的文件存在,您可以使用短文件名(又名 8.3 名称)。此名称是为现有文件定义的,作为参数传递时,不会对非 Unicode 感知程序造成麻烦。
One way to obtain one (needs Pywin32to be installed):
一种获取方式(需要安装Pywin32):
import win32api
short_path = win32api.GetShortPathName(unicode_path)
Alternatively, you can also use ctypes
:
或者,您也可以使用ctypes
:
import ctypes
import ctypes.wintypes
ctypes.windll.kernel32.GetShortPathNameW.argtypes = [
ctypes.wintypes.LPCWSTR, # lpszLongPath
ctypes.wintypes.LPWSTR, # lpszShortPath
ctypes.wintypes.DWORD # cchBuffer
]
ctypes.windll.kernel32.GetShortPathNameW.restype = ctypes.wintypes.DWORD
buf = ctypes.create_unicode_buffer(1024) # adjust buffer size, if necessary
ctypes.windll.kernel32.GetShortPathNameW(unicode_path, buf, len(buf))
short_path = buf.value
回答by clahey
It appears that to make this work, the subprocess code would have to be modified to use a wide character version of CreateProcess (assuming that one exists). There's a PEP discussing the same change made for the file object at http://www.python.org/dev/peps/pep-0277/Perhaps you could research the Windows C calls and propose a similar change for subprocess.
似乎要使这项工作,必须修改子流程代码以使用 CreateProcess 的宽字符版本(假设存在)。在http://www.python.org/dev/peps/pep-0277/有一个 PEP 讨论了对文件对象所做的相同更改, 也许您可以研究 Windows C 调用并为子进程提出类似的更改。
回答by tzot
You can try opening the file as:
您可以尝试将文件打开为:
subprocess.call((n + f).encode("cp437"))
or whichever codepage chcp
reports as being used in a command prompt window. If you try to chcp 65001
as starbuck suggested, you'll have to edit the stdlib encodings\aliases.py file and add cp65001
as an alias to 'utf-8' beforehand. It's an open issue in the Python source.
或任何代码页chcp
报告为在命令提示符窗口中使用。如果您尝试chcp 65001
按照星巴克的建议进行操作,则必须cp65001
事先编辑 stdlib encodings\aliases.py 文件并添加为“utf-8”的别名。这是 Python 源代码中的一个未解决的问题。
UPDATE: since this is a multiple target scenario, before running such a command, make sure you run a single chcp
command first, analyse the output and retrieve the current "Command Prompt" (DOS) codepage. Subsequently, use the discovered codepage to encode the subprocess.call
argument.
更新:由于这是一个多目标场景,因此在运行此类命令之前,请确保先运行单个chcp
命令,分析输出并检索当前的“命令提示符”(DOS) 代码页。随后,使用发现的代码页对subprocess.call
参数进行编码。
回答by newtover
As ΤΖΩΤΖΙΟΥ and starbuck mentioned, the problem is with the console code page which is in your case 866 (in Russian localization of windows) and not 1251. Just run chcp
in console.
正如 ΤΖΩΤΖΙΟΥ 和星巴克提到的,问题在于控制台代码页,在您的情况下是 866(在 Windows 的俄罗斯本地化中)而不是 1251。只需chcp
在控制台中运行。
The problem is the same as when you want output unicode to Windows console. Unfortunatelly you will need at least to reqister and alias for unicode as 'cp866' in encodings\aliases.py (or do it programmatically on script start) and change the code page of the console to 65001 before running the notepad and setting it back afterwards.
问题与您希望将 unicode 输出到 Windows 控制台时相同。不幸的是,您至少需要在 encodings\aliases.py 中为 unicode 重新注册和别名为“cp866”(或在脚本启动时以编程方式进行)并在运行记事本之前将控制台的代码页更改为 65001 并在之后将其设置回来.
chcp 65001 & c:\WINDOWS\notepad.exe nèw.txt & chcp 866
By the way, to be able to run the command in console and see the filename correctly, you will need to change the console font to Lucida Console in console window properties.
顺便说一句,为了能够在控制台中运行命令并正确查看文件名,您需要在控制台窗口属性中将控制台字体更改为 Lucida Console。
It might be even worse: you will need to change the code page of the current process. To do that, you will need either run chcp 65001 right before the script start or use pywin32 to do it within the script.
情况可能更糟:您将需要更改当前进程的代码页。为此,您需要在脚本启动之前运行 chcp 65001 或使用 pywin32 在脚本中执行此操作。
回答by newtover
I don't have an answer for you, but I've done a fair amount of research into this problem. Python converts all output (including system calls) to the same character as the terminal it is running in. Windows terminals use code pages for character mapping; the default code page is 437, but it can be changed with the chcp command. chcp 65001
will theoretically change the code page to utf-8, but as far as I know python doesn't know what to do with this, so you're SOL.
我没有答案,但我已经对这个问题进行了大量研究。Python 将所有输出(包括系统调用)转换为与其运行的终端相同的字符。Windows 终端使用代码页进行字符映射;默认代码页为 437,但可以使用 chcp 命令进行更改。chcp 65001
理论上会将代码页更改为 utf-8,但据我所知,python 不知道如何处理这个,所以你是 SOL。
回答by Rishi Sharma
Use os.startfile
with the operation edit. This will work better as it will open the default application for your extension.
使用os.startfile
与操作编辑。这会更好地工作,因为它将为您的扩展程序打开默认应用程序。