Python os.stat 和 unicode 文件名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2076708/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python os.stat and unicode file names
提问by interstar
In my Django application, a user has uploaded a file with a unicode character in the name.
在我的 Django 应用程序中,用户上传了一个名称中包含 unicode 字符的文件。
When I'm downloading files, I'm calling :
当我下载文件时,我打电话给:
os.path.exists(media)
to test that the file is there. This, in turn, seems to call
测试文件是否存在。反过来,这似乎要求
st = os.stat(path)
Which then blows up with the error :
然后出现错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xcf' in position 92: ordinal not in range(128)
UnicodeEncodeError: 'ascii' 编解码器无法对位置 92 中的字符 u'\xcf' 进行编码:序号不在范围内 (128)
What can I do about this? Is there an option to path.exists to handle it?
我该怎么办?是否有 path.exists 的选项来处理它?
Update : Actually, all I had to do was encode the argument to exists, ie.
更新:实际上,我所要做的就是将参数编码为exists,即。
os.path.exists(media.encode('utf-8')
Thanks everyone who answered.
感谢所有回答的人。
回答by Glenn Maynard
I'm assuming you're in Unix. If not, please remember to say which OS you're in.
我假设您使用的是 Unix。如果不是,请记住说明您在哪个操作系统中。
Make sure your locale is set to UTF-8. All modern Linux systems do this by default, usually by setting the environment variable LANG to "en_US.UTF-8", or another language. Also, make sure your filenames are encoded in UTF-8.
确保您的语言环境设置为 UTF-8。默认情况下,所有现代 Linux 系统都会执行此操作,通常是将环境变量 LANG 设置为“en_US.UTF-8”或其他语言。此外,请确保您的文件名以 UTF-8 编码。
With that set, there's no need to mess with encodings to access files in any language, even in Python 2.x.
有了这个集合,即使在 Python 2.x 中,也无需弄乱编码来访问任何语言的文件。
[~/test] echo $LANG
en_US.UTF-8
[~/test] echo testing > 漢字
[~/test] python2.6
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.stat("漢字")
posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=1263634230)
>>> os.stat(u"漢字")
posix.stat_result(st_mode=33188, st_ino=548583333L, st_dev=2049L, st_nlink=1, st_uid=1000, st_gid=1000, st_size=8L, st_atime=1263634240, st_mtime=1263634230, st_ctime=1263634230)
>>> open("漢字").read()
'testing\n'
>>> open(u"漢字").read()
'testing\n'
If this doesn't work, run "locale"; if the values are "C" instead of en_US.UTF-8, you may not have the locale installed correctly.
如果这不起作用,请运行“locale”;如果值是“C”而不是 en_US.UTF-8,则您可能没有正确安装语言环境。
If you're in Windows, I think Unicode filenames should always just work (at least for the os/posix modules), since the Unicode file API in Windows is supported transparently.
如果您使用的是 Windows,我认为 Unicode 文件名应该始终有效(至少对于 os/posix 模块),因为 Windows 中的 Unicode 文件 API 是透明支持的。
回答by Deleet
None of these solutions worked for me. However, I did find the (a?) solution. There is yet another place in Apache settings where one has to add the locale setting if one uses WSGI. Official docs are here. Add the following two lines to /etc/apache2/envvars
(on Ubuntu):
这些解决方案都不适合我。但是,我确实找到了(a?)解决方案。如果使用 WSGI,Apache 设置中还有另一个地方必须添加语言环境设置。官方文档在这里。将以下两行添加到/etc/apache2/envvars
(在 Ubuntu 上):
export LANG='en_US.UTF-8'
export LC_ALL='en_US.UTF-8'
Then restart the server. This solved my problem.
然后重启服务器。这解决了我的问题。
回答by Ignacio Vazquez-Abrams
Encode to the filesystem encoding before calling. See the locale
module.
在调用之前编码为文件系统编码。查看locale
模块。
回答by Ming C
Change your http server to use UTF-8 locale. For example, I use apache2 on CentOS. I changed /etc/sysconfig/httpdlocale setting by HTTPD_LANG:
更改您的 http 服务器以使用 UTF-8 语言环境。例如,我在 CentOS 上使用 apache2。我通过 HTTPD_LANG更改了/etc/sysconfig/httpd语言环境设置:
# CentOS use /etc/sysconfig/httpd to config environment variables.
#
# By default, the httpd process is started in the C locale; to
# change the locale in which the server runs, the HTTPD_LANG
# variable can be set.
#
# HTTPD_LANG=C
HTTPD_LANG=en_US.UTF-8 # you can change to your locale.
回答by HVNSweeting
It is easy to get this kind of error when running service (E.g: gunicorn) from Upstart.
从 Upstart 运行服务(例如:gunicorn)时很容易出现这种错误。
To fix that, set env in upstart file:
要解决这个问题,请在 upstart 文件中设置 env:
env LANG=en_US.UTF-8
env LC_CTYPE=en_US.UTF-8
env LC_ALL=en_US.UTF-8