Python 请求编码 POST 数据

Question

提问by TheMagician

Version: Python 2.7.3

版本：Python 2.7.3

Other libraries: Python-Requests 1.2.3, jinja2 (2.6)

其他库：Python-Requests 1.2.3、jinja2 (2.6)

I have a script that submits data to a forum and the problem is that non-ascii characters appear as garbage. For instance a name like André Téchiné comes out as Andr?? T??chin??.

我有一个向论坛提交数据的脚本，问题是非 ascii 字符显示为垃圾。例如，像 André Téchiné 这样的名字出现在 Andr?? T？？下巴？？。

Here's how the data is submitted:

以下是提交数据的方式：

1) Data is initially loaded from a UTF-8 encoded CSV file like so:

1) 数据最初是从 UTF-8 编码的 CSV 文件加载的，如下所示：

entries = []
with codecs.open(filename, 'r', 'utf-8') as f:
    for row in unicode_csv_reader(f.readlines()[1:]):
        entries.append(dict(zip(csv_header, row)))

unicode_csv_reader is from the bottom of Python CSV documentation page: http://docs.python.org/2/library/csv.html

unicode_csv_reader 来自 Python CSV 文档页面的底部：http://docs.python.org/2/library/csv.html

When I type the entries name in the interpreter, I see the name as u'Andr\xe9 T\xe9chin\xe9'.

当我在解释器中输入条目名称时，我看到名称为u'Andr\xe9 T\xe9chin\xe9'.

2) Next I render the data through jinja2:

2）接下来我通过jinja2渲染数据：

tpl = tpl_env.get_template(u'forumpost.html')
rendered = tpl.render(entries=entries)

When I type the name rendered in the interpreter I see again the same: u'Andr\xe9 T\xe9chin\xe9'

当我输入在解释器中呈现的名称时，我再次看到相同的内容： u'Andr\xe9 T\xe9chin\xe9'

Now, if I write the rendered variable to a filename like this, it displays correctly:

现在，如果我将呈现的变量写入这样的文件名，它会正确显示：

with codecs.open('out.txt', 'a', 'utf-8') as f:
    f.write(rendered)

But I must send it to the forum:

但我必须把它发送到论坛：

3) In the POST request code I have:

3）在POST请求代码中，我有：

params = {u'post': rendered}
headers = {u'content-type': u'application/x-www-form-urlencoded'}
session.post(posturl, data=params, headers=headers, cookies=session.cookies)

session is a Requests session.

session 是一个请求会话。

And the name is displayed broken in the forum post. I have tried the following:

并且该名称在论坛帖子中显示已损坏。我尝试了以下方法：

Leave out headers
Encode rendered as rendered.encode('utf-8') (same result)
rendered = urllib.quote_plus(rendered) (comes out as all %XY)

省略标题
编码呈现为 render.encode('utf-8') （结果相同）
渲染 = urllib.quote_plus(rendered)（全部为 %XY）

If I type rendered.encode('utf-8') I see the following:

如果我输入 render.encode('utf-8') 我会看到以下内容：

'Andr\xc3\xa9 T\xc3\xa9chin\xc3\xa9'

How could I fix the issue? Thanks.

我该如何解决这个问题？谢谢。

Answer 1

采纳答案by jfs

Your client behaves as it should e.g. running nc -l 8888as a server and making a request:

您的客户端的行为与它应该的一样，例如nc -l 8888作为服务器运行并发出请求：

import requests

requests.post('http://localhost:8888', data={u'post': u'Andr\xe9 T\xe9chin\xe9'})

shows:

显示：

POST / HTTP/1.1
Host: localhost:8888
Content-Length: 33
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/1.2.3 CPython/2.7.3

post=Andr%C3%A9+T%C3%A9chin%C3%A9

You can check that it is correct:

您可以检查它是否正确：

>>> import urllib
>>> urllib.unquote_plus(b"Andr%C3%A9+T%C3%A9chin%C3%A9").decode('utf-8')
u'Andr\xe9 T\xe9chin\xe9'

check the server decodes the request correctly. You could try to specify the charset:
```
headers = {"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8"}
```
the body contains only ascii characters so it shouldn't hurt and the correct server would ignore any parameters for x-www-form-urlencodedtype anyway. Look for gory details in URL-encoded form data
check the issue is not a display artefact i.e., the value is correct but it displays incorrectly

检查服务器是否正确解码了请求。您可以尝试指定字符集：
```
headers = {"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8"}
```
正文仅包含 ascii 字符，因此它不应该受到伤害，并且正确的服务器x-www-form-urlencoded无论如何都会忽略任何类型参数。在URL 编码的表单数据中查找血腥细节
检查问题不是显示伪像，即值正确但显示不正确

Answer 2

回答by dikkini

Try to decode into utf8:

尝试解码成utf8：

unicode(my_string_variable, "utf8")

or decode and encode:

或解码和编码：

sometext = gettextfromsomewhere().decode('utf-8')
env = jinja2.Environment(loader=jinja2.PackageLoader('jinjaapplication', 'templates'))
template = env.get_template('mypage.html')
print template.render( sometext = sometext ).encode('utf-8')

Python 请求编码 POST 数据

提问by TheMagician

采纳答案by jfs

回答by dikkini

相关推荐

最近更新

标签

Python 请求编码 POST 数据

提问by TheMagician

采纳答案by jfs

回答by dikkini

相关推荐

Python 在 Kivy 中将图像对象作为按钮背景传递

使用 .iteritems() 迭代 Python 字典中的键、值

Python 错误：OSError: [Errno 22] 无效参数

Python 在 virtualenv 中安装 Flask 但“找不到命令”

相关推荐

最近更新

标签