Python UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40619675/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:44:40  来源:igfitidea点击:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

pythonpython-2.7web-scrapingpython-unicode

提问by dtrinh

I am trying to pull a list of 500 restaurants in Amsterdam from TripAdvisor; however after the 308th restaurant I get the following error:

我正在尝试从 TripAdvisor 中提取阿姆斯特丹 500 家餐厅的列表;但是在第 308 家餐厅之后,我收到以下错误:

Traceback (most recent call last):
  File "C:/Users/dtrinh/PycharmProjects/TripAdvisorData/LinkPull-HK.py", line 43, in <module>
    writer.writerow(rest_array)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

I tried several things I found on StackOverflow, but nothing is working as of right now. I was wondering if someone could take a look at my code and see any potential solutions that would be great.

我尝试了我在 StackOverflow 上找到的一些东西,但目前没有任何效果。我想知道是否有人可以查看我的代码并查看任何可能的解决方案。

        for item in soup2.findAll('div', attrs={'class', 'title'}):
            if 'Cuisine' in item.text:
                item.text.strip()
                content = item.findNext('div', attrs=('class', 'content'))
                cuisine_type = content.text.encode('utf8', 'ignore').strip().split(r'\xa0')
        rest_array = [account_name, rest_address, postcode, phonenumber, cuisine_type]
        #print rest_array
        with open('ListingsPull-Amsterdam.csv', 'a') as file:
                writer = csv.writer(file)
                writer.writerow(rest_array)
    break

回答by Laurent LAPORTE

The rest_arraycontains unicode strings. When you use csv.writerto write rows, you need to serialise bytes strings (you are on Python 2.7).

rest_array包含unicode字符串。当您使用csv.writer写入行时,您需要序列化字节字符串(您使用的是 Python 2.7)。

I suggest you to use "utf8" encoding:

我建议您使用“utf8”编码:

with open('ListingsPull-Amsterdam.csv', mode='a') as fd:
    writer = csv.writer(fd)
    rest_array = [text.encode("utf8") for text in rest_array]
    writer.writerow(rest_array)

note: please, don't use fileas variable because you shadow the built-in function file()(an alias of open()function).

注意:请不要使用fileas 变量,因为您隐藏了内置函数file()(函数的别名open())。

If you want to open this CSV file with Microsoft Excel, you may consider using another encoding, for instance "cp1252" (it allows u"\u2019" character).

如果你想用 Microsoft Excel 打开这个 CSV 文件,你可以考虑使用另一种编码,例如“cp1252”(它允许 u"\u2019" 字符)。

回答by Irmen de Jong

You're writing a non-ascii character(s) to your csv output file. Make sure you open the output file with the appropriate character encoding that allows for the character(s) to be encoded. A safe bet is often UTF-8. Try this:

您正在将非 ASCII 字符写入 csv 输出文件。确保使用允许对字符进行编码的适当字符编码打开输出文件。安全的赌注通常是 UTF-8。尝试这个:

with open('ListingsPull-Amsterdam.csv', 'a', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(rest_array)

editthis is for Python 3.x, sorry.

编辑这是针对 Python 3.x 的,抱歉。