pandas 如何在熊猫数据框中显示汉字?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39308065/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:57:16  来源:igfitidea点击:

How to display Chinese characters inside a pandas dataframe?

pythoncsvpandasencodingchinese-locale

提问by Daniel

I can read a csv file in which there is a column containing Chinese characters (other columns are English and numbers). However, Chinese characters don't display correctly. see photo below

我可以读取一个csv文件,其中有一列包含汉字(其他列是英文和数字)。但是,中文字符无法正确显示。看下面的照片

enter image description here

在此处输入图片说明

I loaded the csv file with pd.read_csv().

我用 .csv 文件加载了 .csv 文件pd.read_csv()

Either display(data06_16)or data06_16.head()won't display Chinese characters correctly.

无论是display(data06_16)data06_16.head()将无法正确显示 CN 文字。

I tried to add the following lines into my .bash_profile:

我尝试将以下行添加到我的.bash_profile

export LC_ALL=zh_CN.UTF-8
export LANG=zh_CN.UTF-8

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

but it doesn't help.

但它没有帮助。

Also I have tried to add encodingarg to pd.read_csv():

我也尝试将encodingarg添加到pd.read_csv()

pd.read_csv('data.csv', encoding='utf_8')
pd.read_csv('data.csv', encoding='utf_16')
pd.read_csv('data.csv', encoding='utf_32')

These won't work at all.

这些根本行不通。

How can I display the Chinese characters properly?

如何正确显示汉字?

回答by Daniel

I just remembered that the source dataset was created using encoding='GBK', so I tried again using

我只记得源数据集是使用创建的encoding='GBK',所以我再次尝试使用

data06_16 = pd.read_csv("../data/stocks1542monthly.csv", encoding="GBK")

Now, I can see all the Chinese characters.

现在,我可以看到所有的汉字。

Thanks guys!

谢谢你们!

回答by vlad.rad

I see here three possible issues:

我在这里看到三个可能的问题:

1) You can try this:

1)你可以试试这个:

import codecs
x = codecs.open("testdata.csv", "r", "utf-8")

2) Another possibility can be theoretically this:

2)另一种可能性理论上可以是这样:

import pandas as pd
df = pd.DataFrame(pd.read_csv('testdata.csv',encoding='utf-8')) 

3) Maybe you should convert you csv file into utf-8 before importing with Python (for example in Notepad++)? It can be a solution for one-time-import, not for automatic process, of course.

3)也许您应该在使用 Python 导入之前将 csv 文件转换为 utf-8(例如在 Notepad++ 中)?当然,它可以是一次性导入的解决方案,而不是自动处理的解决方案。

回答by blacksheep

Try this

尝试这个

df = pd.read_csv(path, engine='python', encoding='utf-8-sig')