Python Pandas.read_csv() 在列名中带有特殊字符(重音符号)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39650407/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:32:05  来源:igfitidea点击:

Pandas.read_csv() with special characters (accents) in column names ?

pythonpandasunicodeutf-8special-characters

提问by farhawa

I have a csvfile that contains some data with columns names:

我有一个csv文件,其中包含一些带有列名的数据:

  • "PERIODE"
  • "IAS_brut"
  • "IAS_lissé"
  • "Incidence_Sentinelles"
  • “时期”
  • “IAS_brut”
  • “IAS_lissé”
  • “事件_哨兵”

I have a problem with the third one "IAS_lissé"which is misinterpreted by pd.read_csv()method and returned as ?.

我对第三个“IAS_lissé”有问题,它被pd.read_csv()方法误解并返回为 ?。

What is that character?

那是什么性格?

Because it's generating a bug in my flask application, is there a way to read that column in an other way without modifying the file?

因为它在我的烧瓶应用程序中产生了一个错误,有没有办法在不修改文件的情况下以其他方式读取该列

In [1]: import pandas as pd

In [2]: pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";").columns

Out[2]: Index([u'PERIODE', u'IAS_brut', u'IAS_liss?', u'Incidence_Sentinelles'], dtype='object')

回答by shawnheide

You can change the encodingparameter for read_csv, see the pandas doc here. Also the python standard encodings are here.

您可以更改encodingread_csv的参数,请参阅此处的Pandas 文档。还有 python 标准编码在这里

I believe for your example you can use the utf-8encoding (assuming that your language is French).

我相信对于您的示例,您可以使用utf-8编码(假设您的语言是法语)。

df = pd.read_csv("Openhealth_S-Grippal.csv", delimiter=";", encoding='utf-8')


Here's an example showing some sample output. All I did was make a csv file with one column, using the problem characters.

这是一个显示一些示例输出的示例。我所做的只是使用问题字符制作一个包含一列的 csv 文件。

df = pd.read_csv('sample.csv', encoding='utf-8')

Output:

输出:

    IAS_lissé
0   1
1   2
2   3

回答by Francisco del Valle Bas

I found the same problem with spanish, solved it with with "latin1" encoding:

我发现西班牙语也有同样的问题,用“latin1”编码解决了这个问题:

import pandas as pd

 pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";", encoding='latin1')

Hope it helps!

希望能帮助到你!

回答by pantherentheitroade

Using utf-8 didn't work for me. E.g. this piece of code:

使用 utf-8 对我不起作用。例如这段代码:

    bla = pd.DataFrame(data = [1, 2])
    bla.to_csv('funkyNamé , things.csv')
    blabla = pd.read_csv('funkyNamé , things.csv', delimiter=";", encoding='utf-8')
    blabla 

Ultimately returned: OSError: Initializing from file failed

最终返回:OSError: Initializing from file failed

I know you said you didn't want to modify the file. If you meant the file content vs the filename, I would rename the file to something without an accent, read the csv file under its new name, then reset the filename back to its original name.

我知道你说过你不想修改文件。如果您指的是文件内容与文件名,我会将文件重命名为不带重音的名称,以新名称读取 csv 文件,然后将文件名重置为其原始名称。

    originalfilepath = r'C:\Users\myself\funkyNamé , things.csv'
    originalfolder = r'C:\Users\myself'
    os.rename(originalfilepath, originalFolder+"\tempName.csv")
    df = pd.read_csv(originalFolder+"\tempName.csv", encoding='ISO-8859-1')
    os.rename(originalFolder+"\tempName.csv", originalfilepath)

If you did mean "without modifying the filename, my apologies for not being helpful to you, and I hope this helps someone else.

如果你的意思是“不修改文件,我很抱歉没有对你有帮助,我希望这能帮助别人。