Python Pandas.read_csv() 在列名中带有特殊字符(重音符号)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39650407/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas.read_csv() with special characters (accents) in column names ?
提问by farhawa
I have a csv
file that contains some data with columns names:
我有一个csv
文件,其中包含一些带有列名的数据:
- "PERIODE"
- "IAS_brut"
- "IAS_lissé"
- "Incidence_Sentinelles"
- “时期”
- “IAS_brut”
- “IAS_lissé”
- “事件_哨兵”
I have a problem with the third one "IAS_lissé"which is misinterpreted by pd.read_csv()
method and returned as ?.
我对第三个“IAS_lissé”有问题,它被pd.read_csv()
方法误解并返回为 ?。
What is that character?
那是什么性格?
Because it's generating a bug in my flask application, is there a way to read that column in an other way without modifying the file?
因为它在我的烧瓶应用程序中产生了一个错误,有没有办法在不修改文件的情况下以其他方式读取该列?
In [1]: import pandas as pd
In [2]: pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";").columns
Out[2]: Index([u'PERIODE', u'IAS_brut', u'IAS_liss?', u'Incidence_Sentinelles'], dtype='object')
回答by shawnheide
You can change the encoding
parameter for read_csv, see the pandas doc here. Also the python standard encodings are here.
您可以更改encoding
read_csv的参数,请参阅此处的Pandas 文档。还有 python 标准编码在这里。
I believe for your example you can use the utf-8
encoding (assuming that your language is French).
我相信对于您的示例,您可以使用utf-8
编码(假设您的语言是法语)。
df = pd.read_csv("Openhealth_S-Grippal.csv", delimiter=";", encoding='utf-8')
Here's an example showing some sample output. All I did was make a csv file with one column, using the problem characters.
这是一个显示一些示例输出的示例。我所做的只是使用问题字符制作一个包含一列的 csv 文件。
df = pd.read_csv('sample.csv', encoding='utf-8')
Output:
输出:
IAS_lissé
0 1
1 2
2 3
回答by Francisco del Valle Bas
I found the same problem with spanish, solved it with with "latin1" encoding:
我发现西班牙语也有同样的问题,用“latin1”编码解决了这个问题:
import pandas as pd
pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";", encoding='latin1')
Hope it helps!
希望能帮助到你!
回答by pantherentheitroade
Using utf-8 didn't work for me. E.g. this piece of code:
使用 utf-8 对我不起作用。例如这段代码:
bla = pd.DataFrame(data = [1, 2])
bla.to_csv('funkyNamé , things.csv')
blabla = pd.read_csv('funkyNamé , things.csv', delimiter=";", encoding='utf-8')
blabla
Ultimately returned: OSError: Initializing from file failed
最终返回:OSError: Initializing from file failed
I know you said you didn't want to modify the file. If you meant the file content vs the filename, I would rename the file to something without an accent, read the csv file under its new name, then reset the filename back to its original name.
我知道你说过你不想修改文件。如果您指的是文件内容与文件名,我会将文件重命名为不带重音的名称,以新名称读取 csv 文件,然后将文件名重置为其原始名称。
originalfilepath = r'C:\Users\myself\funkyNamé , things.csv'
originalfolder = r'C:\Users\myself'
os.rename(originalfilepath, originalFolder+"\tempName.csv")
df = pd.read_csv(originalFolder+"\tempName.csv", encoding='ISO-8859-1')
os.rename(originalFolder+"\tempName.csv", originalfilepath)
If you did mean "without modifying the filename, my apologies for not being helpful to you, and I hope this helps someone else.
如果你的意思是“不修改文件名,我很抱歉没有对你有帮助,我希望这能帮助别人。