pandas 熊猫无法识别 csv 列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/21946933/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas not recognizing csv columns
提问by cubedNnoobed
I am using pandas to read .csv data files. For one of my files I am able to index using the column title. For the other I get error messages
我正在使用 Pandas 读取 .csv 数据文件。对于我的一个文件,我可以使用列标题进行索引。对于另一个我收到错误消息
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", 
line 1023, in _check_have
raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named State'
The code I used is:
我使用的代码是:
filename = "PovertyEstimates.csv"
#filename = "nm.csv"
f = open(filename)
import pandas as pd
data = pd.read_csv(f)#, index_col=0)
print data['State']
Even when I use index_col I get the same error(unless it is 0). I have found that when I print the csv file that isn't working in my terminal it is not separated into columns like the one that is. Rather the items in each row are printed consecutively separated by spaces. I believe this incorrect separation is the problem.
即使我使用 index_col 我也会得到同样的错误(除非它是 0)。我发现当我打印在我的终端中不起作用的 csv 文件时,它没有像那样分成几列。而是每行中的项目以空格分隔连续打印。我相信这种不正确的分离是问题所在。
I am using LibreOffice Calc on Ubuntu Linux. For the improperly formatted file (which appears in perfect format in LibreOffice) the terminal output is:
我在 Ubuntu Linux 上使用 LibreOffice Calc。对于格式不正确的文件(在 LibreOffice 中以完美格式出现),终端输出为:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3194 entries, 0 to 3193
Data columns:
FIPStxt State   Area_name   Rural-urban_Continuum Code_2003       Urban_Influence_Code_2003 Rural-urban_Continuum Code_20013      Urban_Influence_Code_20013    POVALL_2011 CI90LBAll_2011    CI90UBALL_2011    PCTPOVALL_2011  CI90LBALLP_2011 CI90UBALLP_2011 POV017_2011 CI90LB017_2011  CI90UB017_2011  PCTPOV017_2011  CI90LB017P_2011 CI90UB017P_2011 POV517_2011 CI90LB517_2011  CI90UB517_2011  PCTPOV517_2011  CI90LB517P_2011 CI90UB517P_2011 MEDHHINC_2011   CI90LBINC_2011  CI90UBINC_2011  POV05_2011  CI90LB05_2011   CI90UB05_2011   PCTPOV05_2011   CI90LB05P_2011       CI90UB05P_2011    3194  non-null values
dtypes: object(1)
The first few lines of the csv file are:
csv文件的前几行是:
FIPStxt State   Area_name   Rural-urban_Continuum Code_2003       
01000   AL  Alabama      
01001   AL  Autauga County  2   2
01003   AL  Baldwin County  4   5
回答by Will
The spaces are probably the problem. You need to tell pandas what separator to use when parsing the CSV.
空间可能是问题所在。您需要告诉 pandas 在解析 CSV 时使用什么分隔符。
data = pd.read_csv(f, sep=" ")
Problem is though, it will pick up allspaces as valid separators (e.g. Alabama County becomes 2 columns). The best would be to convert that one file to a an actual comma (semicolon or other) separated file or make sure that compound values are quoted ("Alabama County") and then specify the quotechar:
问题是,它会选择所有空格作为有效分隔符(例如,阿拉巴马县变成 2 列)。最好的办法是将该文件转换为一个实际的逗号(分号或其他)分隔文件,或者确保引用复合值(“阿拉巴马县”),然后指定quotechar:
data = pd.read_csv(f, sep=" ", quotechar='"')

