Pandas concat ValueError:缓冲区数据类型不匹配,预期为“Python 对象”但得到“长长”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25822604/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas concat ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
提问by lapolonio
I am trying to analyze the Gizette Datasetfrom a Feature Selection Challenge
我正在尝试从特征选择挑战中分析Gizette 数据集
when i try to concat the train dataframe with the label series based on pandas exampleit
当我尝试将火车数据框与基于Pandas示例的标签系列连接起来时
throws
投掷
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
ValueError:缓冲区数据类型不匹配,预期为“Python 对象”但得到“长长”
Code:
代码:
import pandas as pd
trainData = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.data'
,delim_whitespace=True
,header=None
,names=['AA','AB','AC','AD','AE','AF','AG','AH','AI','AJ','AK','AL','AM','AN','AO','AP','AQ','AR','AS','AT','AU','AV','AW','AX','AY','AZ','BA','BB','BC','BD','BE','BF','BG','BH','BI','BJ','BK','BL','BM','BN','BO','BP','BQ','BR','BS','BT','BU','BV','BW','BX','BY','BZ','CA','CB','CC','CD','CE','CF','CG','CH','CI','CJ','CK','CL','CM','CN','CO','CP','CQ','CR','CS','CT','CU','CV','CW','CX','CY','CZ','DA','DB','DC','DD','DE','DF','DG','DH','DI','DJ','DK','DL','DM','DN','DO','DP','DQ','DR','DS','DT','DU','DV','DW','DX','DY','DZ','EA','EB','EC','ED','EE','EF','EG','EH','EI','EJ','EK','EL','EM','EN','EO','EP','EQ','ER','ES','ET','EU','EV','EW','EX','EY','EZ','FA','FB','FC','FD','FE','FF','FG','FH','FI','FJ','FK','FL','FM','FN','FO','FP','FQ','FR','FS','FT','FU','FV','FW','FX','FY','FZ','GA','GB','GC','GD','GE','GF','GG','GH','GI','GJ','GK','GL','GM','GN','GO','GP','GQ','GR','GS','GT','GU','GV','GW','GX','GY','GZ','HA','HB','HC','HD','HE','HF','HG','HH','HI','HJ','HK','HL','HM','HN','HO','HP','HQ','HR','HS','HT','HU','HV','HW','HX','HY','HZ','IA','IB','IC','ID','IE','IF','IG','IH','II','IJ','IK','IL','IM','IN','IO','IP','IQ','IR','IS','IT','IU','IV','IW','IX','IY','IZ','JA','JB','JC','JD','JE','JF','JG','JH','JI','JJ','JK','JL','JM','JN','JO','JP','JQ','JR','JS','JT','JU','JV','JW','JX','JY','JZ','KA','KB','KC','KD','KE','KF','KG','KH','KI','KJ','KK','KL','KM','KN','KO','KP','KQ','KR','KS','KT','KU','KV','KW','KX','KY','KZ','LA','LB','LC','LD','LE','LF','LG','LH','LI','LJ','LK','LL','LM','LN','LO','LP','LQ','LR','LS','LT','LU','LV','LW','LX','LY','LZ','MA','MB','MC','MD','ME','MF','MG','MH','MI','MJ','MK','ML','MM','MN','MO','MP','MQ','MR','MS','MT','MU','MV','MW','MX','MY','MZ','NA','NB','NC','ND','NE','NF','NG','NH','NI','NJ','NK','NL','NM','NN','NO','NP','NQ','NR','NS','NT','NU','NV','NW','NX','NY','NZ','OA','OB','OC','OD','OE','OF','OG','OH','OI','OJ','OK','OL','OM','ON','OO','OP','OQ','OR','OS','OT','OU','OV','OW','OX','OY','OZ','PA','PB','PC','PD','PE','PF','PG','PH','PI','PJ','PK','PL','PM','PN','PO','PP','PQ','PR','PS','PT','PU','PV','PW','PX','PY','PZ','QA','QB','QC','QD','QE','QF','QG','QH','QI','QJ','QK','QL','QM','QN','QO','QP','QQ','QR','QS','QT','QU','QV','QW','QX','QY','QZ','RA','RB','RC','RD','RE','RF','RG','RH','RI','RJ','RK','RL','RM','RN','RO','RP','RQ','RR','RS','RT','RU','RV','RW','RX','RY','RZ','SA','SB','SC','SD','SE','SF','SG','SH','SI','SJ','SK','SL','SM','SN','SO','SP','SQ','SR','SS','ST','SU','SV','SW','SX','SY','SZ','TA','TB','TC','TD','TE','TF'])
# print 'finished with train data'
trainLabel = pd.read_table(filepath_or_buffer='GISETTE/gisette_train.labels'
,squeeze=True
,names=['label']
,delim_whitespace=True
,header=None)
trainData.info()
# outputs
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6000 entries
Columns: 500 entries, AA to TF
dtypes: int64(500)None
trainLabel.describe()
#outputs
count 6000.000000
mean 0.000000
std 1.000083
min -1.000000
25% -1.000000
50% 0.000000
75% 1.000000
max 1.000000
dtype: float64
readyToTrain = pd.concat([trainData, trainLabel], axis=1)
full stack trace
完整的堆栈跟踪
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 717, in concat
verify_integrity=verify_integrity)
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 848, in __init__
self.new_axes = self._get_new_axes()
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 898, in _get_new_axes
new_axes[i] = self._get_comb_axis(i)
File "C:\env\Python27\lib\site-packages\pandas\tools\merge.py", line 924, in _get_comb_axis
return _get_combined_index(all_indexes, intersect=self.intersect)
File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3991, in _get_combined_index
union = _union_indexes(indexes)
File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 4017, in _union_indexes
result = result.union(other)
File "C:\env\Python27\lib\site-packages\pandas\core\index.py", line 3753, in union
uniq_tuples = lib.fast_unique_multiple([self.values, other.values])
File "lib.pyx", line 366, in pandas.lib.fast_unique_multiple (pandas\lib.c:8378)
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'
edit: installed library from binary from lfd.uci.edu/~gohlke/pythonlibs pandas-0.14.1.win-amd64-py2.7
编辑:从 lfd.uci.edu/~gohlke/pythonlibs pandas-0.14.1.win-amd64-py2.7 的二进制安装库
tried suggestion to convert series to frame (did not work same stacktrace as above) frame info:
尝试将系列转换为框架的建议(与上面的堆栈跟踪不同)框架信息:
dataframe info (trainData)
数据框信息(trainData)
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 6000 entries, (550, 0, 495, 0, 0, 0, 0, 976, 0, 0, 0, 0, 983, 0, 995, 0, 983, 0, 0, 983, 0, 0, 0, 0, 0, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 983, 0, 0, 0, 0, 0, 0, 0, 0, 0, 808, 0, 778, 0, 983, 0, 0, 0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 991, 983, 983, 0, 0, 0, 0, 0, 0, 0, 983, 735, 0, 0, 983, 983, 0, 0, 0, 0, 569, 0, 0, 0, 0, 713, 0, 0, 0, 0, 0, 983, 983, 0, ...) to (0, 0, 991, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 948, 995, 348, 0, 0, 0, 0, 0, 0, 0, 0, 0, 751, 0, 0, 0, 0, 0, 0, 0, 0, 804, 0, 0, 0, 862, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 991, 0, 0, 0, 0, 995, 0, 0, 0, 0, 0, 0, 840, 0, 0, 0, 976, 0, 0, 0, 0, 0, 0, 777, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...)
Columns: 500 entries, AA to TF
dtypes: int64(500)None
series to dataframe info (trainLabel):
系列到数据框信息(trainLabel):
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6000 entries, 0 to 5999
Data columns (total 1 columns):
label 6000 non-null int64
dtypes: int64(1)None
回答by The Unfun Cat
Like joris pointed out (and like I had to figure out myself because I did not read the comments first) the problems are your indexes.
就像 joris 指出的那样(就像我必须弄清楚自己,因为我没有先阅读评论)问题是您的索引。
Change your code from
更改您的代码
pd.concat(to_concat, axis=1)
to
到
pd.concat([s.reset_index(drop=True) for s in to_concat], axis=1)
回答by Samuel Padilla
I've read it is easier to just create the whole table as a nested list, and then create the pandas DataFrame using that nested list.
我读过将整个表创建为嵌套列表更容易,然后使用该嵌套列表创建 Pandas DataFrame。
回答by Kriti Pawar
This error is coming because of mismatch in index of both dataframe when you try to concat it. I figured out the way to concat this two dataframe. Try this:
当您尝试连接它时,由于两个数据帧的索引不匹配而出现此错误。我想出了连接这两个数据框的方法。尝试这个:
import pandas as pd
trainData = pd.read_table(filepath_or_buffer='/Users/embibe/Desktop/kaggle/gisette/gisette_train.data'
,delim_whitespace=True
,header=None
,names=['AA','AB','AC','AD','AE','AF','AG','AH','AI','AJ','AK','AL','AM','AN','AO','AP','AQ','AR','AS','AT','AU','AV','AW','AX','AY','AZ','BA','BB','BC','BD','BE','BF','BG','BH','BI','BJ','BK','BL','BM','BN','BO','BP','BQ','BR','BS','BT','BU','BV','BW','BX','BY','BZ','CA','CB','CC','CD','CE','CF','CG','CH','CI','CJ','CK','CL','CM','CN','CO','CP','CQ','CR','CS','CT','CU','CV','CW','CX','CY','CZ','DA','DB','DC','DD','DE','DF','DG','DH','DI','DJ','DK','DL','DM','DN','DO','DP','DQ','DR','DS','DT','DU','DV','DW','DX','DY','DZ','EA','EB','EC','ED','EE','EF','EG','EH','EI','EJ','EK','EL','EM','EN','EO','EP','EQ','ER','ES','ET','EU','EV','EW','EX','EY','EZ','FA','FB','FC','FD','FE','FF','FG','FH','FI','FJ','FK','FL','FM','FN','FO','FP','FQ','FR','FS','FT','FU','FV','FW','FX','FY','FZ','GA','GB','GC','GD','GE','GF','GG','GH','GI','GJ','GK','GL','GM','GN','GO','GP','GQ','GR','GS','GT','GU','GV','GW','GX','GY','GZ','HA','HB','HC','HD','HE','HF','HG','HH','HI','HJ','HK','HL','HM','HN','HO','HP','HQ','HR','HS','HT','HU','HV','HW','HX','HY','HZ','IA','IB','IC','ID','IE','IF','IG','IH','II','IJ','IK','IL','IM','IN','IO','IP','IQ','IR','IS','IT','IU','IV','IW','IX','IY','IZ','JA','JB','JC','JD','JE','JF','JG','JH','JI','JJ','JK','JL','JM','JN','JO','JP','JQ','JR','JS','JT','JU','JV','JW','JX','JY','JZ','KA','KB','KC','KD','KE','KF','KG','KH','KI','KJ','KK','KL','KM','KN','KO','KP','KQ','KR','KS','KT','KU','KV','KW','KX','KY','KZ','LA','LB','LC','LD','LE','LF','LG','LH','LI','LJ','LK','LL','LM','LN','LO','LP','LQ','LR','LS','LT','LU','LV','LW','LX','LY','LZ','MA','MB','MC','MD','ME','MF','MG','MH','MI','MJ','MK','ML','MM','MN','MO','MP','MQ','MR','MS','MT','MU','MV','MW','MX','MY','MZ','NA','NB','NC','ND','NE','NF','NG','NH','NI','NJ','NK','NL','NM','NN','NO','NP','NQ','NR','NS','NT','NU','NV','NW','NX','NY','NZ','OA','OB','OC','OD','OE','OF','OG','OH','OI','OJ','OK','OL','OM','ON','OO','OP','OQ','OR','OS','OT','OU','OV','OW','OX','OY','OZ','PA','PB','PC','PD','PE','PF','PG','PH','PI','PJ','PK','PL','PM','PN','PO','PP','PQ','PR','PS','PT','PU','PV','PW','PX','PY','PZ','QA','QB','QC','QD','QE','QF','QG','QH','QI','QJ','QK','QL','QM','QN','QO','QP','QQ','QR','QS','QT','QU','QV','QW','QX','QY','QZ','RA','RB','RC','RD','RE','RF','RG','RH','RI','RJ','RK','RL','RM','RN','RO','RP','RQ','RR','RS','RT','RU','RV','RW','RX','RY','RZ','SA','SB','SC','SD','SE','SF','SG','SH','SI','SJ','SK','SL','SM','SN','SO','SP','SQ','SR','SS','ST','SU','SV','SW','SX','SY','SZ','TA','TB','TC','TD','TE','TF'])
trainLabel = pd.read_table(filepath_or_buffer='/Users/embibe/Desktop/kaggle/gisette/gisette_train.labels'
,squeeze=True
,names=['label']
,delim_whitespace=True
,header=None)
trainData.describe()
trainData = trainData.reset_index()
trainData = trainData.iloc[:,-500:]
trainData.describe()
trainData.shape
trainLabel.shape
df = pd.concat([trainData, trainLabel], axis=1)
df.head()
Hope this will resolve your issue.
希望这能解决您的问题。

