Python Pandas concat 产生 ValueError:计划形状未对齐
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26226343/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas concat yields ValueError: Plan shapes are not aligned
提问by Lt.Fr0st
I am quite new to pandas, I am attempting to concatenate a set of dataframes and I am getting this error:
我对熊猫很陌生,我正在尝试连接一组数据框,但出现此错误:
ValueError: Plan shapes are not aligned
My understanding of .concat()is that it will join where columns are the same, but for those that it can't find it will fill with NA. This doesn't seem to be the case here.
我的理解.concat()是它会在列相同的地方加入,但对于那些找不到它的人,它将用 NA 填充。这似乎不是这里的情况。
Heres the concat statement:
这是 concat 语句:
dfs = [npo_jun_df, npo_jul_df,npo_may_df,npo_apr_df,npo_feb_df]
alpha = pd.concat(dfs)
回答by user3805082
In case it helps, I have also hit this error when I tried to concatenate two data frames (and as of the time of writing this is the only related hit I can find on google other than the source code).
如果有帮助,当我尝试连接两个数据帧时,我也遇到了这个错误(截至撰写本文时,这是我在 google 上除了源代码之外唯一可以找到的相关命中)。
I don't know whether this answer would have solved the OP's problem (since he/she didn't post enough information), but for me, this was caused when I tried to concatdataframe df1with columns ['A', 'B', 'B', 'C'](see the duplicate column headings?) with dataframe df2with columns ['A', 'B']. Understandably the duplication caused pandas to throw a wobbly. Change df1to ['A', 'B', 'C'](i.e. drop one of the duplicate columns) and everything works fine.
我不知道这个答案是否能解决 OP 的问题(因为他/她没有发布足够的信息),但对我来说,这是当我尝试使用列进行concat数据框(查看重复的列标题?)时引起的带有列的数据框。可以理解,重复导致熊猫摇晃。更改为(即删除重复列之一),一切正常。df1['A', 'B', 'B', 'C']df2['A', 'B']df1['A', 'B', 'C']
回答by William Welsh
I recently got this message, too, and I found like user @jasonand @user3805082above that I had duplicate columns in several of the hundreds of dataframes I was trying to concat, each with dozens of enigmatic varnames. Manually searching for duplicates was not practical.
我最近也收到了这条消息,我发现像上面的用户@jason和@user3805082 一样,我在尝试处理的数百个数据帧中的几个数据帧中有重复的列concat,每个数据帧都有几十个神秘的 varname。手动搜索重复项是不切实际的。
In case anyone else has the same problem, I wrote the following function which might help out.
如果其他人有同样的问题,我写了以下可能会有所帮助的函数。
def duplicated_varnames(df):
"""Return a dict of all variable names that
are duplicated in a given dataframe."""
repeat_dict = {}
var_list = list(df) # list of varnames as strings
for varname in var_list:
# make a list of all instances of that varname
test_list = [v for v in var_list if v == varname]
# if more than one instance, report duplications in repeat_dict
if len(test_list) > 1:
repeat_dict[varname] = len(test_list)
return repeat_dict
Then you can iterate over that dict to report how many duplicates there are, delete the duplicated variables, or rename them in some systematic way.
然后您可以迭代该 dict 以报告有多少重复项,删除重复的变量,或以某种系统的方式重命名它们。
回答by DiMithras
Wrote a small function to concatenate duplicated column names. Function cares about sorting if original dataframe is unsorted, the output will be a sorted one.
编写了一个小函数来连接重复的列名。如果原始数据框未排序,则函数关心排序,输出将是排序的。
def concat_duplicate_columns(df):
dupli = {}
# populate dictionary with column names and count for duplicates
for column in df.columns:
dupli[column] = dupli[column] + 1 if column in dupli.keys() else 1
# rename duplicated keys with °°° number suffix
for key, val in dict(dupli).items():
del dupli[key]
if val > 1:
for i in range(val):
dupli[key+'°°°'+str(i)] = val
else: dupli[key] = 1
# rename columns so that we can now access abmigous column names
# sorting in dict is the same as in original table
df.columns = dupli.keys()
# for each duplicated column name
for i in set(re.sub('°°°(.*)','',j) for j in dupli.keys() if '°°°' in j):
i = str(i)
# for each duplicate of a column name
for k in range(dupli[i+'°°°0']-1):
# concatenate values in duplicated columns
df[i+'°°°0'] = df[i+'°°°0'].astype(str) + df[i+'°°°'+str(k+1)].astype(str)
# Drop duplicated columns from which we have aquired data
df = df.drop(i+'°°°'+str(k+1), 1)
# resort column names for proper mapping
df = df.reindex_axis(sorted(df.columns), axis = 1)
# rename columns
df.columns = sorted(set(re.sub('°°°(.*)','',i) for i in dupli.keys()))
return df
回答by user12326346
You need to have the same header names for all the df you want to concat.
对于要连接的所有 df,您需要具有相同的标头名称。
Do it for example with :
例如用:
headername = list(df)
标题名称 = 列表(df)
Data = Data.filter(headername)
数据 = Data.filter(headername)

