Python Pandas concat 产生 ValueError：计划形状未对齐

Question

提问by Lt.Fr0st

I am quite new to pandas, I am attempting to concatenate a set of dataframes and I am getting this error:

我对熊猫很陌生，我正在尝试连接一组数据框，但出现此错误：

ValueError: Plan shapes are not aligned

My understanding of .concat()is that it will join where columns are the same, but for those that it can't find it will fill with NA. This doesn't seem to be the case here.

我的理解.concat()是它会在列相同的地方加入，但对于那些找不到它的人，它将用 NA 填充。这似乎不是这里的情况。

Heres the concat statement:

这是 concat 语句：

dfs = [npo_jun_df, npo_jul_df,npo_may_df,npo_apr_df,npo_feb_df]
alpha = pd.concat(dfs)

Answer 1

回答by user3805082

In case it helps, I have also hit this error when I tried to concatenate two data frames (and as of the time of writing this is the only related hit I can find on google other than the source code).

如果有帮助，当我尝试连接两个数据帧时，我也遇到了这个错误（截至撰写本文时，这是我在 google 上除了源代码之外唯一可以找到的相关命中）。

I don't know whether this answer would have solved the OP's problem (since he/she didn't post enough information), but for me, this was caused when I tried to concatdataframe df1with columns ['A', 'B', 'B', 'C'](see the duplicate column headings?) with dataframe df2with columns ['A', 'B']. Understandably the duplication caused pandas to throw a wobbly. Change df1to ['A', 'B', 'C'](i.e. drop one of the duplicate columns) and everything works fine.

我不知道这个答案是否能解决 OP 的问题（因为他/她没有发布足够的信息），但对我来说，这是当我尝试使用列进行concat数据框（查看重复的列标题？）时引起的带有列的数据框。可以理解，重复导致熊猫摇晃。更改为（即删除重复列之一），一切正常。df1['A', 'B', 'B', 'C']df2['A', 'B']df1['A', 'B', 'C']

Answer 2

回答by William Welsh

I recently got this message, too, and I found like user @jasonand @user3805082above that I had duplicate columns in several of the hundreds of dataframes I was trying to concat, each with dozens of enigmatic varnames. Manually searching for duplicates was not practical.

我最近也收到了这条消息，我发现像上面的用户@jason和@user3805082 一样，我在尝试处理的数百个数据帧中的几个数据帧中有重复的列concat，每个数据帧都有几十个神秘的 varname。手动搜索重复项是不切实际的。

In case anyone else has the same problem, I wrote the following function which might help out.

如果其他人有同样的问题，我写了以下可能会有所帮助的函数。

def duplicated_varnames(df):
    """Return a dict of all variable names that 
    are duplicated in a given dataframe."""
    repeat_dict = {}
    var_list = list(df) # list of varnames as strings
    for varname in var_list:
        # make a list of all instances of that varname
        test_list = [v for v in var_list if v == varname] 
        # if more than one instance, report duplications in repeat_dict
        if len(test_list) > 1: 
            repeat_dict[varname] = len(test_list)
    return repeat_dict

Then you can iterate over that dict to report how many duplicates there are, delete the duplicated variables, or rename them in some systematic way.

然后您可以迭代该 dict 以报告有多少重复项，删除重复的变量，或以某种系统的方式重命名它们。

Answer 3

回答by DiMithras

Wrote a small function to concatenate duplicated column names. Function cares about sorting if original dataframe is unsorted, the output will be a sorted one.

编写了一个小函数来连接重复的列名。如果原始数据框未排序，则函数关心排序，输出将是排序的。

def concat_duplicate_columns(df):
    dupli = {}
    # populate dictionary with column names and count for duplicates 
    for column in df.columns:
        dupli[column] = dupli[column] + 1 if column in dupli.keys() else 1
    # rename duplicated keys with °°° number suffix
    for key, val in dict(dupli).items():
        del dupli[key]
        if val > 1:
            for i in range(val):
                dupli[key+'°°°'+str(i)] = val
        else: dupli[key] = 1
    # rename columns so that we can now access abmigous column names
    # sorting in dict is the same as in original table
    df.columns = dupli.keys()
    # for each duplicated column name
    for i in set(re.sub('°°°(.*)','',j) for j in dupli.keys() if '°°°' in j):
        i = str(i)
        # for each duplicate of a column name
        for k in range(dupli[i+'°°°0']-1):
            # concatenate values in duplicated columns
            df[i+'°°°0'] = df[i+'°°°0'].astype(str) + df[i+'°°°'+str(k+1)].astype(str)
            # Drop duplicated columns from which we have aquired data
            df = df.drop(i+'°°°'+str(k+1), 1)
    # resort column names for proper mapping
    df = df.reindex_axis(sorted(df.columns), axis = 1)
    # rename columns
    df.columns = sorted(set(re.sub('°°°(.*)','',i) for i in dupli.keys()))
    return df

Answer 4

回答by user12326346

You need to have the same header names for all the df you want to concat.

对于要连接的所有 df，您需要具有相同的标头名称。

Do it for example with :

例如用：

headername = list(df)

标题名称 = 列表（df）

Data = Data.filter(headername)

数据 = Data.filter(headername)

Python Pandas concat 产生 ValueError：计划形状未对齐

提问by Lt.Fr0st

回答by user3805082

回答by William Welsh

回答by DiMithras

回答by user12326346

相关推荐

最近更新

标签

Python Pandas concat 产生 ValueError：计划形状未对齐

提问by Lt.Fr0st

回答by user3805082

回答by William Welsh

回答by DiMithras

回答by user12326346

相关推荐

Python 如何找到字典值的长度

Python 我的路由中需要什么 base_name 参数才能使这个 Django API 工作？

Python 按增量增加所有列表值

Python PyCharm 无法识别在开发模式下安装的模块

相关推荐

最近更新

标签