pandas 熊猫连接失败

Question

提问by user308827

I am trying to concat dataframes based on the foll. 2 csv files:

我正在尝试根据 foll 连接数据帧。2个csv文件：

df_a: https://www.dropbox.com/s/slcu7o7yyottujl/df_current.csv?dl=0

df_b: https://www.dropbox.com/s/laveuldraurdpu1/df_climatology.csv?dl=0

Both of these have the same number and names of columns. However, when I do this:

这两者具有相同的列数和名称。但是，当我这样做时：

pandas.concat([df_a, df_b])

I get the error:

我收到错误：

AssertionError: Number of manager items must equal union of block items
# manager items: 20, # tot_items: 21

How to fix this?

如何解决这个问题？

Answer 1

回答by phil_20686

I believe that this error occurs if the following two conditions are met:

我相信如果满足以下两个条件就会出现这个错误：

The data frames have different columns. (i.e. (df1.columns == df2.columns)is False
The columns has a repeated value.

数据框有不同的列。（即(df1.columns == df2.columns)是False
列具有重复值。

Basically if you concatdataframes with columns [A,B,C]and [B,C,D]it can work out to make one series for each distinct column name. So if I try to join a third dataframe [B,B,C]it does not know which column to append and ends up with fewer distinct columns than it thinks it needs.

基本上，如果您concat使用带有列的数据框，[A,B,C]并且[B,C,D]它可以为每个不同的列名称制作一个系列。因此，如果我尝试加入第三个数据框，[B,B,C]它不知道要附加哪一列，最终得到的不同列比它认为需要的要少。

If your dataframes are such that df1.columns == df2.columnsthen it will work anyway. So you can join [B,B,C]to [B,B,C], but not to [C,B,B], as if the columns are identical it probably just uses the integer indexes or something.

如果您的数据帧是这样的，df1.columns == df2.columns那么无论如何它都会起作用。所以你可以加入[B,B,C]到[B,B,C]，但不能加入到，[C,B,B]好像列是相同的，它可能只使用整数索引或其他东西。

Answer 2

回答by kmader

You can get around this issue with a 'manual' concatenation, in this case your

您可以通过“手动”连接来解决此问题，在这种情况下，您的

list_of_dfs = [df_a, df_b]

And instead of running

而不是跑步

giant_concat_df = pd.concat(list_of_dfs,0)

You can use turn all of the dataframes to a list of dictionaries and then make a new data frame from these lists (merged with chain)

您可以使用将所有数据框转换为字典列表，然后从这些列表中创建一个新数据框（与链合并）

from itertools import chain
list_of_dicts = [cur_df.T.to_dict().values() for cur_df in list_of_dfs]    
giant_concat_df = pd.DataFrame(list(chain(*list_of_dicts)))

Answer 3

回答by Karatheodory

Unfortunately, the source files are already unavailable, so I can't check my solution in your case. In my case the error occurred when:

不幸的是，源文件已经不可用，所以我无法在你的情况下检查我的解决方案。在我的情况下，错误发生在：

Data frames have two columns with the same name (I've had IDand idcolumns, which I then converted to lower case, so they become the same)
Value types of the same-named columns are different

数据框有两列同名（我有ID和id列，然后我将其转换为小写，因此它们变得相同）
同名列的值类型不同

Here is an example which gives me the error in question:

这是一个示例，它给了我有问题的错误：

df1 = pd.DataFrame(data=[
    ['a', 'b', 'id', 1],
    ['a', 'b', 'id', 2]
], columns=['A', 'B', 'id', 'id'])

df2 = pd.DataFrame(data=[
    ['b', 'c', 'id', 1],
    ['b', 'c', 'id', 2]
], columns=['B', 'C', 'id', 'id'])
pd.concat([df1, df2])
>>> AssertionError: Number of manager items must equal union of block items
 # manager items: 4, # tot_items: 5

Removing / renaming one of the columns makes this code work.

删除/重命名其中一列使此代码起作用。

pandas 熊猫连接失败

提问by user308827

回答by phil_20686

回答by kmader

回答by Karatheodory

相关推荐

最近更新

标签

pandas 熊猫连接失败

提问by user308827

回答by phil_20686

回答by kmader

回答by Karatheodory

相关推荐

Python scipy/numpy/pandas 中时间序列的分层聚类？

pandas 即使大部分数据已填充，也无法插入数据框

将 Teradata 查询读入 Pandas

pandas 更有效的方法来表示在熊猫数据框中将列的子集居中并保留列名

相关推荐

最近更新

标签