pandas 熊猫连接失败
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35137952/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas concat failing
提问by user308827
I am trying to concat dataframes based on the foll. 2 csv files:
我正在尝试根据 foll 连接数据帧。2个csv文件:
df_a: https://www.dropbox.com/s/slcu7o7yyottujl/df_current.csv?dl=0
df_a: https://www.dropbox.com/s/slcu7o7yyottujl/df_current.csv?dl=0
df_b: https://www.dropbox.com/s/laveuldraurdpu1/df_climatology.csv?dl=0
df_b: https://www.dropbox.com/s/laveuldraurdpu1/df_climatology.csv?dl=0
Both of these have the same number and names of columns. However, when I do this:
这两者具有相同的列数和名称。但是,当我这样做时:
pandas.concat([df_a, df_b])
I get the error:
我收到错误:
AssertionError: Number of manager items must equal union of block items
# manager items: 20, # tot_items: 21
How to fix this?
如何解决这个问题?
回答by phil_20686
I believe that this error occurs if the following two conditions are met:
我相信如果满足以下两个条件就会出现这个错误:
- The data frames have different columns. (i.e.
(df1.columns == df2.columns)
isFalse
- The columns has a repeated value.
- 数据框有不同的列。(即
(df1.columns == df2.columns)
是False
- 列具有重复值。
Basically if you concat
dataframes with columns [A,B,C]
and [B,C,D]
it can work out to make one series for each distinct column name. So if I try to join a third dataframe [B,B,C]
it does not know which column to append and ends up with fewer distinct columns than it thinks it needs.
基本上,如果您concat
使用带有列的数据框,[A,B,C]
并且[B,C,D]
它可以为每个不同的列名称制作一个系列。因此,如果我尝试加入第三个数据框,[B,B,C]
它不知道要附加哪一列,最终得到的不同列比它认为需要的要少。
If your dataframes are such that df1.columns == df2.columns
then it will work anyway. So you can join [B,B,C]
to [B,B,C]
, but not to [C,B,B]
, as if the columns are identical it probably just uses the integer indexes or something.
如果您的数据帧是这样的,df1.columns == df2.columns
那么无论如何它都会起作用。所以你可以加入[B,B,C]
到[B,B,C]
,但不能加入到,[C,B,B]
好像列是相同的,它可能只使用整数索引或其他东西。
回答by kmader
You can get around this issue with a 'manual' concatenation, in this case your
您可以通过“手动”连接来解决此问题,在这种情况下,您的
list_of_dfs = [df_a, df_b]
And instead of running
而不是跑步
giant_concat_df = pd.concat(list_of_dfs,0)
You can use turn all of the dataframes to a list of dictionaries and then make a new data frame from these lists (merged with chain)
您可以使用将所有数据框转换为字典列表,然后从这些列表中创建一个新数据框(与链合并)
from itertools import chain
list_of_dicts = [cur_df.T.to_dict().values() for cur_df in list_of_dfs]
giant_concat_df = pd.DataFrame(list(chain(*list_of_dicts)))
回答by Karatheodory
Unfortunately, the source files are already unavailable, so I can't check my solution in your case. In my case the error occurred when:
不幸的是,源文件已经不可用,所以我无法在你的情况下检查我的解决方案。在我的情况下,错误发生在:
- Data frames have two columns with the same name (I've had
ID
andid
columns, which I then converted to lower case, so they become the same) - Value types of the same-named columns are different
- 数据框有两列同名(我有
ID
和id
列,然后我将其转换为小写,因此它们变得相同) - 同名列的值类型不同
Here is an example which gives me the error in question:
这是一个示例,它给了我有问题的错误:
df1 = pd.DataFrame(data=[
['a', 'b', 'id', 1],
['a', 'b', 'id', 2]
], columns=['A', 'B', 'id', 'id'])
df2 = pd.DataFrame(data=[
['b', 'c', 'id', 1],
['b', 'c', 'id', 2]
], columns=['B', 'C', 'id', 'id'])
pd.concat([df1, df2])
>>> AssertionError: Number of manager items must equal union of block items
# manager items: 4, # tot_items: 5
Removing / renaming one of the columns makes this code work.
删除/重命名其中一列使此代码起作用。