Python 熊猫 concat ignore_index 不起作用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32801806/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:17:15  来源:igfitidea点击:

pandas concat ignore_index doesn't work

pythonpandasappendconcat

提问by muon

I am trying to column-bind dataframes and having issue with pandas concat, as ignore_index=Truedoesn't seem to work:

我正在尝试对数据框进行列绑定并且遇到了 pandas 问题concat,因为ignore_index=True似乎不起作用:

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 2, 3,4])

df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D2': ['D4', 'D5', 'D6', 'D7']},
                    index=[ 5, 6, 7,3])
df1
#     A   B   D
# 0  A0  B0  D0
# 2  A1  B1  D1
# 3  A2  B2  D2
# 4  A3  B3  D3

df2
#    A1   C  D2
# 5  A4  C4  D4
# 6  A5  C5  D5
# 7  A6  C6  D6
# 3  A7  C7  D7

dfs = [df1,df2]
df = pd.concat( dfs,axis=1,ignore_index=True)     
print df   

and the result is

结果是

     0    1    2    3    4    5    
0   A0   B0   D0  NaN  NaN  NaN  
2   A1   B1   D1  NaN  NaN  NaN    
3   A2   B2   D2   A7   C7   D7   
4   A3   B3   D3  NaN  NaN  NaN  
5  NaN  NaN  NaN   A4   C4   D4  
6  NaN  NaN  NaN   A5   C5   D5  
7  NaN  NaN  NaN   A6   C6   D6           

Even if I reset index using

即使我使用重置索引

 df1.reset_index()    
 df2.reset_index() 

and then try

然后尝试

pd.concat([df1,df2],axis=1) 

it still produces the same result!

它仍然产生相同的结果!

采纳答案by cel

If I understood you correctly, this is what you would like to do.

如果我理解正确,这就是你想要做的。

import pandas as pd

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 2, 3,4])

df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D2': ['D4', 'D5', 'D6', 'D7']},
                    index=[ 4, 5, 6 ,7])


df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)

df = pd.concat( [df1, df2], axis=1) 

Which gives:

这使:

    A   B   D   A1  C   D2
0   A0  B0  D0  A4  C4  D4
1   A1  B1  D1  A5  C5  D5
2   A2  B2  D2  A6  C6  D6
3   A3  B3  D3  A7  C7  D7

Actually, I would have expected that df = pd.concat(dfs,axis=1,ignore_index=True)gives the same result.

实际上,我会期望df = pd.concat(dfs,axis=1,ignore_index=True)给出相同的结果。

This is the excellent explanation from jreback:

这是jreback的出色解释:

ignore_index=True‘ignores', meaning doesn't align on the joining axis. it simply pastes them together in the order that they are passed, then reassigns a range for the actual index (e.g. range(len(index))) so the difference between joining on non-overlapping indexes (assume axis=1in the example), is that with ignore_index=False(the default), you get the concat of the indexes, and with ignore_index=Trueyou get a range.

ignore_index=True'ignores',意思是在连接轴上不对齐。它只是按照传递的顺序将它们粘贴在一起,然后为实际索引(例如range(len(index)))重新分配一个范围,因此加入非重叠索引(axis=1在示例中假设)之间的区别在于ignore_index=False(默认),您获取索引的连接,并ignore_index=True得到一个范围。

回答by Alex

The ignore_index option is working in your example, you just need to know that it is ignoring the axis of concatenationwhich in your case is the columns. (Perhaps a better name would be ignore_labels.) If you want the concatenation to ignore the index labels, then your axis variable has to be set to 0 (the default).

ignore_index 选项在您的示例中有效,您只需要知道它忽略了在您的情况下是列的连接轴。(也许一个更好的名称是 ignore_labels。)如果您希望串联忽略索引标签,那么您的轴变量必须设置为 0(默认值)。

回答by Dickster

Agree with the comments, always best to post expected output.

同意评论,最好总是发布预期的输出。

Is this what you are seeking?

这是你要找的吗?

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'D': ['D0', 'D1', 'D2', 'D3']},
                    index=[0, 2, 3,4])

df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D2': ['D4', 'D5', 'D6', 'D7']},
                    index=[ 5, 6, 7,3])


df1 = df1.transpose().reset_index(drop=True).transpose()
df2 = df2.transpose().reset_index(drop=True).transpose()


dfs = [df1,df2]
df = pd.concat( dfs,axis=0,ignore_index=True)

print df



    0   1   2
0  A0  B0  D0
1  A1  B1  D1
2  A2  B2  D2
3  A3  B3  D3
4  A4  C4  D4
5  A5  C5  D5
6  A6  C6  D6
7  A7  C7  D7

回答by Yury Wallet

Thanks for asking. I had the same issue. For some reason "ignore_index=True" doesn't help in my case. I wanted to keep index from the first dataset and ignore the second index a this worked for me

谢谢你的提问。我遇到过同样的问题。出于某种原因,“ignore_index=True”在我的情况下没有帮助。我想保留第一个数据集中的索引并忽略第二个索引,这对我有用

X_train=pd.concat([train_sp, X_train.reset_index(drop=True, inplace=True)], axis=1)