Python 熊猫 concat ignore_index 不起作用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32801806/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas concat ignore_index doesn't work
提问by muon
I am trying to column-bind dataframes and having issue with pandas concat
, as ignore_index=True
doesn't seem to work:
我正在尝试对数据框进行列绑定并且遇到了 pandas 问题concat
,因为ignore_index=True
似乎不起作用:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 2, 3,4])
df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D2': ['D4', 'D5', 'D6', 'D7']},
index=[ 5, 6, 7,3])
df1
# A B D
# 0 A0 B0 D0
# 2 A1 B1 D1
# 3 A2 B2 D2
# 4 A3 B3 D3
df2
# A1 C D2
# 5 A4 C4 D4
# 6 A5 C5 D5
# 7 A6 C6 D6
# 3 A7 C7 D7
dfs = [df1,df2]
df = pd.concat( dfs,axis=1,ignore_index=True)
print df
and the result is
结果是
0 1 2 3 4 5
0 A0 B0 D0 NaN NaN NaN
2 A1 B1 D1 NaN NaN NaN
3 A2 B2 D2 A7 C7 D7
4 A3 B3 D3 NaN NaN NaN
5 NaN NaN NaN A4 C4 D4
6 NaN NaN NaN A5 C5 D5
7 NaN NaN NaN A6 C6 D6
Even if I reset index using
即使我使用重置索引
df1.reset_index()
df2.reset_index()
and then try
然后尝试
pd.concat([df1,df2],axis=1)
it still produces the same result!
它仍然产生相同的结果!
采纳答案by cel
If I understood you correctly, this is what you would like to do.
如果我理解正确,这就是你想要做的。
import pandas as pd
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 2, 3,4])
df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D2': ['D4', 'D5', 'D6', 'D7']},
index=[ 4, 5, 6 ,7])
df1.reset_index(drop=True, inplace=True)
df2.reset_index(drop=True, inplace=True)
df = pd.concat( [df1, df2], axis=1)
Which gives:
这使:
A B D A1 C D2
0 A0 B0 D0 A4 C4 D4
1 A1 B1 D1 A5 C5 D5
2 A2 B2 D2 A6 C6 D6
3 A3 B3 D3 A7 C7 D7
Actually, I would have expected that df = pd.concat(dfs,axis=1,ignore_index=True)
gives the same result.
实际上,我会期望df = pd.concat(dfs,axis=1,ignore_index=True)
给出相同的结果。
This is the excellent explanation from jreback:
这是jreback的出色解释:
ignore_index=True
‘ignores', meaning doesn't align on the joining axis. it simply pastes them together in the order that they are passed, then reassigns a range for the actual index (e.g.range(len(index))
) so the difference between joining on non-overlapping indexes (assumeaxis=1
in the example), is that withignore_index=False
(the default), you get the concat of the indexes, and withignore_index=True
you get a range.
ignore_index=True
'ignores',意思是在连接轴上不对齐。它只是按照传递的顺序将它们粘贴在一起,然后为实际索引(例如range(len(index))
)重新分配一个范围,因此加入非重叠索引(axis=1
在示例中假设)之间的区别在于ignore_index=False
(默认),您获取索引的连接,并ignore_index=True
得到一个范围。
回答by Alex
The ignore_index option is working in your example, you just need to know that it is ignoring the axis of concatenationwhich in your case is the columns. (Perhaps a better name would be ignore_labels.) If you want the concatenation to ignore the index labels, then your axis variable has to be set to 0 (the default).
ignore_index 选项在您的示例中有效,您只需要知道它忽略了在您的情况下是列的连接轴。(也许一个更好的名称是 ignore_labels。)如果您希望串联忽略索引标签,那么您的轴变量必须设置为 0(默认值)。
回答by Dickster
Agree with the comments, always best to post expected output.
同意评论,最好总是发布预期的输出。
Is this what you are seeking?
这是你要找的吗?
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 2, 3,4])
df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D2': ['D4', 'D5', 'D6', 'D7']},
index=[ 5, 6, 7,3])
df1 = df1.transpose().reset_index(drop=True).transpose()
df2 = df2.transpose().reset_index(drop=True).transpose()
dfs = [df1,df2]
df = pd.concat( dfs,axis=0,ignore_index=True)
print df
0 1 2
0 A0 B0 D0
1 A1 B1 D1
2 A2 B2 D2
3 A3 B3 D3
4 A4 C4 D4
5 A5 C5 D5
6 A6 C6 D6
7 A7 C7 D7
回答by Yury Wallet
Thanks for asking. I had the same issue. For some reason "ignore_index=True" doesn't help in my case. I wanted to keep index from the first dataset and ignore the second index a this worked for me
谢谢你的提问。我遇到过同样的问题。出于某种原因,“ignore_index=True”在我的情况下没有帮助。我想保留第一个数据集中的索引并忽略第二个索引,这对我有用
X_train=pd.concat([train_sp, X_train.reset_index(drop=True, inplace=True)], axis=1)