pandas 熊猫从数据框中选择不连续的列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29221502/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:05:15  来源:igfitidea点击:

Pandas selecting discontinuous columns from a dataframe

pythonpandas

提问by dartdog

I am using the following to select specific columns from the dataframe comb, which I would like to bring into a new dataframe. The individual selects work fine EG: comb.ix[:,0:1], but when I attempt to combine them using the + I get a bad result the 1st selection ([:,0:1]) getting stuck on the end of the dataframe and the values contained in original col 1 are wiped out while appearing at the end of the row. What is the right way to get just the columns I want? (I'd include sample data but as you may see, too many columns...which is why I'm trying to do it this way)

我正在使用以下内容从数据框梳中选择特定的列,我想将其引入新的数据框中。个人选择工作正常 EG:comb.ix[:,0:1],但是当我尝试使用 + 组合它们时,我得到了一个糟糕的结果,第一个选择 ([:,0:1]) 卡在最后数据帧的 和包含在原始列 1 中的值在出现在行的末尾时被清除。获得我想要的列的正确方法是什么?(我会包含示例数据,但正如您所看到的,列太多……这就是我尝试这样做的原因)

comb.ix[:,0:1]+comb.ix[:,17:342]

回答by EdChum

If you want to concatenate a sub selection of your df columns then use pd.concat:

如果要连接 df 列的子选择,请使用pd.concat

pd.concat([comb.ix[:,0:1],comb.ix[:,17:342]], axis=1)

So long as the indices match then this will align correctly.

只要索引匹配,就会正确对齐。

Thanks to @iHightower that you can also sub-select by passing the labels:

感谢@iHightower,您还可以通过传递标签来进行子选择:

pd.concat([df.ix[:,'Col1':'Col5'],df.ix[:,'Col9':'Col15']],a??xis=1)

Note that .ixwill be deprecated in a future version the following should work:

请注意,.ix将在未来版本中弃用以下内容:

In [115]:
df = pd.DataFrame(columns=['col' + str(x) for x in range(10)])
df

Out[115]:
Empty DataFrame
Columns: [col0, col1, col2, col3, col4, col5, col6, col7, col8, col9]
Index: []

In [118]:
pd.concat([df.loc[:, 'col2':'col4'], df.loc[:, 'col7':'col8']], axis=1)
?
Out[118]:
Empty DataFrame
Columns: [col2, col3, col4, col7, col8]
Index: []

Or using iloc:

或使用iloc

In [127]:
pd.concat([df.iloc[:, df.columns.get_loc('col2'):df.columns.get_loc('col4')], df.iloc[:, df.columns.get_loc('col7'):df.columns.get_loc('col8')]], axis=1)

Out[127]:
Empty DataFrame
Columns: [col2, col3, col7]
Index: []

Note that ilocslicing is open/closed so the end range is not included so you'd have to find the column after the column of interest if you want to include it:

请注意,iloc切片是打开/关闭的,因此不包括结束范围,因此如果要包含它,则必须在感兴趣的列之后找到该列:

In [128]:
pd.concat([df.iloc[:, df.columns.get_loc('col2'):df.columns.get_loc('col4')+1], df.iloc[:, df.columns.get_loc('col7'):df.columns.get_loc('col8')+1]], axis=1)

Out[128]:
Empty DataFrame
Columns: [col2, col3, col4, col7, col8]
Index: []

回答by neves

NumPy has a nice module named r_, allowing you to solve it with the modern DataFrame selection interface, iloc:

NumPy 有一个不错的模块,名为r_,允许您使用现代 DataFrame 选择界面 iloc 来解决它:

df.iloc[:, np.r_[0:1, 17:342]]

I believe this is a more elegant solution.

我相信这是一个更优雅的解决方案。

The method even support more complex selections:

该方法甚至支持更复杂的选择:

df.iloc[:, np.r_[0:1, 5, 16, 17:342:2, -5:]]

回答by David Hernandez Mendez

I recently solved it by just appending ranges

我最近通过附加范围解决了它

r1 = pd.Series(range(5))
r2 = pd.Series([10,15,20])
final_range = r1.append(r2)
df.iloc[:,final_range]

Then you will get columns from 0:5 and 10, 15, 20.

然后您将获得 0:5 和 10、15、20 的列。