pandas 熊猫从数据框中选择不连续的列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29221502/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas selecting discontinuous columns from a dataframe
提问by dartdog
I am using the following to select specific columns from the dataframe comb, which I would like to bring into a new dataframe. The individual selects work fine EG: comb.ix[:,0:1], but when I attempt to combine them using the + I get a bad result the 1st selection ([:,0:1]) getting stuck on the end of the dataframe and the values contained in original col 1 are wiped out while appearing at the end of the row. What is the right way to get just the columns I want? (I'd include sample data but as you may see, too many columns...which is why I'm trying to do it this way)
我正在使用以下内容从数据框梳中选择特定的列,我想将其引入新的数据框中。个人选择工作正常 EG:comb.ix[:,0:1],但是当我尝试使用 + 组合它们时,我得到了一个糟糕的结果,第一个选择 ([:,0:1]) 卡在最后数据帧的 和包含在原始列 1 中的值在出现在行的末尾时被清除。获得我想要的列的正确方法是什么?(我会包含示例数据,但正如您所看到的,列太多……这就是我尝试这样做的原因)
comb.ix[:,0:1]+comb.ix[:,17:342]
回答by EdChum
If you want to concatenate a sub selection of your df columns then use pd.concat:
如果要连接 df 列的子选择,请使用pd.concat:
pd.concat([comb.ix[:,0:1],comb.ix[:,17:342]], axis=1)
So long as the indices match then this will align correctly.
只要索引匹配,就会正确对齐。
Thanks to @iHightower that you can also sub-select by passing the labels:
感谢@iHightower,您还可以通过传递标签来进行子选择:
pd.concat([df.ix[:,'Col1':'Col5'],df.ix[:,'Col9':'Col15']],a??xis=1)
Note that .ixwill be deprecated in a future version the following should work:
请注意,.ix将在未来版本中弃用以下内容:
In [115]:
df = pd.DataFrame(columns=['col' + str(x) for x in range(10)])
df
Out[115]:
Empty DataFrame
Columns: [col0, col1, col2, col3, col4, col5, col6, col7, col8, col9]
Index: []
In [118]:
pd.concat([df.loc[:, 'col2':'col4'], df.loc[:, 'col7':'col8']], axis=1)
?
Out[118]:
Empty DataFrame
Columns: [col2, col3, col4, col7, col8]
Index: []
Or using iloc:
或使用iloc:
In [127]:
pd.concat([df.iloc[:, df.columns.get_loc('col2'):df.columns.get_loc('col4')], df.iloc[:, df.columns.get_loc('col7'):df.columns.get_loc('col8')]], axis=1)
Out[127]:
Empty DataFrame
Columns: [col2, col3, col7]
Index: []
Note that ilocslicing is open/closed so the end range is not included so you'd have to find the column after the column of interest if you want to include it:
请注意,iloc切片是打开/关闭的,因此不包括结束范围,因此如果要包含它,则必须在感兴趣的列之后找到该列:
In [128]:
pd.concat([df.iloc[:, df.columns.get_loc('col2'):df.columns.get_loc('col4')+1], df.iloc[:, df.columns.get_loc('col7'):df.columns.get_loc('col8')+1]], axis=1)
Out[128]:
Empty DataFrame
Columns: [col2, col3, col4, col7, col8]
Index: []
回答by neves
NumPy has a nice module named r_, allowing you to solve it with the modern DataFrame selection interface, iloc:
NumPy 有一个不错的模块,名为r_,允许您使用现代 DataFrame 选择界面 iloc 来解决它:
df.iloc[:, np.r_[0:1, 17:342]]
I believe this is a more elegant solution.
我相信这是一个更优雅的解决方案。
The method even support more complex selections:
该方法甚至支持更复杂的选择:
df.iloc[:, np.r_[0:1, 5, 16, 17:342:2, -5:]]
回答by David Hernandez Mendez
I recently solved it by just appending ranges
我最近通过附加范围解决了它
r1 = pd.Series(range(5))
r2 = pd.Series([10,15,20])
final_range = r1.append(r2)
df.iloc[:,final_range]
Then you will get columns from 0:5 and 10, 15, 20.
然后您将获得 0:5 和 10、15、20 的列。

