pandas 熊猫从数据框中选择不连续的列

Question

提问by dartdog

I am using the following to select specific columns from the dataframe comb, which I would like to bring into a new dataframe. The individual selects work fine EG: comb.ix[:,0:1], but when I attempt to combine them using the + I get a bad result the 1st selection ([:,0:1]) getting stuck on the end of the dataframe and the values contained in original col 1 are wiped out while appearing at the end of the row. What is the right way to get just the columns I want? (I'd include sample data but as you may see, too many columns...which is why I'm trying to do it this way)

我正在使用以下内容从数据框梳中选择特定的列，我想将其引入新的数据框中。个人选择工作正常 EG：comb.ix[:,0:1]，但是当我尝试使用 + 组合它们时，我得到了一个糟糕的结果，第一个选择 ([:,0:1]) 卡在最后数据帧的和包含在原始列 1 中的值在出现在行的末尾时被清除。获得我想要的列的正确方法是什么？（我会包含示例数据，但正如您所看到的，列太多……这就是我尝试这样做的原因）

comb.ix[:,0:1]+comb.ix[:,17:342]

Answer 1

回答by EdChum

If you want to concatenate a sub selection of your df columns then use pd.concat:

如果要连接 df 列的子选择，请使用pd.concat：

pd.concat([comb.ix[:,0:1],comb.ix[:,17:342]], axis=1)

So long as the indices match then this will align correctly.

只要索引匹配，就会正确对齐。

Thanks to @iHightower that you can also sub-select by passing the labels:

感谢@iHightower，您还可以通过传递标签来进行子选择：

pd.concat([df.ix[:,'Col1':'Col5'],df.ix[:,'Col9':'Col15']],a??xis=1)

Note that .ixwill be deprecated in a future version the following should work:

请注意，.ix将在未来版本中弃用以下内容：

In [115]:
df = pd.DataFrame(columns=['col' + str(x) for x in range(10)])
df

Out[115]:
Empty DataFrame
Columns: [col0, col1, col2, col3, col4, col5, col6, col7, col8, col9]
Index: []

In [118]:
pd.concat([df.loc[:, 'col2':'col4'], df.loc[:, 'col7':'col8']], axis=1)
?
Out[118]:
Empty DataFrame
Columns: [col2, col3, col4, col7, col8]
Index: []

Or using iloc:

或使用iloc：

In [127]:
pd.concat([df.iloc[:, df.columns.get_loc('col2'):df.columns.get_loc('col4')], df.iloc[:, df.columns.get_loc('col7'):df.columns.get_loc('col8')]], axis=1)

Out[127]:
Empty DataFrame
Columns: [col2, col3, col7]
Index: []

Note that ilocslicing is open/closed so the end range is not included so you'd have to find the column after the column of interest if you want to include it:

请注意，iloc切片是打开/关闭的，因此不包括结束范围，因此如果要包含它，则必须在感兴趣的列之后找到该列：

In [128]:
pd.concat([df.iloc[:, df.columns.get_loc('col2'):df.columns.get_loc('col4')+1], df.iloc[:, df.columns.get_loc('col7'):df.columns.get_loc('col8')+1]], axis=1)

Out[128]:
Empty DataFrame
Columns: [col2, col3, col4, col7, col8]
Index: []

Answer 2

回答by neves

NumPy has a nice module named r_, allowing you to solve it with the modern DataFrame selection interface, iloc:

NumPy 有一个不错的模块，名为r_，允许您使用现代 DataFrame 选择界面 iloc 来解决它：

df.iloc[:, np.r_[0:1, 17:342]]

I believe this is a more elegant solution.

我相信这是一个更优雅的解决方案。

The method even support more complex selections:

该方法甚至支持更复杂的选择：

df.iloc[:, np.r_[0:1, 5, 16, 17:342:2, -5:]]

Answer 3

回答by David Hernandez Mendez

I recently solved it by just appending ranges

我最近通过附加范围解决了它

r1 = pd.Series(range(5))
r2 = pd.Series([10,15,20])
final_range = r1.append(r2)
df.iloc[:,final_range]

Then you will get columns from 0:5 and 10, 15, 20.

然后您将获得 0:5 和 10、15、20 的列。

pandas 熊猫从数据框中选择不连续的列

提问by dartdog

回答by EdChum

回答by neves

回答by David Hernandez Mendez

相关推荐

最近更新

标签

pandas 熊猫从数据框中选择不连续的列

提问by dartdog

回答by EdChum

回答by neves

回答by David Hernandez Mendez

相关推荐

pandas 如何根据列值对熊猫数据框进行切片？

使用条件语句替换 Pandas DataFrame 中的条目

将 Pandas DataFrame 写入换行符分隔的 JSON

Python pandas：获取数据框中值的位置

相关推荐

最近更新

标签