通过索引和列名数组对 Pandas 数据框进行切片

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23686561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:03:27  来源:igfitidea点击:

Slice a Pandas dataframe by an array of indices and column names

pythonnumpypandasdataframeslice

提问by Artturi Bj?rk

I'm looking to replicate the behavior of a numpy array with a pandas dataframe. I want to pass an array of indices and column names and get a list of objects that are found in the corresponding index and column name.

我正在寻找用 Pandas 数据框复制 numpy 数组的行为。我想传递一个索引和列名数组,并获取在相应索引和列名中找到的对象列表。

import pandas as pd
import numpy as np

In numpy:

在 numpy 中:

array=np.array(range(9)).reshape([3,3])
print array
print array[[0,1],[0,1]]

[[0 1 2]
 [3 4 5]
 [6 7 8]]

[0 4]

In pandas:

在Pandas中:

prng = pd.period_range('1/1/2011', '1/1/2013', freq='A')
df=pd.DataFrame(array,index=prng)
print df

      0  1  2
2011  0  1  2
2012  3  4  5
2013  6  7  8

df[[2011,2012],[0,1]]

Expected output:

预期输出:

[0 4]

How should I slice this dataframe to get it to return the same as numpy?

我应该如何切片此数据帧以使其返回与 numpy 相同的值?

回答by Jeff

Pandas doesn't support this directly; it could, but the issue is how to specify that you want coordinates rather than different axes, e.g. df.iloc[[0,1],[0,1]]means give me the 0 and 1st rows and the 0 and 1st column.

Pandas 不直接支持这个;它可以,但问题是如何指定您想要坐标而不是不同的轴,例如df.iloc[[0,1],[0,1]]意味着给我第 0 和第 1 行以及第 0 和第 1 列。

That said, you can do this:

也就是说,你可以这样做:

You updated the question and say you want to start with the index values

您更新了问题并说您想从索引值开始

In [19]: row_indexer = df.index.get_indexer([Period('2011'),Period('2012')])

In [20]: col_indexer = df.columns.get_indexer([0,1])

In [21]: z = np.zeros(df.shape,dtype=bool)

In [22]: z[row_indexer,col_indexer] = True

In [23]: df.where(z)
Out[23]: 
       0   1   2
2011   0 NaN NaN
2012 NaN   4 NaN
2013 NaN NaN NaN

This seems easier though (these are the locations)

这似乎更容易(这些是位置)

In [63]: df.values[[0,1],[0,1]]
Out[63]: array([0, 4])

Or this; as the Period index will be sliced correctly from the strings (don't use integers here)

或这个; 因为 Period 索引将从字符串中正确切片(不要在此处使用整数)

In [26]: df.loc['2011',0]
Out[26]: 0

In [27]: df.loc['2012',1]
Out[27]: 4