什么是 Pandas 中 dataframe.loc() 的 Numpy 等价物

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51508682/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:50:27  来源:igfitidea点击:

What is Numpy equivalence of dataframe.loc() in Pandas

pythonpandasnumpy

提问by Chris

I have a 120,000*4 numpy array as shown below. Each row is a sample. The first column is time in second, or the indexusing Pandas terminology.

我有一个 120,000*4 的 numpy 数组,如下所示。每一行都是一个样本。第一列是以秒为单位的时间,或index使用 Pandas 术语。

0.014      14.175  -29.97  -22.68 
0.022      13.905  -29.835 -22.68
0.030      12.257  -29.32  -22.67
... ...
1259.980   -0.405   2.205   3.825
1259.991   -0.495   2.115   3.735

I want to select the rows recorded between 100.000 to 200.000 sec and save it into a new array. If this were a Pandas dataframe, I would simply write df.loc[100:200]. What is the equivalent operation in numpy?

我想选择记录在 100.000 到 200.000 秒之间的行并将其保存到一个新数组中。如果这是 Pandas 数据框,我会简单地写df.loc[100:200]. numpy 中的等效操作是什么?

This is NOT a question of feasibility. I simply wonder if there are any pythonic one-line solutions.

这不是可行性问题。我只是想知道是否有任何 pythonic 单行解决方案。

采纳答案by rafaelc

This assumes indexes are sorted:

这假设索引已排序:

IIUC,

国际大学联盟,

x=np.array([ [1,2,3,4],
           [5,6,7,8],
           [9,10,11,12],
           [13,14,15,16]])

x[(x[:,0] >= 5) & (x[:,0] <= 9) ]

So you would have 100 and 200 instead of 5 and 9.

所以你会有 100 和 200 而不是 5 和 9。



For a more general solution, check Wen`s answer

如需更通用的解决方案,请查看Wen 的回答

回答by YOBEN_S

Data from Raf

来自 Raf 的数据

x[np.where(x[:,0]==5)[0][0]:np.where(x[:,0]==9)[0][0]+1,:]
Out[341]: 
array([[ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

Notice

注意

only using greater and less than for that can not fully replace the .loc, the back end of .loc is index position not value range

只用大于和小于不能完全替代.loc,.loc的后端是索引位置而不是取值范围

For example

例如

df
Out[348]: 
       0   1   2   3
0      1   2   3   4
1      5   6   7   8
4444   9  10  11  12
3     13  14  15  16

df.loc[1:3]
Out[347]: 
       0   1   2   3
1      5   6   7   8
4444   9  10  11  12
3     13  14  15  16