pandas 如何根据第 i 个字段的值对 numpy 数组进行切片?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12290844/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I slice a numpy array by the value of the ith field?
提问by user1621048
I have a 2D numpy array with 4 columns and a lot of rows (>10000, this number is not fixed).
我有一个 2D numpy 数组,有 4 列和很多行(> 10000,这个数字不是固定的)。
I need to create nsubarrays by the value of one of the columns; the closest question I found was How slice Numpy array by column value; nevertheless, I dont know the exact values in the field (they're floats and they change in every file I need), but I know they are no more than 20.
我需要通过其中一列的值创建n个子数组;我发现的最接近的问题是How slice Numpy array by column value;尽管如此,我不知道该字段中的确切值(它们是浮点数,并且在我需要的每个文件中都会发生变化),但我知道它们不超过 20。
I guess I could read line by line, record the different values and then make the split, but I figure there is a more efficient way to do this.
我想我可以逐行读取,记录不同的值,然后进行拆分,但我认为有一种更有效的方法可以做到这一点。
Thank you.
谢谢你。
回答by Taro Sato
You can use multidimensional slicing conveniently:
您可以方便地使用多维切片:
import numpy as np
# just creating a random 2d array.
a = (np.random.random((10, 5)) * 100).astype(int)
print a
print
# select by the values of the 3rd column, selecting out more than 50.
b = a[a[:, 2] > 50]
# showing the rows for which the 3rd column value is > 50.
print b
Another example, closer to what you are asking in the comment (?):
另一个例子,更接近你在评论中提出的问题 (?):
import numpy as np
# just creating a random 2d array.
a = np.random.random((10000, 5)) * 100
print a
print
# select by the values of the 3rd column, selecting out more than 50.
b = a[a[:, 2] > 50.0]
b = b[b[:, 2] <= 50.2]
# showing the rows for which the 3rd column value is > 50.
print b
This selects out rows for which the 3rd column values are (50, 50.2].
这将选择第三列值为 (50, 50.2] 的行。
回答by Daniel Velkov
You can use pandas for that task and more specifically the groupbymethod of DataFrame. Here's some example code:
您可以将Pandas用于该任务,更具体地说是 DataFrame的groupby方法。下面是一些示例代码:
import numpy as np
import pandas as pd
# generate a random 20x5 DataFrame
x=np.random.randint(0,10,100)
x.shape=(20,5)
df=pd.DataFrame(x)
# group by the values in the 1st column
g=df.groupby(0)
# make a dict with the numbers from the 1st column as keys and
# the slice of the DataFrame corresponding to each number as
# values of the dict
d={k:v for (k,v) in g}
Some sample output:
一些示例输出:
In [74]: d[3]
Out[74]:
0 1 2 3 4
2 3 2 5 4 3
5 3 9 4 3 2
12 3 3 9 6 2
16 3 2 1 6 5
17 3 5 3 1 8

