pandas 如何根据第 i 个字段的值对 numpy 数组进行切片？

Question

提问by user1621048

I have a 2D numpy array with 4 columns and a lot of rows (>10000, this number is not fixed).

我有一个 2D numpy 数组，有 4 列和很多行（> 10000，这个数字不是固定的）。

I need to create nsubarrays by the value of one of the columns; the closest question I found was How slice Numpy array by column value; nevertheless, I dont know the exact values in the field (they're floats and they change in every file I need), but I know they are no more than 20.

我需要通过其中一列的值创建n个子数组；我发现的最接近的问题是How slice Numpy array by column value；尽管如此，我不知道该字段中的确切值（它们是浮点数，并且在我需要的每个文件中都会发生变化），但我知道它们不超过 20。

I guess I could read line by line, record the different values and then make the split, but I figure there is a more efficient way to do this.

我想我可以逐行读取，记录不同的值，然后进行拆分，但我认为有一种更有效的方法可以做到这一点。

Thank you.

谢谢你。

Answer 1

回答by Taro Sato

You can use multidimensional slicing conveniently:

您可以方便地使用多维切片：

import numpy as np

# just creating a random 2d array.
a = (np.random.random((10, 5)) * 100).astype(int)
print a
print

# select by the values of the 3rd column, selecting out more than 50.
b = a[a[:, 2] > 50]

# showing the rows for which the 3rd column value is > 50.
print b

Another example, closer to what you are asking in the comment (?):

另一个例子，更接近你在评论中提出的问题 (?)：

import numpy as np

# just creating a random 2d array.
a = np.random.random((10000, 5)) * 100
print a
print

# select by the values of the 3rd column, selecting out more than 50.
b = a[a[:, 2] > 50.0]
b = b[b[:, 2] <= 50.2]

# showing the rows for which the 3rd column value is > 50.
print b

This selects out rows for which the 3rd column values are (50, 50.2].

这将选择第三列值为 (50, 50.2] 的行。

Answer 2

回答by Daniel Velkov

You can use pandas for that task and more specifically the groupbymethod of DataFrame. Here's some example code:

您可以将Pandas用于该任务，更具体地说是 DataFrame的groupby方法。下面是一些示例代码：

import numpy as np
import pandas as pd

# generate a random 20x5 DataFrame
x=np.random.randint(0,10,100)
x.shape=(20,5)
df=pd.DataFrame(x)

# group by the values in the 1st column
g=df.groupby(0)

# make a dict with the numbers from the 1st column as keys and
# the slice of the DataFrame corresponding to each number as
# values of the dict
d={k:v for (k,v) in g}

Some sample output:

一些示例输出：

In [74]: d[3]
Out[74]: 
    0  1  2  3  4
2   3  2  5  4  3
5   3  9  4  3  2
12  3  3  9  6  2
16  3  2  1  6  5
17  3  5  3  1  8

pandas 如何根据第 i 个字段的值对 numpy 数组进行切片？

提问by user1621048

回答by Taro Sato

回答by Daniel Velkov

相关推荐

最近更新

标签

pandas 如何根据第 i 个字段的值对 numpy 数组进行切片？

提问by user1621048

回答by Taro Sato

回答by Daniel Velkov

相关推荐

apache SSL 收到超过最大允许长度的记录。（错误代码：ssl_error_rx_record_too_long）

apache 使用“共享”选项编译 OpenSSL？

Linux/Apache 上的 ColdFusion 是否稳定？

apache Zend 框架项目显示空白页面，没有任何错误

相关推荐

最近更新

标签