pandas 合并一个 numpy 数组

Question

提问by deltap

I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins.

我有一个包含时间序列数据的 numpy 数组。我想将该数组分箱为给定长度的相等分区（如果最后一个分区的大小不同，可以删除最后一个分区），然后计算每个分箱的平均值。

I suspect there is numpy, scipy, or pandas functionality to do this.

我怀疑有 numpy、scipy 或 pandas 功能可以做到这一点。

example:

例子：

data = [4,2,5,6,7,5,4,3,5,7]

for a bin size of 2:

对于 bin 大小为 2：

bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]

for a bin size of 3:

对于 bin 大小为 3：

bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]

Answer 1

回答by Joe Kington

Just use reshapeand then mean(axis=1).

只需使用reshape然后mean(axis=1)。

As the simplest possible example:

作为最简单的例子：

import numpy as np

data = np.array([4,2,5,6,7,5,4,3,5,7])

print data.reshape(-1, 2).mean(axis=1)

More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:

更一般地，当它不是偶数倍时，我们需要做这样的事情来删除最后一个 bin：

import numpy as np

width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])

result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)

print result

Answer 2

回答by TomAugspurger

Since you already have a numpy array, to avoid for loops, you can use reshapeand consider the new dimension to be the bin:

由于您已经有一个 numpy 数组，为了避免 for 循环，您可以使用reshape并将新维度视为 bin：

In [33]: data.reshape(2, -1)
Out[33]: 
array([[4, 2, 5, 6, 7],
       [5, 4, 3, 5, 7]])

In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5,  3. ,  4. ,  5.5,  7. ])

Actually this will just work if the size of datais divisible by n. I'll edit a fix.

实际上，如果的大小可以data被整除，这将起作用n。我会编辑一个修复程序。

Looks like Joe Kington has an answerthat handles that.

看起来乔金顿有一个解决这个问题的答案。

Answer 3

回答by óscar López

Try this, using standard Python (NumPy isn't necessary for this). Assuming Python 2.x is in use:

试试这个，使用标准 Python（为此不需要 NumPy）。假设正在使用 Python 2.x：

data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]

# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]

# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]

# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]

Answer 4

回答by Alexandre Kempf

I just wrote a function to apply it to all array size or dimension you want.

我刚刚编写了一个函数来将它应用于您想要的所有数组大小或维度。

datais your array
axisis the axis you want to been
binstepis the number of points between each bin (allow overlapping bins)
binsizeis the size of each bin

funcis the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...)

def binArray(data, axis, binstep, binsize, func=np.nanmean):
    data = np.array(data)
    dims = np.array(data.shape)
    argdims = np.arange(data.ndim)
    argdims[0], argdims[axis]= argdims[axis], argdims[0]
    data = data.transpose(argdims)
    data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)]
    data = np.array(data).transpose(argdims)
    return data

数据是你的数组
轴是你想要的轴
binstep是每个 bin 之间的点数（允许重叠 bin）
binsize是每个 bin 的大小

func是你想应用到 bin 的函数（np.max 表示 maxpooling，np.mean 表示平均值……）

def binArray(data, axis, binstep, binsize, func=np.nanmean):
    data = np.array(data)
    dims = np.array(data.shape)
    argdims = np.arange(data.ndim)
    argdims[0], argdims[axis]= argdims[axis], argdims[0]
    data = data.transpose(argdims)
    data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)]
    data = np.array(data).transpose(argdims)
    return data

In you case it will be :

在你的情况下，它将是：

data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)

or for the bin size of 3:

或对于 3 的 bin 大小：

bin_data_mean = binArray(data, 0, 3, 3, np.mean)

pandas 合并一个 numpy 数组

提问by deltap

回答by Joe Kington

回答by TomAugspurger

回答by óscar López

回答by Alexandre Kempf

相关推荐

最近更新

标签

pandas 合并一个 numpy 数组

提问by deltap

回答by Joe Kington

回答by TomAugspurger

回答by óscar López

回答by Alexandre Kempf

相关推荐

Python pandas 删除 SettingWithCopyWarning

使用 Pandas 按列总和的值分组

pandas 熊猫加入DataFrame强制后缀？

pandas 在 scikit-learn 中使用多个功能

相关推荐

最近更新

标签