pandas 合并一个 numpy 数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/21921178/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Binning a numpy array
提问by deltap
I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins.
我有一个包含时间序列数据的 numpy 数组。我想将该数组分箱为给定长度的相等分区(如果最后一个分区的大小不同,可以删除最后一个分区),然后计算每个分箱的平均值。
I suspect there is numpy, scipy, or pandas functionality to do this.
我怀疑有 numpy、scipy 或 pandas 功能可以做到这一点。
example:
例子:
data = [4,2,5,6,7,5,4,3,5,7]
for a bin size of 2:
对于 bin 大小为 2:
bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]
for a bin size of 3:
对于 bin 大小为 3:
bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]
回答by Joe Kington
Just use reshapeand then mean(axis=1).
只需使用reshape然后mean(axis=1)。
As the simplest possible example:
作为最简单的例子:
import numpy as np
data = np.array([4,2,5,6,7,5,4,3,5,7])
print data.reshape(-1, 2).mean(axis=1)
More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:
更一般地,当它不是偶数倍时,我们需要做这样的事情来删除最后一个 bin:
import numpy as np
width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])
result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)
print result
回答by TomAugspurger
Since you already have a numpy array, to avoid for loops, you can use reshapeand consider the new dimension to be the bin:
由于您已经有一个 numpy 数组,为了避免 for 循环,您可以使用reshape并将新维度视为 bin:
In [33]: data.reshape(2, -1)
Out[33]: 
array([[4, 2, 5, 6, 7],
       [5, 4, 3, 5, 7]])
In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5,  3. ,  4. ,  5.5,  7. ])
Actually this will just work if the size of datais divisible by n. I'll edit a fix.
实际上,如果 的大小可以data被 整除,这将起作用n。我会编辑一个修复程序。
Looks like Joe Kington has an answerthat handles that.
回答by óscar López
Try this, using standard Python (NumPy isn't necessary for this). Assuming Python 2.x is in use:
试试这个,使用标准 Python(为此不需要 NumPy)。假设正在使用 Python 2.x:
data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]
# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]
# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]
# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]
回答by Alexandre Kempf
I just wrote a function to apply it to all array size or dimension you want.
我刚刚编写了一个函数来将它应用于您想要的所有数组大小或维度。
- datais your array
- axisis the axis you want to been
- binstepis the number of points between each bin (allow overlapping bins)
- binsizeis the size of each bin
- funcis the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...) - def binArray(data, axis, binstep, binsize, func=np.nanmean): data = np.array(data) dims = np.array(data.shape) argdims = np.arange(data.ndim) argdims[0], argdims[axis]= argdims[axis], argdims[0] data = data.transpose(argdims) data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)] data = np.array(data).transpose(argdims) return data
- 数据是你的数组
- 轴是你想要的轴
- binstep是每个 bin 之间的点数(允许重叠 bin)
- binsize是每个 bin 的大小
- func是你想应用到 bin 的函数(np.max 表示 maxpooling,np.mean 表示平均值……) - def binArray(data, axis, binstep, binsize, func=np.nanmean): data = np.array(data) dims = np.array(data.shape) argdims = np.arange(data.ndim) argdims[0], argdims[axis]= argdims[axis], argdims[0] data = data.transpose(argdims) data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)] data = np.array(data).transpose(argdims) return data
In you case it will be :
在你的情况下,它将是:
data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)
or for the bin size of 3:
或对于 3 的 bin 大小:
bin_data_mean = binArray(data, 0, 3, 3, np.mean)

