pandas 合并一个 numpy 数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21921178/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:43:19  来源:igfitidea点击:

Binning a numpy array

pythonarraysnumpypandasscipy

提问by deltap

I have a numpy array which contains time series data. I want to bin that array into equal partitions of a given length (it is fine to drop the last partition if it is not the same size) and then calculate the mean of each of those bins.

我有一个包含时间序列数据的 numpy 数组。我想将该数组分箱为给定长度的相等分区(如果最后一个分区的大小不同,可以删除最后一个分区),然后计算每个分箱的平均值。

I suspect there is numpy, scipy, or pandas functionality to do this.

我怀疑有 numpy、scipy 或 pandas 功能可以做到这一点。

example:

例子:

data = [4,2,5,6,7,5,4,3,5,7]

for a bin size of 2:

对于 bin 大小为 2:

bin_data = [(4,2),(5,6),(7,5),(4,3),(5,7)]
bin_data_mean = [3,5.5,6,3.5,6]

for a bin size of 3:

对于 bin 大小为 3:

bin_data = [(4,2,5),(6,7,5),(4,3,5)]
bin_data_mean = [7.67,6,4]

回答by Joe Kington

Just use reshapeand then mean(axis=1).

只需使用reshape然后mean(axis=1)

As the simplest possible example:

作为最简单的例子:

import numpy as np

data = np.array([4,2,5,6,7,5,4,3,5,7])

print data.reshape(-1, 2).mean(axis=1)

More generally, we'd need to do something like this to drop the last bin when it's not an even multiple:

更一般地,当它不是偶数倍时,我们需要做这样的事情来删除最后一个 bin:

import numpy as np

width=3
data = np.array([4,2,5,6,7,5,4,3,5,7])

result = data[:(data.size // width) * width].reshape(-1, width).mean(axis=1)

print result

回答by TomAugspurger

Since you already have a numpy array, to avoid for loops, you can use reshapeand consider the new dimension to be the bin:

由于您已经有一个 numpy 数组,为了避免 for 循环,您可以使用reshape并将新维度视为 bin:

In [33]: data.reshape(2, -1)
Out[33]: 
array([[4, 2, 5, 6, 7],
       [5, 4, 3, 5, 7]])

In [34]: data.reshape(2, -1).mean(0)
Out[34]: array([ 4.5,  3. ,  4. ,  5.5,  7. ])

Actually this will just work if the size of datais divisible by n. I'll edit a fix.

实际上,如果 的大小可以data被 整除,这将起作用n。我会编辑一个修复程序。

Looks like Joe Kington has an answerthat handles that.

看起来乔金顿有一个解决这个问题的答案

回答by óscar López

Try this, using standard Python (NumPy isn't necessary for this). Assuming Python 2.x is in use:

试试这个,使用标准 Python(为此不需要 NumPy)。假设正在使用 Python 2.x:

data = [ 4, 2, 5, 6, 7, 5, 4, 3, 5, 7 ]

# example: for n == 2
n=2
partitions = [data[i:i+n] for i in xrange(0, len(data), n)]
partitions = partitions if len(partitions[-1]) == n else partitions[:-1]

# the above produces a list of lists
partitions
=> [[4, 2], [5, 6], [7, 5], [4, 3], [5, 7]]

# now the mean
[sum(x)/float(n) for x in partitions]
=> [3.0, 5.5, 6.0, 3.5, 6.0]

回答by Alexandre Kempf

I just wrote a function to apply it to all array size or dimension you want.

我刚刚编写了一个函数来将它应用于您想要的所有数组大小或维度。

  • datais your array
  • axisis the axis you want to been
  • binstepis the number of points between each bin (allow overlapping bins)
  • binsizeis the size of each bin
  • funcis the function you want to apply to the bin (np.max for maxpooling, np.mean for an average ...)

    def binArray(data, axis, binstep, binsize, func=np.nanmean):
        data = np.array(data)
        dims = np.array(data.shape)
        argdims = np.arange(data.ndim)
        argdims[0], argdims[axis]= argdims[axis], argdims[0]
        data = data.transpose(argdims)
        data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)]
        data = np.array(data).transpose(argdims)
        return data
    
  • 数据是你的数组
  • 是你想要的轴
  • binstep是每个 bin 之间的点数(允许重叠 bin)
  • binsize是每个 bin 的大小
  • func是你想应用到 bin 的函数(np.max 表示 maxpooling,np.mean 表示平均值……)

    def binArray(data, axis, binstep, binsize, func=np.nanmean):
        data = np.array(data)
        dims = np.array(data.shape)
        argdims = np.arange(data.ndim)
        argdims[0], argdims[axis]= argdims[axis], argdims[0]
        data = data.transpose(argdims)
        data = [func(np.take(data,np.arange(int(i*binstep),int(i*binstep+binsize)),0),0) for i in np.arange(dims[axis]//binstep)]
        data = np.array(data).transpose(argdims)
        return data
    

In you case it will be :

在你的情况下,它将是:

data = [4,2,5,6,7,5,4,3,5,7]
bin_data_mean = binArray(data, 0, 2, 2, np.mean)

or for the bin size of 3:

或对于 3 的 bin 大小:

bin_data_mean = binArray(data, 0, 3, 3, np.mean)