pandas 熊猫数据框乘以一个系列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13166842/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:07:32  来源:igfitidea点击:

pandas dataframe multiply with a series

dataframepandasmultiplying

提问by jianpan

What is the best way to multiply all the columns of a Pandas DataFrameby a column vector stored in a Series? I used to do this in Matlab with repmat(), which doesn't exist in Pandas. I can use np.tile(), but it looks ugly to convert the data structure back and forth each time.

将 PandasDataFrame的所有列乘以存储在 a 中的列向量的最佳方法是Series什么?我曾经在 Matlab 中使用repmat(),它在 Pandas 中不存在。我可以使用np.tile(),但每次来回转换数据结构看起来很难看。

Thanks.

谢谢。

回答by spencerlyon2

This can be accomplished quite simply with the DataFrame method apply.

这可以通过 DataFrame 方法非常简单地完成apply

In[1]: import pandas as pd; import numpy as np

In[2]: df = pd.DataFrame(np.arange(40.).reshape((8, 5)), columns=list('abcde')); df
Out[2]: 
        a   b   c   d   e
    0   0   1   2   3   4
    1   5   6   7   8   9
    2  10  11  12  13  14
    3  15  16  17  18  19
    4  20  21  22  23  24
    5  25  26  27  28  29
    6  30  31  32  33  34
    7  35  36  37  38  39

In[3]: ser = pd.Series(np.arange(8) * 10); ser
Out[3]: 
    0     0
    1    10
    2    20
    3    30
    4    40
    5    50
    6    60
    7    70

Now that we have our DataFrameand Serieswe need a function to pass to apply.

现在我们有了我们的DataFrameSeries我们需要一个函数来传递给apply.

In[4]: func = lambda x: np.asarray(x) * np.asarray(ser)

We can pass this to df.applyand we are good to go

我们可以把它传递给df.apply我们,我们很高兴去

In[5]: df.apply(func)
Out[5]:
          a     b     c     d     e
    0     0     0     0     0     0
    1    50    60    70    80    90
    2   200   220   240   260   280
    3   450   480   510   540   570
    4   800   840   880   920   960
    5  1250  1300  1350  1400  1450
    6  1800  1860  1920  1980  2040
    7  2450  2520  2590  2660  2730

df.applyacts column-wise by default, but it can can also act row-wise by passing axis=1as an argument to apply.

df.apply默认情况下按列操作,但它也可以通过axis=1作为参数传递给apply.

In[6]: ser2 = pd.Series(np.arange(5) *5); ser2
Out[6]: 
    0     0
    1     5
    2    10
    3    15
    4    20

In[7]: func2 = lambda x: np.asarray(x) * np.asarray(ser2)

In[8]: df.apply(func2, axis=1)
Out[8]: 
       a    b    c    d    e
    0  0    5   20   45   80
    1  0   30   70  120  180
    2  0   55  120  195  280
    3  0   80  170  270  380
    4  0  105  220  345  480
    5  0  130  270  420  580
    6  0  155  320  495  680
    7  0  180  370  570  780

This could be done more concisely by defining the anonymous function inside apply

这可以通过在内部定义匿名函数来更简洁地完成 apply

In[9]: df.apply(lambda x: np.asarray(x) * np.asarray(ser))
Out[9]: 
          a     b     c     d     e
    0     0     0     0     0     0
    1    50    60    70    80    90
    2   200   220   240   260   280
    3   450   480   510   540   570
    4   800   840   880   920   960
    5  1250  1300  1350  1400  1450
    6  1800  1860  1920  1980  2040
    7  2450  2520  2590  2660  2730

In[10]: df.apply(lambda x: np.asarray(x) * np.asarray(ser2), axis=1)
Out[10]:
       a    b    c    d    e
    0  0    5   20   45   80
    1  0   30   70  120  180
    2  0   55  120  195  280
    3  0   80  170  270  380
    4  0  105  220  345  480
    5  0  130  270  420  580
    6  0  155  320  495  680
    7  0  180  370  570  780

回答by Andy Hayden

Why not create your own dataframe tile function:

为什么不创建自己的数据框平铺功能:

def tile_df(df, n, m):
    dfn = df.T
    for _ in range(1, m):
        dfn = dfn.append(df.T, ignore_index=True)
    dfm = dfn.T
    for _ in range(1, n):
        dfm = dfm.append(dfn.T, ignore_index=True)
    return dfm

Example:

例子:

df = pandas.DataFrame([[1,2],[3,4]])
tile_df(df, 2, 3)
#    0  1  2  3  4  5
# 0  1  2  1  2  1  2
# 1  3  4  3  4  3  4
# 2  1  2  1  2  1  2
# 3  3  4  3  4  3  4

However, the docsnote: "DataFrame is not intended to be a drop-in replacement for ndarray as its indexing semantics are quite different in places from a matrix."Which presumably should be interpreted as "use numpy if you are doing lots of matrix stuff".

但是文档指出:“DataFrame 并不打算直接替代 ndarray,因为它的索引语义在某些地方与矩阵完全不同。” 这大概应该被解释为“如果你正在做很多矩阵的东西,请使用 numpy”