pandas 向数据框的所有行添加值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46967581/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:42:17  来源:igfitidea点击:

Adding values to all rows of dataframe

pythonpandasdataframeaddition

提问by Jagruth

I have two pandas dataframes df1(of length 2) and df2(of length about 30 rows). Index values of df1 are always different and never occur in df2. I would like to add the average of columns from df1to corresponding columns of df2. Example: add 0.6 to all rows of c1 and 0.9 to all rows of c2 etc ...

我有两个Pandas数据帧df1(长度为 2)和df2(长度约为 30 行)。df1 的索引值总是不同的,永远不会出现在 df2 中。我想将来自df1的列的平均值添加 到df2 的相应列。示例:将 0.6 添加到 c1 的所有行,将 0.9 添加到 c2 的所有行等...

df1: 
  Date       c1   c2   c3   c4    c5   c6 ...  c10
2017-09-10  0.5  0.6  1.2   0.7  1.3  1.8 ...  1.3
2017-09-11  0.7  1.2  1.3   0.4  0.7  0.4 ...  1.5


df2:
  Date       c1   c2   c3   c4    c5   c6 ...  c10
2017-09-12  0.9  0.1  1.4   0.9  1.5  1.9 ...  1.9
2017-09-13  0.2  1.8  1.2   1.4  2.7  0.8 ...  1.1
    :                                  :  
    :                                  :     
2017-10-10  1.5  0.9  1.5   0.9  1.6  1.8 ...  1.7
2017-10-11  2.7  1.1  1.9   0.4  0.8  0.8 ...  1.3

How can I do that ?

我怎样才能做到这一点 ?

回答by piRSquared

When using meanon df1, it calculates over each column by default and produces a pd.Series.

使用meanon 时df1,它默认计算每一列并生成一个pd.Series.

When adding adding a pd.Seriesto a pd.DataFrameit aligns the index of the pd.Serieswith the columns of the pd.DataFrameand broadcasts along the index of the pd.DataFrame... by default.

将 a 添加pd.Series到 a 时,pd.DataFrame它会将 的索引pd.Series与 的列对齐,pd.DataFramepd.DataFrame默认沿 ... 的索引进行广播。

The only tricky bit is handling the Datecolumn.

唯一棘手的一点是处理Date列。

Option 1

选项1

m = df1.mean()
df2.loc[:, m.index] += m

df2

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

If I know that 'Date'is always in the first column, I can:

如果我知道它'Date'总是在第一列,我可以:

df2.iloc[:, 1:] += df1.mean()
df2

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7


Option 2
Notice that I use the append=Trueparameter in the set_indexjust incase there are things in the index you don't want to mess up.

选项 2
请注意,我仅在索引中存在您不想弄乱的内容时使用该append=True参数set_index

df2.set_index('Date', append=True).add(df1.mean()).reset_index('Date')

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

If you don't care about the index, you can shorten this to

如果您不关心索引,则可以将其缩短为

df2.set_index('Date').add(df1.mean()).reset_index()

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

回答by jondo

If all columns are in both data frames, then just

如果所有列都在两个数据框中,则只需

for col in df2.columns:
    df2[col] = df2[col] + df1[col].mean()

if the columns are not necessarily in both then:

如果列不一定在两者中,则:

for col in df2.columns:
    if col in df1.columns:
        df2[col] = df2[col] + df1[col].mean()

回答by chumbak

There is probably a more efficient way but here is a quick and dirty solution. I hope this helps!

可能有一种更有效的方法,但这里有一个快速而肮脏的解决方案。我希望这有帮助!

d = {'c1': [0.5,0.7], 'c2': [0.6,1.2],'c3': [1.2,1.3]}
df1 = pd.DataFrame(data=d, index=['2017-09-10','2017-09-11'])
df2 = pd.DataFrame(data=d, index=['2017-09-12','2017-09-13'])

df1

df1

      Date   c1 c2  c3
2017-09-10  0.5 0.6 1.2
2017-09-11  0.7 1.2 1.3

df2

df2

Date   c1   c2  c3
2017-09-12  0.5 0.6 1.2
2017-09-13  0.7 1.2 1.3

The averages of each column in df1 can be obtained using the describe() function

可以使用describe()函数获得df1中每一列的平均值

df1.describe().ix['mean']

c1    0.60
c2    0.90
c3    1.25

And now, simply add the series to df2

现在,只需将该系列添加到 df2

df2 + df1.describe().ix['mean']

Date     c1 c2  c3
2017-09-12  1.1 1.5 2.45
2017-09-13  1.3 2.1 2.55

回答by pankaj mishra

This could be another way of doing it , just simplified this a little bit

这可能是另一种方法,只是稍微简化了一点

import pandas as pd
import numpy as np
from datetime import datetime, timedelta 
date_today=datetime.now()

#Creating df1 & df2 
df1=pd.DataFrame(
    {
        'Date':[date_today,date_today],
        'c1':[0.5,0.4],
        'c2':[0.6,0.3]
    }
)
df2=pd.DataFrame(
    {
        'Date':[date_today,date_today,date_today],
        'c1':[0.9,0.7,0.6],
        'c2':[0.8,0.4,0.3]
    }
)


#getting average of column c1
avg=df1["c1"].mean()

#Adding the average to your existing column of df2
df2['c1']+avg