pandas 向数据框的所有行添加值

Question

提问by Jagruth

I have two pandas dataframes df1(of length 2) and df2(of length about 30 rows). Index values of df1 are always different and never occur in df2. I would like to add the average of columns from df1to corresponding columns of df2. Example: add 0.6 to all rows of c1 and 0.9 to all rows of c2 etc ...

我有两个Pandas数据帧df1（长度为 2）和df2（长度约为 30 行）。df1 的索引值总是不同的，永远不会出现在 df2 中。我想将来自df1的列的平均值添加到df2 的相应列。示例：将 0.6 添加到 c1 的所有行，将 0.9 添加到 c2 的所有行等...

df1: 
  Date       c1   c2   c3   c4    c5   c6 ...  c10
2017-09-10  0.5  0.6  1.2   0.7  1.3  1.8 ...  1.3
2017-09-11  0.7  1.2  1.3   0.4  0.7  0.4 ...  1.5


df2:
  Date       c1   c2   c3   c4    c5   c6 ...  c10
2017-09-12  0.9  0.1  1.4   0.9  1.5  1.9 ...  1.9
2017-09-13  0.2  1.8  1.2   1.4  2.7  0.8 ...  1.1
    :                                  :  
    :                                  :     
2017-10-10  1.5  0.9  1.5   0.9  1.6  1.8 ...  1.7
2017-10-11  2.7  1.1  1.9   0.4  0.8  0.8 ...  1.3

How can I do that ?

我怎样才能做到这一点？

Answer 1

回答by piRSquared

When using meanon df1, it calculates over each column by default and produces a pd.Series.

使用meanon 时df1，它默认计算每一列并生成一个pd.Series.

When adding adding a pd.Seriesto a pd.DataFrameit aligns the index of the pd.Serieswith the columns of the pd.DataFrameand broadcasts along the index of the pd.DataFrame... by default.

将 a 添加pd.Series到 a 时，pd.DataFrame它会将的索引pd.Series与的列对齐，pd.DataFrame并pd.DataFrame默认沿 ... 的索引进行广播。

The only tricky bit is handling the Datecolumn.

唯一棘手的一点是处理Date列。

Option 1

选项1

m = df1.mean()
df2.loc[:, m.index] += m

df2

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

If I know that 'Date'is always in the first column, I can:

如果我知道它'Date'总是在第一列，我可以：

df2.iloc[:, 1:] += df1.mean()
df2

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

Option 2
Notice that I use the append=Trueparameter in the set_indexjust incase there are things in the index you don't want to mess up.

选项 2
请注意，我仅在索引中存在您不想弄乱的内容时使用该append=True参数set_index。

df2.set_index('Date', append=True).add(df1.mean()).reset_index('Date')

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

If you don't care about the index, you can shorten this to

如果您不关心索引，则可以将其缩短为

df2.set_index('Date').add(df1.mean()).reset_index()

         Date   c1   c2    c3    c4   c5   c6  c10
0  2017-09-12  1.5  1.0  2.65  1.45  2.5  3.0  3.3
1  2017-09-13  0.8  2.7  2.45  1.95  3.7  1.9  2.5
2  2017-10-10  2.1  1.8  2.75  1.45  2.6  2.9  3.1
3  2017-10-11  3.3  2.0  3.15  0.95  1.8  1.9  2.7

Answer 2

回答by jondo

If all columns are in both data frames, then just

如果所有列都在两个数据框中，则只需

for col in df2.columns:
    df2[col] = df2[col] + df1[col].mean()

if the columns are not necessarily in both then:

如果列不一定在两者中，则：

for col in df2.columns:
    if col in df1.columns:
        df2[col] = df2[col] + df1[col].mean()

Answer 3

回答by chumbak

There is probably a more efficient way but here is a quick and dirty solution. I hope this helps!

可能有一种更有效的方法，但这里有一个快速而肮脏的解决方案。我希望这有帮助！

d = {'c1': [0.5,0.7], 'c2': [0.6,1.2],'c3': [1.2,1.3]}
df1 = pd.DataFrame(data=d, index=['2017-09-10','2017-09-11'])
df2 = pd.DataFrame(data=d, index=['2017-09-12','2017-09-13'])

df1

      Date   c1 c2  c3
2017-09-10  0.5 0.6 1.2
2017-09-11  0.7 1.2 1.3

df2

Date   c1   c2  c3
2017-09-12  0.5 0.6 1.2
2017-09-13  0.7 1.2 1.3

The averages of each column in df1 can be obtained using the describe() function

可以使用describe()函数获得df1中每一列的平均值

df1.describe().ix['mean']

c1    0.60
c2    0.90
c3    1.25

And now, simply add the series to df2

现在，只需将该系列添加到 df2

df2 + df1.describe().ix['mean']

Date     c1 c2  c3
2017-09-12  1.1 1.5 2.45
2017-09-13  1.3 2.1 2.55

Answer 4

回答by pankaj mishra

This could be another way of doing it , just simplified this a little bit

这可能是另一种方法，只是稍微简化了一点

import pandas as pd
import numpy as np
from datetime import datetime, timedelta 
date_today=datetime.now()

#Creating df1 & df2 
df1=pd.DataFrame(
    {
        'Date':[date_today,date_today],
        'c1':[0.5,0.4],
        'c2':[0.6,0.3]
    }
)
df2=pd.DataFrame(
    {
        'Date':[date_today,date_today,date_today],
        'c1':[0.9,0.7,0.6],
        'c2':[0.8,0.4,0.3]
    }
)


#getting average of column c1
avg=df1["c1"].mean()

#Adding the average to your existing column of df2
df2['c1']+avg

pandas 向数据框的所有行添加值

提问by Jagruth

回答by piRSquared

回答by jondo

回答by chumbak

回答by pankaj mishra

相关推荐

最近更新

标签

pandas 向数据框的所有行添加值

提问by Jagruth

回答by piRSquared

回答by jondo

回答by chumbak

回答by pankaj mishra

相关推荐

pandas 使用来自其他数据帧的匹配值在数据帧中创建新列

如何在 Pandas 中读取大型 json？

pandas 使用 seaborn 绘制系列

返回数据帧中两列的最大值（Pandas）

相关推荐

最近更新

标签