pandas 向数据框的所有行添加值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46967581/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Adding values to all rows of dataframe
提问by Jagruth
I have two pandas dataframes df1(of length 2) and df2(of length about 30 rows). Index values of df1 are always different and never occur in df2. I would like to add the average of columns from df1to corresponding columns of df2. Example: add 0.6 to all rows of c1 and 0.9 to all rows of c2 etc ...
我有两个Pandas数据帧df1(长度为 2)和df2(长度约为 30 行)。df1 的索引值总是不同的,永远不会出现在 df2 中。我想将来自df1的列的平均值添加 到df2 的相应列。示例:将 0.6 添加到 c1 的所有行,将 0.9 添加到 c2 的所有行等...
df1:
Date c1 c2 c3 c4 c5 c6 ... c10
2017-09-10 0.5 0.6 1.2 0.7 1.3 1.8 ... 1.3
2017-09-11 0.7 1.2 1.3 0.4 0.7 0.4 ... 1.5
df2:
Date c1 c2 c3 c4 c5 c6 ... c10
2017-09-12 0.9 0.1 1.4 0.9 1.5 1.9 ... 1.9
2017-09-13 0.2 1.8 1.2 1.4 2.7 0.8 ... 1.1
: :
: :
2017-10-10 1.5 0.9 1.5 0.9 1.6 1.8 ... 1.7
2017-10-11 2.7 1.1 1.9 0.4 0.8 0.8 ... 1.3
How can I do that ?
我怎样才能做到这一点 ?
回答by piRSquared
When using mean
on df1
, it calculates over each column by default and produces a pd.Series
.
使用mean
on 时df1
,它默认计算每一列并生成一个pd.Series
.
When adding adding a pd.Series
to a pd.DataFrame
it aligns the index of the pd.Series
with the columns of the pd.DataFrame
and broadcasts along the index of the pd.DataFrame
... by default.
将 a 添加pd.Series
到 a 时,pd.DataFrame
它会将 的索引pd.Series
与 的列对齐,pd.DataFrame
并pd.DataFrame
默认沿 ... 的索引进行广播。
The only tricky bit is handling the Date
column.
唯一棘手的一点是处理Date
列。
Option 1
选项1
m = df1.mean()
df2.loc[:, m.index] += m
df2
Date c1 c2 c3 c4 c5 c6 c10
0 2017-09-12 1.5 1.0 2.65 1.45 2.5 3.0 3.3
1 2017-09-13 0.8 2.7 2.45 1.95 3.7 1.9 2.5
2 2017-10-10 2.1 1.8 2.75 1.45 2.6 2.9 3.1
3 2017-10-11 3.3 2.0 3.15 0.95 1.8 1.9 2.7
If I know that 'Date'
is always in the first column, I can:
如果我知道它'Date'
总是在第一列,我可以:
df2.iloc[:, 1:] += df1.mean()
df2
Date c1 c2 c3 c4 c5 c6 c10
0 2017-09-12 1.5 1.0 2.65 1.45 2.5 3.0 3.3
1 2017-09-13 0.8 2.7 2.45 1.95 3.7 1.9 2.5
2 2017-10-10 2.1 1.8 2.75 1.45 2.6 2.9 3.1
3 2017-10-11 3.3 2.0 3.15 0.95 1.8 1.9 2.7
Option 2
Notice that I use the append=True
parameter in the set_index
just incase there are things in the index you don't want to mess up.
选项 2
请注意,我仅在索引中存在您不想弄乱的内容时使用该append=True
参数set_index
。
df2.set_index('Date', append=True).add(df1.mean()).reset_index('Date')
Date c1 c2 c3 c4 c5 c6 c10
0 2017-09-12 1.5 1.0 2.65 1.45 2.5 3.0 3.3
1 2017-09-13 0.8 2.7 2.45 1.95 3.7 1.9 2.5
2 2017-10-10 2.1 1.8 2.75 1.45 2.6 2.9 3.1
3 2017-10-11 3.3 2.0 3.15 0.95 1.8 1.9 2.7
If you don't care about the index, you can shorten this to
如果您不关心索引,则可以将其缩短为
df2.set_index('Date').add(df1.mean()).reset_index()
Date c1 c2 c3 c4 c5 c6 c10
0 2017-09-12 1.5 1.0 2.65 1.45 2.5 3.0 3.3
1 2017-09-13 0.8 2.7 2.45 1.95 3.7 1.9 2.5
2 2017-10-10 2.1 1.8 2.75 1.45 2.6 2.9 3.1
3 2017-10-11 3.3 2.0 3.15 0.95 1.8 1.9 2.7
回答by jondo
If all columns are in both data frames, then just
如果所有列都在两个数据框中,则只需
for col in df2.columns:
df2[col] = df2[col] + df1[col].mean()
if the columns are not necessarily in both then:
如果列不一定在两者中,则:
for col in df2.columns:
if col in df1.columns:
df2[col] = df2[col] + df1[col].mean()
回答by chumbak
There is probably a more efficient way but here is a quick and dirty solution. I hope this helps!
可能有一种更有效的方法,但这里有一个快速而肮脏的解决方案。我希望这有帮助!
d = {'c1': [0.5,0.7], 'c2': [0.6,1.2],'c3': [1.2,1.3]}
df1 = pd.DataFrame(data=d, index=['2017-09-10','2017-09-11'])
df2 = pd.DataFrame(data=d, index=['2017-09-12','2017-09-13'])
df1
df1
Date c1 c2 c3
2017-09-10 0.5 0.6 1.2
2017-09-11 0.7 1.2 1.3
df2
df2
Date c1 c2 c3
2017-09-12 0.5 0.6 1.2
2017-09-13 0.7 1.2 1.3
The averages of each column in df1 can be obtained using the describe() function
可以使用describe()函数获得df1中每一列的平均值
df1.describe().ix['mean']
c1 0.60
c2 0.90
c3 1.25
And now, simply add the series to df2
现在,只需将该系列添加到 df2
df2 + df1.describe().ix['mean']
Date c1 c2 c3
2017-09-12 1.1 1.5 2.45
2017-09-13 1.3 2.1 2.55
回答by pankaj mishra
This could be another way of doing it , just simplified this a little bit
这可能是另一种方法,只是稍微简化了一点
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
date_today=datetime.now()
#Creating df1 & df2
df1=pd.DataFrame(
{
'Date':[date_today,date_today],
'c1':[0.5,0.4],
'c2':[0.6,0.3]
}
)
df2=pd.DataFrame(
{
'Date':[date_today,date_today,date_today],
'c1':[0.9,0.7,0.6],
'c2':[0.8,0.4,0.3]
}
)
#getting average of column c1
avg=df1["c1"].mean()
#Adding the average to your existing column of df2
df2['c1']+avg