Python Pandas 动态创建数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47109931/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas Dynamically Create a Dataframe
提问by Kyle
The code below will generate the desired output in ONEdataframe, however, I would like to dynamically create data frames in a FOR loop then assign the shifted value to that data frame. Example, data frame df_lag_12 would only contain column1_t12 and column2_12. Any ideas would be greatly appreciated. I attempted to dynamically create 12 dataframes using the EXEC statement, google searching seems to state this is poor practice.
下面的代码将在一个数据帧中生成所需的输出,但是,我想在 FOR 循环中动态创建数据帧,然后将移位的值分配给该数据帧。例如,数据框 df_lag_12 将仅包含 column1_t12 和 column2_12。任何想法将不胜感激。我尝试使用 EXEC 语句动态创建 12 个数据帧,谷歌搜索似乎表明这是一种糟糕的做法。
import pandas as pd
list1=list(range(0,20))
list2=list(range(19,-1,-1))
d={'column1':list(range(0,20)),
'column2':list(range(19,-1,-1))}
df=pd.DataFrame(d)
df_lags=pd.DataFrame()
for col in df.columns:
for i in range(12,0,-1):
df_lags[col+'_t'+str(i)]=df[col].shift(i)
df_lags[col]=df[col].values
print(df_lags)
for df in (range(12,0,-1)):
exec('model_data_lag_'+str(df)+'=pd.DataFrame()')
Desired output for dymanically created dataframe DF_LAGS_12:
动态创建的数据帧 DF_LAGS_12 所需的输出:
var_list=['column1_t12','column2_t12']
df_lags_12=df_lags[var_list]
print(df_lags_12)
回答by jezrael
I think the best is create dictionary of DataFrames
:
我认为最好的是创建dictionary of DataFrames
:
d = {}
for i in range(12,0,-1):
d['t' + str(i)] = df.shift(i).add_suffix('_t' + str(i))
If need specify columns first:
如果需要先指定列:
d = {}
cols = ['column1','column2']
for i in range(12,0,-1):
d['t' + str(i)] = df[cols].shift(i).add_suffix('_t' + str(i))
dict comprehension
solution:
dict comprehension
解决方案:
d = {'t' + str(i): df.shift(i).add_suffix('_t' + str(i)) for i in range(12,0,-1)}
print (d['t10'])
column1_t10 column2_t10
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 0.0 19.0
11 1.0 18.0
12 2.0 17.0
13 3.0 16.0
14 4.0 15.0
15 5.0 14.0
16 6.0 13.0
17 7.0 12.0
18 8.0 11.0
19 9.0 10.0
EDIT: Is it possible by globals, but much better is dictionary
:
编辑:全局变量是否有可能,但更好的是dictionary
:
d = {}
cols = ['column1','column2']
for i in range(12,0,-1):
globals()['df' + str(i)] = df[cols].shift(i).add_suffix('_t' + str(i))
print (df10)
column1_t10 column2_t10
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 0.0 19.0
11 1.0 18.0
12 2.0 17.0
13 3.0 16.0
14 4.0 15.0
15 5.0 14.0
16 6.0 13.0
17 7.0 12.0
18 8.0 11.0
19 9.0 10.0