连接用循环生成的 Pandas DataFrames

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48810726/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:11:12  来源:igfitidea点击:

Concatenate pandas DataFrames generated with a loop

pythonpandasloopsdataframeappend

提问by Annalix

I am creating a new DataFrame named data_day, containing new features, for each day extrapolated from the day-timestamp of a previous DataFrame df.

我正在创建一个名为data_day的新DataFrame,其中包含新功能,用于从前一个 DataFrame df的日期时间戳推断出的每一天。

My new dataframes data_dayare 30 independent DataFrames that I need to concatenate/append at the end in a unic dataframe (final_data_day).

我的新数据帧data_day是 30 个独立的数据帧,我需要在 unic 数据帧 (final_data_day) 的末尾连接/附加它们。

The for loop for each day is defined as follow:

每天的 for 循环定义如下:

num_days=len(list_day)

#list_day= random.sample(list_day,num_days_to_simulate)
data_frame = pd.DataFrame()

for i, day in enumerate(list_day):

    print('*** ',day,' ***')

    data_day=df[df.day==day]
    .....................
    final_data_day = pd.concat()

Hope I was clear. Mine is basically a problem of append/concatenation of data-frames generated in a non-trivial for loop

希望我很清楚。我的基本上是在非平凡的 for 循环中生成的数据帧的追加/串联问题

回答by David Rinck

Pandas concat takes a list of dataframes. If you can generate a list of dataframes with your looping function, once you are finished you can concatenate the list together:

Pandas concat 需要一个数据框列表。如果您可以使用循环函数生成数据帧列表,完成后您可以将列表连接在一起:

data_day_list = []
for i, day in enumerate(list_day):
  data_day = df[df.day==day]
  data_day_list.append(data_day)
final_data_day = pd.concat(data_day_list)

回答by jpp

Exhausting a generator is more efficient than appending to a list. For example:

耗尽生成器比附加到列表更有效。例如:

def yielder(df, list_day):
    for i, day in enumerate(list_day):
        data_day = df[df['day'] == day]
        yield data_day

final_data_day = pd.concat(list(yielder(df, list_day))

回答by mechanical_meat

Appending or concatenating pd.DataFrames is slow. You can use a list in the interim and then create the final pd.DataFrameat the end with pd.DataFrame.from_records()e.g.:

附加或连接pd.DataFrames 很慢。您可以在中间使用一个列表,然后在最后创建一个列表pd.DataFramepd.DataFrame.from_records()例如:

interim_list = []
for i,(k,g) in enumerate(df.groupby(['[*name of your date column here*'])):
    if i % 1000 == 0 and i != 0:
        print('iteration: {}'.format(i)) # just tells you where you are in iteration
    # add your "new features" here...
    for v in g.values:
        interim_list.append(v)

# here you want to specify the resulting df's column list...
df_final = pd.DataFrame.from_records(interim_list,columns=['a','list','of','columns'])