pandas 使用循环填充空的python数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28910089/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:01:32  来源:igfitidea点击:

Filling empty python dataframe using loops

pythonpandasiteration

提问by ccsv

Lets say I want to create and fill an empty dataframe with values from a loop.

假设我想用循环中的值创建并填充一个空数据框。

import pandas as pd
import numpy as np

years = [2013, 2014, 2015]
dn=pd.DataFrame()
for year in years:
    df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
                 year: [1, 1, 1 ],
                }).set_index('Incidents')
    print (df1)
    dn=dn.append(df1, ignore_index = False)

The append gives a diagonal matrix even when ignore index is false:

即使忽略索引为假,附加也会给出对角矩阵:

>>> dn
       2013  2014  2015
Incidents                  
C             1   NaN   NaN
B             1   NaN   NaN
A             1   NaN   NaN
C           NaN     1   NaN
B           NaN     1   NaN
A           NaN     1   NaN
C           NaN   NaN     1
B           NaN   NaN     1
A           NaN   NaN     1

[9 rows x 3 columns]

It should look like this:

它应该是这样的:

>>> dn
       2013  2014  2015
Incidents                  
C             1   1   1
B             1   1   1
A             1   1   1

[3 rows x 3 columns]

Is there a better way of doing this? and is there a way to fix the append?

有没有更好的方法来做到这一点?有没有办法修复附加?

I have pandas version '0.13.1-557-g300610e'

我有Pandas版本'0.13.1-557-g300610e'

回答by unutbu

import pandas as pd

years = [2013, 2014, 2015]
dn = []
for year in years:
    df1 = pd.DataFrame({'Incidents': [ 'C', 'B','A'],
                 year: [1, 1, 1 ],
                }).set_index('Incidents')
    dn.append(df1)
dn = pd.concat(dn, axis=1)
print(dn)

yields

产量

           2013  2014  2015
Incidents                  
C             1     1     1
B             1     1     1
A             1     1     1


Note that calling pd.concatonceoutside the loop is more time-efficient than calling pd.concatwith each iteration of the loop.

请注意,在循环外调用pd.concat一次pd.concat在循环的每次迭代中调用更省时。

Each time you call pd.concatnew space is allocated for a new DataFrame, and all the data from each component DataFrame is copied into the new DataFrame. If you call pd.concatfrom within the for-loop then you end up doing on the order of n**2copies, where nis the number of years.

每次调用时都会pd.concat为一个新的DataFrame 分配新的空间,并且每个组件DataFrame 中的所有数据都被复制到新的DataFrame 中。如果您pd.concat从 for 循环内调用,那么您最终会按照n**2副本的顺序进行操作,其中n是年数。

If you accumulate the partial DataFrames in a list and call pd.concatonce outside the list, then Pandas only needs to perform ncopies to make dn.

如果将部分 DataFrame 累积在一个列表中并在列表pd.concat外调用一次,那么 Pandas 只需要执行n复制即可dn

回答by Donbeo

As far as I know you should avoid to add line by line to the dataframe due to speed issue

据我所知,由于速度问题,您应该避免逐行添加到数据帧

What I usually do is:

我通常做的是:

l1 = []
l2 = []

for i in range(n):
   compute value v1
   compute value v2
   l1.append(v1)
   l2.append(v2)

d = pd.DataFrame()
d['l1'] = l1
d['l2'] = l2