pandas 将字典添加到数据框的最佳方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49817715/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:27:52  来源:igfitidea点击:

Best way to add dictionary to dataframe

pythonpandas

提问by Rutger Hofste

I have a Pandas Dataframe and want to add the data from a dictionary uniformly to all rows in my dataframe. Currently I loop over the dictionary and set the value to my new columns. Is there a more efficient way to do this?

我有一个 Pandas 数据框,想将字典中的数据统一添加到数据框中的所有行。目前我遍历字典并将值设置为我的新列。有没有更有效的方法来做到这一点?

notebook

笔记本

# coding: utf-8    
import pandas as pd

df = pd.DataFrame({'age' : [1, 2, 3],'name' : ['Foo', 'Bar', 'Barbie']}) 
d = {"blah":42,"blah-blah":"bar"}
for k,v in d.items():
    df[k] = v
df

回答by jezrael

Use assignif all keys are not numeric:

assign如果所有键都不是数字,请使用:

df = df.assign(**d)
print (df)
   age    name  blah blah-blah
0    1     Foo    42       bar
1    2     Bar    42       bar
2    3  Barbie    42       bar

If possible numeric joinworking nice:

如果可能的话,数字join工作很好:

d = {8:42,"blah-blah":"bar"}
df = df.join(pd.DataFrame(d, index=df.index))
print (df)

   age    name   8 blah-blah
0    1     Foo  42       bar
1    2     Bar  42       bar
2    3  Barbie  42       bar

回答by Anton vBR

The answer in my opinion is no. Looping through key,values in a dict is already efficient and assigning columns with df[k] = vis more readable. Remember that in the future you just want to remember why you did something and you won't care much if you spare some microseconds. The only thing missing is a comment why you do it.

在我看来,答案是否定的。循环遍历字典中的键和值已经很有效,并且分配列具有df[k] = v更高的可读性。请记住,将来你只想记住你为什么做某事,如果你留出一些微秒,你就不会太在意。唯一缺少的是评论你为什么这样做。

d = {"blah":42,"blah-blah":"bar"}

# Add columns to compensate for missing values in document XXX
for k,v in d.items():
    df[k] = v


Timings (but the error is too big... I'd say they are equivalent in speed):

时间(但错误太大......我会说它们在速度上是相同的):

Your solution:

您的解决方案:

809 μs ± 70 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

df.assign():

df.assign():

893 μs ± 24.2 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)