pandas 将字典添加到数据框的最佳方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49817715/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to add dictionary to dataframe
提问by Rutger Hofste
I have a Pandas Dataframe and want to add the data from a dictionary uniformly to all rows in my dataframe. Currently I loop over the dictionary and set the value to my new columns. Is there a more efficient way to do this?
我有一个 Pandas 数据框,想将字典中的数据统一添加到数据框中的所有行。目前我遍历字典并将值设置为我的新列。有没有更有效的方法来做到这一点?
# coding: utf-8
import pandas as pd
df = pd.DataFrame({'age' : [1, 2, 3],'name' : ['Foo', 'Bar', 'Barbie']})
d = {"blah":42,"blah-blah":"bar"}
for k,v in d.items():
df[k] = v
df
回答by jezrael
Use assign
if all keys are not numeric:
assign
如果所有键都不是数字,请使用:
df = df.assign(**d)
print (df)
age name blah blah-blah
0 1 Foo 42 bar
1 2 Bar 42 bar
2 3 Barbie 42 bar
If possible numeric join
working nice:
如果可能的话,数字join
工作很好:
d = {8:42,"blah-blah":"bar"}
df = df.join(pd.DataFrame(d, index=df.index))
print (df)
age name 8 blah-blah
0 1 Foo 42 bar
1 2 Bar 42 bar
2 3 Barbie 42 bar
回答by Anton vBR
The answer in my opinion is no. Looping through key,values in a dict is already efficient and assigning columns with df[k] = v
is more readable. Remember that in the future you just want to remember why you did something and you won't care much if you spare some microseconds. The only thing missing is a comment why you do it.
在我看来,答案是否定的。循环遍历字典中的键和值已经很有效,并且分配列具有df[k] = v
更高的可读性。请记住,将来你只想记住你为什么做某事,如果你留出一些微秒,你就不会太在意。唯一缺少的是评论你为什么这样做。
d = {"blah":42,"blah-blah":"bar"}
# Add columns to compensate for missing values in document XXX
for k,v in d.items():
df[k] = v
Timings (but the error is too big... I'd say they are equivalent in speed):
时间(但错误太大......我会说它们在速度上是相同的):
Your solution:
您的解决方案:
809 μs ± 70 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
df.assign():
df.assign():
893 μs ± 24.2 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)