Python 如何从for循环构建和填充pandas数据框?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28056171/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to build and fill pandas dataframe from for loop?
提问by c.j.mcdonn
Here is a simple example of the code I am running, and I would like the results put into a pandas dataframe (unless there is a better option):
这是我正在运行的代码的一个简单示例,我希望将结果放入 Pandas 数据框中(除非有更好的选择):
for p in game.players.passing():
print p, p.team, p.passing_att, p.passer_rating()
R.Wilson SEA 29 55.7
J.Ryan SEA 1 158.3
A.Rodgers GB 34 55.8
Using this code:
使用此代码:
d = []
for p in game.players.passing():
d = [{'Player': p, 'Team': p.team, 'Passer Rating':
p.passer_rating()}]
pd.DataFrame(d)
I can get:
我可以得到:
Passer Rating Player Team
0 55.8 A.Rodgers GB
Which is a 1x3 dataframe, and I understand whyit is only one row but I can't figure out how to make it multi-row with the columns in the correct order. Ideally the solution would be able to deal with nnumber of rows (based on p) and it would be wonderful (although not essential) if the number of columns would be set by the number of stats requested. Any suggestions? Thanks in advance!
这是一个 1x3 数据框,我理解为什么它只有一行,但我无法弄清楚如何以正确的顺序将列设置为多行。理想情况下,该解决方案将能够处理n行(基于 p),如果列数由请求的统计数据数量设置,那将会很棒(尽管不是必需的)。有什么建议?提前致谢!
采纳答案by Amit Verma
Try this using list comprehension:
使用列表理解试试这个:
import pandas as pd
df = pd.DataFrame(
[p, p.team, p.passing_att, p.passer_rating()] for p in game.players.passing()
)
回答by Nick Marinakis
The simplest answer is what Paul H said:
最简单的答案是 Paul H 所说的:
d = []
for p in game.players.passing():
d.append(
{
'Player': p,
'Team': p.team,
'Passer Rating': p.passer_rating()
}
)
pd.DataFrame(d)
But if you really want to "build and fill a dataframe from a loop", (which, btw, I wouldn't recommend), here's how you'd do it.
但是,如果您真的想“从循环中构建和填充数据框”(顺便说一句,我不建议这样做),那么您可以这样做。
d = pd.DataFrame()
for p in game.players.passing():
temp = pd.DataFrame(
{
'Player': p,
'Team': p.team,
'Passer Rating': p.passer_rating()
}
)
d = pd.concat([d, temp])
回答by Seanny123
Make a list of tuples with your data and then create a DataFrame with it:
用你的数据制作一个元组列表,然后用它创建一个 DataFrame:
d = []
for p in game.players.passing():
d.append((p, p.team, p.passer_rating()))
pd.DataFrame(d, columns=('Player', 'Team', 'Passer Rating'))
A list of tuples should have less overhead than a list dictionaries. I tested this below, but please remember to prioritize ease of code understanding over performance in most cases.
元组列表的开销应该比列表字典少。我在下面对此进行了测试,但请记住在大多数情况下将代码理解的容易程度置于性能之上。
Testing functions:
测试功能:
def with_tuples(loop_size=1e5):
res = []
for x in range(int(loop_size)):
res.append((x-1, x, x+1))
return pd.DataFrame(res, columns=("a", "b", "c"))
def with_dict(loop_size=1e5):
res = []
for x in range(int(loop_size)):
res.append({"a":x-1, "b":x, "c":x+1})
return pd.DataFrame(res)
Results:
结果:
%timeit -n 10 with_tuples()
# 10 loops, best of 3: 55.2 ms per loop
%timeit -n 10 with_dict()
# 10 loops, best of 3: 130 ms per loop
回答by bzip2
I may be wrong, but I think the accepted answer by @amit has a bug.
我可能是错的,但我认为@amit 接受的答案有一个错误。
from pandas import DataFrame as df
x = [1,2,3]
y = [7,8,9,10]
# this gives me a syntax error at 'for' (Python 3.7)
d1 = df[[a, "A", b, "B"] for a in x for b in y]
# this works
d2 = df([a, "A", b, "B"] for a in x for b in y)
# and if you want to add the column names on the fly
# note the additional parentheses
d3 = df(([a, "A", b, "B"] for a in x for b in y), columns = ("l","m","n","o"))