Python 如何从for循环构建和填充pandas数据框？

Question

提问by c.j.mcdonn

Here is a simple example of the code I am running, and I would like the results put into a pandas dataframe (unless there is a better option):

这是我正在运行的代码的一个简单示例，我希望将结果放入 Pandas 数据框中（除非有更好的选择）：

for p in game.players.passing():
    print p, p.team, p.passing_att, p.passer_rating()

R.Wilson SEA 29 55.7
J.Ryan SEA 1 158.3
A.Rodgers GB 34 55.8

Using this code:

使用此代码：

d = []
for p in game.players.passing():
    d = [{'Player': p, 'Team': p.team, 'Passer Rating':
        p.passer_rating()}]

pd.DataFrame(d)

I can get:

我可以得到：

    Passer Rating   Player      Team
  0 55.8            A.Rodgers   GB

Which is a 1x3 dataframe, and I understand whyit is only one row but I can't figure out how to make it multi-row with the columns in the correct order. Ideally the solution would be able to deal with nnumber of rows (based on p) and it would be wonderful (although not essential) if the number of columns would be set by the number of stats requested. Any suggestions? Thanks in advance!

这是一个 1x3 数据框，我理解为什么它只有一行，但我无法弄清楚如何以正确的顺序将列设置为多行。理想情况下，该解决方案将能够处理n行（基于 p），如果列数由请求的统计数据数量设置，那将会很棒（尽管不是必需的）。有什么建议？提前致谢！

Answer 1

采纳答案by Amit Verma

Try this using list comprehension:

使用列表理解试试这个：

import pandas as pd

df = pd.DataFrame(
    [p, p.team, p.passing_att, p.passer_rating()] for p in game.players.passing()
)

Answer 2

回答by Nick Marinakis

The simplest answer is what Paul H said:

最简单的答案是 Paul H 所说的：

d = []
for p in game.players.passing():
    d.append(
        {
            'Player': p,
            'Team': p.team,
            'Passer Rating':  p.passer_rating()
        }
    )

pd.DataFrame(d)

But if you really want to "build and fill a dataframe from a loop", (which, btw, I wouldn't recommend), here's how you'd do it.

但是，如果您真的想“从循环中构建和填充数据框”（顺便说一句，我不建议这样做），那么您可以这样做。

d = pd.DataFrame()

for p in game.players.passing():
    temp = pd.DataFrame(
        {
            'Player': p,
            'Team': p.team,
            'Passer Rating': p.passer_rating()
        }
    )

    d = pd.concat([d, temp])

Answer 3

回答by Seanny123

Make a list of tuples with your data and then create a DataFrame with it:

用你的数据制作一个元组列表，然后用它创建一个 DataFrame：

d = []
for p in game.players.passing():
    d.append((p, p.team, p.passer_rating()))

pd.DataFrame(d, columns=('Player', 'Team', 'Passer Rating'))

A list of tuples should have less overhead than a list dictionaries. I tested this below, but please remember to prioritize ease of code understanding over performance in most cases.

元组列表的开销应该比列表字典少。我在下面对此进行了测试，但请记住在大多数情况下将代码理解的容易程度置于性能之上。

Testing functions:

测试功能：

def with_tuples(loop_size=1e5):
    res = []

    for x in range(int(loop_size)):
        res.append((x-1, x, x+1))

    return pd.DataFrame(res, columns=("a", "b", "c"))

def with_dict(loop_size=1e5):
    res = []

    for x in range(int(loop_size)):
        res.append({"a":x-1, "b":x, "c":x+1})

    return pd.DataFrame(res)

Results:

结果：

%timeit -n 10 with_tuples()
# 10 loops, best of 3: 55.2 ms per loop

%timeit -n 10 with_dict()
# 10 loops, best of 3: 130 ms per loop

Answer 4

回答by bzip2

I may be wrong, but I think the accepted answer by @amit has a bug.

我可能是错的，但我认为@amit 接受的答案有一个错误。

from pandas import DataFrame as df
x = [1,2,3]
y = [7,8,9,10]

# this gives me a syntax error at 'for' (Python 3.7)
d1 = df[[a, "A", b, "B"] for a in x for b in y]

# this works
d2 = df([a, "A", b, "B"] for a in x for b in y)

# and if you want to add the column names on the fly
# note the additional parentheses
d3 = df(([a, "A", b, "B"] for a in x for b in y), columns = ("l","m","n","o"))

Python 如何从for循环构建和填充pandas数据框？

提问by c.j.mcdonn

采纳答案by Amit Verma

回答by Nick Marinakis

回答by Seanny123

回答by bzip2

相关推荐

最近更新

标签

Python 如何从for循环构建和填充pandas数据框？

提问by c.j.mcdonn

采纳答案by Amit Verma

回答by Nick Marinakis

回答by Seanny123

回答by bzip2

相关推荐

Python pandas 数据框对象是否可以与 sklearn kmeans 聚类一起使用？

Python “int”对象不可迭代

在 ipython qtconsole 中打印粗体、彩色等文本

Python Tkinter.grid 间距选项？

相关推荐

最近更新

标签