Python 如何将模型对象列表转换为熊猫数据框?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34997174/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:51:33  来源:igfitidea点击:

How to convert list of model objects to pandas dataframe?

pythonnumpypandas

提问by ezamur

I have an array of objects of this class

我有一个此类的对象数组

class CancerDataEntity(Model):

    age = columns.Text(primary_key=True)
    gender = columns.Text(primary_key=True)
    cancer = columns.Text(primary_key=True)
    deaths = columns.Integer()
    ...

When printed, array looks like this

打印时,数组看起来像这样

[CancerDataEntity(age=u'80-85+', gender=u'Female', cancer=u'All cancers (C00-97,B21)', deaths=15306), CancerDataEntity(...

I want to convert this to a data frame so I can play with it in a more suitable way to me - to aggregate, count, sum and similar. How I wish this data frame to look, would be something like this:

我想将其转换为数据框,以便我可以以更适合我的方式使用它 - 聚合、计数、求和等。我希望这个数据框看起来像这样:

     age     gender     cancer     deaths
0    80-85+  Female     ...        15306
1    ...

Is there a way to achieve this using numpy/pandas easily, without manually processing the input array?

有没有办法使用 numpy/pandas 轻松实现这一点,而无需手动处理输入数组?

采纳答案by ezamur

Code that leads to desired result:

导致预期结果的代码:

variables = arr[0].keys()
df = pd.DataFrame([[getattr(i,j) for j in variables] for i in arr], columns = variables)

Thanks to @Serbitar for pointing me to the right direction.

感谢@Serbitar 为我指明了正确的方向。

回答by Serbitar

try:

尝试:

variables = list(array[0].keys())
dataframe = pandas.DataFrame([[getattr(i,j) for j in variables] for i in array], columns = variables)

回答by OregonTrail

A much cleaner way to to this is to define a to_dictmethod on your class and then use pandas.DataFrame.from_records

一个更简洁的方法是to_dict在你的类上定义一个方法,然后使用pandas.DataFrame.from_records

class Signal(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def to_dict(self):
        return {
            'x': self.x,
            'y': self.y,
        }

e.g.

例如

In [87]: signals = [Signal(3, 9), Signal(4, 16)]

In [88]: pandas.DataFrame.from_records([s.to_dict() for s in signals])
Out[88]:
   x   y
0  3   9
1  4  16

回答by Shital Shah

Just use:

只需使用:

DataFrame([o.__dict__ for o in my_objs])

Full example:

完整示例:

import pandas as pd

# define some class
class SomeThing:
    def __init__(self, x, y):
        self.x, self.y = x, y

# make an array of the class objects
things = [SomeThing(1,2), SomeThing(3,4), SomeThing(4,5)]

# fill dataframe with one row per object, one attribute per column
df = pd.DataFrame([t.__dict__ for t in things ])

print(df)

This prints:

这打印:

   x  y
0  1  2
1  3  4
2  4  5

回答by typhon04

I would like to emphasize Jim Hunziker's comment.

我想强调Jim Hunziker的评论。

pandas.DataFrame([vars(s) for s in signals])

It is far easier to write, less error-prone and you don't have to change the to_dict()function every time you add a new attribute.

编写起来要容易得多,不易出错,而且to_dict()每次添加新属性时都不必更改函数。

If you want the freedom to choose which attributes to keep, the columnsparameter could be used.

如果您希望自由选择要保留的属性,则可以使用columns参数。

pandas.DataFrame([vars(s) for s in signals], columns=['x', 'y'])

The downside is that it won't work for complex attributes, though that should rarely be the case.

缺点是它不适用于复杂的属性,尽管这种情况很少发生。