将数据从 sqlalchemy 移动到 Pandas DataFrame

Question

提问by David Collins

I am trying to load an SQLAlchemy in a pandas DataFrame.

我正在尝试在 Pandas DataFrame 中加载 SQLAlchemy。

When I do:

当我做：

df = pd.DataFrame(LPRRank.query.all())

I get

我得到

>>> df
0        <M. Misty || 1 || 18>
1        <P. Patch || 2 || 18>
...
...

But, what I want is each column in the database to be a column in the dataframe:

但是，我想要的是数据库中的每一列都是数据框中的一列：

0        M. Misty  1  18
1        P. Patch  2  18
...
...

and when I try:

当我尝试时：

dff = pd.read_sql_query(LPRRank.query.all(), db.session())

I get an Attribute Error:

我收到一个属性错误：

AttributeError: 'SignallingSession' object has no attribute 'cursor'

and

和

dff = pd.read_sql_query(LPRRank.query.all(), db.session)

also gives an error:

还报错：

AttributeError: 'scoped_session' object has no attribute 'cursor'

What I'm using to generate the list of objects is:

我用来生成对象列表的是：

app = Flask(__name__)
db = SQLAlchemy(app)

class LPRRank(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    candid = db.Column(db.String(40), index=True, unique=False)
    rank = db.Column(db.Integer, index=True, unique=False) 
    user_id = db.Column(db.Integer, db.ForeignKey('lprvote.id'))

    def __repr__(self):
        return '<{} || {} || {}>'.format(self.candid,
                                                 self.rank, self.user_id)

This question: How to convert SQL Query result to PANDAS Data Structure?is error free, but gives each row as an object, which is not what I want. I can access the individual columns in the returned object, but its seems like there is a better way to do it.

这个问题：如何将 SQL 查询结果转换为 PANDAS 数据结构？没有错误，但将每一行作为一个对象，这不是我想要的。我可以访问返回对象中的各个列，但似乎有更好的方法来做到这一点。

The documentation at pandas.pydata.org is great if you already understand what is going on and just need to review syntax. The documentation from April 20, 2016 (the 1319 page pdf) identifies a pandas connection as still experimental on p.872.

如果您已经了解正在发生的事情并且只需要查看语法，那么 pandas.pydata.org 上的文档非常有用。2016 年 4 月 20 日的文档（1319 页 pdf）将 Pandas 连接确定为在 p.872 上仍处于实验阶段。

Now, SQLALCHEMY/PANDAS - SQLAlchemy reading column as CLOB for Pandas to_sqlis about specifying the SQL type. Mine is SQLAlchemy which is the default.

现在，SQLALCHEMY/PANDAS - SQLAlchemy 读取列作为 Pandas to_sql 的 CLOB是关于指定 SQL 类型。我的是 SQLAlchemy，这是默认设置。

And, sqlalchemy pandas to_sql OperationalError, Writing to MySQL database with pandas using SQLAlchemy, to_sql, and SQLAlchemy/pandas to_sql for SQLServer -- CREATE TABLE in master dbare about writing to the SQL database which produces an operational error, a database error, and a 'create table' error neither of which are my problems.

并且，sqlalchemy pandas to_sql OperationalError，使用 SQLAlchemy、to_sql和SQLAlchemy/pandas to_sql for SQLServer 使用 Pandas 写入 MySQL 数据库 ——在 master db中创建 TABLE是关于写入 SQL 数据库，这会产生操作错误、数据库错误和“创建表”错误都不是我的问题。

This one, SQLAlchemy Pandas read_sql from jsonbwants a jsonbattribute to columns: not my cup 'o tea.

这个，来自 jsonb 的 SQLAlchemy Pandas read_sql想要一个jsonb属性到列：不是我的杯茶。

This previous question SQLAlchemy ORM conversion to pandas DataFrameaddresses my issue but the solution: using query.session.bindis not my solution. I'm opening /closing sessions with db.session.add(), and db.session.commit(), but when I use db.session.bindas specified in the second answer here, then I get an Attribute Error:

上一个问题SQLAlchemy ORM 转换为 Pandas DataFrame解决了我的问题，但解决方案是：使用query.session.bind不是我的解决方案。我正在使用 db.session.add() 和 db.session.commit() 打开/关闭会话，但是当我db.session.bind按照此处第二个答案中的指定使用时，出现属性错误：

AttributeError: 'list' object has no attribute '_execute_on_connection'

Answer 1

回答by Parfait

Simply add an __init__method in your model and call the Class object before dataframe build. Specifically below creates an iterable of tuples binded into columns with pandas.DataFrame().

只需__init__在模型中添加一个方法并在构建数据帧之前调用 Class 对象。具体来说，下面创建了一个可迭代的元组，它们绑定到带有pandas.DataFrame().

class LPRRank(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    candid = db.Column(db.String(40), index=True, unique=False)
    rank = db.Column(db.Integer, index=True, unique=False) 
    user_id = db.Column(db.Integer, db.ForeignKey('lprvote.id'))

    def __init__(self, candid=None, rank=None, user_id=None):
        self.data = (candid, rank, user_id)

    def __repr__(self):
        return (self.candid, self.rank, self.user_id) 

data = db.session.query(LPRRank).all()
df = pd.DataFrame([(d.candid, d.rank, d.user_id) for d in data], 
                  columns=['candid', 'rank', 'user_id'])

Alternatively, use the SQLAlchemy ORM based on your defined Model class, LPRRank, to run read_sql:

或者，使用基于您定义的模型类LPRRank的 SQLAlchemy ORM来运行read_sql：

df = pd.read_sql(sql = db.session.query(LPRRank)\
                         .with_entities(LPRRank.candid,
                                        LPRRank.rank,
                                        LPRRank.user_id).statement, 
                 con = db.session.bind)

Answer 2

回答by bioinfornatics

The Parfait answer is good but could have to problems:

Parfait 的答案很好，但可能有问题：

efficiency each object creation imply duplication of data into a DataFrame, so a list of dataframe could take time to be created
That do not mirror a dataframe with a collection of row

效率每个对象的创建都意味着将数据复制到 DataFrame 中，因此创建数据框列表可能需要时间
不镜像具有行集合的数据帧

Thus below example provides a parentclass which is assimilated to a DataFramerepresentation and a childclass assimilated to rowof a given dataframe.

因此，下面的示例提供了一个parent被同化到DataFrame表示的child类和一个被同化到给定数据帧的行的类。

Code below provides two way to get a dataframe, the DataFrame object is created only at demand to not waste cpu and memory.

下面的代码提供了两种获取数据帧的方式，数据帧对象是按需创建的，不会浪费cpu和内存。

If dataframe is need at creation time you have only to add constructor (def __init__(self, rows:List[MyDataFrameRow] = None)...) and create a new attribute and assing the result of self.data_frame.

如果在创建时需要数据框，您只需添加构造函数 ( def __init__(self, rows:List[MyDataFrameRow] = None)...) 并创建一个新属性并分配self.data_frame.

from pandas import DataFrame, read_sql
from sqlalchemy import Column, Integer, String, Float, ForeignKey
from sqlalchemy.orm import relationship, Session

Base = declarative_base()

class MyDataFrame(Base):
    __tablename__ = 'my_data_frame'
    id = Column(Integer, primary_key=True)
    rows = relationship('MyDataFrameRow', cascade='all,delete')

    @property
    def data_frame(self) -> DataFrame:
        columns = GenomeCoverageRow.data_frame_columns()
        return DataFrame([[getattr(row, column) for column in columns] for row in self.rows],
                         columns=columns)

    @staticmethod
    def to_data_frame(identifier: int, session: Session) -> DataFrame:
        query = session.query(MyDataFrameRow).join(MyDataFrame).filter(MyDataFrame.id == identifier)
        return read_sql(query.statement, session.get_bind())


class MyDataFrameRow(Base):

    __tablename__ = 'my_data_row'
    id = Column(Integer, primary_key=True)
    name= Column(String)
    age= Column(Integer)
    number_of_children = Column(Integer)
    height= Column(Integer)
    parent_id = Column(Integer, ForeignKey('my_data_frame.id'))

    @staticmethod
    def data_frame_columns() -> Tuple[Any]:
        return tuple(column.name for column in GenomeCoverageRow.__table__.columns if len(column.foreign_keys) == 0
                     and column.primary_key is False)
...
session = Session(...)
df1 = MyDataFrame.to_data_frame(1,session)
my_table_obj = session.query(MyDataFrame).filter(MyDataFrame.id == 1).one()
df2 = my_table_obj.data_frame

将数据从 sqlalchemy 移动到 Pandas DataFrame

提问by David Collins

回答by Parfait

回答by bioinfornatics

相关推荐

最近更新

标签

将数据从 sqlalchemy 移动到 Pandas DataFrame

提问by David Collins

回答by Parfait

回答by bioinfornatics

相关推荐

Pandas DataFrame.groupby() 到具有多列值的字典

将 pandas.DataFrame 转换为 Python 中的字典列表

pandas 熊猫图值以降序方式计算条形图

pandas 理解熊猫中的 lambda 函数

相关推荐

最近更新

标签