Python Pandas 持久缓存

Question

提问by Luca C.

Is there an implementation for python pandas that cache the data on disk so I can avoid to reproduce it every time?

是否有将数据缓存在磁盘上的 python pandas 实现，这样我就可以避免每次都重现它？

In particular is there a caching method for get_yahoo_datafor financial?

特别是get_yahoo_data对于财务有没有缓存方法？

A very plus would be:

一个非常加分的是：

very few lines of code to write
possibility to integrate the persisted series when new data is downloaded for the same source

很少的代码行
当为同一源下载新数据时，可以集成持久化系列

Answer 1

回答by nijm

There are many ways to achieve this, however probably the easiest way is to use the build in methods for writing and reading Python pickles. You can use pandas.DataFrame.to_pickleto store the DataFrame to disk and pandas.read_pickleto read the stored DataFrame from disk.

有很多方法可以实现这一点，但最简单的方法可能是使用内置方法来编写和读取Python pickles。您可以使用pandas.DataFrame.to_pickle将 DataFrame 存储到磁盘并pandas.read_pickle从磁盘读取存储的 DataFrame。

An example for a pandas.DataFrame:

一个例子pandas.DataFrame：

# Store your DataFrame
df.to_pickle('cached_dataframe.pkl') # will be stored in current directory

# Read your DataFrame
df = pandas.read_pickle('cached_dataframe.pkl') # read from current directory

The same methods also work for pandas.Series:

同样的方法也适用于pandas.Series：

# Store your Series
series.to_pickle('cached_series.pkl') # will be stored in current directory

# Read your DataFrame
series = pandas.read_pickle('cached_series.pkl') # read from current directory

Answer 2

回答by YaOzI

Depend on different requirements, there are a dozen of methodsto do that, to and fro, in CSV, Excel, JSON, Python Pickle Format, HDF5 and even SQL with DB, etc.

根据不同的要求，有十几种方法可以做到这一点，在 CSV、Excel、JSON、Python Pickle 格式、HDF5 甚至 SQL 与 DB 等中。

In terms of code lines, to/readmany of these formats are just one line of code for each direction. Python and Pandas already make the code as clean as possible, so you could worry less about that.

在代码行方面，to/read许多这些格式只是每个方向一行代码。Python 和 Pandas 已经使代码尽可能干净，因此您可以少担心。

I think there is no single solution to fit all requirements, really case by case:

我认为没有单一的解决方案可以满足所有要求，具体情况具体如下：

for human readability of saved data: CSV, Excel
for binary python object serialization (use-cases): Pickle
for data-interchange: JSON
for long-time and incrementally updating: SQL
etc.

保存数据的可读性：CSV、Excel
对于二进制 Python 对象序列化（用例）：Pickle
用于数据交换：JSON
用于长期和增量更新：SQL
等等。

And if you want to daily update the stock prices and for later usage, I prefer Pandas with SQL Queries, of course this will add few lines of code to set up DB connection:

如果你想每天更新股票价格并供以后使用，我更喜欢Pandas with SQL Queries，当然这将添加几行代码来设置数据库连接：

from sqlalchemy import create_engine

new_data = getting_daily_price()
# You can also choose other db drivers instead of `sqlalchemy`
engine = create_engine('sqlite:///:memory:')
with engine.connect() as conn:
    new_data.to_sql('table_name', conn) # To Write
    df = pd.read_sql_table('sql_query', conn) # To Read

Python Pandas 持久缓存

提问by Luca C.

回答by nijm

回答by YaOzI

相关推荐

最近更新

标签

Python Pandas 持久缓存

提问by Luca C.

回答by nijm

回答by YaOzI

相关推荐

pandas 基于具有特定值的行创建一个新的数据框

pandas 如何将列中的值更改为二进制？

可以在 Pandas 数据框中创建子列吗？

使用 Pandas 将值从一列复制到另一列

相关推荐

最近更新

标签