Python Pandas 持久缓存

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51235360/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:47:12  来源:igfitidea点击:

Python pandas persistent cache

pandascachingpersistencefinancial

提问by Luca C.

Is there an implementation for python pandas that cache the data on disk so I can avoid to reproduce it every time?

是否有将数据缓存在磁盘上的 python pandas 实现,这样我就可以避免每次都重现它?

In particular is there a caching method for get_yahoo_datafor financial?

特别是get_yahoo_data对于财务有没有缓存方法?

A very plus would be:

一个非常加分的是:

  • very few lines of code to write
  • possibility to integrate the persisted series when new data is downloaded for the same source
  • 很少的代码行
  • 当为同一源下载新数据时,可以集成持久化系列

回答by nijm

There are many ways to achieve this, however probably the easiest way is to use the build in methods for writing and reading Python pickles. You can use pandas.DataFrame.to_pickleto store the DataFrame to disk and pandas.read_pickleto read the stored DataFrame from disk.

有很多方法可以实现这一点,但最简单的方法可能是使用内置方法来编写和读取Python pickles。您可以使用pandas.DataFrame.to_pickle将 DataFrame 存储到磁盘并pandas.read_pickle从磁盘读取存储的 DataFrame。

An example for a pandas.DataFrame:

一个例子pandas.DataFrame

# Store your DataFrame
df.to_pickle('cached_dataframe.pkl') # will be stored in current directory

# Read your DataFrame
df = pandas.read_pickle('cached_dataframe.pkl') # read from current directory

The same methods also work for pandas.Series:

同样的方法也适用于pandas.Series

# Store your Series
series.to_pickle('cached_series.pkl') # will be stored in current directory

# Read your DataFrame
series = pandas.read_pickle('cached_series.pkl') # read from current directory

回答by YaOzI

Depend on different requirements, there are a dozen of methodsto do that, to and fro, in CSV, Excel, JSON, Python Pickle Format, HDF5 and even SQL with DB, etc.

根据不同的要求,有十几种方法可以做到这一点,在 CSV、Excel、JSON、Python Pickle 格式、HDF5 甚至 SQL 与 DB 等中。

In terms of code lines, to/readmany of these formats are just one line of code for each direction. Python and Pandas already make the code as clean as possible, so you could worry less about that.

在代码行方面,to/read许多这些格式只是每个方向一行代码。Python 和 Pandas 已经使代码尽可能干净,因此您可以少担心。

I think there is no single solution to fit all requirements, really case by case:

我认为没有单一的解决方案可以满足所有要求,具体情况具体如下:

  • for human readability of saved data: CSV, Excel
  • for binary python object serialization (use-cases): Pickle
  • for data-interchange: JSON
  • for long-time and incrementally updating: SQL
  • etc.
  • 保存数据的可读性:CSV、Excel
  • 对于二进制 Python 对象序列化(用例):Pickle
  • 用于数据交换:JSON
  • 用于长期和增量更新:SQL
  • 等等。

And if you want to daily update the stock prices and for later usage, I prefer Pandas with SQL Queries, of course this will add few lines of code to set up DB connection:

如果你想每天更新股票价格并供以后使用,我更喜欢Pandas with SQL Queries,当然这将添加几行代码来设置数据库连接:

from sqlalchemy import create_engine

new_data = getting_daily_price()
# You can also choose other db drivers instead of `sqlalchemy`
engine = create_engine('sqlite:///:memory:')
with engine.connect() as conn:
    new_data.to_sql('table_name', conn) # To Write
    df = pd.read_sql_table('sql_query', conn) # To Read