如何在 Redis 中设置/获取 pandas.DataFrame?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37943778/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:25:52  来源:igfitidea点击:

How to set/get pandas.DataFrame to/from Redis?

pythonpandasdataframeredis

提问by Alex Luya

After setting a DataFrame to redis, then getting it back, redis returns a string and I can't figure out a way to convert this str to a DataFrame.

将 DataFrame 设置为 redis,然后将其取回后,redis 返回一个字符串,我想不出将这个 str 转换为 DataFrame 的方法。

How can I do these two appropriately?

我怎样才能正确地做到这两个?

回答by Alex Luya

set:

放:

redisConn.set("key", df.to_msgpack(compress='zlib'))

get:

得到:

pd.read_msgpack(redisConn.get("key"))

回答by Mark Chackerian

I couldn't use msgpack because of Decimalobjects in my dataframe. Instead I combined pickle and zlib together like this, assuming a dataframe dfand a local instance of Redis:

由于Decimal数据框中的对象,我无法使用 msgpack 。相反,我像这样将 pickle 和 zlib 组合在一起,假设有一个数据帧df和一个本地 Redis 实例:

import pickle
import redis
import zlib

EXPIRATION_SECONDS = 600

r = redis.StrictRedis(host='localhost', port=6379, db=0)

# Set
r.setex("key", EXPIRATION_SECONDS, zlib.compress( pickle.dumps(df)))

# Get
rehydrated_df = pickle.loads(zlib.decompress(r.get("key")))

There isn't anything dataframe specific about this.

没有任何关于此的特定数据框。

Caveats

注意事项

  • the other answer using msgpackis better -- use it if it works for you
  • pickling can be dangerous -- your Redis server needs to be secure or you're asking for trouble
  • 使用的另一个答案msgpack更好 - 如果它适合您,请使用它
  • 酸洗可能很危险——您的 Redis 服务器需要安全,否则您会自找麻烦

回答by Lucky M.E.

For caching a dataframe use this.

要缓存数据帧,请使用它。

import pyarrow as pa

def cache_df(alias,df):

    pool = redis.ConnectionPool(host='host', port='port', db='db')
    cur = redis.Redis(connection_pool=pool)
    context = pa.default_serialization_context()
    df_compressed =  context.serialize(df).to_buffer().to_pybytes()

    res = cur.set(alias,df_compressed)
    if res == True:
        print('df cached')

For fetching the cached dataframe use this.

要获取缓存的数据帧,请使用它。

def get_cached_df(alias):

    pool = redis.ConnectionPool(host='host',port='port', db='db') 
    cur = redis.Redis(connection_pool=pool)
    context = pa.default_serialization_context()
    all_keys = [key.decode("utf-8") for key in cur.keys()]

    if alias in all_keys:   
        result = cur.get(alias)

        dataframe = pd.DataFrame.from_dict(context.deserialize(result))

        return dataframe

    return None