Python 将 Pandas 数据帧转换为 Dask 数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39721800/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:37:41  来源:igfitidea点击:

Convert Pandas dataframe to Dask dataframe

pythonpandasdataframedata-conversiondask

提问by rey

Suppose I have pandas dataframe as:

假设我有熊猫数据框:

df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

When I convert it into dask dataframe what should nameand divisionsparameter consist of:

当我将其转换为 dask 数据帧时,应该包含哪些内容namedivisions参数:

from dask import dataframe as dd 
sd=dd.DataFrame(df.to_dict(),divisions=1,meta=pd.DataFrame(columns=df.columns,index=df.index))

TypeError: init() missing 1 required positional argument: 'name'

TypeError: init() 缺少 1 个必需的位置参数:'name'

Edit: Suppose I create a pandas dataframe like:

编辑:假设我创建了一个熊猫数据框,如:

pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

Similarly how to create dask dataframe as it needs three additional arguments as name,divisionsand meta.

同样如何创建 dask 数据帧,因为它需要三个额外的参数name,divisionsmeta

sd=dd.Dataframe({'a':[1,2,3],'b':[4,5,6]},name=,meta=,divisions=)

Thank you for your reply.

感谢你的回复。

回答by jezrael

I think you can use dask.dataframe.from_pandas:

我认为你可以使用dask.dataframe.from_pandas

from dask import dataframe as dd 
sd = dd.from_pandas(df, npartitions=3)
print (sd)
dd.DataFrame<from_pa..., npartitions=2, divisions=(0, 1, 2)>

EDIT:

编辑:

I find solution:

我找到解决方案

import pandas as pd
import dask.dataframe as dd
from dask.dataframe.utils import make_meta

df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

dsk = {('x', 0): df}

meta = make_meta({'a': 'i8', 'b': 'i8'}, index=pd.Index([], 'i8'))
d = dd.DataFrame(dsk, name='x', meta=meta, divisions=[0, 1, 2])
print (d)
dd.DataFrame<x, npartitions=2, divisions=(0, 1, 2)>