Pandas - 将大数据帧切成块

Question

提问by Walt Reed

I have a large dataframe (>3MM rows) that I'm trying to pass through a function (the one below is largely simplified), and I keep getting a Memory Errormessage.

我有一个大数据框（> 3MM 行），我正试图通过一个函数（下面的函数在很大程度上进行了简化），并且我不断收到一条Memory Error消息。

I think I'm passing too large of a dataframe into the function, so I'm trying to:

我想我将太大的数据帧传递到函数中，所以我试图：

1) Slice the dataframe into smaller chunks (preferably sliced by AcctName)

1) 将数据帧切成更小的块（最好由切片AcctName）

2) Pass the dataframe into the function

2）将数据帧传递给函数

3) Concatenate the dataframes back into one large dataframe

3）将数据帧连接回一个大数据帧

def trans_times_2(df):
    df['Double_Transaction'] = df['Transaction'] * 2

large_df 
AcctName   Timestamp    Transaction
ABC        12/1         12.12
ABC        12/2         20.89
ABC        12/3         51.93    
DEF        12/2         13.12
DEF        12/8          9.93
DEF        12/9         92.09
GHI        12/1         14.33
GHI        12/6         21.99
GHI        12/12        98.81

I know that my function works properly, since it will work on a smaller dataframe (e.g. 40,000 rows). I tried the following, but I was unsuccessful with concatenating the small dataframes back into one large dataframe.

我知道我的函数可以正常工作，因为它可以在较小的数据帧（例如 40,000 行）上工作。我尝试了以下操作，但是将小数据帧连接回一个大数据帧没有成功。

def split_df(df):
    new_df = []
    AcctNames = df.AcctName.unique()
    DataFrameDict = {elem: pd.DataFrame for elem in AcctNames}
    key_list = [k for k in DataFrameDict.keys()]
    new_df = []
    for key in DataFrameDict.keys():
        DataFrameDict[key] = df[:][df.AcctNames == key]
        trans_times_2(DataFrameDict[key])
    rejoined_df = pd.concat(new_df)

How I envision the dataframes being split:

我如何设想被拆分的数据帧：

df1
AcctName   Timestamp    Transaction  Double_Transaction
ABC        12/1         12.12        24.24
ABC        12/2         20.89        41.78
ABC        12/3         51.93        103.86

df2
AcctName   Timestamp    Transaction  Double_Transaction
DEF        12/2         13.12        26.24
DEF        12/8          9.93        19.86
DEF        12/9         92.09        184.18

df3
AcctName   Timestamp    Transaction  Double_Transaction
GHI        12/1         14.33        28.66
GHI        12/6         21.99        43.98
GHI        12/12        98.81        197.62

Answer 1

回答by Scott Boston

You can use list comprehension to split your dataframe into smaller dataframes contained in a list.

您可以使用列表理解将数据帧拆分为列表中包含的较小数据帧。

n = 200000  #chunk row size
list_df = [df[i:i+n] for i in range(0,df.shape[0],n)]

You can access the chunks with:

您可以通过以下方式访问块：

list_df[0]
list_df[1]
etc...

Then you can assemble it back into a one dataframe using pd.concat.

然后你可以使用 pd.concat 将它组装回一个单一的数据帧。

By AcctName

按帐户名称

list_df = []

for n,g in df.groupby('AcctName'):
    list_df.append(g)

Pandas - 将大数据帧切成块

提问by Walt Reed

回答by Scott Boston

相关推荐

最近更新

标签

Pandas - 将大数据帧切成块

提问by Walt Reed

回答by Scott Boston

相关推荐

在 Python Pandas Dataframe 中计算百分位数

pandas 在pandas df中查找timedelta对象的均值和标准差

如何在 Pandas 数据框中的特定列中搜索字符串值，如果存在，则给出数据框中该行的输出？

pandas 使用 matplotlib 从 JSON 绘制数据的最简单方法？

相关推荐

最近更新

标签