将 Pandas 数据帧拆分为 N 块

Question

提问by Henrik Poulsen

I'm currently trying to split a pandas dataframe into an unknown number of chunks containing each N rows.

我目前正在尝试将 Pandas 数据帧拆分为包含每 N 行的未知数量的块。

I have tried using numpy.array_split() this funktion however splits the dataframe into N chunks containing an unknown number of rows.

我曾尝试使用 numpy.array_split() 这个功能，但是将数据帧分成 N 个包含未知行数的块。

Is there a clever way to split a python dataframe into multiple dataframes, each containing a specific number of rows from the parent dataframe

是否有一种巧妙的方法可以将 python 数据帧拆分为多个数据帧，每个数据帧都包含来自父数据帧的特定行数

Answer 1

回答by James Schinner

You can try this:

你可以试试这个：

def rolling(df, window, step):
    count = 0
    df_length = len(df)
    while count < (df_length -window):
        yield count, df[count:window+count]
        count += step

Usage:

用法：

for offset, window in rolling(df, 100, 100):
    # |     |                      |     |
    # |     The current chunk.     |     How many rows to step at a time.
    # The current offset index.    How many rows in each chunk.
    # your code here
    pass

There is also this simpler idea:

还有一个更简单的想法：

def chunk(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

Usage:

用法：

for df_chunk in chunk(df, 100):
    #                     |
    #                     The chunk size
    # your code here

BTW. All this can be found on SO, with a search.

顺便提一句。所有这些都可以通过搜索在 SO 上找到。

Answer 2

回答by nnnmmm

You can calculate the number of splits from N:

您可以从 N 计算拆分数：

splits = int(np.floor(len(df.index)/N))
chunks = np.split(df.iloc[:splits*N], splits)
chunks.append(df.iloc[splits*N:])

Answer 3

回答by Romain Jouin

calculate the index of splits :

计算分裂指数：

size_of_chunks =  3
index_for_chunks = list(range(0, index.max(), size_of_chunks))
index_for_chunks.extend([index.max()+1])

use them to split the df :

使用它们来分割 df ：

dfs = {}
for i in range(len(index_for_chunks)-1):
    dfs[i] = df.iloc[index_for_chunks[i]:index_for_chunks[i+1]]

将 Pandas 数据帧拆分为 N 块

提问by Henrik Poulsen

回答by James Schinner

回答by nnnmmm

回答by Romain Jouin

相关推荐

最近更新

标签

将 Pandas 数据帧拆分为 N 块

提问by Henrik Poulsen

回答by James Schinner

回答by nnnmmm

回答by Romain Jouin

相关推荐

pandas 从数据框熊猫中获取单个值

pandas 忽略nan的Python比较

具有冗余 nan 类别的 Pandas groupby

pandas ValueError：DataFrame 的真值不明确

相关推荐

最近更新

标签