Python 根据长度将数据帧拆分为相对均匀的块
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33367142/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Split dataframe into relatively even chunks according to length
提问by YKY
I have to create a function which would split provided dataframe into chunks of needed size. For instance if dataframe contains 1111 rows, I want to be able to specify chunk size of 400 rows, and get three smaller dataframes with sizes of 400, 400 and 311. Is there a convenience function to do the job? What would be the best way to store and iterate over sliced dataframe?
我必须创建一个函数,它将提供的数据帧拆分为所需大小的块。例如,如果数据帧包含 1111 行,我希望能够指定 400 行的块大小,并获得三个大小分别为 400、400 和 311 的较小数据帧。是否有方便的功能来完成这项工作?存储和迭代切片数据帧的最佳方法是什么?
Example DataFrame
示例数据帧
import numpy as np
import pandas as pd
test = pd.concat([pd.Series(np.random.rand(1111)), pd.Series(np.random.rand(1111))], axis = 1)
采纳答案by sinhrks
You can take the floor divisionof a sequence up to the amount of rows in the dataframe, and use it to groupby
splitting the dataframe into equally sized chunks:
您可以将序列的楼层划分达到数据帧中的行数,并使用它来groupby
将数据帧拆分为大小相等的块:
n = 400
for g, df in test.groupby(np.arange(len(test)) // n):
print(df.shape)
# (400, 2)
# (400, 2)
# (311, 2)
回答by Scott Boston
A more pythonic way to break large dataframes into smaller chunks based on fixed number of rows is to use list comprehension:
基于固定行数将大数据帧分解成更小的块的一种更 Pythonic 的方法是使用列表理解:
n = 400 #chunk row size
list_df = [test[i:i+n] for i in range(0,test.shape[0],n)]
[i.shape for i in list_df]
Output:
输出:
[(400, 2), (400, 2), (311, 2)]