将 Pandas 数据帧拆分为多个行数相同的数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33922664/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:16:24  来源:igfitidea点击:

Split pandas dataframe into multiple dataframes with equal numbers of rows

pythonpandasdataframesplit

提问by adw

I have a dataframe df:

我有一个数据框df

        a              b          c
0   0.897134    -0.356157   -0.396212
1   -2.357861   2.066570    -0.512687
2   -0.080665   0.719328    0.604294
3   -0.639392   -0.912989   -1.029892
4   -0.550007   -0.633733   -0.748733
5   -0.712962   -1.612912   -0.248270
6   -0.571474   1.310807    -0.271137
7   -0.228068   0.675771    0.433016
8   0.005606    -0.154633   0.985484
9   0.691329    -0.837302   -0.607225
10  -0.011909   -0.304162   0.422001
11  0.127570    0.956831    1.837523
12  -1.074771   0.379723    -1.889117
13  -1.449475   -0.799574   -0.878192
14  -1.029757   0.551023    2.519929
15  -1.001400   0.838614    -1.006977
16  0.677216    -0.403859   0.451338
17  0.221596    -0.323259   0.324158
18  -0.241935   -2.251687   -0.088494
19  -0.995426   0.665569    -2.228848
20  1.714709    -0.353391   0.671539
21  0.155050    1.136433    -0.005721
22  -0.502412   -0.610901   1.520165
23  -0.853906   0.648321    1.124464
24  1.149151    -0.187300   -0.412946
25  0.329229    -1.690569   -2.746895
26  0.165158    0.173424    0.896344
27  1.157766    0.525674    -1.279618
28  1.729730    -0.798158   0.644869
29  -0.107285   -1.290374   0.544023

that I need to split into multiple dataframes that will contain every 10 rows of df, and every small dataframe I will write to separate file. so I decided create multilevel dataframe, and for this first assign the index to every 10 rows in my dfwith this method:

我需要分成多个数据帧,这些数据帧将包含每 10 行的df,以及我将写入单独文件的每个小数据帧。所以我决定创建多级数据框,为此首先df使用此方法将索引分配给我的每 10 行:

df['split'] = df['split'].apply(lambda x: np.searchsorted(df.iloc[::10], x, side='right')[0])

that throws out

抛出

TypeError: 'function' object has no attribute '__getitem__'

So, do you have an idea of how to fix it? Where is my method wrong?

那么,您知道如何修复它吗?我的方法哪里错了?

But if you have another approach to split my dataframe into multiple dataframes every of which contains 10 rows of df, you are also welcome, cause this approach was just the first I thought about, but I'm not sure that it's the best one.

但是,如果您有另一种方法将我的数据帧拆分为多个数据帧,每个数据帧包含 10 行df,也欢迎您,因为这种方法只是我想到的第一个方法,但我不确定它是最好的方法。

回答by adw

There are many ways to do what you want, your method looks over-complicated. A groupby using a scaled index as the grouping key would work:

有很多方法可以做你想做的事,你的方法看起来过于复杂。使用缩放索引作为分组键的 groupby 将起作用:

df = pd.DataFrame(data=np.random.rand(100, 3), columns=list('ABC'))
groups = df.groupby(np.arange(len(df.index))/10)
for (frameno, frame) in groups:
    frame.to_csv("%s.csv" % frameno)

回答by Alexander

You can use a dictionary comprehension to save slices of the dataframe in groups of ten rows:

您可以使用字典理解以十行为一组保存数据帧的切片:

df_dict = {n: df.iloc[n:n+10, :] 
           for n in range(0, len(df), 10)}

>>> df_dict.keys()
[0, 10, 20]

>>> df_dict[10]
           a         b         c
10 -0.011909 -0.304162  0.422001
11  0.127570  0.956831  1.837523
12 -1.074771  0.379723 -1.889117
13 -1.449475 -0.799574 -0.878192
14 -1.029757  0.551023  2.519929
15 -1.001400  0.838614 -1.006977
16  0.677216 -0.403859  0.451338
17  0.221596 -0.323259  0.324158
18 -0.241935 -2.251687 -0.088494
19 -0.995426  0.665569 -2.228848