将 Pandas 数据帧拆分为多个行数相同的数据帧

Question

提问by adw

I have a dataframe df:

我有一个数据框df：

        a              b          c
0   0.897134    -0.356157   -0.396212
1   -2.357861   2.066570    -0.512687
2   -0.080665   0.719328    0.604294
3   -0.639392   -0.912989   -1.029892
4   -0.550007   -0.633733   -0.748733
5   -0.712962   -1.612912   -0.248270
6   -0.571474   1.310807    -0.271137
7   -0.228068   0.675771    0.433016
8   0.005606    -0.154633   0.985484
9   0.691329    -0.837302   -0.607225
10  -0.011909   -0.304162   0.422001
11  0.127570    0.956831    1.837523
12  -1.074771   0.379723    -1.889117
13  -1.449475   -0.799574   -0.878192
14  -1.029757   0.551023    2.519929
15  -1.001400   0.838614    -1.006977
16  0.677216    -0.403859   0.451338
17  0.221596    -0.323259   0.324158
18  -0.241935   -2.251687   -0.088494
19  -0.995426   0.665569    -2.228848
20  1.714709    -0.353391   0.671539
21  0.155050    1.136433    -0.005721
22  -0.502412   -0.610901   1.520165
23  -0.853906   0.648321    1.124464
24  1.149151    -0.187300   -0.412946
25  0.329229    -1.690569   -2.746895
26  0.165158    0.173424    0.896344
27  1.157766    0.525674    -1.279618
28  1.729730    -0.798158   0.644869
29  -0.107285   -1.290374   0.544023

that I need to split into multiple dataframes that will contain every 10 rows of df, and every small dataframe I will write to separate file. so I decided create multilevel dataframe, and for this first assign the index to every 10 rows in my dfwith this method:

我需要分成多个数据帧，这些数据帧将包含每 10 行的df，以及我将写入单独文件的每个小数据帧。所以我决定创建多级数据框，为此首先df使用此方法将索引分配给我的每 10 行：

df['split'] = df['split'].apply(lambda x: np.searchsorted(df.iloc[::10], x, side='right')[0])

that throws out

抛出

TypeError: 'function' object has no attribute '__getitem__'

So, do you have an idea of how to fix it? Where is my method wrong?

那么，您知道如何修复它吗？我的方法哪里错了？

But if you have another approach to split my dataframe into multiple dataframes every of which contains 10 rows of df, you are also welcome, cause this approach was just the first I thought about, but I'm not sure that it's the best one.

但是，如果您有另一种方法将我的数据帧拆分为多个数据帧，每个数据帧包含 10 行df，也欢迎您，因为这种方法只是我想到的第一个方法，但我不确定它是最好的方法。

Answer 1

回答by adw

There are many ways to do what you want, your method looks over-complicated. A groupby using a scaled index as the grouping key would work:

有很多方法可以做你想做的事，你的方法看起来过于复杂。使用缩放索引作为分组键的 groupby 将起作用：

df = pd.DataFrame(data=np.random.rand(100, 3), columns=list('ABC'))
groups = df.groupby(np.arange(len(df.index))/10)
for (frameno, frame) in groups:
    frame.to_csv("%s.csv" % frameno)

Answer 2

回答by Alexander

You can use a dictionary comprehension to save slices of the dataframe in groups of ten rows:

您可以使用字典理解以十行为一组保存数据帧的切片：

df_dict = {n: df.iloc[n:n+10, :] 
           for n in range(0, len(df), 10)}

>>> df_dict.keys()
[0, 10, 20]

>>> df_dict[10]
           a         b         c
10 -0.011909 -0.304162  0.422001
11  0.127570  0.956831  1.837523
12 -1.074771  0.379723 -1.889117
13 -1.449475 -0.799574 -0.878192
14 -1.029757  0.551023  2.519929
15 -1.001400  0.838614 -1.006977
16  0.677216 -0.403859  0.451338
17  0.221596 -0.323259  0.324158
18 -0.241935 -2.251687 -0.088494
19 -0.995426  0.665569 -2.228848

将 Pandas 数据帧拆分为多个行数相同的数据帧

提问by adw

回答by adw

回答by Alexander

相关推荐

最近更新

标签

将 Pandas 数据帧拆分为多个行数相同的数据帧

提问by adw

回答by adw

回答by Alexander

相关推荐

pandas 使用熊猫数据框计算加权平均值

如果条件满足，pandas 创建一列等于另一列

pandas 读取熊猫中除最后一行之外的所有 CSV 文件

将 Pandas 数据框转换为包含索引、数据和列的列表列表

相关推荐

最近更新

标签