Pandas 使用行索引拆分数据帧

Question

提问by Pradeep Tummala

I want to split dataframe by uneven number of rows using row index.

我想使用行索引按奇数行拆分数据帧。

The below code:

下面的代码：

groups = df.groupby((np.arange(len(df.index))/l[1]).astype(int))

works only for uniform number of rows.

仅适用于统一数量的行。

df

a b c  
1 1 1  
2 2 2  
3 3 3  
4 4 4  
5 5 5  
6 6 6  
7 7 7  

l = [2, 5, 7]

df1  
1 1 1  
2 2 2  

df2  
3,3,3  
4,4,4  
5,5,5  

df3  
6,6,6  
7,7,7  

df4  
8,8,8

Answer 1

回答by Scott Boston

You could use list comprehension with a little modications your list, l, first.

您可以先使用列表理解并稍加修改您的列表 l。

print(df)

   a  b  c
0  1  1  1
1  2  2  2
2  3  3  3
3  4  4  4
4  5  5  5
5  6  6  6
6  7  7  7
7  8  8  8


l = [2,5,7]
l_mod = [0] + l + [max(l)+1]

list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]

Output:

输出：

list_of_dfs[0]

   a  b  c
0  1  1  1
1  2  2  2

list_of_dfs[1]

   a  b  c
2  3  3  3
3  4  4  4
4  5  5  5

list_of_dfs[2]

   a  b  c
5  6  6  6
6  7  7  7

list_of_dfs[3]

   a  b  c
7  8  8  8

Answer 2

回答by Mohit Motwani

I think this is what you need:

我认为这就是你需要的：

df = pd.DataFrame({'a': np.arange(1, 8),
                  'b': np.arange(1, 8),
                  'c': np.arange(1, 8)})
df.head()
    a   b   c
0   1   1   1
1   2   2   2
2   3   3   3
3   4   4   4
4   5   5   5
5   6   6   6
6   7   7   7

last_check = 0
dfs = []
for ind in [2, 5, 7]:
    dfs.append(df.loc[last_check:ind-1])
    last_check = ind

Although list comprehension are much more efficient than a for loop, the last_check is necessary if you don't have a pattern in your list of indices.

尽管列表理解比 for 循环高效得多，但如果索引列表中没有模式，则必须使用 last_check。

dfs[0]

    a   b   c
0   1   1   1
1   2   2   2

dfs[2]

    a   b   c
5   6   6   6
6   7   7   7

Answer 3

回答by Mohamed Thasin ah

I think this is you are looking for.,

我想这就是你要找的。，

l = [2, 5, 7]
dfs=[]
i=0
for val in l:
    if i==0:
        temp=df.iloc[:val]
        dfs.append(temp)
    elif i==len(l):
        temp=df.iloc[val]
        dfs.append(temp)        
    else:
        temp=df.iloc[l[i-1]:val]
        dfs.append(temp)
    i+=1

Output:

输出：

Another Solution:

另一个解决方案：

l = [2, 5, 7]
t= np.arange(l[-1])
l.reverse()
for val in l:
    t[:val]=val
temp=pd.DataFrame(t)
temp=pd.concat([df,temp],axis=1)
for u,v in temp.groupby(0):
    print v

Output:

输出：

   a  b  c  0
0  1  1  1  2
1  2  2  2  2
   a  b  c  0
2  3  3  3  5
3  4  4  4  5
4  5  5  5  5
   a  b  c  0
5  6  6  6  7
6  7  7  7  7

Answer 4

回答by jpp

You can create an array to use for indexing via NumPy:

您可以通过 NumPy 创建一个用于索引的数组：

import pandas as pd, numpy as np

df = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list('abc'))

L = [2, 5, 7]
idx = np.cumsum(np.in1d(np.arange(len(df.index)), L))

for _, chunk in df.groupby(idx):
    print(chunk, '\n')

   a  b  c
0  0  1  2
1  3  4  5 

    a   b   c
2   6   7   8
3   9  10  11
4  12  13  14 

    a   b   c
5  15  16  17
6  18  19  20 

    a   b   c
7  21  22  23

Instead of defining a new variable for each dataframe, you can use a dictionary:

您可以使用字典，而不是为每个数据框定义一个新变量：

d = dict(tuple(df.groupby(idx)))

print(d[1])  # print second groupby value

    a   b   c
2   6   7   8
3   9  10  11
4  12  13  14

Pandas 使用行索引拆分数据帧

提问by Pradeep Tummala

回答by Scott Boston

回答by Mohit Motwani

回答by Mohamed Thasin ah

回答by jpp

相关推荐

最近更新

标签

Pandas 使用行索引拆分数据帧

提问by Pradeep Tummala

回答by Scott Boston

回答by Mohit Motwani

回答by Mohamed Thasin ah

回答by jpp

相关推荐

Pandas - 删除列索引的标签

处理错误“TypeError：预期的元组，得到了str”将CSV加载到pandas多级和多索引（pandas）

Pandas：groupby 列出

pandas 制作熊猫系列的直方图

相关推荐

最近更新

标签