Python Pandas:如何将一行移动到数据框的第一行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32547440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:52:58  来源:igfitidea点击:

Python Pandas: How to move one row to the first row of a Dataframe?

pythonnumpypandasdataframe

提问by Rex

Given an existing Dataframe that is indexed.

给定一个已编入索引的现有 Dataframe。

>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
          a         b         c         d         e
0 -0.131666 -0.315019  0.306728 -0.642224 -0.294562
1  0.769310 -1.277065  0.735549 -0.900214 -1.826320
2 -1.561325 -0.155571  0.544697  0.275880 -0.451564
3  0.612561 -0.540457  2.390871 -2.699741  0.534807
4 -1.504476 -2.113726  0.785208 -1.037256 -0.292959
5  0.467429  1.327839 -1.666649  1.144189  0.322896
6 -0.306556  1.668364  0.036508  0.596452  0.066755
7 -1.689779  1.469891 -0.068087 -1.113231  0.382235
8  0.028250 -2.145618  0.555973 -0.473131 -0.638056
9  0.633408 -0.791857  0.933033  1.485575 -0.021429
>>> df.set_index("a")
                  b         c         d         e
a                                                
-0.131666 -0.315019  0.306728 -0.642224 -0.294562
 0.769310 -1.277065  0.735549 -0.900214 -1.826320
-1.561325 -0.155571  0.544697  0.275880 -0.451564
 0.612561 -0.540457  2.390871 -2.699741  0.534807
-1.504476 -2.113726  0.785208 -1.037256 -0.292959
 0.467429  1.327839 -1.666649  1.144189  0.322896
-0.306556  1.668364  0.036508  0.596452  0.066755
-1.689779  1.469891 -0.068087 -1.113231  0.382235
 0.028250 -2.145618  0.555973 -0.473131 -0.638056
 0.633408 -0.791857  0.933033  1.485575 -0.021429

How to move the 3rd row to the first row?

如何将第三行移动到第一行?

That says, expected result:

也就是说,预期结果:

                  b         c         d         e
a                                                
-1.561325 -0.155571  0.544697  0.275880 -0.451564
-0.131666 -0.315019  0.306728 -0.642224 -0.294562
 0.769310 -1.277065  0.735549 -0.900214 -1.826320
 0.612561 -0.540457  2.390871 -2.699741  0.534807
-1.504476 -2.113726  0.785208 -1.037256 -0.292959
 0.467429  1.327839 -1.666649  1.144189  0.322896
-0.306556  1.668364  0.036508  0.596452  0.066755
-1.689779  1.469891 -0.068087 -1.113231  0.382235
 0.028250 -2.145618  0.555973 -0.473131 -0.638056
 0.633408 -0.791857  0.933033  1.485575 -0.021429

Now the original first row should become the second row.

现在原来的第一行应该变成第二行。

采纳答案by Rex

Reindexing is probably the optimal solution for putting the rows in any new order in 1 apparent step, except it may require producing a new DataFrame which could be prohibitively large.

重新索引可能是在 1 个明显的步骤中以任何新顺序放置行的最佳解决方案,除非它可能需要生成一个可能非常大的新 DataFrame。

For example

例如

import pandas as pd

t = pd.read_csv('table.txt',sep='\s+')
t
Out[81]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
0   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
1   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
2   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

t.index
Out[82]: Int64Index([0, 1, 2, 3], dtype='int64')

t2 = t.reindex([2,0,1,3]) # cannot do this in place
t2
Out[93]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
2   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
0   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
1   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

Now the index can be set back to range(4) without reindexing:

现在可以将索引设置回 range(4) 而无需重新索引:

t2.index=range(4)
Out[102]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
0   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
1   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
2   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

It can also be done with 'tuple switching' and row selection as a basic mechanism and without creating a new DataFrame. For example:

它也可以通过“元组切换”和行选择作为基本机制来完成,而无需创建新的 DataFrame。例如:

import pandas as pd

t = pd.read_csv('table.txt',sep='\s+')

t.ix[1], t.ix[2] = t.ix[2], t.ix[1]
t.ix[0], t.ix[1] = t.ix[1], t.ix[0]  
t
Out[96]: 
  DG/VD   TYPE State Access Consist Cache sCC   Size Units   Name
0   2/2  RAID1  Optl     RW      No  RWTD   -  1.818    TB  three
1   0/0  RAID1  Optl     RW      No  RWTD   -  1.818    TB    one
2   1/1  RAID1  Optl     RW      No  RWTD   -  1.818    TB    two
3   3/3  RAID1  Optl     RW      No  RWTD   -  1.818    TB   four

Another in place method sets the DataFrame index for the desired ordering so that, for example, the 3rd row gets index 0, etc. and then the DataFrame is sorted in place. It's encapsulated in the following function that assumes the rows are indexed with some range(m) for positive integer m and the DataFrame is simply indexed (no MultiIndex) as in the example provided in the question.

另一种就地方法为所需的排序设置 DataFrame 索引,例如,第 3 行获得索引 0 等,然后 DataFrame 就地排序。它封装在以下函数中,该函数假设行以某个范围(m)为正整数 m 进行索引,并且 DataFrame 只是索引(无 MultiIndex),如问题中提供的示例所示。

def putfirst(n,df):
    if not isinstance(n, int):
        print 'error: 1st arg must be an int'
        return
    if n < 1:
        print 'error: 1st arg must be an int > 0'
        return
    if n == 1:
       print 'nothing to do when first arg == 1'
       return
    if n > len(df):
       print 'error: n exceeds the number of rows in the DataFrame'
       return
    df.index = range(1,n) + [0] + range(n,df.index[-1]+1)
    df.sort(inplace=True)

The arguments of putfirst are n, which is the ordinal position of the row to relocate to the first row position, so that if the 3rd row is to be so relocated then n = 3; and df is the DataFrame containing the row to be relocated.

putfirst 的参数是 n,这是要重定位到第一行位置的行的序数位置,因此如果要重定位第 3 行,则 n = 3;df 是包含要重新定位的行的 DataFrame。

Here is a demo:

这是一个演示:

import pandas as pd

df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])

df.set_index("a") # ineffective without assignment or inplace=True
Out[182]: 
                  b         c         d         e
a                                                
 1.394072 -1.076742 -0.192466 -0.871188  0.420852
-1.211411 -0.258867 -0.581647 -1.260421  0.464575
-1.070241  0.804223 -0.156736  2.010390 -0.887104
-0.977936 -0.267217  0.483338 -0.400333  0.449880
 0.399594 -0.151575 -2.557934  0.160807  0.076525
-0.297204 -1.294274 -0.885180 -0.187497 -0.493560
-0.115413 -0.350745  0.044697 -0.897756  0.890874
-1.151185 -2.612303  1.141250 -0.867136  0.383583
-0.437030  0.347489 -1.230179  0.571078  0.060061
-0.225524  1.349726  1.350300 -0.386653  0.865990

df
Out[183]: 
          a         b         c         d         e
0  1.394072 -1.076742 -0.192466 -0.871188  0.420852
1 -1.211411 -0.258867 -0.581647 -1.260421  0.464575
2 -1.070241  0.804223 -0.156736  2.010390 -0.887104
3 -0.977936 -0.267217  0.483338 -0.400333  0.449880
4  0.399594 -0.151575 -2.557934  0.160807  0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745  0.044697 -0.897756  0.890874
7 -1.151185 -2.612303  1.141250 -0.867136  0.383583
8 -0.437030  0.347489 -1.230179  0.571078  0.060061
9 -0.225524  1.349726  1.350300 -0.386653  0.865990

df.index
Out[184]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')

putfirst(3,df)
df
Out[186]: 
          a         b         c         d         e
0 -1.070241  0.804223 -0.156736  2.010390 -0.887104
1  1.394072 -1.076742 -0.192466 -0.871188  0.420852
2 -1.211411 -0.258867 -0.581647 -1.260421  0.464575
3 -0.977936 -0.267217  0.483338 -0.400333  0.449880
4  0.399594 -0.151575 -2.557934  0.160807  0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745  0.044697 -0.897756  0.890874
7 -1.151185 -2.612303  1.141250 -0.867136  0.383583
8 -0.437030  0.347489 -1.230179  0.571078  0.060061
9 -0.225524  1.349726  1.350300 -0.386653  0.865990

回答by Alexander

To move the third row to the first, you can create an index moving the target row to the first element. I use a conditional list comprehension to join by lists.

要将第三行移动到第一行,您可以创建一个将目标行移动到第一个元素的索引。我使用条件列表理解来按列表加入。

Then, just use ilocto select the desired index rows.

然后,只需使用iloc来选择所需的索引行。

np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 3),columns=['a', 'b', 'c'])
>>> df
          a         b         c
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
2  0.950088 -0.151357 -0.103219
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

target_row = 2
# Move target row to first element of list.
idx = [target_row] + [i for i in range(len(df)) if i != target_row]

>>> df.iloc[idx]
          a         b         c
2  0.950088 -0.151357 -0.103219
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

if desired, you can also reset your index.

如果需要,您还可以重置索引。

>>> df.iloc[idx].reset_index(drop=True)
          a         b         c
0  0.950088 -0.151357 -0.103219
1  1.764052  0.400157  0.978738
2  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

Alternatively, you can just reindex the list using idx:

或者,您可以使用idx以下方法重新索引列表:

>>> df.reindex(idx)
          a         b         c
2  0.950088 -0.151357 -0.103219
0  1.764052  0.400157  0.978738
1  2.240893  1.867558 -0.977278
3  0.410599  0.144044  1.454274
4  0.761038  0.121675  0.443863

回答by user3017048

This is not elegant, but works so far:

这并不优雅,但到目前为止有效:

>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
      a         b         c         d         e
0  1.124763 -0.416770  1.347839 -0.944334  0.738686
1 -0.348112  0.786822 -1.161970 -1.645065 -0.075205
2  0.549966  0.357076 -0.880669 -0.187731 -0.221997
3  0.311057 -0.126432 -1.187644  2.151804  0.791835
4 -0.310849  0.753750 -1.087447  0.095884  1.449832
5 -0.272344  0.278788 -0.724369 -0.568442  0.164909
6  0.942927 -0.273203  0.203322  1.099572 -0.505160
7  0.526321  1.665012  0.915676 -1.174497 -2.270662
8 -0.959773  0.921732  1.396364 -1.383112  0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605  0.578198
>>> row = df.ix[0].copy()
>>> row
a    1.124763
b   -0.416770
c    1.347839
d   -0.944334
e    0.738686
Name: 0, dtype: float64
>>> df.ix[0]=df.ix[2]
>>> df.ix[2]=row
>>> df
          a         b         c         d         e
0  0.549966  0.357076 -0.880669 -0.187731 -0.221997
1 -0.348112  0.786822 -1.161970 -1.645065 -0.075205
2  1.124763 -0.416770  1.347839 -0.944334  0.738686
3  0.311057 -0.126432 -1.187644  2.151804  0.791835
4 -0.310849  0.753750 -1.087447  0.095884  1.449832
5 -0.272344  0.278788 -0.724369 -0.568442  0.164909
6  0.942927 -0.273203  0.203322  1.099572 -0.505160
7  0.526321  1.665012  0.915676 -1.174497 -2.270662
8 -0.959773  0.921732  1.396364 -1.383112  0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605  0.578198
>>> df.set_index('a')
                  b         c         d         e
a                                                
 0.549966  0.357076 -0.880669 -0.187731 -0.221997
-0.348112  0.786822 -1.161970 -1.645065 -0.075205
 1.124763 -0.416770  1.347839 -0.944334  0.738686
 0.311057 -0.126432 -1.187644  2.151804  0.791835
-0.310849  0.753750 -1.087447  0.095884  1.449832
-0.272344  0.278788 -0.724369 -0.568442  0.164909
 0.942927 -0.273203  0.203322  1.099572 -0.505160
 0.526321  1.665012  0.915676 -1.174497 -2.270662
-0.959773  0.921732  1.396364 -1.383112  0.603030
-2.802902 -0.572469 -1.599550 -1.305605  0.578198

If that's what you want...

如果那是你想要的...

回答by Nader Hisham

df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])

you can simply do the following

您可以简单地执行以下操作

df.reindex([2, 0 ,1] + range(3, len(df)))

or you can do the following

或者您可以执行以下操作

pd.concat([ df.reindex([2, 0, 1]) , df.iloc[3:]])

# this line rearrange the first 3 rows
df.reindex([2, 0, 1])

# slice data from third row 
df.iloc[3:]

# concatenate both results together
pd.concat([ df.reindex([2, 0 ,1]), df.iloc[3:]])