Python Pandas:如何将一行移动到数据框的第一行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32547440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas: How to move one row to the first row of a Dataframe?
提问by Rex
Given an existing Dataframe that is indexed.
给定一个已编入索引的现有 Dataframe。
>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 -0.131666 -0.315019 0.306728 -0.642224 -0.294562
1 0.769310 -1.277065 0.735549 -0.900214 -1.826320
2 -1.561325 -0.155571 0.544697 0.275880 -0.451564
3 0.612561 -0.540457 2.390871 -2.699741 0.534807
4 -1.504476 -2.113726 0.785208 -1.037256 -0.292959
5 0.467429 1.327839 -1.666649 1.144189 0.322896
6 -0.306556 1.668364 0.036508 0.596452 0.066755
7 -1.689779 1.469891 -0.068087 -1.113231 0.382235
8 0.028250 -2.145618 0.555973 -0.473131 -0.638056
9 0.633408 -0.791857 0.933033 1.485575 -0.021429
>>> df.set_index("a")
b c d e
a
-0.131666 -0.315019 0.306728 -0.642224 -0.294562
0.769310 -1.277065 0.735549 -0.900214 -1.826320
-1.561325 -0.155571 0.544697 0.275880 -0.451564
0.612561 -0.540457 2.390871 -2.699741 0.534807
-1.504476 -2.113726 0.785208 -1.037256 -0.292959
0.467429 1.327839 -1.666649 1.144189 0.322896
-0.306556 1.668364 0.036508 0.596452 0.066755
-1.689779 1.469891 -0.068087 -1.113231 0.382235
0.028250 -2.145618 0.555973 -0.473131 -0.638056
0.633408 -0.791857 0.933033 1.485575 -0.021429
How to move the 3rd row to the first row?
如何将第三行移动到第一行?
That says, expected result:
也就是说,预期结果:
b c d e
a
-1.561325 -0.155571 0.544697 0.275880 -0.451564
-0.131666 -0.315019 0.306728 -0.642224 -0.294562
0.769310 -1.277065 0.735549 -0.900214 -1.826320
0.612561 -0.540457 2.390871 -2.699741 0.534807
-1.504476 -2.113726 0.785208 -1.037256 -0.292959
0.467429 1.327839 -1.666649 1.144189 0.322896
-0.306556 1.668364 0.036508 0.596452 0.066755
-1.689779 1.469891 -0.068087 -1.113231 0.382235
0.028250 -2.145618 0.555973 -0.473131 -0.638056
0.633408 -0.791857 0.933033 1.485575 -0.021429
Now the original first row should become the second row.
现在原来的第一行应该变成第二行。
采纳答案by Rex
Reindexing is probably the optimal solution for putting the rows in any new order in 1 apparent step, except it may require producing a new DataFrame which could be prohibitively large.
重新索引可能是在 1 个明显的步骤中以任何新顺序放置行的最佳解决方案,除非它可能需要生成一个可能非常大的新 DataFrame。
For example
例如
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
t
Out[81]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
t.index
Out[82]: Int64Index([0, 1, 2, 3], dtype='int64')
t2 = t.reindex([2,0,1,3]) # cannot do this in place
t2
Out[93]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
2 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
0 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
1 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Now the index can be set back to range(4) without reindexing:
现在可以将索引设置回 range(4) 而无需重新索引:
t2.index=range(4)
Out[102]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
It can also be done with 'tuple switching' and row selection as a basic mechanism and without creating a new DataFrame. For example:
它也可以通过“元组切换”和行选择作为基本机制来完成,而无需创建新的 DataFrame。例如:
import pandas as pd
t = pd.read_csv('table.txt',sep='\s+')
t.ix[1], t.ix[2] = t.ix[2], t.ix[1]
t.ix[0], t.ix[1] = t.ix[1], t.ix[0]
t
Out[96]:
DG/VD TYPE State Access Consist Cache sCC Size Units Name
0 2/2 RAID1 Optl RW No RWTD - 1.818 TB three
1 0/0 RAID1 Optl RW No RWTD - 1.818 TB one
2 1/1 RAID1 Optl RW No RWTD - 1.818 TB two
3 3/3 RAID1 Optl RW No RWTD - 1.818 TB four
Another in place method sets the DataFrame index for the desired ordering so that, for example, the 3rd row gets index 0, etc. and then the DataFrame is sorted in place. It's encapsulated in the following function that assumes the rows are indexed with some range(m) for positive integer m and the DataFrame is simply indexed (no MultiIndex) as in the example provided in the question.
另一种就地方法为所需的排序设置 DataFrame 索引,例如,第 3 行获得索引 0 等,然后 DataFrame 就地排序。它封装在以下函数中,该函数假设行以某个范围(m)为正整数 m 进行索引,并且 DataFrame 只是索引(无 MultiIndex),如问题中提供的示例所示。
def putfirst(n,df):
if not isinstance(n, int):
print 'error: 1st arg must be an int'
return
if n < 1:
print 'error: 1st arg must be an int > 0'
return
if n == 1:
print 'nothing to do when first arg == 1'
return
if n > len(df):
print 'error: n exceeds the number of rows in the DataFrame'
return
df.index = range(1,n) + [0] + range(n,df.index[-1]+1)
df.sort(inplace=True)
The arguments of putfirst are n, which is the ordinal position of the row to relocate to the first row position, so that if the 3rd row is to be so relocated then n = 3; and df is the DataFrame containing the row to be relocated.
putfirst 的参数是 n,这是要重定位到第一行位置的行的序数位置,因此如果要重定位第 3 行,则 n = 3;df 是包含要重新定位的行的 DataFrame。
Here is a demo:
这是一个演示:
import pandas as pd
df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
df.set_index("a") # ineffective without assignment or inplace=True
Out[182]:
b c d e
a
1.394072 -1.076742 -0.192466 -0.871188 0.420852
-1.211411 -0.258867 -0.581647 -1.260421 0.464575
-1.070241 0.804223 -0.156736 2.010390 -0.887104
-0.977936 -0.267217 0.483338 -0.400333 0.449880
0.399594 -0.151575 -2.557934 0.160807 0.076525
-0.297204 -1.294274 -0.885180 -0.187497 -0.493560
-0.115413 -0.350745 0.044697 -0.897756 0.890874
-1.151185 -2.612303 1.141250 -0.867136 0.383583
-0.437030 0.347489 -1.230179 0.571078 0.060061
-0.225524 1.349726 1.350300 -0.386653 0.865990
df
Out[183]:
a b c d e
0 1.394072 -1.076742 -0.192466 -0.871188 0.420852
1 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
2 -1.070241 0.804223 -0.156736 2.010390 -0.887104
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
df.index
Out[184]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')
putfirst(3,df)
df
Out[186]:
a b c d e
0 -1.070241 0.804223 -0.156736 2.010390 -0.887104
1 1.394072 -1.076742 -0.192466 -0.871188 0.420852
2 -1.211411 -0.258867 -0.581647 -1.260421 0.464575
3 -0.977936 -0.267217 0.483338 -0.400333 0.449880
4 0.399594 -0.151575 -2.557934 0.160807 0.076525
5 -0.297204 -1.294274 -0.885180 -0.187497 -0.493560
6 -0.115413 -0.350745 0.044697 -0.897756 0.890874
7 -1.151185 -2.612303 1.141250 -0.867136 0.383583
8 -0.437030 0.347489 -1.230179 0.571078 0.060061
9 -0.225524 1.349726 1.350300 -0.386653 0.865990
回答by Alexander
To move the third row to the first, you can create an index moving the target row to the first element. I use a conditional list comprehension to join by lists.
要将第三行移动到第一行,您可以创建一个将目标行移动到第一个元素的索引。我使用条件列表理解来按列表加入。
Then, just use ilocto select the desired index rows.
然后,只需使用iloc来选择所需的索引行。
np.random.seed(0)
df = pd.DataFrame(np.random.randn(5, 3),columns=['a', 'b', 'c'])
>>> df
a b c
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
2 0.950088 -0.151357 -0.103219
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
target_row = 2
# Move target row to first element of list.
idx = [target_row] + [i for i in range(len(df)) if i != target_row]
>>> df.iloc[idx]
a b c
2 0.950088 -0.151357 -0.103219
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
if desired, you can also reset your index.
如果需要,您还可以重置索引。
>>> df.iloc[idx].reset_index(drop=True)
a b c
0 0.950088 -0.151357 -0.103219
1 1.764052 0.400157 0.978738
2 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
Alternatively, you can just reindex the list using idx:
或者,您可以使用idx以下方法重新索引列表:
>>> df.reindex(idx)
a b c
2 0.950088 -0.151357 -0.103219
0 1.764052 0.400157 0.978738
1 2.240893 1.867558 -0.977278
3 0.410599 0.144044 1.454274
4 0.761038 0.121675 0.443863
回答by user3017048
This is not elegant, but works so far:
这并不优雅,但到目前为止有效:
>>> df = pd.DataFrame(np.random.randn(10, 5),columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 1.124763 -0.416770 1.347839 -0.944334 0.738686
1 -0.348112 0.786822 -1.161970 -1.645065 -0.075205
2 0.549966 0.357076 -0.880669 -0.187731 -0.221997
3 0.311057 -0.126432 -1.187644 2.151804 0.791835
4 -0.310849 0.753750 -1.087447 0.095884 1.449832
5 -0.272344 0.278788 -0.724369 -0.568442 0.164909
6 0.942927 -0.273203 0.203322 1.099572 -0.505160
7 0.526321 1.665012 0.915676 -1.174497 -2.270662
8 -0.959773 0.921732 1.396364 -1.383112 0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605 0.578198
>>> row = df.ix[0].copy()
>>> row
a 1.124763
b -0.416770
c 1.347839
d -0.944334
e 0.738686
Name: 0, dtype: float64
>>> df.ix[0]=df.ix[2]
>>> df.ix[2]=row
>>> df
a b c d e
0 0.549966 0.357076 -0.880669 -0.187731 -0.221997
1 -0.348112 0.786822 -1.161970 -1.645065 -0.075205
2 1.124763 -0.416770 1.347839 -0.944334 0.738686
3 0.311057 -0.126432 -1.187644 2.151804 0.791835
4 -0.310849 0.753750 -1.087447 0.095884 1.449832
5 -0.272344 0.278788 -0.724369 -0.568442 0.164909
6 0.942927 -0.273203 0.203322 1.099572 -0.505160
7 0.526321 1.665012 0.915676 -1.174497 -2.270662
8 -0.959773 0.921732 1.396364 -1.383112 0.603030
9 -2.802902 -0.572469 -1.599550 -1.305605 0.578198
>>> df.set_index('a')
b c d e
a
0.549966 0.357076 -0.880669 -0.187731 -0.221997
-0.348112 0.786822 -1.161970 -1.645065 -0.075205
1.124763 -0.416770 1.347839 -0.944334 0.738686
0.311057 -0.126432 -1.187644 2.151804 0.791835
-0.310849 0.753750 -1.087447 0.095884 1.449832
-0.272344 0.278788 -0.724369 -0.568442 0.164909
0.942927 -0.273203 0.203322 1.099572 -0.505160
0.526321 1.665012 0.915676 -1.174497 -2.270662
-0.959773 0.921732 1.396364 -1.383112 0.603030
-2.802902 -0.572469 -1.599550 -1.305605 0.578198
If that's what you want...
如果那是你想要的...
回答by Nader Hisham
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
you can simply do the following
您可以简单地执行以下操作
df.reindex([2, 0 ,1] + range(3, len(df)))
or you can do the following
或者您可以执行以下操作
pd.concat([ df.reindex([2, 0, 1]) , df.iloc[3:]])
# this line rearrange the first 3 rows
df.reindex([2, 0, 1])
# slice data from third row
df.iloc[3:]
# concatenate both results together
pd.concat([ df.reindex([2, 0 ,1]), df.iloc[3:]])

