pandas 中 df.reindex() 和 df.set_index() 方法的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50741330/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between df.reindex() and df.set_index() methods in pandas
提问by Ricardo Guerreiro
I was confused by this, which is very simple but I didn't immediately find the answer on StackOverflow:
我对此感到困惑,这很简单,但我没有立即在 StackOverflow 上找到答案:
df.set_index('xcol')
makes the column'xcol'
become the index (when it is a column of df).df.reindex(myList)
, however, takes indexes from outside the dataframe, for example, from a list namedmyList
that we defined somewhere else.
df.set_index('xcol')
使该列'xcol'
成为索引(当它是 df 的列时)。df.reindex(myList)
但是,它从数据框外部获取索引,例如,从myList
我们在其他地方定义的命名列表中获取索引。
I hope this post clarifies it! Additions to this post are also welcome!
我希望这篇文章能澄清它!也欢迎补充这篇文章!
回答by Ben.T
You can see the difference on a simple example. Let's consider this dataframe:
您可以在一个简单的示例中看到不同之处。让我们考虑这个数据框:
df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})
print (df)
a b
0 1 3
1 2 4
Indexes are then 0 and 1
索引然后是 0 和 1
If you use set_index
with the column 'a' then the indexes are 1 and 2. If you do df.set_index('a').loc[1,'b']
, you will get 3.
如果您set_index
与列 'a' 一起使用,则索引为 1 和 2。如果这样做df.set_index('a').loc[1,'b']
,您将得到 3。
Now if you want to use reindex
with the same indexes 1 and 2 such as df.reindex([1,2])
, you will get 4.0 when you do df.reindex([1,2]).loc[1,'b']
现在,如果您想使用reindex
相同的索引 1 和 2,例如df.reindex([1,2])
,那么您将获得 4.0df.reindex([1,2]).loc[1,'b']
What happend is that set_index
has replaced the previous indexes (0,1) with (1,2) (values from column 'a') without touching the order of values in the column 'b'
发生的事情是set_index
用 (1,2)(来自列 'a' 的值)替换了先前的索引 (0,1),而没有触及列 'b' 中值的顺序
df.set_index('a')
b
a
1 3
2 4
while reindex
change the indexes but keeps the values in column 'b' associated to the indexes in the original df
虽然reindex
更改索引但保留列 'b' 中的值与原始 df 中的索引相关联
df.reindex(df.a.values).drop('a',1) # equivalent to df.reindex(df.a.values).drop('a',1)
b
1 4.0
2 NaN
# drop('a',1) is just to not care about column a in my example
Finally, reindex
change the order of indexes without changing the values of the row associated to each index, while set_index
will change the indexes with the values of a column, without touching the order of the other values in the dataframe
最后,reindex
更改索引的顺序而不更改与每个索引关联的行的值,同时set_index
将使用列的值更改索引,而不影响数据框中其他值的顺序
回答by prosti
Just to add, the undo to set_index
would be reset_index
method (more or less):
只是补充一下,撤消set_index
将是reset_index
方法(或多或少):
df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})
print (df)
df.set_index('a', inplace=True)
print(df)
df.reset_index(inplace=True, drop=False)
print(df)
a b
0 1 3
1 2 4
b
a
1 3
2 4
a b
0 1 3
1 2 4
回答by Long
Besides great answer from Ben. T, I would like to give one more example of how they are different when you use reindex
and set_index
to an index column
除了本的精彩回答。T,我想再举一个例子,说明当您使用reindex
和set_index
索引列时它们有何不同
import pandas as pd
import numpy as np
testdf = pd.DataFrame({'a': [1, 3, 2],'b': [3, 5, 4],'c': [5, 7, 6]})
print(testdf)
print(testdf.set_index(np.random.permutation(testdf.index)))
print(testdf.reindex(np.random.permutation(testdf.index)))
Output:
输出:
- With
set_index
, whenindex
column (the first column) is shuffled, the order of other columns are kept intact - With
reindex
, the order of rows are changed accordingly to the shuffle ofindex
column.
- 与
set_index
,当index
列(第一列)被洗牌时,其他列的顺序保持不变 - 使用
reindex
,行的顺序会根据index
列的洗牌进行相应更改。
a b c
0 1 3 5
1 3 5 7
2 2 4 6
a b c
1 1 3 5
2 3 5 7
0 2 4 6
a b c
2 2 4 6
1 3 5 7
0 1 3 5