pandas 中 df.reindex() 和 df.set_index() 方法的区别

Question

提问by Ricardo Guerreiro

I was confused by this, which is very simple but I didn't immediately find the answer on StackOverflow:

我对此感到困惑，这很简单，但我没有立即在 StackOverflow 上找到答案：

df.set_index('xcol')makes the column 'xcol'become the index (when it is a column of df).
df.reindex(myList), however, takes indexes from outside the dataframe, for example, from a list named myListthat we defined somewhere else.

df.set_index('xcol')使该列'xcol'成为索引（当它是 df 的列时）。
df.reindex(myList)但是，它从数据框外部获取索引，例如，从myList我们在其他地方定义的命名列表中获取索引。

I hope this post clarifies it! Additions to this post are also welcome!

我希望这篇文章能澄清它！也欢迎补充这篇文章！

Answer 1

回答by Ben.T

You can see the difference on a simple example. Let's consider this dataframe:

您可以在一个简单的示例中看到不同之处。让我们考虑这个数据框：

df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})
print (df)
   a  b
0  1  3
1  2  4

Indexes are then 0 and 1

索引然后是 0 和 1

If you use set_indexwith the column 'a' then the indexes are 1 and 2. If you do df.set_index('a').loc[1,'b'], you will get 3.

如果您set_index与列 'a' 一起使用，则索引为 1 和 2。如果这样做df.set_index('a').loc[1,'b']，您将得到 3。

Now if you want to use reindexwith the same indexes 1 and 2 such as df.reindex([1,2]), you will get 4.0 when you do df.reindex([1,2]).loc[1,'b']

现在，如果您想使用reindex相同的索引 1 和 2，例如df.reindex([1,2])，那么您将获得 4.0df.reindex([1,2]).loc[1,'b']

What happend is that set_indexhas replaced the previous indexes (0,1) with (1,2) (values from column 'a') without touching the order of values in the column 'b'

发生的事情是set_index用 (1,2)（来自列 'a' 的值）替换了先前的索引 (0,1)，而没有触及列 'b' 中值的顺序

df.set_index('a')
   b
a   
1  3
2  4

while reindexchange the indexes but keeps the values in column 'b' associated to the indexes in the original df

虽然reindex更改索引但保留列 'b' 中的值与原始 df 中的索引相关联

df.reindex(df.a.values).drop('a',1) # equivalent to df.reindex(df.a.values).drop('a',1)
     b
1  4.0
2  NaN
# drop('a',1) is just to not care about column a in my example

Finally, reindexchange the order of indexes without changing the values of the row associated to each index, while set_indexwill change the indexes with the values of a column, without touching the order of the other values in the dataframe

最后，reindex更改索引的顺序而不更改与每个索引关联的行的值，同时set_index将使用列的值更改索引，而不影响数据框中其他值的顺序

Answer 2

回答by prosti

Just to add, the undo to set_indexwould be reset_indexmethod (more or less):

只是补充一下，撤消set_index将是reset_index方法（或多或少）：

df = pd.DataFrame({'a': [1, 2],'b': [3, 4]})
print (df)

df.set_index('a', inplace=True)
print(df)

df.reset_index(inplace=True, drop=False)
print(df)

Answer 3

回答by Long

Besides great answer from Ben. T, I would like to give one more example of how they are different when you use reindexand set_indexto an index column

除了本的精彩回答。T，我想再举一个例子，说明当您使用reindex和set_index索引列时它们有何不同

import pandas as pd
import numpy as np
testdf = pd.DataFrame({'a': [1, 3, 2],'b': [3, 5, 4],'c': [5, 7, 6]})

print(testdf)
print(testdf.set_index(np.random.permutation(testdf.index)))
print(testdf.reindex(np.random.permutation(testdf.index)))

Output:

输出：

With set_index, when indexcolumn (the first column) is shuffled, the order of other columns are kept intact
With reindex, the order of rows are changed accordingly to the shuffle of indexcolumn.

与set_index，当index列（第一列）被洗牌时，其他列的顺序保持不变
使用reindex，行的顺序会根据index列的洗牌进行相应更改。

pandas 中 df.reindex() 和 df.set_index() 方法的区别

提问by Ricardo Guerreiro

回答by Ben.T

回答by prosti

回答by Long

相关推荐

最近更新

标签

pandas 中 df.reindex() 和 df.set_index() 方法的区别

提问by Ricardo Guerreiro

回答by Ben.T

回答by prosti

回答by Long

相关推荐

pandas ValueError ：“color kwarg 每个数据集必须有一种颜色” matplotlib

pandas 在熊猫的给定范围内生成随机日期

Pandas 发送包含数据框的电子邮件作为可视化表格

pandas 扁平化多索引列的简洁方法

相关推荐

最近更新

标签