pandas 为什么“reset_index(drop=True)”函数会意外删除列？

Question

提问by Stanleyrr

I have a Pandas dataframe named data_match. It contains columns '_worker_id', '_unit_id', and 'caption'. (Please see attached screenshot for some of the rows in this dataframe)

我有一个名为data_match的 Pandas 数据框。它包含列“_worker_id”、“_unit_id”和“caption”。（请参阅此数据框中某些行的附加屏幕截图）

Let's say the index column is not in ascending order (I want the index to be 0, 1, 2, 3, 4...n) and I want it to be in ascending order. So I ran the following function attempting to reset the index column:
data_match=data_match.reset_index(drop=True)

假设索引列不是按升序排列的（我希望索引为 0、1、2、3、4...n）并且我希望它按升序排列。所以我运行了以下函数试图重置索引列：
data_match=data_match.reset_index(drop=True)

I was able to get the function to return the correct output in my computer using Python 3.6. However, when my coworker ran that function in his computer using Python 3.6, the '_worker_id' column got removed.

我能够使用 Python 3.6 使该函数在我的计算机中返回正确的输出。但是，当我的同事使用 Python 3.6 在他的计算机中运行该函数时，“_worker_id”列被删除了。

Is this due to the '(drop=True)' clause next to 'reset_index'? But I didn't know why it worked in my computer and not in my coworker's computer. Can anybody advise?

这是由于' reset_index'旁边的' (drop=True)'子句吗？但我不知道为什么它在我的电脑上工作，而不是在我同事的电脑上。有人可以建议吗？

Answer 1

回答by unutbu

As the saying goes, "What happens in your interpreter stays in your interpreter". It's impossible to explain the discrepancy without seeing the full history of commands entered into both Python interactive sessions.

俗话说，“在你的口译员身上发生的事情留在你的口译员身上”。如果不查看输入到两个 Python 交互式会话中的命令的完整历史记录，就无法解释这种差异。

However, it is possible to venture a guess:

但是，可以大胆猜测：

df.reset_index(drop=True)drops the current index of the DataFrame and replaces it with an index of increasing integers. It never drops columns.

df.reset_index(drop=True)删除 DataFrame 的当前索引并用递增的整数索引替换它。它从不丢弃列。

So, in your interactive session, _worker_idwas a column. In your co-worker's interactive session, _worker_idmust have been an index level.

因此，在您的交互式会话中，_worker_id是一个专栏。在你同事的交互会话中，_worker_id一定是一个索引级别。

The visual difference can be somewhat subtle. For example, below, dfhas a _worker_idcolumn while df2has a _worker_idindex level:

视觉差异可能有些微妙。例如，下面df有一个 _worker_id列，而df2有一个_worker_id索引级别：

In [190]: df = pd.DataFrame({'foo':[1,2,3], '_worker_id':list('ABC')}); df
Out[190]: 
  _worker_id  foo
0          A    1
1          B    2
2          C    3

In [191]: df2 = df.set_index('_worker_id', append=True); df2
Out[191]: 
              foo
  _worker_id     
0 A             1
1 B             2
2 C             3

Notice that the name _worker_idappears one line below foowhen it is an index level, and on the same line as foowhen it is a column. That is the only visual clue you get when looking at the stror reprof a DataFrame.

请注意，名称在为索引级别时_worker_id出现在下foo一行，foo在为列时出现在同一行。这是您在查看DataFrame的str或时获得的唯一视觉线索repr。

So to repeat: When _worker_indexis a column, the column is unaffected by df.reset_index(drop=True):

所以重复一遍：当_worker_index是一列时，该列不受以下因素的影响 df.reset_index(drop=True)：

In [194]: df.reset_index(drop=True)
Out[194]: 
  _worker_id  foo
0          A    1
1          B    2
2          C    3

But _worker_indexis dropped when it is part of the index:

但是_worker_index当它是索引的一部分时被删除：

In [195]: df2.reset_index(drop=True)
Out[195]: 
   foo
0    1
1    2
2    3

pandas 为什么“reset_index(drop=True)”函数会意外删除列？

提问by Stanleyrr

回答by unutbu

相关推荐

最近更新

标签

pandas 为什么“reset_index(drop=True)”函数会意外删除列？

提问by Stanleyrr

回答by unutbu

相关推荐

pandas 如何在散点图顶部绘制附加点？

pandas 如何删除熊猫数据透视表中的多级索引

使用 Pandas 中的 Where 条件分组

pandas 在pandas中，如何水平连接然后去除多余的列

相关推荐

最近更新

标签