pandas 为什么“reset_index(drop=True)”函数会意外删除列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44620465/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:49:49  来源:igfitidea点击:

Why did 'reset_index(drop=True)' function unwantedly remove column?

python-3.xpandasindexing

提问by Stanleyrr

I have a Pandas dataframe named data_match. It contains columns '_worker_id', '_unit_id', and 'caption'. (Please see attached screenshot for some of the rows in this dataframe)

我有一个名为data_match的 Pandas 数据。它包含列“_worker_id”、“_unit_id”和“caption”。(请参阅此数据框中某些行的附加屏幕截图)

enter image description here

在此处输入图片说明

Let's say the index column is not in ascending order (I want the index to be 0, 1, 2, 3, 4...n) and I want it to be in ascending order. So I ran the following function attempting to reset the index column:
data_match=data_match.reset_index(drop=True)

假设索引列不是按升序排列的(我希望索引为 0、1、2、3、4...n)并且我希望它按升序排列。所以我运行了以下函数试图重置索引列:
data_match=data_match.reset_index(drop=True)

I was able to get the function to return the correct output in my computer using Python 3.6. However, when my coworker ran that function in his computer using Python 3.6, the '_worker_id' column got removed.

我能够使用 Python 3.6 使该函数在我的计算机中返回正确的输出。但是,当我的同事使用 Python 3.6 在他的计算机中运行该函数时,“_worker_id”列被删除了。

Is this due to the '(drop=True)' clause next to 'reset_index'? But I didn't know why it worked in my computer and not in my coworker's computer. Can anybody advise?

这是由于' reset_index'旁边的' (drop=True)'子句吗?但我不知道为什么它在我的电脑上工作,而不是在我同事的电脑上。有人可以建议吗?

回答by unutbu

As the saying goes, "What happens in your interpreter stays in your interpreter". It's impossible to explain the discrepancy without seeing the full history of commands entered into both Python interactive sessions.

俗话说,“在你的口译员身上发生的事情留在你的口译员身上”。如果不查看输入到两个 Python 交互式会话中的命令的完整历史记录,就无法解释这种差异。

However, it is possible to venture a guess:

但是,可以大胆猜测:

df.reset_index(drop=True)drops the current index of the DataFrame and replaces it with an index of increasing integers. It never drops columns.

df.reset_index(drop=True)删除 DataFrame 的当前索引并用递增的整数索引替换它。它从不丢弃列。

So, in your interactive session, _worker_idwas a column. In your co-worker's interactive session, _worker_idmust have been an index level.

因此,在您的交互式会话中,_worker_id是一个专栏。在你同事的交互会话中,_worker_id一定是一个索引级别。

The visual difference can be somewhat subtle. For example, below, dfhas a _worker_idcolumn while df2has a _worker_idindex level:

视觉差异可能有些微妙。例如,下面df有一个 _worker_id列,而df2有一个_worker_id索引级别:

In [190]: df = pd.DataFrame({'foo':[1,2,3], '_worker_id':list('ABC')}); df
Out[190]: 
  _worker_id  foo
0          A    1
1          B    2
2          C    3

In [191]: df2 = df.set_index('_worker_id', append=True); df2
Out[191]: 
              foo
  _worker_id     
0 A             1
1 B             2
2 C             3

Notice that the name _worker_idappears one line below foowhen it is an index level, and on the same line as foowhen it is a column. That is the only visual clue you get when looking at the stror reprof a DataFrame.

请注意,名称在为索引级别时_worker_id出现在下foo一行,foo在为列时出现在同一行。这是您在查看DataFrame的str或时获得的唯一视觉线索repr

So to repeat: When _worker_indexis a column, the column is unaffected by df.reset_index(drop=True):

所以重复一遍:当_worker_index是一列时,该列不受以下因素的影响 df.reset_index(drop=True)

In [194]: df.reset_index(drop=True)
Out[194]: 
  _worker_id  foo
0          A    1
1          B    2
2          C    3

But _worker_indexis dropped when it is part of the index:

但是_worker_index当它是索引的一部分时被删除:

In [195]: df2.reset_index(drop=True)
Out[195]: 
   foo
0    1
1    2
2    3