pandas 为什么“reset_index(drop=True)”函数会意外删除列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44620465/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why did 'reset_index(drop=True)' function unwantedly remove column?
提问by Stanleyrr
I have a Pandas dataframe named data_match. It contains columns '_worker_id', '_unit_id', and 'caption'. (Please see attached screenshot for some of the rows in this dataframe)
我有一个名为data_match的 Pandas 数据框。它包含列“_worker_id”、“_unit_id”和“caption”。(请参阅此数据框中某些行的附加屏幕截图)
Let's say the index column is not in ascending order (I want the index to be 0, 1, 2, 3, 4...n) and I want it to be in ascending order. So I ran the following function attempting to reset the index column:
data_match=data_match.reset_index(drop=True)
假设索引列不是按升序排列的(我希望索引为 0、1、2、3、4...n)并且我希望它按升序排列。所以我运行了以下函数试图重置索引列:
data_match=data_match.reset_index(drop=True)
I was able to get the function to return the correct output in my computer using Python 3.6. However, when my coworker ran that function in his computer using Python 3.6, the '_worker_id' column got removed.
我能够使用 Python 3.6 使该函数在我的计算机中返回正确的输出。但是,当我的同事使用 Python 3.6 在他的计算机中运行该函数时,“_worker_id”列被删除了。
Is this due to the '(drop=True)' clause next to 'reset_index'? But I didn't know why it worked in my computer and not in my coworker's computer. Can anybody advise?
这是由于' reset_index'旁边的' (drop=True)'子句吗?但我不知道为什么它在我的电脑上工作,而不是在我同事的电脑上。有人可以建议吗?
回答by unutbu
As the saying goes, "What happens in your interpreter stays in your interpreter". It's impossible to explain the discrepancy without seeing the full history of commands entered into both Python interactive sessions.
俗话说,“在你的口译员身上发生的事情留在你的口译员身上”。如果不查看输入到两个 Python 交互式会话中的命令的完整历史记录,就无法解释这种差异。
However, it is possible to venture a guess:
但是,可以大胆猜测:
df.reset_index(drop=True)
drops the current index of the DataFrame and replaces it with an index of
increasing integers. It never drops columns.
df.reset_index(drop=True)
删除 DataFrame 的当前索引并用递增的整数索引替换它。它从不丢弃列。
So, in your interactive session, _worker_id
was a column. In your co-worker's
interactive session, _worker_id
must have been an index level.
因此,在您的交互式会话中,_worker_id
是一个专栏。在你同事的交互会话中,_worker_id
一定是一个索引级别。
The visual difference can be somewhat subtle. For example, below, df
has a
_worker_id
column while df2
has a _worker_id
index level:
视觉差异可能有些微妙。例如,下面df
有一个
_worker_id
列,而df2
有一个_worker_id
索引级别:
In [190]: df = pd.DataFrame({'foo':[1,2,3], '_worker_id':list('ABC')}); df
Out[190]:
_worker_id foo
0 A 1
1 B 2
2 C 3
In [191]: df2 = df.set_index('_worker_id', append=True); df2
Out[191]:
foo
_worker_id
0 A 1
1 B 2
2 C 3
Notice that the name _worker_id
appears one line below foo
when it is an
index level, and on the same line as foo
when it is a column. That is the only
visual clue you get when looking at the str
or repr
of a DataFrame.
请注意,名称在为索引级别时_worker_id
出现在下foo
一行,foo
在为列时出现在同一行。这是您在查看DataFrame的str
或时获得的唯一视觉线索repr
。
So to repeat: When _worker_index
is a column, the column is unaffected by
df.reset_index(drop=True)
:
所以重复一遍:当_worker_index
是一列时,该列不受以下因素的影响
df.reset_index(drop=True)
:
In [194]: df.reset_index(drop=True)
Out[194]:
_worker_id foo
0 A 1
1 B 2
2 C 3
But _worker_index
is dropped when it is part of the index:
但是_worker_index
当它是索引的一部分时被删除:
In [195]: df2.reset_index(drop=True)
Out[195]:
foo
0 1
1 2
2 3