Python Pandas:按位置访问的索引更新和更改值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20997082/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Index updating and changing value accessed by location
提问by Zhubarb
I have two index-related questions on Python Pandas dataframes.
我有两个关于 Python Pandas 数据帧的索引相关问题。
import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
'B' : ['one', 'one', 'two', 'three',
'two', 'three', 'one', 'two'],
'amount' : np.random.randn(8)})
df = df.ix[df.B != 'three'] # remove where B = three
df.index
>> Int64Index([0, 1, 2, 4, 6, 7], dtype=int64) # the original index is preserved.
1)I do not understand why the indexing is not automatically updated after I modify the dataframe. Is there a way to automatically update the indexing while modifying a dataframe? If not, what is the most efficient manual way to do this?
1)我不明白为什么修改数据框后索引不会自动更新。有没有办法在修改数据帧时自动更新索引?如果没有,最有效的手动方法是什么?
2)I want to be able to set the Bcolumn of the 5th element of dfto 'three'. But df.iloc[5]['B'] = 'three'does not do that. I checked on the manualbut it does not cover how to change a specific cell value accessed by location.
2)我希望能够B将第 5 个元素的列设置df为“三”。但df.iloc[5]['B'] = 'three'不这样做。我查看了手册,但它没有涵盖如何更改按位置访问的特定单元格值。
If I were accessing by row name, I could do: df.loc[5,'B'] = 'three'but I don't know what the index access equivalent is.
如果我按行名访问,我可以这样做:df.loc[5,'B'] = 'three'但我不知道索引访问等价物是什么。
P.S. Link1and link2are relevant answers to my second question. However, they do not answer my question.
采纳答案by Briford Wylie
1) I do not understand why the indexing is not automatically updated after I modify the dataframe.
1)我不明白为什么修改数据框后索引不会自动更新。
If you want to reset the index after removing/adding rows you can do this:
如果您想在删除/添加行后重置索引,您可以执行以下操作:
df = df[df.B != 'three'] # remove where B = three
df.reset_index(drop=True)
B amount id
0 one -1.176137 1
1 one 0.434470 2
2 two -0.887526 3
3 two 0.126969 5
4 one 0.090442 7
5 two -1.511353 8
Indexes are meant to label/tag/id a row... so you might think about making your 'id' column the index, and then you'll appreciate that Pandas doesn't 'automatically update' the index when deleting rows.
索引旨在标记/标记/识别一行……所以您可能会考虑将“id”列作为索引,然后您会意识到 Pandas 在删除行时不会“自动更新”索引。
df.set_index('id')
B amount
id
1 one -0.410671
2 one 0.092931
3 two -0.100324
4 three 0.322580
5 two -0.546932
6 three -2.018198
7 one -0.459551
8 two 1.254597
2) I want to be able to set the B column of the 5th element of df to 'three'. But df.iloc[5]['B'] = 'three' does not do that. I checked on the manual but it does not cover how to change a specific cell value accessed by location.
2)我希望能够将 df 的第 5 个元素的 B 列设置为“三”。但是 df.iloc[5]['B'] = 'three' 不会这样做。我查看了手册,但它没有涵盖如何更改按位置访问的特定单元格值。
Jeff already answered this...
杰夫已经回答了这个......
回答by Jeff
In [5]: df = pd.DataFrame({'id' : range(1,9),
...: 'B' : ['one', 'one', 'two', 'three',
...: 'two', 'three', 'one', 'two'],
...: 'amount' : np.random.randn(8)})
In [6]: df
Out[6]:
B amount id
0 one -1.236735 1
1 one -0.427070 2
2 two -2.330888 3
3 three -0.654062 4
4 two 0.587660 5
5 three -0.719589 6
6 one 0.860739 7
7 two -2.041390 8
[8 rows x 3 columns]
Your question 1) your code above is correct (see @Briford Wylie for resetting the index, which is what I think you want)
您的问题 1)您上面的代码是正确的(请参阅@Briford Wylie 以重置索引,这是我认为您想要的)
In [7]: df.ix[df.B!='three']
Out[7]:
B amount id
0 one -1.236735 1
1 one -0.427070 2
2 two -2.330888 3
4 two 0.587660 5
6 one 0.860739 7
7 two -2.041390 8
[6 rows x 3 columns]
In [8]: df = df.ix[df.B!='three']
In [9]: df.index
Out[9]: Int64Index([0, 1, 2, 4, 6, 7], dtype='int64')
In [10]: df.iloc[5]
Out[10]:
B two
amount -2.04139
id 8
Name: 7, dtype: object
Question 2):
问题2):
You are trying to set a copy; In 0.13 this will raise/warn. see here
您正在尝试设置副本;在 0.13 中,这将引发/警告。看这里
In [11]: df.iloc[5]['B'] = 5
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
In [24]: df.iloc[5,df.columns.get_indexer(['B'])] = 'foo'
In [25]: df
Out[25]:
B amount id
0 one -1.236735 1
1 one -0.427070 2
2 two -2.330888 3
4 two 0.587660 5
6 one 0.860739 7
7 foo -2.041390 8
[6 rows x 3 columns]
You can also do this. This is NOT setting a copy and since it selects a Series (that is what df['B']is, then it CAN be set directly
你也可以这样做。这不是设置副本,因为它选择了一个系列(就是这样df['B'],那么它可以直接设置
In [30]: df['B'].iloc[5] = 5
In [31]: df
Out[31]:
B amount id
0 one -1.236735 1
1 one -0.427070 2
2 two -2.330888 3
4 two 0.587660 5
6 one 0.860739 7
7 5 -2.041390 8
[6 rows x 3 columns]

