Python 在 Pandas 中使用 iloc 的正确方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42139624/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Proper way to use iloc in Pandas
提问by supernovaee
I have the following dataframe df:
我有以下数据框 df:
print(df)
Food Taste
0 Apple NaN
1 Banana NaN
2 Candy NaN
3 Milk NaN
4 Bread NaN
5 Strawberry NaN
I am trying to replace values in a range of rows using iloc:
我正在尝试使用 iloc 替换一系列行中的值:
df.Taste.iloc[0:2] = 'good'
df.Taste.iloc[2:6] = 'bad'
But it returned the following SettingWithCopyWarning message:
但它返回了以下 SettingWithCopyWarning 消息:
SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame
So, I found this Stackoverflow pageand tried this:
所以,我找到了这个Stackoverflow 页面并尝试了这个:
df.iloc[0:2, 'Taste'] = 'good'
df.iloc[2:6, 'Taste'] = 'bad'
Unfortunately, it returned the following error:
不幸的是,它返回了以下错误:
ValueError: Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array]
What would be the proper way to use iloc in this situation? Also, is there a way to combine these two lines above?
在这种情况下使用 iloc 的正确方法是什么?另外,有没有办法将上面这两行结合起来?
回答by jezrael
You can use Index.get_loc
for position of column Taste
, because DataFrame.iloc
select by positions:
您可以Index.get_loc
用于 column的位置Taste
,因为DataFrame.iloc
按位置选择:
#return second position (python counts from 0, so 1)
print (df.columns.get_loc('Taste'))
1
df.iloc[0:2, df.columns.get_loc('Taste')] = 'good'
df.iloc[2:6, df.columns.get_loc('Taste')] = 'bad'
print (df)
Food Taste
0 Apple good
1 Banana good
2 Candy bad
3 Milk bad
4 Bread bad
5 Strawberry bad
Possible solution with ix
is not recommended because deprecate ixin next version of pandas:
ix
不推荐使用可能的解决方案,因为 在下一版本的熊猫中弃用 ix:
df.ix[0:2, 'Taste'] = 'good'
df.ix[2:6, 'Taste'] = 'bad'
print (df)
Food Taste
0 Apple good
1 Banana good
2 Candy bad
3 Milk bad
4 Bread bad
5 Strawberry bad
回答by Jared Stufft
.iloc uses integer location, whereas .loc uses name. Both options also take both row AND column identifiers (for DataFrames). Your inital code didn't work because you didn't specify within the .iloc call which column you're selecting. The second code line you tried didn't work because you mixed integer location with column name, and .iloc only accepts integer location. If you don't know the column integer location, you can use Index.get_loc
in place as suggested above. Otherwise, use the integer position, in this case 1.
.iloc 使用整数位置,而 .loc 使用名称。这两个选项也都采用行和列标识符(对于 DataFrames)。您的初始代码不起作用,因为您没有在 .iloc 调用中指定您选择的列。您尝试的第二行代码不起作用,因为您将整数位置与列名混合在一起,而 .iloc 只接受整数位置。如果您不知道列整数位置,则可以Index.get_loc
按照上面的建议就地使用。否则,使用整数位置,在本例中为 1。
df.iloc[0:2, df.columns.get_loc('Taste')] = 'good'
df.iloc[2:6, df.columns.get_loc('Taste')] = 'bad'
is equal to:
等于:
df.iloc[0:2, 1] = 'good'
df.iloc[2:6, 1] = 'bad'
in this particular situation.
在这种特殊情况下。
回答by HeadAndTail
Purely integer-location based indexing for selection by position.. eg :-
纯粹基于整数位置的索引,用于按位置选择......例如:-
lang_sets = {}
lang_sets['en'] = train[train.lang == 'en'].iloc[:,:-1]
lang_sets['ja'] = train[train.lang == 'ja'].iloc[:,:-1]
lang_sets['de'] = train[train.lang == 'de'].iloc[:,:-1]
回答by Rob
I prefer to use .loc
in such cases, and explicitly use the index of the DataFrame if you want to select on position:
我更喜欢.loc
在这种情况下使用,如果要选择位置,请明确使用 DataFrame 的索引:
df.loc[df.index[0:2], 'Taste'] = 'good'
df.loc[df.index[2:6], 'Taste'] = 'bad'