Pandas:替代 .ix
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43838999/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Replacement for .ix
提问by elPastor
Given the update to pandas 0.20.0 and the deprecation of .ix
, I am wondering what the most efficient way to get the same result using the remaining .loc
and .iloc
. I just answered this question, but the second option (not using .ix
) seems inefficient and verbose.
鉴于Pandas 0.20.0 的更新和 的弃用.ix
,我想知道使用剩余的.loc
和.iloc
. 我刚刚回答了这个问题,但第二个选项(不使用.ix
)似乎效率低下且冗长。
Snippet:
片段:
print df.iloc[df.loc[df['cap'].astype(float) > 35].index, :-1]
Is this the proper way to go when using both conditional and index position filtering?
使用条件和索引位置过滤时,这是正确的方法吗?
采纳答案by piRSquared
You can stay in the world of a single loc
by getting at the index values you need by slicing that particular index with positions.
您可以loc
通过将特定索引与位置切片来获取所需的索引值,从而保持单一的世界。
df.loc[
df['cap'].astype(float) > 35,
df.columns[:-1]
]
回答by Ken Wei
Generally, you would prefer to avoid chained indexing in pandas (though, strictly speaking, you're actually using two different indexing methods). You can't modify your dataframe this way (details in the docs), and the docs cite performance as another reason (indexing once vs. twice).
通常,您希望避免在 Pandas 中建立链式索引(不过,严格来说,您实际上使用了两种不同的索引方法)。您不能以这种方式修改数据框(文档中的详细信息),并且文档将性能作为另一个原因(索引一次与两次)。
For the latter, it's usually insignificant (or rather, unlikely to be a bottleneck in your code), and actually seems to not be the case (at least in the following example):
对于后者,它通常无关紧要(或者更确切地说,不太可能成为您代码中的瓶颈),实际上似乎并非如此(至少在以下示例中):
df = pd.DataFrame(np.random.uniform(size=(100000,10)),columns = list('abcdefghij'))
# Get columns number 2:5 where value in 'a' is greater than 0.5
# (i.e. Boolean mask along axis 0, position slice of axis 1)
# Deprecated .ix method
%timeit df.ix[df['a'] > 0.5,2:5]
100 loops, best of 3: 2.14 ms per loop
# Boolean, then position
%timeit df.loc[df['a'] > 0.5,].iloc[:,2:5]
100 loops, best of 3: 2.14 ms per loop
# Position, then Boolean
%timeit df.iloc[:,2:5].loc[df['a'] > 0.5,]
1000 loops, best of 3: 1.75 ms per loop
# .loc
%timeit df.loc[df['a'] > 0.5, df.columns[2:5]]
100 loops, best of 3: 2.64 ms per loop
# .iloc
%timeit df.iloc[np.where(df['a'] > 0.5)[0],2:5]
100 loops, best of 3: 9.91 ms per loop
Bottom line: If you really want to avoid .ix
, and you're not intending to modify values in your dataframe, just go with chained indexing. On the other hand (the 'proper' but arguably messier way), if you do need to modify values, either do .iloc
with np.where()
or .loc
with integer slices of df.index
or df.columns
.
底线:如果您真的想避免.ix
,并且您不打算修改数据框中的值,请使用链式索引。在另一方面(在“适当的”,但可以说是混乱的方式),如果你确实需要修改的值,要么.iloc
用np.where()
或.loc
用整片df.index
或df.columns
。
回答by Psidom
How about breaking this into a two-step indexing:
如何将其分解为两步索引:
df[df['cap'].astype(float) > 35].iloc[:,:-1]
or even:
甚至:
df[df['cap'].astype(float) > 35].drop('cap',1)
回答by KhanJr
Pandas remove .ix, and encourage you to use .iloc, .loc .
Pandas 删除 .ix,并鼓励您使用 .iloc、.loc 。
for this you can refer to the iloc, loc definition and how they are different from ix, This might help you.
为此,您可以参考 iloc、loc 定义以及它们与 ix 的不同之处,这可能对您有所帮助。