Pandas：替代 .ix

Question

提问by elPastor

Given the update to pandas 0.20.0 and the deprecation of .ix, I am wondering what the most efficient way to get the same result using the remaining .locand .iloc. I just answered this question, but the second option (not using .ix) seems inefficient and verbose.

鉴于Pandas 0.20.0 的更新和的弃用.ix，我想知道使用剩余的.loc和.iloc. 我刚刚回答了这个问题，但第二个选项（不使用.ix）似乎效率低下且冗长。

Snippet:

片段：

print df.iloc[df.loc[df['cap'].astype(float) > 35].index, :-1]

Is this the proper way to go when using both conditional and index position filtering?

使用条件和索引位置过滤时，这是正确的方法吗？

Answer 1

采纳答案by piRSquared

You can stay in the world of a single locby getting at the index values you need by slicing that particular index with positions.

您可以loc通过将特定索引与位置切片来获取所需的索引值，从而保持单一的世界。

df.loc[
    df['cap'].astype(float) > 35,
    df.columns[:-1]
]

Answer 2

回答by Ken Wei

Generally, you would prefer to avoid chained indexing in pandas (though, strictly speaking, you're actually using two different indexing methods). You can't modify your dataframe this way (details in the docs), and the docs cite performance as another reason (indexing once vs. twice).

通常，您希望避免在 Pandas 中建立链式索引（不过，严格来说，您实际上使用了两种不同的索引方法）。您不能以这种方式修改数据框（文档中的详细信息），并且文档将性能作为另一个原因（索引一次与两次）。

For the latter, it's usually insignificant (or rather, unlikely to be a bottleneck in your code), and actually seems to not be the case (at least in the following example):

对于后者，它通常无关紧要（或者更确切地说，不太可能成为您代码中的瓶颈），实际上似乎并非如此（至少在以下示例中）：

df = pd.DataFrame(np.random.uniform(size=(100000,10)),columns = list('abcdefghij'))
# Get columns number 2:5 where value in 'a' is greater than 0.5 
# (i.e. Boolean mask along axis 0, position slice of axis 1)

# Deprecated .ix method
%timeit df.ix[df['a'] > 0.5,2:5]
100 loops, best of 3: 2.14 ms per loop

# Boolean, then position
%timeit df.loc[df['a'] > 0.5,].iloc[:,2:5]
100 loops, best of 3: 2.14 ms per loop

# Position, then Boolean
%timeit df.iloc[:,2:5].loc[df['a'] > 0.5,]
1000 loops, best of 3: 1.75 ms per loop

# .loc
%timeit df.loc[df['a'] > 0.5, df.columns[2:5]]
100 loops, best of 3: 2.64 ms per loop

# .iloc
%timeit df.iloc[np.where(df['a'] > 0.5)[0],2:5]
100 loops, best of 3: 9.91 ms per loop

Bottom line: If you really want to avoid .ix, and you're not intending to modify values in your dataframe, just go with chained indexing. On the other hand (the 'proper' but arguably messier way), if you do need to modify values, either do .ilocwith np.where()or .locwith integer slices of df.indexor df.columns.

底线：如果您真的想避免.ix，并且您不打算修改数据框中的值，请使用链式索引。在另一方面（在“适当的”，但可以说是混乱的方式），如果你确实需要修改的值，要么.iloc用np.where()或.loc用整片df.index或df.columns。

Answer 3

回答by Psidom

How about breaking this into a two-step indexing:

如何将其分解为两步索引：

df[df['cap'].astype(float) > 35].iloc[:,:-1]

or even:

甚至：

df[df['cap'].astype(float) > 35].drop('cap',1)

Answer 4

回答by KhanJr

Pandas remove .ix, and encourage you to use .iloc, .loc .

Pandas 删除 .ix，并鼓励您使用 .iloc、.loc 。

for this you can refer to the iloc, loc definition and how they are different from ix, This might help you.

为此，您可以参考 iloc、loc 定义以及它们与 ix 的不同之处，这可能对您有所帮助。

How are iloc, ix and loc different?

iloc、ix 和 loc 有何不同？

Pandas：替代 .ix

提问by elPastor

采纳答案by piRSquared

回答by Ken Wei

回答by Psidom

回答by KhanJr

相关推荐

最近更新

标签

Pandas：替代 .ix

提问by elPastor

采纳答案by piRSquared

回答by Ken Wei

回答by Psidom

回答by KhanJr

相关推荐

pandas Panda Python - 将一列除以 100（然后四舍五入 2.dp）

pandas 熊猫将列转换为日期时间

将对象转换为 Int Pandas

Pandas：无法将 <class 'pandas.tseries.index.DatetimeIndex'> 类型转换为时间戳

相关推荐

最近更新

标签