R 的 Pandas 等价物 which()

Question

提问by user2643394

Variations of this question have been asked before, I'm still having trouble understanding how to actually slice a python series/pandas dataframe based on conditions that I'd like to set.

之前已经问过这个问题的变体，我仍然无法理解如何根据我想设置的条件实际切片 python 系列/pandas 数据框。

In R, what I'm trying to do is:

在 R 中，我想做的是：

df[which(df[,colnumber] > somenumberIchoose),]

The which() function finds indices of row entries in a column in the dataframe which are greater than somenumberIchoose, and returns this as a vector. Then, I slice the dataframe by using these row indices to indicate which rows of the dataframe I would like to look at in the new form.

which() 函数在数据框中的列中查找大于 somenumberIchoose 的行条目的索引，并将其作为向量返回。然后，我使用这些行索引对数据帧进行切片，以指示我想在新表单中查看数据帧的哪些行。

Is there an equivalent way to do this in python? I've seen references to enumerate, which I don't fully understand after reading the documentation. My sample in order to get the row indices right now looks like this:

在 python 中是否有等效的方法来执行此操作？我看过对 enumerate 的引用，在阅读文档后我并不完全理解。我现在获取行索引的示例如下所示：

indexfuture = [ x.index(), x in enumerate(df['colname']) if x > yesterday]

However, I keep on getting an invalid syntax error. I can hack a workaround by for looping through the values, and manually doing the search myself, but that seems extremely non-pythonic and inefficient.

但是，我不断收到无效的语法错误。我可以通过 for 循环遍历值来破解一个解决方法，并自己手动进行搜索，但这似乎非常非 Pythonic 且效率低下。

What exactly does enumerate() do? What is the pythonic way of finding indices of values in a vector that fulfill desired parameters?

enumerate() 到底做了什么？在满足所需参数的向量中查找值索引的pythonic方法是什么？

Note: I'm using Pandas for the dataframes

注意：我使用 Pandas 作为数据框

Answer 1

采纳答案by fdeheeger

I may not understand clearly the question, but it looks like the response is easier than what you think:

我可能不明白这个问题，但看起来答案比你想象的要容易：

using pandas DataFrame:

使用Pandas数据帧：

df['colname'] > somenumberIchoose

returns a pandas series with True / False values and the original index of the DataFrame.

返回带有 True / False 值和 DataFrame 原始索引的 Pandas 系列。

Then you can use that boolean series on the original DataFrame and get the subset you are looking for:

然后您可以在原始 DataFrame 上使用该布尔系列并获取您正在寻找的子集：

df[df['colname'] > somenumberIchoose]

should be enough.

应该够了。

See http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

请参阅http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

Answer 2

回答by Dunes

What what I know of R you might be more comfortable working with numpy-- a scientific computing package similar to MATLAB.

根据我对 R 的了解，您可能更喜欢使用numpy——一种类似于 MATLAB 的科学计算包。

If you want the indices of an array who values are divisible by two then the following would work.

如果你想要一个值可以被二整除的数组的索引，那么下面的方法就可以了。

arr = numpy.arange(10)
truth_table = arr % 2 == 0
indices = numpy.where(truth_table)
values = arr[indices]

It's also easy to work with multi-dimensional arrays

处理多维数组也很容易

arr2d = arr.reshape(2,5)
col_indices = numpy.where(arr2d[col_index] % 2 == 0)
col_values = arr2d[col_index, col_indices]

Answer 3

回答by Tim Pietzcker

enumerate()returns an iterator that yields an (index, item)tuple in each iteration, so you can't (and don't need to) call .index()again.

enumerate()返回一个迭代器，它(index, item)在每次迭代中产生一个元组，所以你不能（也不需要）.index()再次调用。

Furthermore, your list comprehension syntax is wrong:

此外，您的列表理解语法是错误的：

indexfuture = [(index, x) for (index, x) in enumerate(df['colname']) if x > yesterday]

Test case:

测试用例：

>>> [(index, x) for (index, x) in enumerate("abcdef") if x > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]

Of course, you don't need to unpack the tuple:

当然，你不需要解压元组：

>>> [tup for tup in enumerate("abcdef") if tup[1] > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]

unless you're only interested in the indices, in which case you could do something like

除非你只对索引感兴趣，在这种情况下你可以做类似的事情

>>> [index for (index, x) in enumerate("abcdef") if x > "c"]
[3, 4, 5]

Answer 4

回答by Manuel

And if you need an additional statement panda.Series allows you to do Operations between Series (+, -, /, , *).

如果你需要一个额外的语句，panda.Series 允许你在系列（+、-、/、*）之间进行操作。

Just multiplicate the indexes:

只需乘以索引：

idx1 = df['lat'] == 49
idx2 = df['lng'] > 15 
idx = idx1 * idx2

new_df = df[idx]

Answer 5

回答by wdwd

Instead of enumerate, I usually just use .iteritems. This saves a .index(). Namely,

而不是enumerate，我通常只使用.iteritems. 这节省了一个.index(). 即，

[k for k, v in (df['c'] > t).iteritems() if v]

Otherwise, one has to do

否则，必须做

df[df['c'] > t].index()

This duplicates the typing of the data frame name, which can be very long and painful to type.

这会重复输入数据框名称，输入可能会很长而且很麻烦。

Answer 6

回答by Adr

A nice simple and neat way of doing this is the following:

一个很好的简单而整洁的方法如下：

SlicedData1 = df[df.colname>somenumber]]

This can easily be extended to include other criteria, such as non-numeric data:

这可以很容易地扩展到包括其他标准，例如非数字数据：

SlicedData2 = df[(df.colname1>somenumber & df.colname2=='24/08/2018')]

And so on...

等等...

R 的 Pandas 等价物 which()

提问by user2643394

采纳答案by fdeheeger

回答by Dunes

回答by Tim Pietzcker

回答by Manuel

回答by wdwd

回答by Adr

相关推荐

最近更新

标签

R 的 Pandas 等价物 which()

提问by user2643394

采纳答案by fdeheeger

回答by Dunes

回答by Tim Pietzcker

回答by Manuel

回答by wdwd

回答by Adr

相关推荐

pandas 导入熊猫导入错误：没有名为熊猫的模块

在 Pandas 中循环使用 MultiIndex

pandas 将日期时间列拆分为日期和时间 Python

使用 Pandas 读取 JSON 时出现“预期字符串或 Unicode”

相关推荐

最近更新

标签